ClickHouse/docs/tools/concatenate.py

# -*- coding: utf-8 -*-

# - Single-page document.
#   - Requirements to the md-souces:
#       - Don't use links without anchors. It means, that you can not just link file. You should specify an anchor at the top of the file and then link to this anchor
#       - Anchors should be unique through whole document.
#   - Implementation:
#       - Script gets list of the file from the `pages` section of `mkdocs.yml`. It gets commented files too, and it right.
#       - Files are concatenated by order with incrementing level of headers in all files except the first one
#       - Script converts links to other files into inside page links.
#         - Skipping links started with 'http'
#         - Not http-links with anchor are cutted to the anchor sign (#).
#         - For not http-links without anchor script logs an error and cuts them from the resulting single-page document.

import logging
import re
import os


def concatenate(lang, docs_path, single_page_file):

    proj_config = os.path.join(docs_path, 'toc_%s.yml' % lang)
    lang_path = os.path.join(docs_path, lang)

    with open(proj_config) as cfg_file:
        files_to_concatenate = []
        for l in cfg_file:
            if '.md' in l and 'single_page' not in l:
                path = (l[l.index(':') + 1:]).strip(" '\n")
                files_to_concatenate.append(path)

    logging.info(
        str(len(files_to_concatenate)) +
        ' files will be concatenated into single md-file.')
    logging.debug('Concatenating: ' + ', '.join(files_to_concatenate))

    first_file = True

    for path in files_to_concatenate:

        single_page_file.write('\n\n')

        with open(os.path.join(lang_path, path)) as f:

            # function is passed into re.sub() to process links
            def link_proc(matchObj):
                text, link = matchObj.group().strip('[)').split('](')
                if link.startswith('http') or '.jpeg' in link or '.jpg' in link or '.png' in link or '.gif' in link:
                    return '[' + text + '](' + link + ')'
                else:
                    sharp_pos = link.find('#')
                    if sharp_pos > -1:
                        return '[' + text + '](' + link[sharp_pos:] + ')'
                    else:
                        raise RuntimeError(
                            'ERROR: Link [' + text + '](' + link + ') in file ' +
                            path + ' has no anchor. Please provide it.')

            for l in f:
                # Processing links in a string
                l = re.sub(r'\[.+?\]\(.+?\)', link_proc, l)

                # Correcting headers levels
                if not first_file:
                    if l.startswith('#'):
                        l = '#' + l
                else:
                    first_file = False

                single_page_file.write(l)

    single_page_file.flush()
Moving to MkDocs 2017-12-29 12:43:05 +00:00			`# -- coding: utf-8 --`

Changed font to Yandex Sans. Russian comment in scrip is traslated. 2018-02-16 10:33:30 +00:00			`# - Single-page document.`
			`# - Requirements to the md-souces:`
			`# - Don't use links without anchors. It means, that you can not just link file. You should specify an anchor at the top of the file and then link to this anchor`
			`# - Anchors should be unique through whole document.`
			`# - Implementation:`
			# - Script gets list of the file from the `pages` section of `mkdocs.yml`. It gets commented files too, and it right.
			`# - Files are concatenated by order with incrementing level of headers in all files except the first one`
			`# - Script converts links to other files into inside page links.`
			`# - Skipping links started with 'http'`
			`# - Not http-links with anchor are cutted to the anchor sign (#).`
			`# - For not http-links without anchor script logs an error and cuts them from the resulting single-page document.`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`import logging`
Moving to MkDocs 2017-12-29 12:43:05 +00:00			`import re`
			`import os`


Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`def concatenate(lang, docs_path, single_page_file):`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Some WIP on documentation refactoring (#2659) * Additional .gitignore entries * Merge a bunch of small articles about system tables into single one * Merge a bunch of small articles about formats into single one * Adapt table with formats to English docs too * Add SPb meetup link to main page * Move Utilities out of top level of docs (the location is probably not yet final) + translate couple articles * Merge MacOS.md into build_osx.md * Move Data types higher in ToC * Publish changelog on website alongside documentation * Few fixes for en/table_engines/file.md * Use smaller header sizes in changelogs * Group up table engines inside ToC * Move table engines out of top level too * Specificy in ToC that query language is SQL based. Thats a bit excessive, but catches eye. * Move stuff that is part of query language into respective folder * Move table functions lower in ToC * Lost redirects.txt update * Do not rely on comments in yaml + fix few ru titles * Extract major parts of queries.md into separate articles * queries.md has been supposed to be removed * Fix weird translation * Fix a bunch of links * There is only table of contents left * "Query language" is actually part of SQL abbreviation * Change filename in README.md too * fix mistype 2018-07-18 10:00:53 +00:00			`proj_config = os.path.join(docs_path, 'toc_%s.yml' % lang)`
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`lang_path = os.path.join(docs_path, lang)`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`with open(proj_config) as cfg_file:`
Some refactoring in concatenate.py 2018-07-09 12:35:19 +00:00			`files_to_concatenate = []`
			`for l in cfg_file:`
			`if '.md' in l and 'single_page' not in l:`
			`path = (l[l.index(':') + 1:]).strip(" '\n")`
			`files_to_concatenate.append(path)`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Improve logging a bit 2018-07-11 08:14:23 +00:00			`logging.info(`
			`str(len(files_to_concatenate)) +`
Improve logging a bit #2 2018-07-11 08:17:36 +00:00			`' files will be concatenated into single md-file.')`
			`logging.debug('Concatenating: ' + ', '.join(files_to_concatenate))`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`first_file = True`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`for path in files_to_concatenate:`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`single_page_file.write('\n\n')`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`with open(os.path.join(lang_path, path)) as f:`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`# function is passed into re.sub() to process links`
			`def link_proc(matchObj):`
			`text, link = matchObj.group().strip('[)').split('](')`
WIP on website/docs (#3237) * lost backtick * back to short examples on docs front page * publish sitemap_static.xml too * add link to "fa" sitemap * add "fa" to robots.txt * Add upcoming Beijing meetup link * Adapt css to second meetup link * Website front page tuning * Decimal docs in English * Kind of fix CLICKHOUSE-4010 * Lost blank line * fix poco submodule * Remove Paris link from front page 2018-10-05 15:56:05 +00:00			`if link.startswith('http') or '.jpeg' in link or '.jpg' in link or '.png' in link or '.gif' in link:`
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`return '[' + text + '](' + link + ')'`
			`else:`
			`sharp_pos = link.find('#')`
			`if sharp_pos > -1:`
			`return '[' + text + '](' + link[sharp_pos:] + ')'`
Some refactoring in concatenate.py 2018-07-09 12:35:19 +00:00			`else:`
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`raise RuntimeError(`
Improve logging a bit 2018-07-11 08:14:23 +00:00			`'ERROR: Link [' + text + '](' + link + ') in file ' +`
			`path + ' has no anchor. Please provide it.')`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`for l in f:`
			`# Processing links in a string`
			`l = re.sub(r'\[.+?\]\(.+?\)', link_proc, l)`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`# Correcting headers levels`
			`if not first_file:`
			`if l.startswith('#'):`
			`l = '#' + l`
			`else:`
			`first_file = False`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`single_page_file.write(l)`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`single_page_file.flush()`