ClickHouse/docs/tools/concatenate.py

# -*- coding: utf-8 -*-

# - Single-page document.
#   - Requirements to the md-souces:
#       - Don't use links without anchors. It means, that you can not just link file. You should specify an anchor at the top of the file and then link to this anchor
#       - Anchors should be unique through whole document.
#   - Implementation:
#       - Script gets list of the file from the `pages` section of `mkdocs.yml`. It gets commented files too, and it right.
#       - Files are concatenated by order with incrementing level of headers in all files except the first one
#       - Script converts links to other files into inside page links.
#         - Skipping links started with 'http'
#         - Not http-links with anchor are cutted to the anchor sign (#).
#         - For not http-links without anchor script logs an error and cuts them from the resulting single-page document.

import logging
import re
import os


def concatenate(lang, docs_path, single_page_file):

    proj_config = os.path.join(docs_path, 'toc_%s.yml' % lang)
    lang_path = os.path.join(docs_path, lang)

    with open(proj_config) as cfg_file:
        files_to_concatenate = []
        for l in cfg_file:
            if '.md' in l and 'single_page' not in l:
                path = (l[l.index(':') + 1:]).strip(" '\n")
                files_to_concatenate.append(path)

    logging.info(
        str(len(files_to_concatenate)) +
        ' files will be concatenated into single md-file.')
    logging.debug('Concatenating: ' + ', '.join(files_to_concatenate))

    first_file = True

    for path in files_to_concatenate:
        with open(os.path.join(lang_path, path)) as f:
            anchors = set()
            tmp_path = path.replace('/index.md', '/').replace('.md', '/')
            prefixes = ['', '../', '../../', '../../../']
            parts = tmp_path.split('/')
            anchors.add(parts[-2] + '/')
            anchors.add('/'.join(parts[1:]))

            for part in parts[0:-2]:
                for prefix in prefixes:
                    anchors.add(prefix + tmp_path)
                tmp_path = tmp_path.replace(part, '..')

            for anchor in anchors:
                single_page_file.write('<a name="%s"></a>\n' % anchor)

            single_page_file.write('\n\n')

            for l in f:
                if l.startswith('#'):
                    l = '#' + l
                single_page_file.write(l)

    single_page_file.flush()
Moving to MkDocs 2017-12-29 12:43:05 +00:00			`# -- coding: utf-8 --`

Changed font to Yandex Sans. Russian comment in scrip is traslated. 2018-02-16 10:33:30 +00:00			`# - Single-page document.`
			`# - Requirements to the md-souces:`
			`# - Don't use links without anchors. It means, that you can not just link file. You should specify an anchor at the top of the file and then link to this anchor`
			`# - Anchors should be unique through whole document.`
			`# - Implementation:`
			# - Script gets list of the file from the `pages` section of `mkdocs.yml`. It gets commented files too, and it right.
			`# - Files are concatenated by order with incrementing level of headers in all files except the first one`
			`# - Script converts links to other files into inside page links.`
			`# - Skipping links started with 'http'`
			`# - Not http-links with anchor are cutted to the anchor sign (#).`
			`# - For not http-links without anchor script logs an error and cuts them from the resulting single-page document.`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`import logging`
Moving to MkDocs 2017-12-29 12:43:05 +00:00			`import re`
			`import os`


Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`def concatenate(lang, docs_path, single_page_file):`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Some WIP on documentation refactoring (#2659) * Additional .gitignore entries * Merge a bunch of small articles about system tables into single one * Merge a bunch of small articles about formats into single one * Adapt table with formats to English docs too * Add SPb meetup link to main page * Move Utilities out of top level of docs (the location is probably not yet final) + translate couple articles * Merge MacOS.md into build_osx.md * Move Data types higher in ToC * Publish changelog on website alongside documentation * Few fixes for en/table_engines/file.md * Use smaller header sizes in changelogs * Group up table engines inside ToC * Move table engines out of top level too * Specificy in ToC that query language is SQL based. Thats a bit excessive, but catches eye. * Move stuff that is part of query language into respective folder * Move table functions lower in ToC * Lost redirects.txt update * Do not rely on comments in yaml + fix few ru titles * Extract major parts of queries.md into separate articles * queries.md has been supposed to be removed * Fix weird translation * Fix a bunch of links * There is only table of contents left * "Query language" is actually part of SQL abbreviation * Change filename in README.md too * fix mistype 2018-07-18 10:00:53 +00:00			`proj_config = os.path.join(docs_path, 'toc_%s.yml' % lang)`
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`lang_path = os.path.join(docs_path, lang)`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`with open(proj_config) as cfg_file:`
Some refactoring in concatenate.py 2018-07-09 12:35:19 +00:00			`files_to_concatenate = []`
			`for l in cfg_file:`
			`if '.md' in l and 'single_page' not in l:`
			`path = (l[l.index(':') + 1:]).strip(" '\n")`
			`files_to_concatenate.append(path)`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Improve logging a bit 2018-07-11 08:14:23 +00:00			`logging.info(`
			`str(len(files_to_concatenate)) +`
Improve logging a bit #2 2018-07-11 08:17:36 +00:00			`' files will be concatenated into single md-file.')`
			`logging.debug('Concatenating: ' + ', '.join(files_to_concatenate))`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`first_file = True`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`for path in files_to_concatenate:`
			`with open(os.path.join(lang_path, path)) as f:`
less duplicate anchors 2018-12-18 11:08:03 +00:00			`anchors = set()`
			`tmp_path = path.replace('/index.md', '/').replace('.md', '/')`
Generate anchors for single-page mode automatically 2018-12-12 12:54:47 +00:00			`prefixes = ['', '../', '../../', '../../../']`
			`parts = tmp_path.split('/')`
less duplicate anchors 2018-12-18 11:08:03 +00:00			`anchors.add(parts[-2] + '/')`
			`anchors.add('/'.join(parts[1:]))`
Generate anchors for single-page mode automatically 2018-12-12 12:54:47 +00:00
			`for part in parts[0:-2]:`
			`for prefix in prefixes:`
less duplicate anchors 2018-12-18 11:08:03 +00:00			`anchors.add(prefix + tmp_path)`
Generate anchors for single-page mode automatically 2018-12-12 12:54:47 +00:00			`tmp_path = tmp_path.replace(part, '..')`
Add https://github.com/hatarist/clickhouse-cli to third-party section (in gui.md for now, maybe will add cli.md later) 2018-12-18 10:01:54 +00:00
less duplicate anchors 2018-12-18 11:08:03 +00:00			`for anchor in anchors:`
			`single_page_file.write('<a name="%s"></a>\n' % anchor)`

			`single_page_file.write('\n\n')`

Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`for l in f:`
Generate anchors for single-page mode automatically 2018-12-12 12:54:47 +00:00			`if l.startswith('#'):`
			`l = '#' + l`
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`single_page_file.write(l)`
Moving to MkDocs 2017-12-29 12:43:05 +00:00
Rewrite build.sh in Python - Get rid of half of copypaste in yml files - Draft of redirects support 2018-07-09 15:03:54 +00:00			`single_page_file.flush()`