ClickHouse/docs/tools/concatenate.py
Ivan Blinkov 0a4a5b36cc
Some WIP on documentation refactoring (#2659)
* Additional .gitignore entries

* Merge a bunch of small articles about system tables into single one

* Merge a bunch of small articles about formats into single one

* Adapt table with formats to English docs too

* Add SPb meetup link to main page

* Move Utilities out of top level of docs (the location is probably not yet final) + translate couple articles

* Merge MacOS.md into build_osx.md

* Move Data types higher in ToC

* Publish changelog on website alongside documentation

* Few fixes for en/table_engines/file.md

* Use smaller header sizes in changelogs

* Group up table engines inside ToC

* Move table engines out of top level too

* Specificy in ToC that query language is SQL based. Thats a bit excessive, but catches eye.

* Move stuff that is part of query language into respective folder

* Move table functions lower in ToC

* Lost redirects.txt update

* Do not rely on comments in yaml + fix few ru titles

* Extract major parts of queries.md into separate articles

* queries.md has been supposed to be removed

* Fix weird translation

* Fix a bunch of links

* There is only table of contents left

* "Query language" is actually part of SQL abbreviation

* Change filename in README.md too

* fix mistype
2018-07-18 13:00:53 +03:00

73 lines
2.7 KiB
Python
Executable File

# -*- coding: utf-8 -*-
# - Single-page document.
# - Requirements to the md-souces:
# - Don't use links without anchors. It means, that you can not just link file. You should specify an anchor at the top of the file and then link to this anchor
# - Anchors should be unique through whole document.
# - Implementation:
# - Script gets list of the file from the `pages` section of `mkdocs.yml`. It gets commented files too, and it right.
# - Files are concatenated by order with incrementing level of headers in all files except the first one
# - Script converts links to other files into inside page links.
# - Skipping links started with 'http'
# - Not http-links with anchor are cutted to the anchor sign (#).
# - For not http-links without anchor script logs an error and cuts them from the resulting single-page document.
import logging
import re
import os
def concatenate(lang, docs_path, single_page_file):
proj_config = os.path.join(docs_path, 'toc_%s.yml' % lang)
lang_path = os.path.join(docs_path, lang)
with open(proj_config) as cfg_file:
files_to_concatenate = []
for l in cfg_file:
if '.md' in l and 'single_page' not in l:
path = (l[l.index(':') + 1:]).strip(" '\n")
files_to_concatenate.append(path)
logging.info(
str(len(files_to_concatenate)) +
' files will be concatenated into single md-file.')
logging.debug('Concatenating: ' + ', '.join(files_to_concatenate))
first_file = True
for path in files_to_concatenate:
single_page_file.write('\n\n')
with open(os.path.join(lang_path, path)) as f:
# function is passed into re.sub() to process links
def link_proc(matchObj):
text, link = matchObj.group().strip('[)').split('](')
if link.startswith('http'):
return '[' + text + '](' + link + ')'
else:
sharp_pos = link.find('#')
if sharp_pos > -1:
return '[' + text + '](' + link[sharp_pos:] + ')'
else:
raise RuntimeError(
'ERROR: Link [' + text + '](' + link + ') in file ' +
path + ' has no anchor. Please provide it.')
for l in f:
# Processing links in a string
l = re.sub(r'\[.+?\]\(.+?\)', link_proc, l)
# Correcting headers levels
if not first_file:
if l.startswith('#'):
l = '#' + l
else:
first_file = False
single_page_file.write(l)
single_page_file.flush()