ClickHouse/docs/tools/concatenate.py

71 lines
2.8 KiB
Python
Raw Normal View History

2017-12-29 12:43:05 +00:00
# -*- coding: utf-8 -*-
# - Single-page document.
# - Requirements to the md-souces:
# - Don't use links without anchors. It means, that you can not just link file. You should specify an anchor at the top of the file and then link to this anchor
# - Anchors should be unique through whole document.
# - Implementation:
# - Script gets list of the file from the `pages` section of `mkdocs.yml`. It gets commented files too, and it right.
# - Files are concatenated by order with incrementing level of headers in all files except the first one
# - Script converts links to other files into inside page links.
# - Skipping links started with 'http'
# - Not http-links with anchor are cutted to the anchor sign (#).
# - For not http-links without anchor script logs an error and cuts them from the resulting single-page document.
2017-12-29 12:43:05 +00:00
import logging
2017-12-29 12:43:05 +00:00
import re
import os
def concatenate(lang, docs_path, single_page_file):
2017-12-29 12:43:05 +00:00
proj_config = os.path.join(docs_path, 'toc_%s.yml' % lang)
lang_path = os.path.join(docs_path, lang)
2017-12-29 12:43:05 +00:00
with open(proj_config) as cfg_file:
2018-07-09 12:35:19 +00:00
files_to_concatenate = []
for l in cfg_file:
if '.md' in l and 'single_page' not in l:
path = (l[l.index(':') + 1:]).strip(" '\n")
files_to_concatenate.append(path)
2017-12-29 12:43:05 +00:00
2018-07-11 08:14:23 +00:00
logging.info(
str(len(files_to_concatenate)) +
2018-07-11 08:17:36 +00:00
' files will be concatenated into single md-file.')
logging.debug('Concatenating: ' + ', '.join(files_to_concatenate))
2017-12-29 12:43:05 +00:00
first_file = True
2017-12-29 12:43:05 +00:00
for path in files_to_concatenate:
2017-12-29 12:43:05 +00:00
single_page_file.write('\n\n')
2017-12-29 12:43:05 +00:00
with open(os.path.join(lang_path, path)) as f:
2017-12-29 12:43:05 +00:00
# function is passed into re.sub() to process links
def link_proc(matchObj):
WIP on website+docs (#3668) * CLICKHOUSE-4063: less manual html @ index.md * CLICKHOUSE-4063: recommend markdown="1" in README.md * CLICKHOUSE-4003: manually purge custom.css for now * CLICKHOUSE-4064: expand <details> before any print (including to pdf) * CLICKHOUSE-3927: rearrange interfaces/formats.md a bit * CLICKHOUSE-3306: add few http headers * Remove copy-paste introduced in #3392 * Hopefully better chinese fonts #3392 * get rid of tabs @ custom.css * Apply comments and patch from #3384 * Add jdbc.md to ToC and some translation, though it still looks badly incomplete * minor punctuation * Add some backlinks to official website from mirrors that just blindly take markdown sources * Do not make fonts extra light * find . -name '*.md' -type f | xargs -I{} perl -pi -e 's//g' {} * find . -name '*.md' -type f | xargs -I{} perl -pi -e 's/ sql/g' {} * Remove outdated stuff from roadmap.md * Not so light font on front page too * Refactor Chinese formats.md to match recent changes in other languages * Update some links on front page * Remove some outdated comment * Add twitter link to front page * More front page links tuning * Add Amsterdam meetup link * Smaller font to avoid second line * Add Amsterdam link to README.md * Proper docs nav translation * Back to 300 font-weight except Chinese * fix docs build * Update Amsterdam link * remove symlinks * more zh punctuation * apply lost comment by @zhang2014 * Apply comments by @zhang2014 from #3417 * Remove Beijing link * rm incorrect symlink * restore content of docs/zh/operations/table_engines/index.md * CLICKHOUSE-3751: stem terms while searching docs * CLICKHOUSE-3751: use English stemmer in non-English docs too * CLICKHOUSE-4135 fix * Remove past meetup link * Add blog link to top nav * Add ContentSquare article link * Add form link to front page + refactor some texts * couple markup fixes * minor * Introduce basic ODBC driver page in docs * More verbose 3rd party libs disclaimer * Put third-party stuff into a separate folder * Separate third-party stuff in ToC too * Update links * Move stuff that is not really (only) a client library into a separate page * Add clickhouse-hdfs-loader link * Some introduction for "interfaces" section * Rewrite tcp.md * http_interface.md -> http.md * fix link * Remove unconvenient error for now * try to guess anchor instead of failing * remove symlink * Remove outdated info from introduction * remove ru roadmap.md * replace ru roadmap.md with symlink * Update roadmap.md * lost file * Title case in toc_en.yml * Sync "Functions" ToC section with en * Remove reference to pretty old ClickHouse release from docs * couple lost symlinks in fa
2018-11-27 11:13:59 +00:00
text, link = matchObj.group().strip('[)').split('](', 1)
if link.startswith('http:') or link.startswith('https:') or '.jpeg' in link or '.jpg' in link or '.png' in link or '.gif' in link:
return '[' + text + '](' + link + ')'
else:
sharp_pos = link.find('#')
if sharp_pos > -1:
return '[' + text + '](' + link[sharp_pos:] + ')'
2018-07-09 12:35:19 +00:00
else:
WIP on website+docs (#3668) * CLICKHOUSE-4063: less manual html @ index.md * CLICKHOUSE-4063: recommend markdown="1" in README.md * CLICKHOUSE-4003: manually purge custom.css for now * CLICKHOUSE-4064: expand <details> before any print (including to pdf) * CLICKHOUSE-3927: rearrange interfaces/formats.md a bit * CLICKHOUSE-3306: add few http headers * Remove copy-paste introduced in #3392 * Hopefully better chinese fonts #3392 * get rid of tabs @ custom.css * Apply comments and patch from #3384 * Add jdbc.md to ToC and some translation, though it still looks badly incomplete * minor punctuation * Add some backlinks to official website from mirrors that just blindly take markdown sources * Do not make fonts extra light * find . -name '*.md' -type f | xargs -I{} perl -pi -e 's//g' {} * find . -name '*.md' -type f | xargs -I{} perl -pi -e 's/ sql/g' {} * Remove outdated stuff from roadmap.md * Not so light font on front page too * Refactor Chinese formats.md to match recent changes in other languages * Update some links on front page * Remove some outdated comment * Add twitter link to front page * More front page links tuning * Add Amsterdam meetup link * Smaller font to avoid second line * Add Amsterdam link to README.md * Proper docs nav translation * Back to 300 font-weight except Chinese * fix docs build * Update Amsterdam link * remove symlinks * more zh punctuation * apply lost comment by @zhang2014 * Apply comments by @zhang2014 from #3417 * Remove Beijing link * rm incorrect symlink * restore content of docs/zh/operations/table_engines/index.md * CLICKHOUSE-3751: stem terms while searching docs * CLICKHOUSE-3751: use English stemmer in non-English docs too * CLICKHOUSE-4135 fix * Remove past meetup link * Add blog link to top nav * Add ContentSquare article link * Add form link to front page + refactor some texts * couple markup fixes * minor * Introduce basic ODBC driver page in docs * More verbose 3rd party libs disclaimer * Put third-party stuff into a separate folder * Separate third-party stuff in ToC too * Update links * Move stuff that is not really (only) a client library into a separate page * Add clickhouse-hdfs-loader link * Some introduction for "interfaces" section * Rewrite tcp.md * http_interface.md -> http.md * fix link * Remove unconvenient error for now * try to guess anchor instead of failing * remove symlink * Remove outdated info from introduction * remove ru roadmap.md * replace ru roadmap.md with symlink * Update roadmap.md * lost file * Title case in toc_en.yml * Sync "Functions" ToC section with en * Remove reference to pretty old ClickHouse release from docs * couple lost symlinks in fa
2018-11-27 11:13:59 +00:00
return '[' + text + '](#' + link.replace('../', '').replace('/index.md', '').replace('.md', '') + ')'
2017-12-29 12:43:05 +00:00
for l in f:
# Processing links in a string
l = re.sub(r'\[.+?\]\(.+?\)', link_proc, l)
2017-12-29 12:43:05 +00:00
# Correcting headers levels
if not first_file:
if l.startswith('#'):
l = '#' + l
else:
first_file = False
2017-12-29 12:43:05 +00:00
single_page_file.write(l)
2017-12-29 12:43:05 +00:00
single_page_file.flush()