ClickHouse/docs/concatenate.py
Ivan Blinkov 361a27485d Some progress on documentation (#1942)
* update presentations

* CLICKHOUSE-2936: redirect from clickhouse.yandex.ru and clickhouse.yandex.com

* update submodule

* lost files

* CLICKHOUSE-2981: prefer sphinx docs over original reference

* CLICKHOUSE-2981: docs styles more similar to main website + add flags to switch language links

* update presentations

* Less confusing directory structure (docs -> doc/reference/)

* Minify sphinx docs too

* Website release script: fail fast + pass docker hash on deploy

* Do not underline links in docs

* shorter

* cleanup docker images

* tune nginx config

* CLICKHOUSE-3043: get rid of habrastorage links

* Lost translation

* CLICKHOUSE-2936: temporary client-side redirect

* behaves weird in test

* put redirect back

* CLICKHOUSE-3047: copy docs txts to public too

* move to proper file

* remove old pages to avoid confusion

* Remove reference redirect warning for now

* Refresh README.md

* Yellow buttons in docs

* Use svg flags instead of unicode ones in docs

* fix test website instance

* Put flags to separate files

* wrong flag

* Copy Yandex.Metrica introduction from main page to docs

* Yet another home page structure change, couple new blocks (CLICKHOUSE-3045)

* Update Contacts section

* CLICKHOUSE-2849: more detailed legal information

* CLICKHOUSE-2978 preparation - split by files

* More changes in Contacts block

* Tune texts on index page

* update presentations

* One more benchmark

* Add usage sections to index page, adapted from slides

* Get the roadmap started, based on slides from last ClickHouse Meetup

* CLICKHOUSE-2977: some rendering tuning

* Get rid of excessive section in the end of getting started

* Make headers linkable

* CLICKHOUSE-2981: links to editing reference - https://github.com/yandex/ClickHouse/issues/849

* CLICKHOUSE-2981: fix mobile styles in docs

* Ban crawling of duplicating docs

* Open some external links in new tab

* Ban old docs too

* Lots of trivial fixes in english docs

* Lots of trivial fixes in russian docs

* Remove getting started copies in markdown

* Add Yandex.Webmaster

* Fix some sphinx warnings

* More warnings fixed in english docs

* More sphinx warnings fixed

* Add code-block:: text

* More code-block:: text

* These headers look not that well

* Better switch between documentation languages

* merge use_case.rst into ya_metrika_task.rst

* Edit the agg_functions.rst texts

* Add lost empty lines

* Lost blank lines

* Add new logo sizes

* update presentations

* Next step in migrating to new documentation

* Fix all warnings in en reference

* Fix all warnings in ru reference

* Re-arrange existing reference

* Move operation tips to main reference

* Fix typos noticed by milovidov@

* Get rid of zookeeper.md

* Looks like duplicate of tutorial.html

* Fix some mess with html tags in tutorial

* No idea why nobody noticed this before, but it was completely not clear whet to get the data

* Match code block styling between main and tutorial pages (in favor of the latter)

* Get rid of some copypaste in tutorial

* Normalize header styles

* Move example_datasets to sphinx

* Move presentations submodule to website

* Move and update README.md

* No point in duplicating articles from habrahabr here

* Move development-related docs as is for now

* doc/reference/ -> docs/ (to match the URL on website)

* Adapt links to match the previous commit

* Adapt development docs to rst (still lacks translation and strikethrough support)

* clean on release

* blacklist presentations in gulp

* strikethrough support in sphinx

* just copy development folder for now

* fix weird introduction in style article

* Style guide translation (WIP)

* Finish style guide translation to English

* gulp clean separately

* Update year in LICENSE

* Initial CONTRIBUTING.md

* Fix remaining links to old docs in tutorial

* Some tutorial fixes

* Typo

* Another typo

* Update list of authors from yandex-team accoding to git log

* Fix diff with master

* couple fixes in en what_is_clickhouse.rst

* Try different link to blog in Russian

* Swap words

* Slightly larger line height

* CLICKHOUSE-3089: disable hyphenation in docs

* update presentations

* Fix copying of txt files

* update submodule

* CLICKHOUSE-3108: fix overflow issues in mobile version

* Less weird tutorial header in mobile version

* CLICKHOUSE-3073: skip sourcemaps by default

* CLICKHOUSE-3067: rename item in docs navigation

* fix list markup

* CLICKHOUSE-3067: some documentation style tuning

* CLICKHOUSE-3067: less laggy single page documentation

* update presentations

* YQL-3278: add some links to ClickHouse Meetup in Berlin on October 5, 2017

* Add "time series" keyword

* Switch link to next event

* Switch link to next event #2

* smaller font

* Remove Palo Alto link

* Add link to Success stories list

* better title

* Update index.html

* Update index.html

* Do not expect gulp in $PATH

* Add link to Beijing meetup

* ignore presentations

* introduce requirements.txt

* Apply hacks by bayonet@ using monkey patching

* Simplify and fix patching of "single" docs on Mac OS (it still has a bug on chunk borders though)

* remove hidden symbol

* s/2016–2017/2016–2018/g

* Add some place to put virtualenv

* mkdocs was missing from requirements.txt

* This way it hurts eyes less

* Change header layout + add flags

* yandex_fonts.css -> custom.css

* Larger docs logo

* Shorter link

* Link to home from logo

* Borrow some more styles from main page

* Tune some links

* Remove shadow

* Add header border

* Header font

* Better flag margin

* Improve single page mode

* Fix search results hover

* Fix some MarkDown errors

* Silence useless error

* Get rid of index.html's

* Enable syntax highlight

* Fix link label in ru

* More style fixes in documentation scripts
2018-02-21 21:44:33 +03:00

88 lines
2.8 KiB
Python
Executable File

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# - Single-page document.
# - Requirements to the md-souces:
# - Don't use links without anchors. It means, that you can not just link file. You should specify an anchor at the top of the file and then link to this anchor
# - Anchors should be unique through whole document.
# - Implementation:
# - Script gets list of the file from the `pages` section of `mkdocs.yml`. It gets commented files too, and it right.
# - Files are concatenated by order with incrementing level of headers in all files except the first one
# - Script converts links to other files into inside page links.
# - Skipping links started with 'http'
# - Not http-links with anchor are cutted to the anchor sign (#).
# - For not http-links without anchor script logs an error and cuts them from the resulting single-page document.
import codecs
import sys
import re
import os
if len(sys.argv) < 2:
print "Usage: concatenate.py language_dir"
print "Example: concatenate.py ru"
sys.exit(1)
if not os.path.exists(sys.argv[1]):
print "Pass language_dir correctly. For example, 'ru'."
sys.exit(2)
# Configuration
PROJ_CONFIG = 'mkdocs_' + sys.argv[1] + '.yml'
SINGLE_PAGE = sys.argv[1] + '_single_page/index.md'
DOCS_DIR = sys.argv[1] + '/'
# 1. Open mkdocs.yml file and read `pages` configuration to get an ordered list of files
cfg_file = open(PROJ_CONFIG)
files_to_concatenate = []
for l in cfg_file:
if('.md' in l) and ('single_page' not in l):
path = (l[l.index(':') + 1:]).strip(" '\n")
files_to_concatenate.append(path)
print str(len(files_to_concatenate)) + " files will be concatenated into single md-file.\nFiles:"
print files_to_concatenate
# 2. Concatenate all of the files in the list
single_page_file = open(SINGLE_PAGE, 'w')
first_file = True
for path in files_to_concatenate:
single_page_file.write('\n\n')
file = open(DOCS_DIR + path)
# function is passed into re.sub() to process links
def link_proc(matchObj):
text, link = matchObj.group().strip('[)').split('](')
if link.startswith('http'):
return '[' + text + '](' + link + ')'
else:
sharp_pos = link.find('#')
if sharp_pos > -1:
return '[' + text + '](' + link[sharp_pos:] + ')'
else:
print 'ERROR: Link [' + text + '](' + link + ') in file ' + path + ' has no anchor. Please provide it.'
# return '['+text+'](#'+link.replace('/','-')+')'
for l in file:
# Processing links in a string
l = re.sub(r'\[.+?\]\(.+?\)', link_proc, l)
# Correcting headers levels
if not first_file:
if(l.startswith('#')):
l = '#' + l
else:
first_file = False
single_page_file.write(l)
single_page_file.close()