ClickHouse/docs/ru/getting_started/example_datasets/wikistat.rst
Ivan Blinkov 67c2e50331 CLICKHOUSE-2720: progress on website and reference (#886)
* update presentations

* CLICKHOUSE-2936: redirect from clickhouse.yandex.ru and clickhouse.yandex.com

* update submodule

* lost files

* CLICKHOUSE-2981: prefer sphinx docs over original reference

* CLICKHOUSE-2981: docs styles more similar to main website + add flags to switch language links

* update presentations

* Less confusing directory structure (docs -> doc/reference/)

* Minify sphinx docs too

* Website release script: fail fast + pass docker hash on deploy

* Do not underline links in docs

* shorter

* cleanup docker images

* tune nginx config

* CLICKHOUSE-3043: get rid of habrastorage links

* Lost translation

* CLICKHOUSE-2936: temporary client-side redirect

* behaves weird in test

* put redirect back

* CLICKHOUSE-3047: copy docs txts to public too

* move to proper file

* remove old pages to avoid confusion

* Remove reference redirect warning for now

* Refresh README.md

* Yellow buttons in docs

* Use svg flags instead of unicode ones in docs

* fix test website instance

* Put flags to separate files

* wrong flag

* Copy Yandex.Metrica introduction from main page to docs

* Yet another home page structure change, couple new blocks (CLICKHOUSE-3045)

* Update Contacts section

* CLICKHOUSE-2849: more detailed legal information

* CLICKHOUSE-2978 preparation - split by files

* More changes in Contacts block

* Tune texts on index page

* update presentations

* One more benchmark

* Add usage sections to index page, adapted from slides

* Get the roadmap started, based on slides from last ClickHouse Meetup

* CLICKHOUSE-2977: some rendering tuning

* Get rid of excessive section in the end of getting started

* Make headers linkable

* CLICKHOUSE-2981: links to editing reference - https://github.com/yandex/ClickHouse/issues/849

* CLICKHOUSE-2981: fix mobile styles in docs

* Ban crawling of duplicating docs

* Open some external links in new tab

* Ban old docs too

* Lots of trivial fixes in english docs

* Lots of trivial fixes in russian docs

* Remove getting started copies in markdown

* Add Yandex.Webmaster

* Fix some sphinx warnings

* More warnings fixed in english docs

* More sphinx warnings fixed

* Add code-block:: text

* More code-block:: text

* These headers look not that well

* Better switch between documentation languages

* merge use_case.rst into ya_metrika_task.rst

* Edit the agg_functions.rst texts

* Add lost empty lines

* Lost blank lines

* Add new logo sizes

* update presentations

* Next step in migrating to new documentation

* Fix all warnings in en reference

* Fix all warnings in ru reference

* Re-arrange existing reference

* Move operation tips to main reference

* Fix typos noticed by milovidov@

* Get rid of zookeeper.md

* Looks like duplicate of tutorial.html

* Fix some mess with html tags in tutorial

* No idea why nobody noticed this before, but it was completely not clear whet to get the data

* Match code block styling between main and tutorial pages (in favor of the latter)

* Get rid of some copypaste in tutorial

* Normalize header styles

* Move example_datasets to sphinx

* Move presentations submodule to website

* Move and update README.md

* No point in duplicating articles from habrahabr here

* Move development-related docs as is for now

* doc/reference/ -> docs/ (to match the URL on website)

* Adapt links to match the previous commit

* Adapt development docs to rst (still lacks translation and strikethrough support)

* clean on release

* blacklist presentations in gulp

* strikethrough support in sphinx

* just copy development folder for now

* fix weird introduction in style article

* Style guide translation (WIP)

* Finish style guide translation to English

* gulp clean separately

* Update year in LICENSE

* Initial CONTRIBUTING.md

* Fix remaining links to old docs in tutorial

* Some tutorial fixes

* Typo

* Another typo

* Update list of authors from yandex-team accoding to git log
2017-06-20 17:19:03 +03:00

28 lines
1.2 KiB
ReStructuredText

WikiStat
========
См: http://dumps.wikimedia.org/other/pagecounts-raw/
Создание таблицы:
.. code-block:: sql
CREATE TABLE wikistat
(
date Date,
time DateTime,
project String,
subproject String,
path String,
hits UInt64,
size UInt64
) ENGINE = MergeTree(date, (path, time), 8192);
Загрузка данных:
.. code-block:: bash
for i in {2007..2016}; do for j in {01..12}; do echo $i-$j >&2; curl -sS "http://dumps.wikimedia.org/other/pagecounts-raw/$i/$i-$j/" | grep -oE 'pagecounts-[0-9]+-[0-9]+\.gz'; done; done | sort | uniq | tee links.txt
cat links.txt | while read link; do wget http://dumps.wikimedia.org/other/pagecounts-raw/$(echo $link | sed -r 's/pagecounts-([0-9]{4})([0-9]{2})[0-9]{2}-[0-9]+\.gz/\1/')/$(echo $link | sed -r 's/pagecounts-([0-9]{4})([0-9]{2})[0-9]{2}-[0-9]+\.gz/\1-\2/')/$link; done
ls -1 /opt/wikistat/ | grep gz | while read i; do echo $i; gzip -cd /opt/wikistat/$i | ./wikistat-loader --time="$(echo -n $i | sed -r 's/pagecounts-([0-9]{4})([0-9]{2})([0-9]{2})-([0-9]{2})([0-9]{2})([0-9]{2})\.gz/\1-\2-\3 \4-00-00/')" | clickhouse-client --query="INSERT INTO wikistat FORMAT TabSeparated"; done