mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-27 01:51:59 +00:00
2d2bc052e1
* Typo fix. * Links fix. * Fixed links in docs. * More fixes. * docs/en: cleaning some files * docs/en: cleaning data_types * docs/en: cleaning database_engines * docs/en: cleaning development * docs/en: cleaning getting_started * docs/en: cleaning interfaces * docs/en: cleaning operations * docs/en: cleaning query_lamguage * docs/en: cleaning en * docs/ru: cleaning data_types * docs/ru: cleaning index * docs/ru: cleaning database_engines * docs/ru: cleaning development * docs/ru: cleaning general * docs/ru: cleaning getting_started * docs/ru: cleaning interfaces * docs/ru: cleaning operations * docs/ru: cleaning query_language * docs: cleaning interfaces/http * Update docs/en/data_types/array.md decorated ``` Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/getting_started/example_datasets/nyc_taxi.md fixed typo Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/getting_started/example_datasets/ontime.md fixed typo Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/interfaces/formats.md fixed error Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/operations/table_engines/custom_partitioning_key.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/operations/utils/clickhouse-local.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/query_language/dicts/external_dicts_dict_sources.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/operations/utils/clickhouse-local.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/query_language/functions/json_functions.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/query_language/functions/json_functions.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/query_language/functions/other_functions.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/query_language/functions/other_functions.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/query_language/functions/date_time_functions.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/operations/table_engines/jdbc.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * docs: fixed error * docs: fixed error
1.2 KiB
1.2 KiB
WikiStat
See: http://dumps.wikimedia.org/other/pagecounts-raw/
Creating a table:
CREATE TABLE wikistat
(
date Date,
time DateTime,
project String,
subproject String,
path String,
hits UInt64,
size UInt64
) ENGINE = MergeTree(date, (path, time), 8192);
Loading data:
$ for i in {2007..2016}; do for j in {01..12}; do echo $i-$j >&2; curl -sSL "http://dumps.wikimedia.org/other/pagecounts-raw/$i/$i-$j/" | grep -oE 'pagecounts-[0-9]+-[0-9]+\.gz'; done; done | sort | uniq | tee links.txt
$ cat links.txt | while read link; do wget http://dumps.wikimedia.org/other/pagecounts-raw/$(echo $link | sed -r 's/pagecounts-([0-9]{4})([0-9]{2})[0-9]{2}-[0-9]+\.gz/\1/')/$(echo $link | sed -r 's/pagecounts-([0-9]{4})([0-9]{2})[0-9]{2}-[0-9]+\.gz/\1-\2/')/$link; done
$ ls -1 /opt/wikistat/ | grep gz | while read i; do echo $i; gzip -cd /opt/wikistat/$i | ./wikistat-loader --time="$(echo -n $i | sed -r 's/pagecounts-([0-9]{4})([0-9]{2})([0-9]{2})-([0-9]{2})([0-9]{2})([0-9]{2})\.gz/\1-\2-\3 \4-00-00/')" | clickhouse-client --query="INSERT INTO wikistat FORMAT TabSeparated"; done