mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-18 05:32:52 +00:00
2d2bc052e1
* Typo fix. * Links fix. * Fixed links in docs. * More fixes. * docs/en: cleaning some files * docs/en: cleaning data_types * docs/en: cleaning database_engines * docs/en: cleaning development * docs/en: cleaning getting_started * docs/en: cleaning interfaces * docs/en: cleaning operations * docs/en: cleaning query_lamguage * docs/en: cleaning en * docs/ru: cleaning data_types * docs/ru: cleaning index * docs/ru: cleaning database_engines * docs/ru: cleaning development * docs/ru: cleaning general * docs/ru: cleaning getting_started * docs/ru: cleaning interfaces * docs/ru: cleaning operations * docs/ru: cleaning query_language * docs: cleaning interfaces/http * Update docs/en/data_types/array.md decorated ``` Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/getting_started/example_datasets/nyc_taxi.md fixed typo Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/getting_started/example_datasets/ontime.md fixed typo Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/interfaces/formats.md fixed error Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/operations/table_engines/custom_partitioning_key.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/operations/utils/clickhouse-local.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/query_language/dicts/external_dicts_dict_sources.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/operations/utils/clickhouse-local.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/query_language/functions/json_functions.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/query_language/functions/json_functions.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/query_language/functions/other_functions.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/query_language/functions/other_functions.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/query_language/functions/date_time_functions.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * Update docs/en/operations/table_engines/jdbc.md Co-Authored-By: BayoNet <da-daos@yandex.ru> * docs: fixed error * docs: fixed error
3.7 KiB
3.7 KiB
AMPLab Big Data Benchmark
See https://amplab.cs.berkeley.edu/benchmark/
Sign up for a free account at https://aws.amazon.com. You will need a credit card, email and phone number.Get a new access key at https://console.aws.amazon.com/iam/home?nc2=h_m_sc#security_credential
Run the following in the console:
$ sudo apt-get install s3cmd
$ mkdir tiny; cd tiny;
$ s3cmd sync s3://big-data-benchmark/pavlo/text-deflate/tiny/ .
$ cd ..
$ mkdir 1node; cd 1node;
$ s3cmd sync s3://big-data-benchmark/pavlo/text-deflate/1node/ .
$ cd ..
$ mkdir 5nodes; cd 5nodes;
$ s3cmd sync s3://big-data-benchmark/pavlo/text-deflate/5nodes/ .
$ cd ..
Run the following ClickHouse queries:
CREATE TABLE rankings_tiny
(
pageURL String,
pageRank UInt32,
avgDuration UInt32
) ENGINE = Log;
CREATE TABLE uservisits_tiny
(
sourceIP String,
destinationURL String,
visitDate Date,
adRevenue Float32,
UserAgent String,
cCode FixedString(3),
lCode FixedString(6),
searchWord String,
duration UInt32
) ENGINE = MergeTree(visitDate, visitDate, 8192);
CREATE TABLE rankings_1node
(
pageURL String,
pageRank UInt32,
avgDuration UInt32
) ENGINE = Log;
CREATE TABLE uservisits_1node
(
sourceIP String,
destinationURL String,
visitDate Date,
adRevenue Float32,
UserAgent String,
cCode FixedString(3),
lCode FixedString(6),
searchWord String,
duration UInt32
) ENGINE = MergeTree(visitDate, visitDate, 8192);
CREATE TABLE rankings_5nodes_on_single
(
pageURL String,
pageRank UInt32,
avgDuration UInt32
) ENGINE = Log;
CREATE TABLE uservisits_5nodes_on_single
(
sourceIP String,
destinationURL String,
visitDate Date,
adRevenue Float32,
UserAgent String,
cCode FixedString(3),
lCode FixedString(6),
searchWord String,
duration UInt32
) ENGINE = MergeTree(visitDate, visitDate, 8192);
Go back to the console:
$ for i in tiny/rankings/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO rankings_tiny FORMAT CSV"; done
$ for i in tiny/uservisits/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO uservisits_tiny FORMAT CSV"; done
$ for i in 1node/rankings/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO rankings_1node FORMAT CSV"; done
$ for i in 1node/uservisits/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO uservisits_1node FORMAT CSV"; done
$ for i in 5nodes/rankings/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO rankings_5nodes_on_single FORMAT CSV"; done
$ for i in 5nodes/uservisits/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO uservisits_5nodes_on_single FORMAT CSV"; done
Queries for obtaining data samples:
SELECT pageURL, pageRank FROM rankings_1node WHERE pageRank > 1000
SELECT substring(sourceIP, 1, 8), sum(adRevenue) FROM uservisits_1node GROUP BY substring(sourceIP, 1, 8)
SELECT
sourceIP,
sum(adRevenue) AS totalRevenue,
avg(pageRank) AS pageRank
FROM rankings_1node ALL INNER JOIN
(
SELECT
sourceIP,
destinationURL AS pageURL,
adRevenue
FROM uservisits_1node
WHERE (visitDate > '1980-01-01') AND (visitDate < '1980-04-01')
) USING pageURL
GROUP BY sourceIP
ORDER BY totalRevenue DESC
LIMIT 1