--- toc_priority: 17 toc_title: "AMPLab Big Data ベンチマーク" --- # AMPLab Big Data ベンチマーク {#amplab-big-data-benchmark} https://amplab.cs.berkeley.edu/benchmark/ を参照して下さい。 https://aws.amazon.com で無料アカウントにサインアップしてください。クレジットカード、電子メール、電話番号が必要です。 https://console.aws.amazon.com/iam/home?nc2=h\_m\_sc\#security\_credential で新しいアクセスキーを取得します。 コンソールで以下を実行します: ``` bash $ sudo apt-get install s3cmd $ mkdir tiny; cd tiny; $ s3cmd sync s3://big-data-benchmark/pavlo/text-deflate/tiny/ . $ cd .. $ mkdir 1node; cd 1node; $ s3cmd sync s3://big-data-benchmark/pavlo/text-deflate/1node/ . $ cd .. $ mkdir 5nodes; cd 5nodes; $ s3cmd sync s3://big-data-benchmark/pavlo/text-deflate/5nodes/ . $ cd .. ``` 次のClickHouseクエリを実行します: ``` sql CREATE TABLE rankings_tiny ( pageURL String, pageRank UInt32, avgDuration UInt32 ) ENGINE = Log; CREATE TABLE uservisits_tiny ( sourceIP String, destinationURL String, visitDate Date, adRevenue Float32, UserAgent String, cCode FixedString(3), lCode FixedString(6), searchWord String, duration UInt32 ) ENGINE = MergeTree(visitDate, visitDate, 8192); CREATE TABLE rankings_1node ( pageURL String, pageRank UInt32, avgDuration UInt32 ) ENGINE = Log; CREATE TABLE uservisits_1node ( sourceIP String, destinationURL String, visitDate Date, adRevenue Float32, UserAgent String, cCode FixedString(3), lCode FixedString(6), searchWord String, duration UInt32 ) ENGINE = MergeTree(visitDate, visitDate, 8192); CREATE TABLE rankings_5nodes_on_single ( pageURL String, pageRank UInt32, avgDuration UInt32 ) ENGINE = Log; CREATE TABLE uservisits_5nodes_on_single ( sourceIP String, destinationURL String, visitDate Date, adRevenue Float32, UserAgent String, cCode FixedString(3), lCode FixedString(6), searchWord String, duration UInt32 ) ENGINE = MergeTree(visitDate, visitDate, 8192); ``` コンソールに戻って以下を実行します: ``` bash $ for i in tiny/rankings/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO rankings_tiny FORMAT CSV"; done $ for i in tiny/uservisits/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO uservisits_tiny FORMAT CSV"; done $ for i in 1node/rankings/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO rankings_1node FORMAT CSV"; done $ for i in 1node/uservisits/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO uservisits_1node FORMAT CSV"; done $ for i in 5nodes/rankings/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO rankings_5nodes_on_single FORMAT CSV"; done $ for i in 5nodes/uservisits/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO uservisits_5nodes_on_single FORMAT CSV"; done ``` データの取得サンプルクエリ: ``` sql SELECT pageURL, pageRank FROM rankings_1node WHERE pageRank > 1000 SELECT substring(sourceIP, 1, 8), sum(adRevenue) FROM uservisits_1node GROUP BY substring(sourceIP, 1, 8) SELECT sourceIP, sum(adRevenue) AS totalRevenue, avg(pageRank) AS pageRank FROM rankings_1node ALL INNER JOIN ( SELECT sourceIP, destinationURL AS pageURL, adRevenue FROM uservisits_1node WHERE (visitDate > '1980-01-01') AND (visitDate < '1980-04-01') ) USING pageURL GROUP BY sourceIP ORDER BY totalRevenue DESC LIMIT 1 ``` [元の記事](https://clickhouse.tech/docs/en/getting_started/example_datasets/amplab_benchmark/)