--- machine_translated: true machine_translated_rev: 72537a2d527c63c07aa5d2361a8829f3895cf2bd toc_priority: 17 toc_title: Referencia de Big Data de AMPLab --- # Referencia de Big Data de AMPLab {#amplab-big-data-benchmark} Ver https://amplab.cs.berkeley.edu/benchmark/ Regístrese para obtener una cuenta gratuita en https://aws.amazon.com. Requiere una tarjeta de crédito, correo electrónico y número de teléfono. Obtenga una nueva clave de acceso en https://console.aws.amazon.com/iam/home?nc2=h_m_sc#security_credential Ejecute lo siguiente en la consola: ``` bash $ sudo apt-get install s3cmd $ mkdir tiny; cd tiny; $ s3cmd sync s3://big-data-benchmark/pavlo/text-deflate/tiny/ . $ cd .. $ mkdir 1node; cd 1node; $ s3cmd sync s3://big-data-benchmark/pavlo/text-deflate/1node/ . $ cd .. $ mkdir 5nodes; cd 5nodes; $ s3cmd sync s3://big-data-benchmark/pavlo/text-deflate/5nodes/ . $ cd .. ``` Ejecute las siguientes consultas de ClickHouse: ``` sql CREATE TABLE rankings_tiny ( pageURL String, pageRank UInt32, avgDuration UInt32 ) ENGINE = Log; CREATE TABLE uservisits_tiny ( sourceIP String, destinationURL String, visitDate Date, adRevenue Float32, UserAgent String, cCode FixedString(3), lCode FixedString(6), searchWord String, duration UInt32 ) ENGINE = MergeTree(visitDate, visitDate, 8192); CREATE TABLE rankings_1node ( pageURL String, pageRank UInt32, avgDuration UInt32 ) ENGINE = Log; CREATE TABLE uservisits_1node ( sourceIP String, destinationURL String, visitDate Date, adRevenue Float32, UserAgent String, cCode FixedString(3), lCode FixedString(6), searchWord String, duration UInt32 ) ENGINE = MergeTree(visitDate, visitDate, 8192); CREATE TABLE rankings_5nodes_on_single ( pageURL String, pageRank UInt32, avgDuration UInt32 ) ENGINE = Log; CREATE TABLE uservisits_5nodes_on_single ( sourceIP String, destinationURL String, visitDate Date, adRevenue Float32, UserAgent String, cCode FixedString(3), lCode FixedString(6), searchWord String, duration UInt32 ) ENGINE = MergeTree(visitDate, visitDate, 8192); ``` Volver a la consola: ``` bash $ for i in tiny/rankings/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO rankings_tiny FORMAT CSV"; done $ for i in tiny/uservisits/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO uservisits_tiny FORMAT CSV"; done $ for i in 1node/rankings/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO rankings_1node FORMAT CSV"; done $ for i in 1node/uservisits/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO uservisits_1node FORMAT CSV"; done $ for i in 5nodes/rankings/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO rankings_5nodes_on_single FORMAT CSV"; done $ for i in 5nodes/uservisits/*.deflate; do echo $i; zlib-flate -uncompress < $i | clickhouse-client --host=example-perftest01j --query="INSERT INTO uservisits_5nodes_on_single FORMAT CSV"; done ``` Consultas para obtener muestras de datos: ``` sql SELECT pageURL, pageRank FROM rankings_1node WHERE pageRank > 1000 SELECT substring(sourceIP, 1, 8), sum(adRevenue) FROM uservisits_1node GROUP BY substring(sourceIP, 1, 8) SELECT sourceIP, sum(adRevenue) AS totalRevenue, avg(pageRank) AS pageRank FROM rankings_1node ALL INNER JOIN ( SELECT sourceIP, destinationURL AS pageURL, adRevenue FROM uservisits_1node WHERE (visitDate > '1980-01-01') AND (visitDate < '1980-04-01') ) USING pageURL GROUP BY sourceIP ORDER BY totalRevenue DESC LIMIT 1 ``` [Artículo Original](https://clickhouse.tech/docs/en/getting_started/example_datasets/amplab_benchmark/)