mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-08 08:35:20 +00:00
371ecb4fe6
* Remove machine_translated * Add 'For non-Linux operating systems and for AArch64 CPU' section * Translate getting-started/install.md to ja * Fix index.md anchors * Translate index.md via GitLocalize * Translate index.md via GitLocalize * Translate getting-started/playground.md to ja * Translate getting-started/index.md to ja * Translate experiment to 試す in ja * Translate getting-started/tutoial.md to ja * Translate getting-started/tutoial.md to ja * Translate getting-started/example-datasets to ja * Fix original article path * Fix getting-started/index.md * Revert ja/index.md * Fix Input/Output to 入出力 * Fix Original argicle en * Fix missing links md files. `getting-started/sql-reference/statements/optimize.md` is not placed in ja directory. Currentry, it refer to english one. * Change link to optimize.md copied from en * Change link to alter.md ja
392 lines
39 KiB
Markdown
392 lines
39 KiB
Markdown
---
|
||
toc_priority: 16
|
||
toc_title: "ニューヨークタクシー"
|
||
---
|
||
|
||
# ニューヨークタクシー {#new-york-taxi-data}
|
||
|
||
このデータセットは二つの方法で取得できます:
|
||
|
||
- 生データからインポート
|
||
- パーティション済みデータのダウンロード
|
||
|
||
## 生データのインポート方法 {#how-to-import-the-raw-data}
|
||
|
||
データセットの説明とダウンロード方法については、https://github.com/toddwschneider/nyc-taxi-data と http://tech.marksblogg.com/billion-nyc-taxi-rides-redshift.html を参照してください。
|
||
|
||
ダウンロードすると、CSVファイルで約227GBの非圧縮データが生成されます。ダウンロードは約1Gbitの回線で1時間以上かかります。(s3.amazonaws.com からの並列ダウンロードは1Gbitチャネルの少なくとも半分を回復します)。
|
||
|
||
一部のファイルは完全にダウンロードできない場合があります。ファイルサイズを確認し、疑わしいと思われるものは再ダウンロードしてください。
|
||
|
||
ファイルの中には、無効な行が含まれている場合があり、以下のように修正することができます。
|
||
|
||
``` bash
|
||
sed -E '/(.*,){18,}/d' data/yellow_tripdata_2010-02.csv > data/yellow_tripdata_2010-02.csv_
|
||
sed -E '/(.*,){18,}/d' data/yellow_tripdata_2010-03.csv > data/yellow_tripdata_2010-03.csv_
|
||
mv data/yellow_tripdata_2010-02.csv_ data/yellow_tripdata_2010-02.csv
|
||
mv data/yellow_tripdata_2010-03.csv_ data/yellow_tripdata_2010-03.csv
|
||
```
|
||
|
||
その後、PostgreSQLでデータを前処理する必要があります。これにより、(地図上の点とニューヨーク市の行政区を一致させるために) ポリゴン内の点の選択を作成し、JOINを使用してすべてのデータを単一の非正規化されたフラットテーブルに結合します。これを行うには、PostGISをサポートしたPostgreSQLをインストールする必要があります。
|
||
|
||
`inititialize_database.sh` を実行する際には注意が必要で、すべてのテーブルが正しく作成されていることを手動で再確認してください。
|
||
|
||
1ヶ月分のデータをPostgreSQLで処理するのに約20~30分、合計で約48時間かかります。
|
||
|
||
ダウンロードした行数は以下のように確認できます:
|
||
|
||
``` bash
|
||
$ time psql nyc-taxi-data -c "SELECT count(*) FROM trips;"
|
||
## Count
|
||
1298979494
|
||
(1 row)
|
||
|
||
real 7m9.164s
|
||
```
|
||
|
||
(これはMark Litwintschik氏が一連のブログ記事で報告した11億行をわずかに上回っています)
|
||
|
||
|
||
PostgreSQLのデータは370GBの容量を使用します。
|
||
|
||
PostgreSQLからデータをエクスポート:
|
||
|
||
``` sql
|
||
COPY
|
||
(
|
||
SELECT trips.id,
|
||
trips.vendor_id,
|
||
trips.pickup_datetime,
|
||
trips.dropoff_datetime,
|
||
trips.store_and_fwd_flag,
|
||
trips.rate_code_id,
|
||
trips.pickup_longitude,
|
||
trips.pickup_latitude,
|
||
trips.dropoff_longitude,
|
||
trips.dropoff_latitude,
|
||
trips.passenger_count,
|
||
trips.trip_distance,
|
||
trips.fare_amount,
|
||
trips.extra,
|
||
trips.mta_tax,
|
||
trips.tip_amount,
|
||
trips.tolls_amount,
|
||
trips.ehail_fee,
|
||
trips.improvement_surcharge,
|
||
trips.total_amount,
|
||
trips.payment_type,
|
||
trips.trip_type,
|
||
trips.pickup,
|
||
trips.dropoff,
|
||
|
||
cab_types.type cab_type,
|
||
|
||
weather.precipitation_tenths_of_mm rain,
|
||
weather.snow_depth_mm,
|
||
weather.snowfall_mm,
|
||
weather.max_temperature_tenths_degrees_celsius max_temp,
|
||
weather.min_temperature_tenths_degrees_celsius min_temp,
|
||
weather.average_wind_speed_tenths_of_meters_per_second wind,
|
||
|
||
pick_up.gid pickup_nyct2010_gid,
|
||
pick_up.ctlabel pickup_ctlabel,
|
||
pick_up.borocode pickup_borocode,
|
||
pick_up.boroname pickup_boroname,
|
||
pick_up.ct2010 pickup_ct2010,
|
||
pick_up.boroct2010 pickup_boroct2010,
|
||
pick_up.cdeligibil pickup_cdeligibil,
|
||
pick_up.ntacode pickup_ntacode,
|
||
pick_up.ntaname pickup_ntaname,
|
||
pick_up.puma pickup_puma,
|
||
|
||
drop_off.gid dropoff_nyct2010_gid,
|
||
drop_off.ctlabel dropoff_ctlabel,
|
||
drop_off.borocode dropoff_borocode,
|
||
drop_off.boroname dropoff_boroname,
|
||
drop_off.ct2010 dropoff_ct2010,
|
||
drop_off.boroct2010 dropoff_boroct2010,
|
||
drop_off.cdeligibil dropoff_cdeligibil,
|
||
drop_off.ntacode dropoff_ntacode,
|
||
drop_off.ntaname dropoff_ntaname,
|
||
drop_off.puma dropoff_puma
|
||
FROM trips
|
||
LEFT JOIN cab_types
|
||
ON trips.cab_type_id = cab_types.id
|
||
LEFT JOIN central_park_weather_observations_raw weather
|
||
ON weather.date = trips.pickup_datetime::date
|
||
LEFT JOIN nyct2010 pick_up
|
||
ON pick_up.gid = trips.pickup_nyct2010_gid
|
||
LEFT JOIN nyct2010 drop_off
|
||
ON drop_off.gid = trips.dropoff_nyct2010_gid
|
||
) TO '/opt/milovidov/nyc-taxi-data/trips.tsv';
|
||
```
|
||
|
||
データのスナップショットは、毎秒約50MBの速度で作成されます。スナップショットを作成している間、PostgreSQLは毎秒約28MBの速度でディスクから読み込みます。
|
||
これには約5時間かかります。結果として得られるTSVファイルは 590612904969 バイトです。
|
||
|
||
ClickHouseで一時テーブルを作成します:
|
||
|
||
``` sql
|
||
CREATE TABLE trips
|
||
(
|
||
trip_id UInt32,
|
||
vendor_id String,
|
||
pickup_datetime DateTime,
|
||
dropoff_datetime Nullable(DateTime),
|
||
store_and_fwd_flag Nullable(FixedString(1)),
|
||
rate_code_id Nullable(UInt8),
|
||
pickup_longitude Nullable(Float64),
|
||
pickup_latitude Nullable(Float64),
|
||
dropoff_longitude Nullable(Float64),
|
||
dropoff_latitude Nullable(Float64),
|
||
passenger_count Nullable(UInt8),
|
||
trip_distance Nullable(Float64),
|
||
fare_amount Nullable(Float32),
|
||
extra Nullable(Float32),
|
||
mta_tax Nullable(Float32),
|
||
tip_amount Nullable(Float32),
|
||
tolls_amount Nullable(Float32),
|
||
ehail_fee Nullable(Float32),
|
||
improvement_surcharge Nullable(Float32),
|
||
total_amount Nullable(Float32),
|
||
payment_type Nullable(String),
|
||
trip_type Nullable(UInt8),
|
||
pickup Nullable(String),
|
||
dropoff Nullable(String),
|
||
cab_type Nullable(String),
|
||
precipitation Nullable(UInt8),
|
||
snow_depth Nullable(UInt8),
|
||
snowfall Nullable(UInt8),
|
||
max_temperature Nullable(UInt8),
|
||
min_temperature Nullable(UInt8),
|
||
average_wind_speed Nullable(UInt8),
|
||
pickup_nyct2010_gid Nullable(UInt8),
|
||
pickup_ctlabel Nullable(String),
|
||
pickup_borocode Nullable(UInt8),
|
||
pickup_boroname Nullable(String),
|
||
pickup_ct2010 Nullable(String),
|
||
pickup_boroct2010 Nullable(String),
|
||
pickup_cdeligibil Nullable(FixedString(1)),
|
||
pickup_ntacode Nullable(String),
|
||
pickup_ntaname Nullable(String),
|
||
pickup_puma Nullable(String),
|
||
dropoff_nyct2010_gid Nullable(UInt8),
|
||
dropoff_ctlabel Nullable(String),
|
||
dropoff_borocode Nullable(UInt8),
|
||
dropoff_boroname Nullable(String),
|
||
dropoff_ct2010 Nullable(String),
|
||
dropoff_boroct2010 Nullable(String),
|
||
dropoff_cdeligibil Nullable(String),
|
||
dropoff_ntacode Nullable(String),
|
||
dropoff_ntaname Nullable(String),
|
||
dropoff_puma Nullable(String)
|
||
) ENGINE = Log;
|
||
```
|
||
|
||
これは、フィールドをより正しいデータ型に変換したり、可能であればNULLを排除したりするために必要です。
|
||
|
||
``` bash
|
||
$ time clickhouse-client --query="INSERT INTO trips FORMAT TabSeparated" < trips.tsv
|
||
|
||
real 75m56.214s
|
||
```
|
||
|
||
データの読み込み速度は 112~140Mb/秒です。
|
||
1ストリームで、Log型のテーブルにデータをロードするのに76分かかりました。
|
||
このテーブルのデータは 142GB を使用します。
|
||
|
||
(Postgresから直接データをインポートするには、`COPY ... TO PROGRAM`)を使用しても可能です。
|
||
|
||
残念ながら、天気に関連するフィールド(降水量...平均風速)はすべてNULLで埋め尽くされていました。このため、最終的なデータセットから削除します。
|
||
|
||
まず、一つのサーバにテーブルを作成します。そのあとで、テーブルを分散させます。
|
||
|
||
サマリーテーブルを作成します:
|
||
|
||
``` sql
|
||
CREATE TABLE trips_mergetree
|
||
ENGINE = MergeTree(pickup_date, pickup_datetime, 8192)
|
||
AS SELECT
|
||
|
||
trip_id,
|
||
CAST(vendor_id AS Enum8('1' = 1, '2' = 2, 'CMT' = 3, 'VTS' = 4, 'DDS' = 5, 'B02512' = 10, 'B02598' = 11, 'B02617' = 12, 'B02682' = 13, 'B02764' = 14)) AS vendor_id,
|
||
toDate(pickup_datetime) AS pickup_date,
|
||
ifNull(pickup_datetime, toDateTime(0)) AS pickup_datetime,
|
||
toDate(dropoff_datetime) AS dropoff_date,
|
||
ifNull(dropoff_datetime, toDateTime(0)) AS dropoff_datetime,
|
||
assumeNotNull(store_and_fwd_flag) IN ('Y', '1', '2') AS store_and_fwd_flag,
|
||
assumeNotNull(rate_code_id) AS rate_code_id,
|
||
assumeNotNull(pickup_longitude) AS pickup_longitude,
|
||
assumeNotNull(pickup_latitude) AS pickup_latitude,
|
||
assumeNotNull(dropoff_longitude) AS dropoff_longitude,
|
||
assumeNotNull(dropoff_latitude) AS dropoff_latitude,
|
||
assumeNotNull(passenger_count) AS passenger_count,
|
||
assumeNotNull(trip_distance) AS trip_distance,
|
||
assumeNotNull(fare_amount) AS fare_amount,
|
||
assumeNotNull(extra) AS extra,
|
||
assumeNotNull(mta_tax) AS mta_tax,
|
||
assumeNotNull(tip_amount) AS tip_amount,
|
||
assumeNotNull(tolls_amount) AS tolls_amount,
|
||
assumeNotNull(ehail_fee) AS ehail_fee,
|
||
assumeNotNull(improvement_surcharge) AS improvement_surcharge,
|
||
assumeNotNull(total_amount) AS total_amount,
|
||
CAST((assumeNotNull(payment_type) AS pt) IN ('CSH', 'CASH', 'Cash', 'CAS', 'Cas', '1') ? 'CSH' : (pt IN ('CRD', 'Credit', 'Cre', 'CRE', 'CREDIT', '2') ? 'CRE' : (pt IN ('NOC', 'No Charge', 'No', '3') ? 'NOC' : (pt IN ('DIS', 'Dispute', 'Dis', '4') ? 'DIS' : 'UNK'))) AS Enum8('CSH' = 1, 'CRE' = 2, 'UNK' = 0, 'NOC' = 3, 'DIS' = 4)) AS payment_type_,
|
||
assumeNotNull(trip_type) AS trip_type,
|
||
ifNull(toFixedString(unhex(pickup), 25), toFixedString('', 25)) AS pickup,
|
||
ifNull(toFixedString(unhex(dropoff), 25), toFixedString('', 25)) AS dropoff,
|
||
CAST(assumeNotNull(cab_type) AS Enum8('yellow' = 1, 'green' = 2, 'uber' = 3)) AS cab_type,
|
||
|
||
assumeNotNull(pickup_nyct2010_gid) AS pickup_nyct2010_gid,
|
||
toFloat32(ifNull(pickup_ctlabel, '0')) AS pickup_ctlabel,
|
||
assumeNotNull(pickup_borocode) AS pickup_borocode,
|
||
CAST(assumeNotNull(pickup_boroname) AS Enum8('Manhattan' = 1, 'Queens' = 4, 'Brooklyn' = 3, '' = 0, 'Bronx' = 2, 'Staten Island' = 5)) AS pickup_boroname,
|
||
toFixedString(ifNull(pickup_ct2010, '000000'), 6) AS pickup_ct2010,
|
||
toFixedString(ifNull(pickup_boroct2010, '0000000'), 7) AS pickup_boroct2010,
|
||
CAST(assumeNotNull(ifNull(pickup_cdeligibil, ' ')) AS Enum8(' ' = 0, 'E' = 1, 'I' = 2)) AS pickup_cdeligibil,
|
||
toFixedString(ifNull(pickup_ntacode, '0000'), 4) AS pickup_ntacode,
|
||
|
||
CAST(assumeNotNull(pickup_ntaname) AS Enum16('' = 0, 'Airport' = 1, 'Allerton-Pelham Gardens' = 2, 'Annadale-Huguenot-Prince\'s Bay-Eltingville' = 3, 'Arden Heights' = 4, 'Astoria' = 5, 'Auburndale' = 6, 'Baisley Park' = 7, 'Bath Beach' = 8, 'Battery Park City-Lower Manhattan' = 9, 'Bay Ridge' = 10, 'Bayside-Bayside Hills' = 11, 'Bedford' = 12, 'Bedford Park-Fordham North' = 13, 'Bellerose' = 14, 'Belmont' = 15, 'Bensonhurst East' = 16, 'Bensonhurst West' = 17, 'Borough Park' = 18, 'Breezy Point-Belle Harbor-Rockaway Park-Broad Channel' = 19, 'Briarwood-Jamaica Hills' = 20, 'Brighton Beach' = 21, 'Bronxdale' = 22, 'Brooklyn Heights-Cobble Hill' = 23, 'Brownsville' = 24, 'Bushwick North' = 25, 'Bushwick South' = 26, 'Cambria Heights' = 27, 'Canarsie' = 28, 'Carroll Gardens-Columbia Street-Red Hook' = 29, 'Central Harlem North-Polo Grounds' = 30, 'Central Harlem South' = 31, 'Charleston-Richmond Valley-Tottenville' = 32, 'Chinatown' = 33, 'Claremont-Bathgate' = 34, 'Clinton' = 35, 'Clinton Hill' = 36, 'Co-op City' = 37, 'College Point' = 38, 'Corona' = 39, 'Crotona Park East' = 40, 'Crown Heights North' = 41, 'Crown Heights South' = 42, 'Cypress Hills-City Line' = 43, 'DUMBO-Vinegar Hill-Downtown Brooklyn-Boerum Hill' = 44, 'Douglas Manor-Douglaston-Little Neck' = 45, 'Dyker Heights' = 46, 'East Concourse-Concourse Village' = 47, 'East Elmhurst' = 48, 'East Flatbush-Farragut' = 49, 'East Flushing' = 50, 'East Harlem North' = 51, 'East Harlem South' = 52, 'East New York' = 53, 'East New York (Pennsylvania Ave)' = 54, 'East Tremont' = 55, 'East Village' = 56, 'East Williamsburg' = 57, 'Eastchester-Edenwald-Baychester' = 58, 'Elmhurst' = 59, 'Elmhurst-Maspeth' = 60, 'Erasmus' = 61, 'Far Rockaway-Bayswater' = 62, 'Flatbush' = 63, 'Flatlands' = 64, 'Flushing' = 65, 'Fordham South' = 66, 'Forest Hills' = 67, 'Fort Greene' = 68, 'Fresh Meadows-Utopia' = 69, 'Ft. Totten-Bay Terrace-Clearview' = 70, 'Georgetown-Marine Park-Bergen Beach-Mill Basin' = 71, 'Glen Oaks-Floral Park-New Hyde Park' = 72, 'Glendale' = 73, 'Gramercy' = 74, 'Grasmere-Arrochar-Ft. Wadsworth' = 75, 'Gravesend' = 76, 'Great Kills' = 77, 'Greenpoint' = 78, 'Grymes Hill-Clifton-Fox Hills' = 79, 'Hamilton Heights' = 80, 'Hammels-Arverne-Edgemere' = 81, 'Highbridge' = 82, 'Hollis' = 83, 'Homecrest' = 84, 'Hudson Yards-Chelsea-Flatiron-Union Square' = 85, 'Hunters Point-Sunnyside-West Maspeth' = 86, 'Hunts Point' = 87, 'Jackson Heights' = 88, 'Jamaica' = 89, 'Jamaica Estates-Holliswood' = 90, 'Kensington-Ocean Parkway' = 91, 'Kew Gardens' = 92, 'Kew Gardens Hills' = 93, 'Kingsbridge Heights' = 94, 'Laurelton' = 95, 'Lenox Hill-Roosevelt Island' = 96, 'Lincoln Square' = 97, 'Lindenwood-Howard Beach' = 98, 'Longwood' = 99, 'Lower East Side' = 100, 'Madison' = 101, 'Manhattanville' = 102, 'Marble Hill-Inwood' = 103, 'Mariner\'s Harbor-Arlington-Port Ivory-Graniteville' = 104, 'Maspeth' = 105, 'Melrose South-Mott Haven North' = 106, 'Middle Village' = 107, 'Midtown-Midtown South' = 108, 'Midwood' = 109, 'Morningside Heights' = 110, 'Morrisania-Melrose' = 111, 'Mott Haven-Port Morris' = 112, 'Mount Hope' = 113, 'Murray Hill' = 114, 'Murray Hill-Kips Bay' = 115, 'New Brighton-Silver Lake' = 116, 'New Dorp-Midland Beach' = 117, 'New Springville-Bloomfield-Travis' = 118, 'North Corona' = 119, 'North Riverdale-Fieldston-Riverdale' = 120, 'North Side-South Side' = 121, 'Norwood' = 122, 'Oakland Gardens' = 123, 'Oakwood-Oakwood Beach' = 124, 'Ocean Hill' = 125, 'Ocean Parkway South' = 126, 'Old Astoria' = 127, 'Old Town-Dongan Hills-South Beach' = 128, 'Ozone Park' = 129, 'Park Slope-Gowanus' = 130, 'Parkchester' = 131, 'Pelham Bay-Country Club-City Island' = 132, 'Pelham Parkway' = 133, 'Pomonok-Flushing Heights-Hillcrest' = 134, 'Port Richmond' = 135, 'Prospect Heights' = 136, 'Prospect Lefferts Gardens-Wingate' = 137, 'Queens Village' = 138, 'Queensboro Hill' = 139, 'Queensbridge-Ravenswood-Long Island City' = 140, 'Rego Park' = 141, 'Richmond Hill' = 142, 'Ridgewood' = 143, 'Rikers Island' = 144, 'Rosedale' = 145, 'Rossville-Woodrow' = 146, 'Rugby-Remsen Village' = 147, 'Schuylerville-Throgs Neck-Edgewater Park' = 148, 'Seagate-Coney Island' = 149, 'Sheepshead Bay-Gerritsen Beach-Manhattan Beach' = 150, 'SoHo-TriBeCa-Civic Center-Little Italy' = 151, 'Soundview-Bruckner' = 152, 'Soundview-Castle Hill-Clason Point-Harding Park' = 153, 'South Jamaica' = 154, 'South Ozone Park' = 155, 'Springfield Gardens North' = 156, 'Springfield Gardens South-Brookville' = 157, 'Spuyten Duyvil-Kingsbridge' = 158, 'St. Albans' = 159, 'Stapleton-Rosebank' = 160, 'Starrett City' = 161, 'Steinway' = 162, 'Stuyvesant Heights' = 163, 'Stuyvesant Town-Cooper Village' = 164, 'Sunset Park East' = 165, 'Sunset Park West' = 166, 'Todt Hill-Emerson Hill-Heartland Village-Lighthouse Hill' = 167, 'Turtle Bay-East Midtown' = 168, 'University Heights-Morris Heights' = 169, 'Upper East Side-Carnegie Hill' = 170, 'Upper West Side' = 171, 'Van Cortlandt Village' = 172, 'Van Nest-Morris Park-Westchester Square' = 173, 'Washington Heights North' = 174, 'Washington Heights South' = 175, 'West Brighton' = 176, 'West Concourse' = 177, 'West Farms-Bronx River' = 178, 'West New Brighton-New Brighton-St. George' = 179, 'West Village' = 180, 'Westchester-Unionport' = 181, 'Westerleigh' = 182, 'Whitestone' = 183, 'Williamsbridge-Olinville' = 184, 'Williamsburg' = 185, 'Windsor Terrace' = 186, 'Woodhaven' = 187, 'Woodlawn-Wakefield' = 188, 'Woodside' = 189, 'Yorkville' = 190, 'park-cemetery-etc-Bronx' = 191, 'park-cemetery-etc-Brooklyn' = 192, 'park-cemetery-etc-Manhattan' = 193, 'park-cemetery-etc-Queens' = 194, 'park-cemetery-etc-Staten Island' = 195)) AS pickup_ntaname,
|
||
|
||
toUInt16(ifNull(pickup_puma, '0')) AS pickup_puma,
|
||
|
||
assumeNotNull(dropoff_nyct2010_gid) AS dropoff_nyct2010_gid,
|
||
toFloat32(ifNull(dropoff_ctlabel, '0')) AS dropoff_ctlabel,
|
||
assumeNotNull(dropoff_borocode) AS dropoff_borocode,
|
||
CAST(assumeNotNull(dropoff_boroname) AS Enum8('Manhattan' = 1, 'Queens' = 4, 'Brooklyn' = 3, '' = 0, 'Bronx' = 2, 'Staten Island' = 5)) AS dropoff_boroname,
|
||
toFixedString(ifNull(dropoff_ct2010, '000000'), 6) AS dropoff_ct2010,
|
||
toFixedString(ifNull(dropoff_boroct2010, '0000000'), 7) AS dropoff_boroct2010,
|
||
CAST(assumeNotNull(ifNull(dropoff_cdeligibil, ' ')) AS Enum8(' ' = 0, 'E' = 1, 'I' = 2)) AS dropoff_cdeligibil,
|
||
toFixedString(ifNull(dropoff_ntacode, '0000'), 4) AS dropoff_ntacode,
|
||
|
||
CAST(assumeNotNull(dropoff_ntaname) AS Enum16('' = 0, 'Airport' = 1, 'Allerton-Pelham Gardens' = 2, 'Annadale-Huguenot-Prince\'s Bay-Eltingville' = 3, 'Arden Heights' = 4, 'Astoria' = 5, 'Auburndale' = 6, 'Baisley Park' = 7, 'Bath Beach' = 8, 'Battery Park City-Lower Manhattan' = 9, 'Bay Ridge' = 10, 'Bayside-Bayside Hills' = 11, 'Bedford' = 12, 'Bedford Park-Fordham North' = 13, 'Bellerose' = 14, 'Belmont' = 15, 'Bensonhurst East' = 16, 'Bensonhurst West' = 17, 'Borough Park' = 18, 'Breezy Point-Belle Harbor-Rockaway Park-Broad Channel' = 19, 'Briarwood-Jamaica Hills' = 20, 'Brighton Beach' = 21, 'Bronxdale' = 22, 'Brooklyn Heights-Cobble Hill' = 23, 'Brownsville' = 24, 'Bushwick North' = 25, 'Bushwick South' = 26, 'Cambria Heights' = 27, 'Canarsie' = 28, 'Carroll Gardens-Columbia Street-Red Hook' = 29, 'Central Harlem North-Polo Grounds' = 30, 'Central Harlem South' = 31, 'Charleston-Richmond Valley-Tottenville' = 32, 'Chinatown' = 33, 'Claremont-Bathgate' = 34, 'Clinton' = 35, 'Clinton Hill' = 36, 'Co-op City' = 37, 'College Point' = 38, 'Corona' = 39, 'Crotona Park East' = 40, 'Crown Heights North' = 41, 'Crown Heights South' = 42, 'Cypress Hills-City Line' = 43, 'DUMBO-Vinegar Hill-Downtown Brooklyn-Boerum Hill' = 44, 'Douglas Manor-Douglaston-Little Neck' = 45, 'Dyker Heights' = 46, 'East Concourse-Concourse Village' = 47, 'East Elmhurst' = 48, 'East Flatbush-Farragut' = 49, 'East Flushing' = 50, 'East Harlem North' = 51, 'East Harlem South' = 52, 'East New York' = 53, 'East New York (Pennsylvania Ave)' = 54, 'East Tremont' = 55, 'East Village' = 56, 'East Williamsburg' = 57, 'Eastchester-Edenwald-Baychester' = 58, 'Elmhurst' = 59, 'Elmhurst-Maspeth' = 60, 'Erasmus' = 61, 'Far Rockaway-Bayswater' = 62, 'Flatbush' = 63, 'Flatlands' = 64, 'Flushing' = 65, 'Fordham South' = 66, 'Forest Hills' = 67, 'Fort Greene' = 68, 'Fresh Meadows-Utopia' = 69, 'Ft. Totten-Bay Terrace-Clearview' = 70, 'Georgetown-Marine Park-Bergen Beach-Mill Basin' = 71, 'Glen Oaks-Floral Park-New Hyde Park' = 72, 'Glendale' = 73, 'Gramercy' = 74, 'Grasmere-Arrochar-Ft. Wadsworth' = 75, 'Gravesend' = 76, 'Great Kills' = 77, 'Greenpoint' = 78, 'Grymes Hill-Clifton-Fox Hills' = 79, 'Hamilton Heights' = 80, 'Hammels-Arverne-Edgemere' = 81, 'Highbridge' = 82, 'Hollis' = 83, 'Homecrest' = 84, 'Hudson Yards-Chelsea-Flatiron-Union Square' = 85, 'Hunters Point-Sunnyside-West Maspeth' = 86, 'Hunts Point' = 87, 'Jackson Heights' = 88, 'Jamaica' = 89, 'Jamaica Estates-Holliswood' = 90, 'Kensington-Ocean Parkway' = 91, 'Kew Gardens' = 92, 'Kew Gardens Hills' = 93, 'Kingsbridge Heights' = 94, 'Laurelton' = 95, 'Lenox Hill-Roosevelt Island' = 96, 'Lincoln Square' = 97, 'Lindenwood-Howard Beach' = 98, 'Longwood' = 99, 'Lower East Side' = 100, 'Madison' = 101, 'Manhattanville' = 102, 'Marble Hill-Inwood' = 103, 'Mariner\'s Harbor-Arlington-Port Ivory-Graniteville' = 104, 'Maspeth' = 105, 'Melrose South-Mott Haven North' = 106, 'Middle Village' = 107, 'Midtown-Midtown South' = 108, 'Midwood' = 109, 'Morningside Heights' = 110, 'Morrisania-Melrose' = 111, 'Mott Haven-Port Morris' = 112, 'Mount Hope' = 113, 'Murray Hill' = 114, 'Murray Hill-Kips Bay' = 115, 'New Brighton-Silver Lake' = 116, 'New Dorp-Midland Beach' = 117, 'New Springville-Bloomfield-Travis' = 118, 'North Corona' = 119, 'North Riverdale-Fieldston-Riverdale' = 120, 'North Side-South Side' = 121, 'Norwood' = 122, 'Oakland Gardens' = 123, 'Oakwood-Oakwood Beach' = 124, 'Ocean Hill' = 125, 'Ocean Parkway South' = 126, 'Old Astoria' = 127, 'Old Town-Dongan Hills-South Beach' = 128, 'Ozone Park' = 129, 'Park Slope-Gowanus' = 130, 'Parkchester' = 131, 'Pelham Bay-Country Club-City Island' = 132, 'Pelham Parkway' = 133, 'Pomonok-Flushing Heights-Hillcrest' = 134, 'Port Richmond' = 135, 'Prospect Heights' = 136, 'Prospect Lefferts Gardens-Wingate' = 137, 'Queens Village' = 138, 'Queensboro Hill' = 139, 'Queensbridge-Ravenswood-Long Island City' = 140, 'Rego Park' = 141, 'Richmond Hill' = 142, 'Ridgewood' = 143, 'Rikers Island' = 144, 'Rosedale' = 145, 'Rossville-Woodrow' = 146, 'Rugby-Remsen Village' = 147, 'Schuylerville-Throgs Neck-Edgewater Park' = 148, 'Seagate-Coney Island' = 149, 'Sheepshead Bay-Gerritsen Beach-Manhattan Beach' = 150, 'SoHo-TriBeCa-Civic Center-Little Italy' = 151, 'Soundview-Bruckner' = 152, 'Soundview-Castle Hill-Clason Point-Harding Park' = 153, 'South Jamaica' = 154, 'South Ozone Park' = 155, 'Springfield Gardens North' = 156, 'Springfield Gardens South-Brookville' = 157, 'Spuyten Duyvil-Kingsbridge' = 158, 'St. Albans' = 159, 'Stapleton-Rosebank' = 160, 'Starrett City' = 161, 'Steinway' = 162, 'Stuyvesant Heights' = 163, 'Stuyvesant Town-Cooper Village' = 164, 'Sunset Park East' = 165, 'Sunset Park West' = 166, 'Todt Hill-Emerson Hill-Heartland Village-Lighthouse Hill' = 167, 'Turtle Bay-East Midtown' = 168, 'University Heights-Morris Heights' = 169, 'Upper East Side-Carnegie Hill' = 170, 'Upper West Side' = 171, 'Van Cortlandt Village' = 172, 'Van Nest-Morris Park-Westchester Square' = 173, 'Washington Heights North' = 174, 'Washington Heights South' = 175, 'West Brighton' = 176, 'West Concourse' = 177, 'West Farms-Bronx River' = 178, 'West New Brighton-New Brighton-St. George' = 179, 'West Village' = 180, 'Westchester-Unionport' = 181, 'Westerleigh' = 182, 'Whitestone' = 183, 'Williamsbridge-Olinville' = 184, 'Williamsburg' = 185, 'Windsor Terrace' = 186, 'Woodhaven' = 187, 'Woodlawn-Wakefield' = 188, 'Woodside' = 189, 'Yorkville' = 190, 'park-cemetery-etc-Bronx' = 191, 'park-cemetery-etc-Brooklyn' = 192, 'park-cemetery-etc-Manhattan' = 193, 'park-cemetery-etc-Queens' = 194, 'park-cemetery-etc-Staten Island' = 195)) AS dropoff_ntaname,
|
||
|
||
toUInt16(ifNull(dropoff_puma, '0')) AS dropoff_puma
|
||
|
||
FROM trips
|
||
```
|
||
|
||
これは1秒間に約428,000行の速度で3030秒かかります。
|
||
より速くロードするには、`MergeTree`の代わりに`Log`エンジンでテーブルを作成します。この場合、ダウンロードは200秒よりも速く動作します。
|
||
|
||
テーブルは126GBのディスク領域を使用します。
|
||
|
||
|
||
``` sql
|
||
SELECT formatReadableSize(sum(bytes)) FROM system.parts WHERE table = 'trips_mergetree' AND active
|
||
```
|
||
|
||
``` text
|
||
┌─formatReadableSize(sum(bytes))─┐
|
||
│ 126.18 GiB │
|
||
└────────────────────────────────┘
|
||
```
|
||
|
||
その他に、MergeTree で OPTIMIZE クエリを実行することができます。しかし、これがなくてもうまく動作するので、必須ではありません。
|
||
|
||
## パーティションされたデータのダウンロード {#download-of-prepared-partitions}
|
||
|
||
``` bash
|
||
$ curl -O https://clickhouse-datasets.s3.yandex.net/trips_mergetree/partitions/trips_mergetree.tar
|
||
$ tar xvf trips_mergetree.tar -C /var/lib/clickhouse # path to ClickHouse data directory
|
||
$ # check permissions of unpacked data, fix if required
|
||
$ sudo service clickhouse-server restart
|
||
$ clickhouse-client --query "select count(*) from datasets.trips_mergetree"
|
||
```
|
||
|
||
!!! info "情報"
|
||
以下で説明するクエリを実行する場合は、`datasets.trips_mergetree` のように完全なテーブル名を使用する必要があります。
|
||
|
||
## 単一サーバーでの結果 {#results-on-single-server}
|
||
|
||
Q1:
|
||
|
||
``` sql
|
||
SELECT cab_type, count(*) FROM trips_mergetree GROUP BY cab_type
|
||
```
|
||
|
||
0.490秒
|
||
|
||
Q2:
|
||
|
||
``` sql
|
||
SELECT passenger_count, avg(total_amount) FROM trips_mergetree GROUP BY passenger_count
|
||
```
|
||
|
||
1.224秒
|
||
|
||
Q3:
|
||
|
||
``` sql
|
||
SELECT passenger_count, toYear(pickup_date) AS year, count(*) FROM trips_mergetree GROUP BY passenger_count, year
|
||
```
|
||
|
||
2.104秒
|
||
|
||
Q4:
|
||
|
||
``` sql
|
||
SELECT passenger_count, toYear(pickup_date) AS year, round(trip_distance) AS distance, count(*)
|
||
FROM trips_mergetree
|
||
GROUP BY passenger_count, year, distance
|
||
ORDER BY year, count(*) DESC
|
||
```
|
||
|
||
3.593秒
|
||
|
||
使用したサーバは以下の通りです。
|
||
|
||
Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz が2つ、合計16個の物理カーネル、128GiB RAM、RAID-5 の 8 x 6 TB HDD。
|
||
|
||
実行時間は、3回実行した中で最もよい結果です。しかし、2回目の実行からは、クエリはファイルシステムのキャッシュからデータを読み込みます。それ以上のキャッシュは発生しません:データは読み出され、各実行で処理されます。
|
||
|
||
3台のサーバーにテーブルを作成する:
|
||
|
||
各サーバー上で:
|
||
|
||
``` sql
|
||
CREATE TABLE default.trips_mergetree_third ( trip_id UInt32, vendor_id Enum8('1' = 1, '2' = 2, 'CMT' = 3, 'VTS' = 4, 'DDS' = 5, 'B02512' = 10, 'B02598' = 11, 'B02617' = 12, 'B02682' = 13, 'B02764' = 14), pickup_date Date, pickup_datetime DateTime, dropoff_date Date, dropoff_datetime DateTime, store_and_fwd_flag UInt8, rate_code_id UInt8, pickup_longitude Float64, pickup_latitude Float64, dropoff_longitude Float64, dropoff_latitude Float64, passenger_count UInt8, trip_distance Float64, fare_amount Float32, extra Float32, mta_tax Float32, tip_amount Float32, tolls_amount Float32, ehail_fee Float32, improvement_surcharge Float32, total_amount Float32, payment_type_ Enum8('UNK' = 0, 'CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4), trip_type UInt8, pickup FixedString(25), dropoff FixedString(25), cab_type Enum8('yellow' = 1, 'green' = 2, 'uber' = 3), pickup_nyct2010_gid UInt8, pickup_ctlabel Float32, pickup_borocode UInt8, pickup_boroname Enum8('' = 0, 'Manhattan' = 1, 'Bronx' = 2, 'Brooklyn' = 3, 'Queens' = 4, 'Staten Island' = 5), pickup_ct2010 FixedString(6), pickup_boroct2010 FixedString(7), pickup_cdeligibil Enum8(' ' = 0, 'E' = 1, 'I' = 2), pickup_ntacode FixedString(4), pickup_ntaname Enum16('' = 0, 'Airport' = 1, 'Allerton-Pelham Gardens' = 2, 'Annadale-Huguenot-Prince\'s Bay-Eltingville' = 3, 'Arden Heights' = 4, 'Astoria' = 5, 'Auburndale' = 6, 'Baisley Park' = 7, 'Bath Beach' = 8, 'Battery Park City-Lower Manhattan' = 9, 'Bay Ridge' = 10, 'Bayside-Bayside Hills' = 11, 'Bedford' = 12, 'Bedford Park-Fordham North' = 13, 'Bellerose' = 14, 'Belmont' = 15, 'Bensonhurst East' = 16, 'Bensonhurst West' = 17, 'Borough Park' = 18, 'Breezy Point-Belle Harbor-Rockaway Park-Broad Channel' = 19, 'Briarwood-Jamaica Hills' = 20, 'Brighton Beach' = 21, 'Bronxdale' = 22, 'Brooklyn Heights-Cobble Hill' = 23, 'Brownsville' = 24, 'Bushwick North' = 25, 'Bushwick South' = 26, 'Cambria Heights' = 27, 'Canarsie' = 28, 'Carroll Gardens-Columbia Street-Red Hook' = 29, 'Central Harlem North-Polo Grounds' = 30, 'Central Harlem South' = 31, 'Charleston-Richmond Valley-Tottenville' = 32, 'Chinatown' = 33, 'Claremont-Bathgate' = 34, 'Clinton' = 35, 'Clinton Hill' = 36, 'Co-op City' = 37, 'College Point' = 38, 'Corona' = 39, 'Crotona Park East' = 40, 'Crown Heights North' = 41, 'Crown Heights South' = 42, 'Cypress Hills-City Line' = 43, 'DUMBO-Vinegar Hill-Downtown Brooklyn-Boerum Hill' = 44, 'Douglas Manor-Douglaston-Little Neck' = 45, 'Dyker Heights' = 46, 'East Concourse-Concourse Village' = 47, 'East Elmhurst' = 48, 'East Flatbush-Farragut' = 49, 'East Flushing' = 50, 'East Harlem North' = 51, 'East Harlem South' = 52, 'East New York' = 53, 'East New York (Pennsylvania Ave)' = 54, 'East Tremont' = 55, 'East Village' = 56, 'East Williamsburg' = 57, 'Eastchester-Edenwald-Baychester' = 58, 'Elmhurst' = 59, 'Elmhurst-Maspeth' = 60, 'Erasmus' = 61, 'Far Rockaway-Bayswater' = 62, 'Flatbush' = 63, 'Flatlands' = 64, 'Flushing' = 65, 'Fordham South' = 66, 'Forest Hills' = 67, 'Fort Greene' = 68, 'Fresh Meadows-Utopia' = 69, 'Ft. Totten-Bay Terrace-Clearview' = 70, 'Georgetown-Marine Park-Bergen Beach-Mill Basin' = 71, 'Glen Oaks-Floral Park-New Hyde Park' = 72, 'Glendale' = 73, 'Gramercy' = 74, 'Grasmere-Arrochar-Ft. Wadsworth' = 75, 'Gravesend' = 76, 'Great Kills' = 77, 'Greenpoint' = 78, 'Grymes Hill-Clifton-Fox Hills' = 79, 'Hamilton Heights' = 80, 'Hammels-Arverne-Edgemere' = 81, 'Highbridge' = 82, 'Hollis' = 83, 'Homecrest' = 84, 'Hudson Yards-Chelsea-Flatiron-Union Square' = 85, 'Hunters Point-Sunnyside-West Maspeth' = 86, 'Hunts Point' = 87, 'Jackson Heights' = 88, 'Jamaica' = 89, 'Jamaica Estates-Holliswood' = 90, 'Kensington-Ocean Parkway' = 91, 'Kew Gardens' = 92, 'Kew Gardens Hills' = 93, 'Kingsbridge Heights' = 94, 'Laurelton' = 95, 'Lenox Hill-Roosevelt Island' = 96, 'Lincoln Square' = 97, 'Lindenwood-Howard Beach' = 98, 'Longwood' = 99, 'Lower East Side' = 100, 'Madison' = 101, 'Manhattanville' = 102, 'Marble Hill-Inwood' = 103, 'Mariner\'s Harbor-Arlington-Port Ivory-Graniteville' = 104, 'Maspeth' = 105, 'Melrose South-Mott Haven North' = 106, 'Middle Village' = 107, 'Midtown-Midtown South' = 108, 'Midwood' = 109, 'Morningside Heights' = 110, 'Morrisania-Melrose' = 111, 'Mott Haven-Port Morris' = 112, 'Mount Hope' = 113, 'Murray Hill' = 114, 'Murray Hill-Kips Bay' = 115, 'New Brighton-Silver Lake' = 116, 'New Dorp-Midland Beach' = 117, 'New Springville-Bloomfield-Travis' = 118, 'North Corona' = 119, 'North Riverdale-Fieldston-Riverdale' = 120, 'North Side-South Side' = 121, 'Norwood' = 122, 'Oakland Gardens' = 123, 'Oakwood-Oakwood Beach' = 124, 'Ocean Hill' = 125, 'Ocean Parkway South' = 126, 'Old Astoria' = 127, 'Old Town-Dongan Hills-South Beach' = 128, 'Ozone Park' = 129, 'Park Slope-Gowanus' = 130, 'Parkchester' = 131, 'Pelham Bay-Country Club-City Island' = 132, 'Pelham Parkway' = 133, 'Pomonok-Flushing Heights-Hillcrest' = 134, 'Port Richmond' = 135, 'Prospect Heights' = 136, 'Prospect Lefferts Gardens-Wingate' = 137, 'Queens Village' = 138, 'Queensboro Hill' = 139, 'Queensbridge-Ravenswood-Long Island City' = 140, 'Rego Park' = 141, 'Richmond Hill' = 142, 'Ridgewood' = 143, 'Rikers Island' = 144, 'Rosedale' = 145, 'Rossville-Woodrow' = 146, 'Rugby-Remsen Village' = 147, 'Schuylerville-Throgs Neck-Edgewater Park' = 148, 'Seagate-Coney Island' = 149, 'Sheepshead Bay-Gerritsen Beach-Manhattan Beach' = 150, 'SoHo-TriBeCa-Civic Center-Little Italy' = 151, 'Soundview-Bruckner' = 152, 'Soundview-Castle Hill-Clason Point-Harding Park' = 153, 'South Jamaica' = 154, 'South Ozone Park' = 155, 'Springfield Gardens North' = 156, 'Springfield Gardens South-Brookville' = 157, 'Spuyten Duyvil-Kingsbridge' = 158, 'St. Albans' = 159, 'Stapleton-Rosebank' = 160, 'Starrett City' = 161, 'Steinway' = 162, 'Stuyvesant Heights' = 163, 'Stuyvesant Town-Cooper Village' = 164, 'Sunset Park East' = 165, 'Sunset Park West' = 166, 'Todt Hill-Emerson Hill-Heartland Village-Lighthouse Hill' = 167, 'Turtle Bay-East Midtown' = 168, 'University Heights-Morris Heights' = 169, 'Upper East Side-Carnegie Hill' = 170, 'Upper West Side' = 171, 'Van Cortlandt Village' = 172, 'Van Nest-Morris Park-Westchester Square' = 173, 'Washington Heights North' = 174, 'Washington Heights South' = 175, 'West Brighton' = 176, 'West Concourse' = 177, 'West Farms-Bronx River' = 178, 'West New Brighton-New Brighton-St. George' = 179, 'West Village' = 180, 'Westchester-Unionport' = 181, 'Westerleigh' = 182, 'Whitestone' = 183, 'Williamsbridge-Olinville' = 184, 'Williamsburg' = 185, 'Windsor Terrace' = 186, 'Woodhaven' = 187, 'Woodlawn-Wakefield' = 188, 'Woodside' = 189, 'Yorkville' = 190, 'park-cemetery-etc-Bronx' = 191, 'park-cemetery-etc-Brooklyn' = 192, 'park-cemetery-etc-Manhattan' = 193, 'park-cemetery-etc-Queens' = 194, 'park-cemetery-etc-Staten Island' = 195), pickup_puma UInt16, dropoff_nyct2010_gid UInt8, dropoff_ctlabel Float32, dropoff_borocode UInt8, dropoff_boroname Enum8('' = 0, 'Manhattan' = 1, 'Bronx' = 2, 'Brooklyn' = 3, 'Queens' = 4, 'Staten Island' = 5), dropoff_ct2010 FixedString(6), dropoff_boroct2010 FixedString(7), dropoff_cdeligibil Enum8(' ' = 0, 'E' = 1, 'I' = 2), dropoff_ntacode FixedString(4), dropoff_ntaname Enum16('' = 0, 'Airport' = 1, 'Allerton-Pelham Gardens' = 2, 'Annadale-Huguenot-Prince\'s Bay-Eltingville' = 3, 'Arden Heights' = 4, 'Astoria' = 5, 'Auburndale' = 6, 'Baisley Park' = 7, 'Bath Beach' = 8, 'Battery Park City-Lower Manhattan' = 9, 'Bay Ridge' = 10, 'Bayside-Bayside Hills' = 11, 'Bedford' = 12, 'Bedford Park-Fordham North' = 13, 'Bellerose' = 14, 'Belmont' = 15, 'Bensonhurst East' = 16, 'Bensonhurst West' = 17, 'Borough Park' = 18, 'Breezy Point-Belle Harbor-Rockaway Park-Broad Channel' = 19, 'Briarwood-Jamaica Hills' = 20, 'Brighton Beach' = 21, 'Bronxdale' = 22, 'Brooklyn Heights-Cobble Hill' = 23, 'Brownsville' = 24, 'Bushwick North' = 25, 'Bushwick South' = 26, 'Cambria Heights' = 27, 'Canarsie' = 28, 'Carroll Gardens-Columbia Street-Red Hook' = 29, 'Central Harlem North-Polo Grounds' = 30, 'Central Harlem South' = 31, 'Charleston-Richmond Valley-Tottenville' = 32, 'Chinatown' = 33, 'Claremont-Bathgate' = 34, 'Clinton' = 35, 'Clinton Hill' = 36, 'Co-op City' = 37, 'College Point' = 38, 'Corona' = 39, 'Crotona Park East' = 40, 'Crown Heights North' = 41, 'Crown Heights South' = 42, 'Cypress Hills-City Line' = 43, 'DUMBO-Vinegar Hill-Downtown Brooklyn-Boerum Hill' = 44, 'Douglas Manor-Douglaston-Little Neck' = 45, 'Dyker Heights' = 46, 'East Concourse-Concourse Village' = 47, 'East Elmhurst' = 48, 'East Flatbush-Farragut' = 49, 'East Flushing' = 50, 'East Harlem North' = 51, 'East Harlem South' = 52, 'East New York' = 53, 'East New York (Pennsylvania Ave)' = 54, 'East Tremont' = 55, 'East Village' = 56, 'East Williamsburg' = 57, 'Eastchester-Edenwald-Baychester' = 58, 'Elmhurst' = 59, 'Elmhurst-Maspeth' = 60, 'Erasmus' = 61, 'Far Rockaway-Bayswater' = 62, 'Flatbush' = 63, 'Flatlands' = 64, 'Flushing' = 65, 'Fordham South' = 66, 'Forest Hills' = 67, 'Fort Greene' = 68, 'Fresh Meadows-Utopia' = 69, 'Ft. Totten-Bay Terrace-Clearview' = 70, 'Georgetown-Marine Park-Bergen Beach-Mill Basin' = 71, 'Glen Oaks-Floral Park-New Hyde Park' = 72, 'Glendale' = 73, 'Gramercy' = 74, 'Grasmere-Arrochar-Ft. Wadsworth' = 75, 'Gravesend' = 76, 'Great Kills' = 77, 'Greenpoint' = 78, 'Grymes Hill-Clifton-Fox Hills' = 79, 'Hamilton Heights' = 80, 'Hammels-Arverne-Edgemere' = 81, 'Highbridge' = 82, 'Hollis' = 83, 'Homecrest' = 84, 'Hudson Yards-Chelsea-Flatiron-Union Square' = 85, 'Hunters Point-Sunnyside-West Maspeth' = 86, 'Hunts Point' = 87, 'Jackson Heights' = 88, 'Jamaica' = 89, 'Jamaica Estates-Holliswood' = 90, 'Kensington-Ocean Parkway' = 91, 'Kew Gardens' = 92, 'Kew Gardens Hills' = 93, 'Kingsbridge Heights' = 94, 'Laurelton' = 95, 'Lenox Hill-Roosevelt Island' = 96, 'Lincoln Square' = 97, 'Lindenwood-Howard Beach' = 98, 'Longwood' = 99, 'Lower East Side' = 100, 'Madison' = 101, 'Manhattanville' = 102, 'Marble Hill-Inwood' = 103, 'Mariner\'s Harbor-Arlington-Port Ivory-Graniteville' = 104, 'Maspeth' = 105, 'Melrose South-Mott Haven North' = 106, 'Middle Village' = 107, 'Midtown-Midtown South' = 108, 'Midwood' = 109, 'Morningside Heights' = 110, 'Morrisania-Melrose' = 111, 'Mott Haven-Port Morris' = 112, 'Mount Hope' = 113, 'Murray Hill' = 114, 'Murray Hill-Kips Bay' = 115, 'New Brighton-Silver Lake' = 116, 'New Dorp-Midland Beach' = 117, 'New Springville-Bloomfield-Travis' = 118, 'North Corona' = 119, 'North Riverdale-Fieldston-Riverdale' = 120, 'North Side-South Side' = 121, 'Norwood' = 122, 'Oakland Gardens' = 123, 'Oakwood-Oakwood Beach' = 124, 'Ocean Hill' = 125, 'Ocean Parkway South' = 126, 'Old Astoria' = 127, 'Old Town-Dongan Hills-South Beach' = 128, 'Ozone Park' = 129, 'Park Slope-Gowanus' = 130, 'Parkchester' = 131, 'Pelham Bay-Country Club-City Island' = 132, 'Pelham Parkway' = 133, 'Pomonok-Flushing Heights-Hillcrest' = 134, 'Port Richmond' = 135, 'Prospect Heights' = 136, 'Prospect Lefferts Gardens-Wingate' = 137, 'Queens Village' = 138, 'Queensboro Hill' = 139, 'Queensbridge-Ravenswood-Long Island City' = 140, 'Rego Park' = 141, 'Richmond Hill' = 142, 'Ridgewood' = 143, 'Rikers Island' = 144, 'Rosedale' = 145, 'Rossville-Woodrow' = 146, 'Rugby-Remsen Village' = 147, 'Schuylerville-Throgs Neck-Edgewater Park' = 148, 'Seagate-Coney Island' = 149, 'Sheepshead Bay-Gerritsen Beach-Manhattan Beach' = 150, 'SoHo-TriBeCa-Civic Center-Little Italy' = 151, 'Soundview-Bruckner' = 152, 'Soundview-Castle Hill-Clason Point-Harding Park' = 153, 'South Jamaica' = 154, 'South Ozone Park' = 155, 'Springfield Gardens North' = 156, 'Springfield Gardens South-Brookville' = 157, 'Spuyten Duyvil-Kingsbridge' = 158, 'St. Albans' = 159, 'Stapleton-Rosebank' = 160, 'Starrett City' = 161, 'Steinway' = 162, 'Stuyvesant Heights' = 163, 'Stuyvesant Town-Cooper Village' = 164, 'Sunset Park East' = 165, 'Sunset Park West' = 166, 'Todt Hill-Emerson Hill-Heartland Village-Lighthouse Hill' = 167, 'Turtle Bay-East Midtown' = 168, 'University Heights-Morris Heights' = 169, 'Upper East Side-Carnegie Hill' = 170, 'Upper West Side' = 171, 'Van Cortlandt Village' = 172, 'Van Nest-Morris Park-Westchester Square' = 173, 'Washington Heights North' = 174, 'Washington Heights South' = 175, 'West Brighton' = 176, 'West Concourse' = 177, 'West Farms-Bronx River' = 178, 'West New Brighton-New Brighton-St. George' = 179, 'West Village' = 180, 'Westchester-Unionport' = 181, 'Westerleigh' = 182, 'Whitestone' = 183, 'Williamsbridge-Olinville' = 184, 'Williamsburg' = 185, 'Windsor Terrace' = 186, 'Woodhaven' = 187, 'Woodlawn-Wakefield' = 188, 'Woodside' = 189, 'Yorkville' = 190, 'park-cemetery-etc-Bronx' = 191, 'park-cemetery-etc-Brooklyn' = 192, 'park-cemetery-etc-Manhattan' = 193, 'park-cemetery-etc-Queens' = 194, 'park-cemetery-etc-Staten Island' = 195), dropoff_puma UInt16) ENGINE = MergeTree(pickup_date, pickup_datetime, 8192)
|
||
```
|
||
|
||
移行元サーバー上:
|
||
|
||
``` sql
|
||
CREATE TABLE trips_mergetree_x3 AS trips_mergetree_third ENGINE = Distributed(perftest, default, trips_mergetree_third, rand())
|
||
```
|
||
|
||
次のクエリで、データを再配布します:
|
||
|
||
``` sql
|
||
INSERT INTO trips_mergetree_x3 SELECT * FROM trips_mergetree
|
||
```
|
||
|
||
これには2454秒かかります。
|
||
|
||
三つのサーバー上でクエリを実行すると:
|
||
|
||
Q1:0.212秒
|
||
Q2:0.438秒
|
||
Q3:0.733秒
|
||
Q4:1.241秒
|
||
|
||
クエリは線形にスケーリングされているので、ここでは予想通りです。
|
||
|
||
また、140台のサーバーのクラスタからの結果も得られます:
|
||
|
||
Q1:0.028秒
|
||
Q2:0.043秒
|
||
Q3:0.051秒
|
||
Q4:0.072秒
|
||
|
||
この場合、クエリの処理時間は、ネットワークのレイテンシによって決定されます。
|
||
フィンランドのYandexデータセンターにあるクライアントをロシアのクラスター上に置いてクエリを実行したところ、約20ミリ秒のレイテンシが追加されました。
|
||
|
||
## サマリ {#summary}
|
||
|
||
| サーバ | Q1 | Q2 | Q3 | Q4 |
|
||
|--------|-------|-------|-------|-------|
|
||
| 1 | 0.490 | 1.224 | 2.104 | 3.593 |
|
||
| 3 | 0.212 | 0.438 | 0.733 | 1.241 |
|
||
| 140 | 0.028 | 0.043 | 0.051 | 0.072 |
|
||
|
||
[元の記事](https://clickhouse.tech/docs/en/getting_started/example_datasets/nyc_taxi/) <!--hide-->
|