--- slug: /ja/integrations/jupysql sidebar_label: Jupyter notebooks description: JupysqlはJupyter向けのマルチプラットフォームデータベースツールです。 --- # ClickHouseでJupySQLを使用する このガイドでは、ClickHouseとの統合について説明します。 Jupysqlを使ってClickHouse上でクエリを実行します。データがロードされた後、SQLプロットでデータを可視化します。 JupysqlとClickHouseの統合は、clickhouse_sqlalchemyライブラリを使用することで可能になります。このライブラリは、両システム間のコミュニケーションを容易にし、ClickHouseに接続してSQL方言を渡すことを可能にします。接続されたら、ClickhouseのネイティブUIまたはJupyterノートブックから直接SQLクエリを実行できます。 ```python # 必要なパッケージをインストール %pip install --quiet jupysql clickhouse_sqlalchemy ``` 注: 更新されたパッケージを使用するにはカーネルを再起動する必要があるかもしれません。 ```python import pandas as pd from sklearn_evaluation import plot # jupysql Jupyter拡張機能をインポートしてSQLセルを作成 %load_ext sql %config SqlMagic.autocommit=False ``` **次の段階に進むには、Clickhouseが起動してアクセス可能であることを確認してください。ローカル版またはクラウド版のどちらでも利用できます。** **注:** 接続文字列は、接続しようとしているインスタンスタイプ(URL、ユーザー、パスワード)に応じて調整する必要があります。以下の例ではローカルインスタンスを使用しています。詳しくは、[こちらのガイド](https://clickhouse.com/docs/ja/getting-started/quick-start)をご覧ください。 ```python %sql clickhouse://default:@localhost:8123/default ``` ```sql %%sql CREATE TABLE trips ( `trip_id` UInt32, `vendor_id` Enum8('1' = 1, '2' = 2, '3' = 3, '4' = 4, 'CMT' = 5, 'VTS' = 6, 'DDS' = 7, 'B02512' = 10, 'B02598' = 11, 'B02617' = 12, 'B02682' = 13, 'B02764' = 14, '' = 15), `pickup_date` Date, `pickup_datetime` DateTime, `dropoff_date` Date, `dropoff_datetime` DateTime, `store_and_fwd_flag` UInt8, `rate_code_id` UInt8, `pickup_longitude` Float64, `pickup_latitude` Float64, `dropoff_longitude` Float64, `dropoff_latitude` Float64, `passenger_count` UInt8, `trip_distance` Float64, `fare_amount` Float32, `extra` Float32, `mta_tax` Float32, `tip_amount` Float32, `tolls_amount` Float32, `ehail_fee` Float32, `improvement_surcharge` Float32, `total_amount` Float32, `payment_type` Enum8('UNK' = 0, 'CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4), `trip_type` UInt8, `pickup` FixedString(25), `dropoff` FixedString(25), `cab_type` Enum8('yellow' = 1, 'green' = 2, 'uber' = 3), `pickup_nyct2010_gid` Int8, `pickup_ctlabel` Float32, `pickup_borocode` Int8, `pickup_ct2010` String, `pickup_boroct2010` String, `pickup_cdeligibil` String, `pickup_ntacode` FixedString(4), `pickup_ntaname` String, `pickup_puma` UInt16, `dropoff_nyct2010_gid` UInt8, `dropoff_ctlabel` Float32, `dropoff_borocode` UInt8, `dropoff_ct2010` String, `dropoff_boroct2010` String, `dropoff_cdeligibil` String, `dropoff_ntacode` FixedString(4), `dropoff_ntaname` String, `dropoff_puma` UInt16 ) ENGINE = MergeTree PARTITION BY toYYYYMM(pickup_date) ORDER BY pickup_datetime; ``` * clickhouse://default:***@localhost:8123/default Done.
```sql %%sql INSERT INTO trips SELECT * FROM s3( 'https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_{1..2}.gz', 'TabSeparatedWithNames', " `trip_id` UInt32, `vendor_id` Enum8('1' = 1, '2' = 2, '3' = 3, '4' = 4, 'CMT' = 5, 'VTS' = 6, 'DDS' = 7, 'B02512' = 10, 'B02598' = 11, 'B02617' = 12, 'B02682' = 13, 'B02764' = 14, '' = 15), `pickup_date` Date, `pickup_datetime` DateTime, `dropoff_date` Date, `dropoff_datetime` DateTime, `store_and_fwd_flag` UInt8, `rate_code_id` UInt8, `pickup_longitude` Float64, `pickup_latitude` Float64, `dropoff_longitude` Float64, `dropoff_latitude` Float64, `passenger_count` UInt8, `trip_distance` Float64, `fare_amount` Float32, `extra` Float32, `mta_tax` Float32, `tip_amount` Float32, `tolls_amount` Float32, `ehail_fee` Float32, `improvement_surcharge` Float32, `total_amount` Float32, `payment_type` Enum8('UNK' = 0, 'CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4), `trip_type` UInt8, `pickup` FixedString(25), `dropoff` FixedString(25), `cab_type` Enum8('yellow' = 1, 'green' = 2, 'uber' = 3), `pickup_nyct2010_gid` Int8, `pickup_ctlabel` Float32, `pickup_borocode` Int8, `pickup_ct2010` String, `pickup_boroct2010` String, `pickup_cdeligibil` String, `pickup_ntacode` FixedString(4), `pickup_ntaname` String, `pickup_puma` UInt16, `dropoff_nyct2010_gid` UInt8, `dropoff_ctlabel` Float32, `dropoff_borocode` UInt8, `dropoff_ct2010` String, `dropoff_boroct2010` String, `dropoff_cdeligibil` String, `dropoff_ntacode` FixedString(4), `dropoff_ntaname` String, `dropoff_puma` UInt16 ") SETTINGS input_format_try_infer_datetimes = 0 ``` * clickhouse://default:***@localhost:8123/default Done.
```python %sql SELECT count() FROM trips limit 5; ``` * clickhouse://default:***@localhost:8123/default Done.
count()
1999657
```python %sql SELECT DISTINCT(pickup_ntaname) FROM trips limit 5; ``` * clickhouse://default:***@localhost:8123/default Done.
pickup_ntaname
Morningside Heights
Hudson Yards-Chelsea-Flatiron-Union Square
Midtown-Midtown South
SoHo-TriBeCa-Civic Center-Little Italy
Murray Hill-Kips Bay
```python %sql SELECT round(avg(tip_amount), 2) FROM trips ``` * clickhouse://default:***@localhost:8123/default Done.
round(avg(tip_amount), 2)
1.68
```sql %%sql SELECT passenger_count, ceil(avg(total_amount),2) AS average_total_amount FROM trips GROUP BY passenger_count ``` * clickhouse://default:***@localhost:8123/default Done.
passenger_count average_total_amount
0 22.69
1 15.97
2 17.15
3 16.76
4 17.33
5 16.35
6 16.04
7 59.8
8 36.41
9 9.81
```sql %%sql SELECT pickup_date, pickup_ntaname, SUM(1) AS number_of_trips FROM trips GROUP BY pickup_date, pickup_ntaname ORDER BY pickup_date ASC limit 5; ``` * clickhouse://default:***@localhost:8123/default Done.
pickup_date pickup_ntaname number_of_trips
2015-07-01 Bushwick North 2
2015-07-01 Brighton Beach 1
2015-07-01 Briarwood-Jamaica Hills 3
2015-07-01 Williamsburg 1
2015-07-01 Queensbridge-Ravenswood-Long Island City 9
```python # %sql DESCRIBE trips; ``` ```python # %sql SELECT DISTINCT(trip_distance) FROM trips limit 50; ``` ```sql %%sql --save short-trips --no-execute SELECT * FROM trips WHERE trip_distance < 6.3 ``` * clickhouse://default:***@localhost:8123/default Skipping execution... ```python %sqlplot histogram --table short-trips --column trip_distance --bins 10 --with short-trips ``` ![histogram example](images/jupysql-plot-1.png) ```python ax = %sqlplot histogram --table short-trips --column trip_distance --bins 50 --with short-trips ax.grid() ax.set_title("Trip distance from trips < 6.3") _ = ax.set_xlabel("Trip distance") ``` ![histogram second example](images/jupysql-plot-1.png)