ClickHouse/docs/en/getting_started/example_datasets/star_schema.rst
2017-08-16 00:33:07 +03:00

87 lines
3.0 KiB
ReStructuredText

Star Schema
===========
Compile dbgen: https://github.com/vadimtk/ssb-dbgen
.. code-block:: bash
git clone git@github.com:vadimtk/ssb-dbgen.git
cd ssb-dbgen
make
You will see some warnings. It's Ok.
Place ``dbgen`` and ``dists.dss`` to some place with at least 800 GB free space available.
Generate data:
.. code-block:: bash
./dbgen -s 1000 -T c
./dbgen -s 1000 -T l
Create tables in ClickHouse:
.. code-block:: sql
CREATE TABLE lineorder (
LO_ORDERKEY UInt32,
LO_LINENUMBER UInt8,
LO_CUSTKEY UInt32,
LO_PARTKEY UInt32,
LO_SUPPKEY UInt32,
LO_ORDERDATE Date,
LO_ORDERPRIORITY String,
LO_SHIPPRIORITY UInt8,
LO_QUANTITY UInt8,
LO_EXTENDEDPRICE UInt32,
LO_ORDTOTALPRICE UInt32,
LO_DISCOUNT UInt8,
LO_REVENUE UInt32,
LO_SUPPLYCOST UInt32,
LO_TAX UInt8,
LO_COMMITDATE Date,
LO_SHIPMODE String
) Engine = MergeTree(LO_ORDERDATE,(LO_ORDERKEY,LO_LINENUMBER,LO_ORDERDATE),8192);
CREATE TABLE customer (
C_CUSTKEY UInt32,
C_NAME String,
C_ADDRESS String,
C_CITY String,
C_NATION String,
C_REGION String,
C_PHONE String,
C_MKTSEGMENT String,
C_FAKEDATE Date
) Engine = MergeTree(C_FAKEDATE,(C_CUSTKEY,C_FAKEDATE),8192);
CREATE TABLE part (
P_PARTKEY UInt32,
P_NAME String,
P_MFGR String,
P_CATEGORY String,
P_BRAND String,
P_COLOR String,
P_TYPE String,
P_SIZE UInt8,
P_CONTAINER String,
P_FAKEDATE Date
) Engine = MergeTree(P_FAKEDATE,(P_PARTKEY,P_FAKEDATE),8192);
CREATE TABLE lineorderd AS lineorder ENGINE = Distributed(perftest_3shards_1replicas, default, lineorder, rand());
CREATE TABLE customerd AS customer ENGINE = Distributed(perftest_3shards_1replicas, default, customer, rand());
CREATE TABLE partd AS part ENGINE = Distributed(perftest_3shards_1replicas, default, part, rand());
For single-node setup, create just MergeTree tables.
For Distributed setup, you must configure cluster ``perftest_3shards_1replicas`` in configuration file.
Then create MergeTree tables on each node and then create Distributed tables.
Load data (change customer to customerd in case of distributed setup):
.. code-block:: bash
cat customer.tbl | sed 's/$/2000-01-01/' | clickhouse-client --query "INSERT INTO customer FORMAT CSV"
cat lineorder.tbl | clickhouse-client --query "INSERT INTO lineorder FORMAT CSV"