ClickHouse/docs/en/development/building_and_benchmarking_deflate_qpl.md
2024-02-13 02:10:41 +01:00

11 KiB
Raw Blame History

slug sidebar_position sidebar_label description
/en/development/building_and_benchmarking_deflate_qpl 73 Building and Benchmarking DEFLATE_QPL How to build Clickhouse and run benchmark with DEFLATE_QPL Codec

Build Clickhouse with DEFLATE_QPL

  • Make sure your host machine meet the QPL required prerequisites

  • deflate_qpl is enabled by default during cmake build. In case you accidentally change it, please double-check build flag: ENABLE_QPL=1

  • For generic requirements, please refer to Clickhouse generic build instructions

Run Benchmark with DEFLATE_QPL

Files list

The folders benchmark_sample under qpl-cmake give example to run benchmark with python scripts:

client_scripts contains python scripts for running typical benchmark, for example:

  • client_stressing_test.py: The python script for query stress test with [1~4] server instances.
  • queries_ssb.sql: The file lists all queries for Star Schema Benchmark
  • allin1_ssb.sh: This shell script executes benchmark workflow all in one automatically.

database_files means it will store database files according to lz4/deflate/zstd codec.

Run benchmark automatically for Star Schema:

$ cd ./benchmark_sample/client_scripts
$ sh run_ssb.sh

After complete, please check all the results in this folder:./output/

In case you run into failure, please manually run benchmark as below sections.

Definition

[CLICKHOUSE_EXE] means the path of clickhouse executable program.

Environment

pip3 install clickhouse_driver numpy

[Self-check for IAA]

$ accel-config list | grep -P 'iax|state'

Expected output like this:

    "dev":"iax1",
    "state":"enabled",
            "state":"enabled",

If you see nothing output, it means IAA is not ready to work. Please check IAA setup again.

Generate raw data

$ cd ./benchmark_sample
$ mkdir rawdata_dir && cd rawdata_dir

Use dbgen to generate 100 million rows data with the parameters: -s 20

The files like *.tbl are expected to output under ./benchmark_sample/rawdata_dir/ssb-dbgen:

Database setup

Set up database with LZ4 codec

$ cd ./database_dir/lz4
$ [CLICKHOUSE_EXE] server -C config_lz4.xml >&/dev/null&
$ [CLICKHOUSE_EXE] client

Here you should see the message Connected to ClickHouse server from console which means client successfully setup connection with server.

Complete below three steps mentioned in Star Schema Benchmark

  • Creating tables in ClickHouse
  • Inserting data. Here should use ./benchmark_sample/rawdata_dir/ssb-dbgen/*.tbl as input data.
  • Converting “star schema” to de-normalized “flat schema”

Set up database with IAA Deflate codec

$ cd ./database_dir/deflate
$ [CLICKHOUSE_EXE] server -C config_deflate.xml >&/dev/null&
$ [CLICKHOUSE_EXE] client

Complete three steps same as lz4 above

Set up database with ZSTD codec

$ cd ./database_dir/zstd
$ [CLICKHOUSE_EXE] server -C config_zstd.xml >&/dev/null&
$ [CLICKHOUSE_EXE] client

Complete three steps same as lz4 above

[self-check] For each codec(lz4/zstd/deflate), please execute below query to make sure the databases are created successfully:

select count() from lineorder_flat

You are expected to see below output:

┌───count()─┐
 119994608 
└───────────┘

[Self-check for IAA Deflate codec]

At the first time you execute insertion or query from client, clickhouse server console is expected to print this log:

Hardware-assisted DeflateQpl codec is ready!

If you never find this, but see another log as below:

Initialization of hardware-assisted DeflateQpl codec failed

That means IAA devices is not ready, you need check IAA setup again.

Benchmark with single instance

  • Before start benchmark, Please disable C6 and set CPU frequency governor to be performance
$ cpupower idle-set -d 3
$ cpupower frequency-set -g performance
  • To eliminate impact of memory bound on cross sockets, we use numactl to bind server on one socket and client on another socket.
  • Single instance means single server connected with single client

Now run benchmark for LZ4/Deflate/ZSTD respectively:

LZ4:

$ cd ./database_dir/lz4 
$ numactl -m 0 -N 0 [CLICKHOUSE_EXE] server -C config_lz4.xml >&/dev/null&
$ cd ./client_scripts
$ numactl -m 1 -N 1 python3 client_stressing_test.py queries_ssb.sql 1 > lz4.log

IAA deflate:

$ cd ./database_dir/deflate
$ numactl -m 0 -N 0 [CLICKHOUSE_EXE] server -C config_deflate.xml >&/dev/null&
$ cd ./client_scripts
$ numactl -m 1 -N 1 python3 client_stressing_test.py queries_ssb.sql 1 > deflate.log

ZSTD:

$ cd ./database_dir/zstd
$ numactl -m 0 -N 0 [CLICKHOUSE_EXE] server -C config_zstd.xml >&/dev/null&
$ cd ./client_scripts
$ numactl -m 1 -N 1 python3 client_stressing_test.py queries_ssb.sql 1 > zstd.log

Now three logs should be output as expected:

lz4.log
deflate.log
zstd.log

How to check performance metrics:

We focus on QPS, please search the keyword: QPS_Final and collect statistics

Benchmark with multi-instances

  • To reduce impact of memory bound on too much threads, We recommend run benchmark with multi-instances.
  • Multi-instance means multiple2 or 4servers connected with respective client.
  • The cores of one socket need to be divided equally and assigned to the servers respectively.
  • For multi-instances, must create new folder for each codec and insert dataset by following the similar steps as single instance.

There are 2 differences:

  • For client side, you need launch clickhouse with the assigned port during table creation and data insertion.
  • For server side, you need launch clickhouse with the specific xml config file in which port has been assigned. All customized xml config files for multi-instances has been provided under ./server_config.

Here we assume there are 60 cores per socket and take 2 instances for example. Launch server for first instance LZ4:

$ cd ./database_dir/lz4
$ numactl -C 0-29,120-149 [CLICKHOUSE_EXE] server -C config_lz4.xml >&/dev/null&

ZSTD:

$ cd ./database_dir/zstd
$ numactl -C 0-29,120-149 [CLICKHOUSE_EXE] server -C config_zstd.xml >&/dev/null&

IAA Deflate:

$ cd ./database_dir/deflate
$ numactl -C 0-29,120-149 [CLICKHOUSE_EXE] server -C config_deflate.xml >&/dev/null&

[Launch server for second instance]

LZ4:

$ cd ./database_dir && mkdir lz4_s2 && cd lz4_s2
$ cp ../../server_config/config_lz4_s2.xml ./
$ numactl -C 30-59,150-179 [CLICKHOUSE_EXE] server -C config_lz4_s2.xml >&/dev/null&

ZSTD:

$ cd ./database_dir && mkdir zstd_s2 && cd zstd_s2
$ cp ../../server_config/config_zstd_s2.xml ./
$ numactl -C 30-59,150-179 [CLICKHOUSE_EXE] server -C config_zstd_s2.xml >&/dev/null&

IAA Deflate:

$ cd ./database_dir && mkdir deflate_s2 && cd deflate_s2
$ cp ../../server_config/config_deflate_s2.xml ./
$ numactl -C 30-59,150-179 [CLICKHOUSE_EXE] server -C config_deflate_s2.xml >&/dev/null&

Creating tables && Inserting data for second instance

Creating tables:

$ [CLICKHOUSE_EXE] client -m --port=9001 

Inserting data:

$ [CLICKHOUSE_EXE] client --query "INSERT INTO [TBL_FILE_NAME] FORMAT CSV" < [TBL_FILE_NAME].tbl  --port=9001
  • [TBL_FILE_NAME] represents the name of a file named with the regular expression: *. tbl under ./benchmark_sample/rawdata_dir/ssb-dbgen.
  • --port=9001 stands for the assigned port for server instance which is also defined in config_lz4_s2.xml/config_zstd_s2.xml/config_deflate_s2.xml. For even more instances, you need replace it with the value: 9002/9003 which stand for s3/s4 instance respectively. If you don't assign it, the port is 9000 by default which has been used by first instance.

Benchmarking with 2 instances

LZ4:

$ cd ./database_dir/lz4
$ numactl -C 0-29,120-149 [CLICKHOUSE_EXE] server -C config_lz4.xml >&/dev/null&
$ cd ./database_dir/lz4_s2
$ numactl -C 30-59,150-179 [CLICKHOUSE_EXE] server -C config_lz4_s2.xml >&/dev/null&
$ cd ./client_scripts
$ numactl -m 1 -N 1 python3 client_stressing_test.py queries_ssb.sql 2  > lz4_2insts.log

ZSTD:

$ cd ./database_dir/zstd
$ numactl -C 0-29,120-149 [CLICKHOUSE_EXE] server -C config_zstd.xml >&/dev/null&
$ cd ./database_dir/zstd_s2
$ numactl -C 30-59,150-179 [CLICKHOUSE_EXE] server -C config_zstd_s2.xml >&/dev/null& 
$ cd ./client_scripts
$ numactl -m 1 -N 1 python3 client_stressing_test.py queries_ssb.sql 2 > zstd_2insts.log

IAA deflate

$ cd ./database_dir/deflate
$ numactl -C 0-29,120-149 [CLICKHOUSE_EXE] server -C config_deflate.xml >&/dev/null&
$ cd ./database_dir/deflate_s2
$ numactl -C 30-59,150-179 [CLICKHOUSE_EXE] server -C config_deflate_s2.xml >&/dev/null&
$ cd ./client_scripts
$ numactl -m 1 -N 1 python3 client_stressing_test.py queries_ssb.sql 2 > deflate_2insts.log

Here the last argument: 2 of client_stressing_test.py stands for the number of instances. For more instances, you need replace it with the value: 3 or 4. This script support up to 4 instances/

Now three logs should be output as expected:

lz4_2insts.log
deflate_2insts.log
zstd_2insts.log

How to check performance metrics:

We focus on QPS, please search the keyword: QPS_Final and collect statistics

Benchmark setup for 4 instances is similar with 2 instances above. We recommend use 2 instances benchmark data as final report for review.

Tips

Each time before launch new clickhouse server, please make sure no background clickhouse process running, please check and kill old one:

$ ps -aux| grep clickhouse
$ kill -9 [PID]

By comparing the query list in ./client_scripts/queries_ssb.sql with official Star Schema Benchmark, you will find 3 queries are not included: Q1.2/Q1.3/Q3.4 . This is because cpu utilization% is very low <10% for these queries which means cannot demonstrate performance differences.