mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-18 13:42:02 +00:00
e6167d6b36
Reasons: 1. The original Gorilla paper proposed a compression schema for pairs of time stamps and double-precision FP values. ClickHouse's Gorilla codec only implements compression of the latter and it does not impose any data type restrictions. - Data types != Float* or (U)Int* (e.g. Decimal, Point etc.) are definitely not supposed to be used with Gorilla. - (U)Int* types are debatable. The paper only considers integers-stored-as-FP-values, a practical use case for which Gorilla works well. Standalone integers are not considered which makes them at least suspicious. 2. Achieve consistency with FPC, another specialized floating-point timeseries codec, which rejects non-float data. 3. On practical datasets, ZSTD is often "good enough" (**) so it should be okay to disincentive non-ZSTD codecs a little bit. If needed, Delta and DoubleDelta codecs are viable alternative for slowly changing (time-series-like) integer sequences. Since on-prem and hosted users may still have Gorilla-compressed non-float data, this combination is only deprecated for now. No warning or error will be emitted. Users are encouraged to migrate Gorilla-compressed non-float data to an alternative codec. It is planned to treat Gorilla-compressed non-float columns as "suspicious" six months after this commit (i.e. in v23.6). Even then, it will still be possible to set "allow_suspicious_codecs = true" and read and write Gorilla-compressed non-float data. (*) Sec. 4.1.2, "Gorilla restricts the value element in its tuple to a double floating point type.", https://doi.org/10.14778/2824032.2824078 (**) https://clickhouse.com/blog/optimize-clickhouse-codecs-compression-schema
54 lines
2.0 KiB
XML
54 lines
2.0 KiB
XML
<test>
|
|
<settings>
|
|
<allow_suspicious_codecs>1</allow_suspicious_codecs>
|
|
</settings>
|
|
|
|
<substitutions>
|
|
<substitution>
|
|
<name>codec</name>
|
|
<values>
|
|
<value>NONE</value> <!-- as a baseline -->
|
|
<value>LZ4</value>
|
|
<value>ZSTD</value>
|
|
<value>Delta</value>
|
|
<value>T64</value>
|
|
<value>DoubleDelta</value>
|
|
</values>
|
|
</substitution>
|
|
<substitution>
|
|
<name>type</name>
|
|
<values>
|
|
<value>UInt64</value>
|
|
</values>
|
|
</substitution>
|
|
<substitution>
|
|
<name>seq_type</name>
|
|
<values>
|
|
<value>seq</value>
|
|
<value>mon</value>
|
|
<value>rnd</value>
|
|
</values>
|
|
</substitution>
|
|
<substitution>
|
|
<name>num_rows</name>
|
|
<values>
|
|
<value>20000000</value>
|
|
</values>
|
|
</substitution>
|
|
</substitutions>
|
|
|
|
<create_query>CREATE TABLE IF NOT EXISTS codec_{seq_type}_{type}_{codec} (n {type} CODEC({codec}))
|
|
ENGINE = MergeTree PARTITION BY tuple() ORDER BY tuple()
|
|
SETTINGS parts_to_delay_insert = 5000, parts_to_throw_insert = 5000;</create_query>
|
|
<create_query>system stop merges</create_query>
|
|
|
|
<!-- Using limit to make query finite, allowing it to be run multiple times in a loop, reducing mean error -->
|
|
<query>INSERT INTO codec_seq_{type}_{codec} (n) SELECT number FROM system.numbers LIMIT {num_rows} SETTINGS max_threads=1</query>
|
|
<query>INSERT INTO codec_mon_{type}_{codec} (n) SELECT number*512+(intHash64(number)%512) FROM system.numbers LIMIT {num_rows} SETTINGS max_threads=1</query>
|
|
<query>INSERT INTO codec_rnd_{type}_{codec} (n) SELECT intHash64(number) FROM system.numbers LIMIT {num_rows} SETTINGS max_threads=1</query>
|
|
|
|
<drop_query>system start merges</drop_query>
|
|
<drop_query>DROP TABLE IF EXISTS codec_{seq_type}_{type}_{codec}</drop_query>
|
|
|
|
</test>
|