Commit Graph

1067 Commits

Author SHA1 Message Date
taiyang-li
36a98a1628 add performance tests 2023-02-02 20:16:16 +08:00
Nikita Taranov
e7ca90adab fix perf test 2023-01-30 17:11:56 +00:00
Nikita Taranov
ac77808133 fix perf test 2023-01-30 17:11:56 +00:00
Nikita Taranov
52fe7edbd9 better key analysis 2023-01-30 17:11:56 +00:00
Nikita Taranov
2057db68a2 cosmetics 2023-01-30 17:10:45 +00:00
Nikita Taranov
1d45cce03c support for aggr in order 2023-01-30 17:10:45 +00:00
Nikita Taranov
a2c9aeb7c9 stash 2023-01-30 17:10:45 +00:00
taiyang-li
d25740da83 change as request 2023-01-30 16:13:12 +08:00
Alexey Milovidov
bc2f454522
Merge branch 'master' into block-non-float-gorilla-v2 2023-01-28 03:30:12 +03:00
Igor Nikonov
300f78df96
Merge pull request #45567 from ClickHouse/enable-remove-redundant-sorting
Enable query_plan_remove_redundant_sorting optimization by default
2023-01-27 19:14:36 +01:00
Igor Nikonov
41b94b4954 Enable query_plan_remove_redundant_sorting optimization by default 2023-01-24 13:38:21 +00:00
Robert Schulze
97d1bed114
Merge branch 'master' into improve_week_day 2023-01-21 20:40:33 +01:00
Robert Schulze
e6167d6b36
Deprecate Gorilla compression of non-float columns
Reasons:

1. The original Gorilla paper proposed a compression schema for pairs of
   time stamps and double-precision FP values. ClickHouse's Gorilla
   codec only implements compression of the latter and it does not
   impose any data type restrictions.
   - Data types != Float* or (U)Int* (e.g. Decimal, Point etc.) are
     definitely not supposed to be used with Gorilla.
   - (U)Int* types are debatable. The paper only considers
     integers-stored-as-FP-values, a practical use case for which
     Gorilla works well. Standalone integers are not considered which
     makes them at least suspicious.

2. Achieve consistency with FPC, another specialized floating-point
   timeseries codec, which rejects non-float data.

3. On practical datasets, ZSTD is often "good enough" (**) so it should
   be okay to disincentive non-ZSTD codecs a little bit. If needed,
   Delta and DoubleDelta codecs are viable alternative for slowly
   changing (time-series-like) integer sequences.

Since on-prem and hosted users may still have Gorilla-compressed
non-float data, this combination is only deprecated for now. No warning
or error will be emitted. Users are encouraged to migrate
Gorilla-compressed non-float data to an alternative codec. It is planned
to treat Gorilla-compressed non-float columns as "suspicious" six months
after this commit (i.e. in v23.6). Even then, it will still be possible
to set "allow_suspicious_codecs = true" and read and write
Gorilla-compressed non-float data.

(*) Sec. 4.1.2, "Gorilla restricts the value element in its tuple to a
    double floating point type.", https://doi.org/10.14778/2824032.2824078

(**) https://clickhouse.com/blog/optimize-clickhouse-codecs-compression-schema
2023-01-20 17:31:16 +00:00
Igor Nikonov
7ed8fec94f
Revert "Remove redundant sorting" 2023-01-18 18:38:25 +01:00
Igor Nikonov
72066846cf
Merge pull request #43905 from ClickHouse/igor/remove_redundant_order_by
Remove redundant sorting
2023-01-18 13:25:03 +01:00
Igor Nikonov
0cfa08df7a Merge remote-tracking branch 'origin/master' into igor/remove_redundant_order_by 2023-01-17 16:28:17 +00:00
Alexander Tokmakov
df75c24f01
Revert "Disallow Gorilla codec on non-float columns" 2023-01-16 19:14:28 +03:00
Igor Nikonov
a34991cb65 Merge remote-tracking branch 'origin/master' into igor/remove_redundant_order_by 2023-01-16 12:14:02 +00:00
Robert Schulze
bd41c74ddf
Various test, code and docs fixups 2023-01-15 13:47:34 +00:00
Robert Schulze
7023d68536
Fix codecs_int_*.xml 2023-01-15 13:31:45 +00:00
Azat Khuzhin
925fd2c33a tests/performance: do not use scientific notation in hashed_dictionary_sharded
v2: fix few mistakes
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
345c422e28 Add ability to load hashed dictionaries using multiple threads
Right now dictionaries (here I will talk about only
HASHED/SPARSE_HASHED/COMPLEX_KEY_HASHED/COMPLEX_KEY_SPARSE_HASHED)
can load data only in one thread, since it uses one hash table that
cannot be filled from multiple threads.

And in case you have very big dictionary (i.e. 10e9 elements), it can
take a awhile to load them, especially for SPARSE_HASHED variants (and
if you have such amount of elements there, you are likely use
SPARSE_HASHED, since it requires less memory), in my env it takes ~4
hours, which is enormous amount of time.

So this patch add support of shards for dictionaries, number of shards
determine how much hash tables will use this dictionary, also, and which
is more important, how much threads it can use to load the data.

And with 16 threads this works 2x faster, not perfect though, see the
follow up patches in this series.

v0: PARTITION BY
v1: SHARDS 1
v2: SHARDS(1)
v3: tried optimized mod - logical and, but it does not gain even 10%
v4: tried squashing more (max_block_size * shards), but it does not gain even 10% either
v5: move SHARDS into layout parameters (unknown simply ignored)
v6: tune params for perf tests (to avoid too long queries)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:25 +01:00
Nikolai Kochetov
30310df5be
Merge branch 'master' into logical-optimizer-lowcardinality 2023-01-12 18:51:05 +01:00
Nikita Taranov
006fdd32d4
Apply preallocation optimisation more carefully (#44455)
* impl

* add perf test

* fix

* review fixes
2023-01-09 13:30:48 +01:00
Igor Nikonov
2187bdd4cc Disable diagnostics
+ cleanup
+ disable optimization in sort performance test since it removes sorting
  at all
2023-01-06 17:00:05 +00:00
Nikolay Degterinsky
dfe93b5d82
Merge pull request #42284 from Algunenano/perf_experiment
Performance experiment
2022-12-30 03:14:22 +01:00
Alexey Milovidov
79f2e747e4 Remove QuestDB (flaky test) 2022-12-28 12:42:14 +01:00
Raúl Marín
fc1fa82a39
Merge branch 'master' into perf_experiment 2022-12-27 10:51:58 +01:00
Raúl Marín
45d27f461b
Merge branch 'master' into perf_experiment 2022-12-20 09:07:48 +00:00
Kruglov Pavel
37df9b9990
Merge branch 'master' into refactor-schema-inference 2022-12-16 19:13:15 +01:00
Azat Khuzhin
53bac4de71 tests/perf: fix dependency check during DROP
CI [1]:

    DB::Exception: Cannot drop or rename default.hierarchical_dictionary_source_table, because some tables depend on it: default.hierarchical_hashed_array_dictionary, default.hierarchical_flat_dictionary, default.hierarchical_hashed_dictionary. Stack trace:

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/44256/8e67a361a8f14abec6717af09ee997eb25151685/performance_comparison_[1/4]/report.html

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-12-16 15:15:15 +01:00
Nikolay Degterinsky
9b6d31b95d
Merge branch 'master' into perf_experiment 2022-12-13 17:15:07 +01:00
avogar
7375a7d429 Refactor and improve schema inference for text formats 2022-12-07 21:19:27 +00:00
Guo Wangyang
b86686b3f8
Merge branch 'master' into logical-optimizer-lowcardinality 2022-12-07 13:33:25 +08:00
Maksim Kita
1cdc7ab62a
Merge pull request #43556 from Algunenano/interpretation_benchmark
Add benchmark for query interpretation with JOINs
2022-12-01 22:53:02 +03:00
Vladimir C
53dc70a2d0
Merge pull request #38191 from BigRedEye/grace_hash_join
Closes https://github.com/ClickHouse/ClickHouse/issues/11596
2022-11-30 17:01:00 +01:00
Nikolai Kochetov
51439e2c19
Merge pull request #43260 from ClickHouse/read-from-mt-in-io-pool
Read from MergeTree in I/O pool
2022-11-29 12:09:03 +01:00
Nikolai Kochetov
d9fc13b230
Update async_remote_read.xml 2022-11-28 14:00:49 +01:00
Nikita Taranov
8ed5cfc265
Memory bound merging for distributed aggregation in order (#40879)
* impl

* fix style

* make executeQueryWithParallelReplicas similar to executeQuery

* impl for parallel replicas

* cleaner code for remote sorting properties

* update test

* fix

* handle when nodes of old versions participate

* small fixes

* temporary enable for testing

* fix after merge

* Revert "temporary enable for testing"

This reverts commit cce7f8884c.

* review fixes

* add bc test

* Update src/Core/Settings.h
2022-11-28 00:41:31 +01:00
Nikita Taranov
d1c258cf20
Add xxh3 hash function (#43411)
* impl

* try fix

* add docs

* add test

* rm unused file

* excellent
2022-11-26 00:14:08 +01:00
Nikolai Kochetov
4632e7c644 Add max_streams_for_merge_tree_reading setting. 2022-11-25 17:14:22 +00:00
Nikolai Kochetov
dfd3976040
Update async_remote_read.xml 2022-11-25 14:53:45 +01:00
Igor Nikonov
236e7e3989 Small fixes 2022-11-25 12:04:12 +00:00
Igor Nikonov
20e67b7140 Merge remote-tracking branch 'origin/master' into HEAD 2022-11-24 13:10:37 +00:00
Nikolai Kochetov
e79c91947a
Update async_remote_read.xml 2022-11-24 12:35:02 +01:00
Raúl Marín
e910648c5d Add benchmark for query interpretation with JOINs 2022-11-23 13:15:35 +01:00
Raúl Marín
ed0c174c0c Merge remote-tracking branch 'blessed/master' into perf_experiment 2022-11-21 11:02:31 +01:00
Guo Wangyang
7d6ff90e34
Merge branch 'master' into logical-optimizer-lowcardinality 2022-11-20 09:56:50 +08:00
Nikolai Kochetov
5da1d893fd
Merge branch 'master' into read-from-mt-in-io-pool 2022-11-18 21:10:45 +01:00
Nikita Taranov
7beb58b0cf
Optimize merge of uniqExact without_key (#43072)
* impl for uniqExact

* rm unused (read|write)Text methods

* fix style

* small fixes

* impl for variadic uniqExact

* refactor

* fix style

* more agressive inlining

* disable if max_threads=1

* small improvements

* review fixes

* Revert "rm unused (read|write)Text methods"

This reverts commit a7e7480584.

* encapsulate is_able_to_parallelize_merge in Data

* encapsulate is_exact & argument_is_tuple in Data
2022-11-17 13:19:02 +01:00
Kruglov Pavel
1b68f605a2
Merge pull request #42761 from AlfVII/fix-slow-json-extract-with-low-cardinality
Fixed slowness in JSONExtract with LowCardinality(String) tuples
2022-11-17 12:49:18 +01:00
Raúl Marín
97d6fc3071 Merge remote-tracking branch 'blessed/master' into perf_experiment 2022-11-17 11:48:46 +01:00
Nikolai Kochetov
10f449c6c1 Add a query to perftest. 2022-11-15 18:08:03 +00:00
李扬
1de5bb2392
Add function canonicalRand (#43124)
* add function canonicalRand

* add perf test

* revert rand.xml
2022-11-15 00:27:19 +01:00
Wangyang Guo
887779e8d8 Add perftest: low_cardinality_query 2022-11-08 17:19:18 +08:00
Kruglov Pavel
e9a01a1946
Merge branch 'master' into fix-slow-json-extract-with-low-cardinality 2022-11-04 11:13:46 +01:00
Nikolay Degterinsky
30ad1a6826
Merge branch 'master' into perf_experiment 2022-11-03 02:18:21 +03:00
vdimir
6a4247ca32
Merge branch 'master' into grace_hash_join 2022-10-31 09:54:37 +00:00
Alfonso Martinez
9e33b13737 Merge remote-tracking branch 'upstream/master' into fix-slow-json-extract-with-low-cardinality 2022-10-31 08:46:55 +01:00
avogar
fe0aea2e3a Support parallel parsing for LineAsString input format 2022-10-28 21:56:09 +00:00
Alfonso Martinez
c37b154254 Added reverted files and fixes for failing fuzzer tests 2022-10-28 12:37:59 +02:00
Raúl Marín
891484b462 Merge remote-tracking branch 'blessed/master' into perf_experiment 2022-10-27 13:17:07 +02:00
Vladimir C
31e8f92cd9
Merge pull request #42664 from ClickHouse/vdimir/followup-42274 2022-10-27 12:20:46 +02:00
vdimir
14d0f6457b
Add tests and doc for some url-related functions 2022-10-26 10:52:57 +00:00
Raúl Marín
9395f77421 Merge remote-tracking branch 'blessed/master' into perf_experiment 2022-10-26 11:46:17 +02:00
Raúl Marín
6e0a9452e7 Merge remote-tracking branch 'blessed/master' into perf_experiment 2022-10-25 15:25:06 +02:00
Anton Popov
eed21ad4ca
Revert "Low cardinality cases moved to the function for its corresponding type" 2022-10-25 01:30:32 +02:00
vdimir
adb63a5583
Merge branch 'master' into grace_hash_join 2022-10-17 12:32:56 +00:00
Raúl Marín
46616d341c Make explain_ast even larger 2022-10-14 14:06:44 +02:00
AlfVII
5b2703c412
Merge branch 'master' into fix-slow-json-extract-with-low-cardinality 2022-10-11 13:28:07 +02:00
vdimir
ff55c369bc
Merge branch 'tmp-data-followup' 2022-10-05 18:10:05 +00:00
BoloniniD
9dd15998c7 Print nicer exception if BLAKE3 is unavailable 2022-10-05 00:11:41 +03:00
Sergei Trifonov
a592150ae7
Merge branch 'master' into fix-slow-json-extract-with-low-cardinality 2022-10-03 18:10:07 +02:00
Vitaly Baranov
65c61877c7
Merge pull request #33435 from BoloniniD/BLAKE3
Integrating Rust code into ClickHouse
2022-10-03 15:25:06 +02:00
Anton Popov
77eacfbbe0
Update bitmap_array_element.xml 2022-10-03 14:56:34 +02:00
BoloniniD
f5c57cd4a8 Fix test queries 2022-10-03 00:20:44 +03:00
flynn
7109aff2f0 fix style 2022-09-30 18:04:51 +08:00
flynn
1f51a86285 add test 2022-09-30 18:04:51 +08:00
vdimir
7ebc297f4c
Merge branch 'master' into pr/BigRedEye/38191 2022-09-30 09:40:47 +00:00
BoloniniD
55c79230b3 Merge branch 'master' of github.com:ClickHouse/ClickHouse into BLAKE3 2022-09-29 23:53:25 +03:00
Sergei Trifonov
976804b5db
Merge branch 'master' into fix-slow-json-extract-with-low-cardinality 2022-09-26 12:48:49 +02:00
Alfonso Martinez
9cb74c7807 Low cardinality cases moved to the function for its corresponding type 2022-09-23 14:12:37 +02:00
Igor Nikonov
8c93a9adda Merge remote-tracking branch 'origin/master' into distinct_in_order_wo_order_by 2022-09-22 07:40:14 +00:00
Nikita Taranov
930d050b55
fix (#41648) 2022-09-21 19:04:03 +02:00
Nikita Taranov
100c055510
Prefetching in aggregation (#39304)
* impl

* stash

* clean up

* do not apply when HT is small

* make branch static

* also in merge

* do not hardcode look ahead value

* fix

* apply to methods with cheap key calculation

* more tests

* silence tidy

* fix build

* support HashMethodKeysFixed

* apply during merge only for cheap

* stash

* fixes

* rename method

* add feature flag

* cache prefetch threshold value

* fix

* fix

* Update HashMap.h

* fix typo

* 256KB as default l2 size

Co-authored-by: Alexey Milovidov <milovidov@clickhouse.com>
2022-09-21 18:59:07 +02:00
BoloniniD
55fcb98f29 Merge branch 'master' of github.com:ClickHouse/ClickHouse into BLAKE3 2022-09-19 21:53:14 +03:00
Igor Nikonov
aca810ba62 Merge remote-tracking branch 'origin/master' into distinct_in_order_wo_order_by 2022-09-19 18:34:38 +00:00
Kruglov Pavel
4c3194eefe
Merge pull request #41286 from azat/utf8-fix
Do not allow invalid sequences influence other rows in lowerUTF8/upperUTF8
2022-09-19 14:07:10 +02:00
Azat Khuzhin
bd54a6c45d tests: add perf test for lowerUTF8()/upperUTF8()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-09-17 11:16:45 +02:00
BoloniniD
452ef4435b Merge branch 'master' of github.com:ClickHouse/ClickHouse into BLAKE3 2022-09-16 20:05:56 +03:00
Nikita Taranov
ee31be4286 impl 2022-09-16 15:41:15 +02:00
Igor Nikonov
eeecaf7a31 Merge remote-tracking branch 'origin/master' into distinct_in_order_wo_order_by 2022-09-16 10:30:52 +00:00
Raúl Marín
c3ff66bd9d
Implement batch processing for aggregate functions with multiple nullable arguments (#41058)
* Implement batch processing for aggregate functions with multiple nullable arguments

* Fix broken perf test

* Improve filter handling in addBatchSinglePlace with nullable arguments

* Fix detecting the Null filter usage
2022-09-15 23:51:38 +02:00
Raúl Marín
6dac509739
Speed up reading uniqState (#41089)
* Speed up reading UniquesHashSet

* Improve uniq serialization tests
2022-09-15 23:41:15 +02:00
Igor Nikonov
8a4806e8c0 Fix test
- remove perfomance queries which can be unstable
2022-09-15 10:53:42 +00:00
BoloniniD
e8bcbcd016
Merge branch 'master' into BLAKE3 2022-09-09 11:48:31 +03:00
vdimir
6d4b6c452a
Merge branch 'master' into grace_hash_join 2022-09-07 08:00:14 +00:00
Nikita Taranov
7c4f42d014
Skip empty literals in lz4 decompression (#40142) 2022-09-06 13:58:26 +02:00
Alexey Milovidov
193cd1b3b2
Merge pull request #39138 from nickitat/control_block_size_in_aggregator
Control block size in aggregator
2022-09-04 04:51:00 +03:00
vdimir
e21763e759
remove new setting from join_set_filter.xml 2022-08-29 09:49:13 +00:00