ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-12-13 09:52:38 +00:00

Author	SHA1	Message	Date
Azat Khuzhin	64e3677961	Avoid double hash calculation in HashedDictionary::getShard(StringRef) Previously it was written this way because getShard() was a simple module operation. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	2783850f08	Minor review fixes in HashedDictionary Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	6e0a7add93	Completelly exception safe HashedDictionary dtor Previously there was one (even though very unlikely) case when the dtor can throw - logging code or ThreadPool::wait. Just guard the dtor with try/catch and done with it. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	74def83c5d	Destroy hashtables for hashed dictionary in parallel only for sharded dict Since there can be multiple hashtables, since each attribute uses it's own hashtable. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	1c0e0ea1e4	Disable sharded dictionaries with updatable sources Support of sharded dictionary for updatable sources is questionable since: - sharded dictionary developed for hashed dictionary with a huge number of keys - updatable source requires storing the whole table in memory (due to how reload works) - also it is an open question will it have some benefits from the updatable source or not, since using updatable source with a huge number of changes in the source does not looks optimal and on the other side if there are small amount of changes the you don't need sharded dictionary at all Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	c97991fce1	Use shared arena for HashedDictionary::blockToAttributes() This should decrease number of allocations. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	01b100da61	Use shared arena in ParallelDictionaryLoader::createShardSelector() (and add missing rollback) This should decrease number of allocations. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	64874824b4	Minor review fixes in HashedDictionary Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	77c1f07636	Make HashedDictionary::~HashedDictionary exception safe Before it was possible for the desturctor to throw, in case of thread allocation fails, rewrite it to trySchedule() and do sequential destroy in this case. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	a3f189e191	Optimize sharded dictionaries with skewed distribution In case of skewed distribution simple division by module will not give you good distribution between shards and eventually this can lead to performance the same as non-sharded dictionary (except for it will occupy +1 thread for Block::scatter). But if HashedDictionary::blockToAttributes() will not have calls to HashedDictionary::getShard() this can be fixed by using a more complex key-to-shard (getShard()) mapping. And actually you do not need to call getShard() in blockToAttributes() you can simply use passed shard, and that's it. And by wrapping key with intHash64() in getShard() skewed distribution can be fixed. Note, that previously I tried similar approach but did not removed getShard() from blockToAttributes(), that's why it failed. And now it works almost as fast as with simple createBlockSelector(), just 13.6% slower (18.75min vs 16.5min, with 16 threads). Note, that I've also tried to add libdivide for this, but it does not improves the performance. I've also tried the approach without scatter, and it works 20% slower then this one (22.5min VS 18.75min, with 16 threads). v2: Use intHashCRC32() over intHash64() for HashedDictionary::getShard() (with intHash64() it works very slower, almost 2x slower, there was 18min with 32 threads) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	655a564280	Parallel hash tables destroy for hashed dictionaries Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	99063b152f	Allow to configure queue backlog of the parallel hashed dictionary loader v2: Decrease default parallel_queue_backlog to 10000 (same speed) v3: Rename parallel_queue_backlog to per_shard_load_backlog v3: Rename per_shard_load_backlog to shard_load_queue_backlog v4: Fix documentation Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	79ad81dfdf	Implement separate queue for parallel loader of hashed dictionaries Previous patches in this series has a bottleneck in rehash(). This is the most slowest operation when insert lots of rows into the hashtable and eventually all that thread pool sometimes work as the most slowest thread since we did not have any queue of blocks. This patch adds such queue and now it scales linearly, so initialy with 1 thread I had ~4 hours for 10e9 elements (UInt64 key, UInt16 value), after this patch it works in 16 minutes with 16 threads (well actually I have to use 32 threads because of distribution of data in the source table). And now with 16 threads it works 16 times faster. Also this patch adds more optimal block splitting for the non-complex dictionaries, and usual block splitting for complex dictionaries. But anyway this moves the overhead from the loading into the hashtable threads out to the reader thread, and this is better, since reader does not uses that much CPU. v2: fix use-after-free on failed load (add missing wait in dtor) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	5d0fd3cdc4	Remove sharded overhead for non-sharded hashed dictionaries By adding one more template parameter - HashedDictionary<sharded> (yes, it is already too much of them, for the template class that has explicit instantion). Since perf tests [1] shows 20% slowdown. [1]: https://s3.amazonaws.com/clickhouse-test-reports/40003/8f0cf2d6b8a7df511afe901331d5e2c7b06c0b4d/performance_comparison_[1/4]/report.html Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	345c422e28	Add ability to load hashed dictionaries using multiple threads Right now dictionaries (here I will talk about only HASHED/SPARSE_HASHED/COMPLEX_KEY_HASHED/COMPLEX_KEY_SPARSE_HASHED) can load data only in one thread, since it uses one hash table that cannot be filled from multiple threads. And in case you have very big dictionary (i.e. 10e9 elements), it can take a awhile to load them, especially for SPARSE_HASHED variants (and if you have such amount of elements there, you are likely use SPARSE_HASHED, since it requires less memory), in my env it takes ~4 hours, which is enormous amount of time. So this patch add support of shards for dictionaries, number of shards determine how much hash tables will use this dictionary, also, and which is more important, how much threads it can use to load the data. And with 16 threads this works 2x faster, not perfect though, see the follow up patches in this series. v0: PARTITION BY v1: SHARDS 1 v2: SHARDS(1) v3: tried optimized mod - logical and, but it does not gain even 10% v4: tried squashing more (max_block_size * shards), but it does not gain even 10% either v5: move SHARDS into layout parameters (unknown simply ignored) v6: tune params for perf tests (to avoid too long queries) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:25 +01:00
Han Fei	6ed4570f73	Merge branch 'master' into regexp-tree-dictionary	2023-01-10 15:36:30 +01:00
Maksim Kita	83a8d3ed25	RangeHashedDictionary update field primary key fix	2023-01-09 13:52:15 +01:00
Anton Popov	1f32ffedf8	Merge pull request #43221 from ClickHouse/refactoring-ip-types Replace domain IP types (IPv4, IPv6) with native	2023-01-07 12:01:21 +01:00
Han Fei	a4427a05c2	fix build	2023-01-06 14:30:00 +01:00
Kseniia Sumarokova	573d3283b0	Merge pull request #44327 from kssenii/use-new-named-collections-code-2 Replace old named collections code with new (from #43147) part 2	2023-01-06 13:06:26 +01:00
Han Fei	cac7f65b40	fix build	2023-01-06 11:49:34 +01:00
Han Fei	744084375c	fix build	2023-01-05 22:27:45 +01:00
Han Fei	ae5ee8194b	fix check style	2023-01-05 17:52:05 +01:00
Han Fei	f2a9eea995	write docs and optimize regex compile	2023-01-05 17:38:01 +01:00
Yakov Olkhovskiy	7a5a36cbed	Merge branch 'master' into refactoring-ip-types	2023-01-04 11:11:06 -05:00
Han Fei	65ef7b4adc	fix build	2023-01-04 12:45:12 +01:00
Nikolay Degterinsky	aa41e9b775	Merge pull request #44857 from evillique/fix-msan-build Try to fix MSan build	2023-01-04 04:31:28 +01:00
Han Fei	00e717d7ce	some improvement	2023-01-03 21:41:51 +01:00
kssenii	67509aa2d5	Merge remote-tracking branch 'upstream/master' into use-new-named-collections-code-2	2023-01-03 16:41:30 +01:00
Nikolay Degterinsky	c4431e9931	Fix MSan build	2023-01-03 02:21:26 +00:00
Alexey Milovidov	e855d3519a	Merge branch 'master' into refactoring-ip-types	2023-01-02 21:58:53 +03:00
Han Fei	97cdfdceea	fix style check	2022-12-31 20:36:23 +01:00
Han Fei	c25207fc21	Merge branch 'master' into regexp-tree-dictionary	2022-12-30 17:31:44 +01:00
Han Fei	83c6517fcf	try to fix flaky tests	2022-12-30 17:31:28 +01:00
Nikolay Degterinsky	dfe93b5d82	Merge pull request #42284 from Algunenano/perf_experiment Performance experiment	2022-12-30 03:14:22 +01:00
Han Fei	fa1baef448	add check sanitizer	2022-12-29 23:00:55 +01:00
Han Fei	50905e2005	address comments	2022-12-29 20:13:46 +01:00
Alexey Milovidov	33bcd07be5	Remove old code	2022-12-28 19:02:06 +01:00
Raúl Marín	fc1fa82a39	Merge branch 'master' into perf_experiment	2022-12-27 10:51:58 +01:00
mayamika	f66a0c01ad	Add null dictionary source	2022-12-24 17:11:30 +03:00
Han Fei	4859197c34	fix build	2022-12-22 23:59:04 +01:00
Han Fei	2bb952a796	fix build	2022-12-22 23:27:10 +01:00
Han Fei	efa963fb0e	support regex tree dictionary	2022-12-22 22:42:11 +01:00
Yakov Olkhovskiy	a8cb29da4b	Merge branch 'master' into refactoring-ip-types	2022-12-21 23:56:24 -05:00
Raúl Marín	45d27f461b	Merge branch 'master' into perf_experiment	2022-12-20 09:07:48 +00:00
kssenii	30547d2dcd	Replace old named collections code for url	2022-12-17 00:24:05 +01:00
Vitaly Baranov	fb8aca8319	Merge pull request #44158 from vitlibar/improve-referential-deps Improve referential dependencies	2022-12-14 21:17:02 +01:00
Han Fei	d3f8bb3f52	Merge branch 'master' into regexp-tree-dictionary	2022-12-14 16:29:17 +01:00
Han Fei	2272d712e2	reimplement	2022-12-14 16:28:57 +01:00
Nikolay Degterinsky	9b6d31b95d	Merge branch 'master' into perf_experiment	2022-12-13 17:15:07 +01:00

1 2 3 4 5 ...

1269 Commits