ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-12-12 09:22:05 +00:00

Author	SHA1	Message	Date
Kseniia Sumarokova	6a0d9a37ce	Merge branch 'master' into fix-mysql-named-collection	2023-04-14 12:03:51 +02:00
kssenii	ad48e1d010	Fox	2023-04-13 19:36:25 +02:00
Raúl Marín	2b70e08f23	Don't count unreserved bytes in Arenas as read_bytes	2023-04-13 12:43:24 +02:00
Robert Schulze	7a21d5888c	Remove -Wshadow suppression which leaked into global namespace	2023-04-13 08:46:40 +00:00
Raúl Marín	da9a539cf7	Reduce the usage of Arena.h	2023-04-13 10:31:32 +02:00
Robert Schulze	f41354ccd6	Merge pull request #48671 from ClickHouse/rs/gcc-removal Remove GCC remainders	2023-04-13 10:15:35 +02:00
Alexander Tokmakov	75f18b1198	Revert "Check simple dictionary key is native unsigned integer"	2023-04-13 01:32:19 +03:00
Robert Schulze	3f7ce60e03	Merge branch 'master' into rs/gcc-removal	2023-04-12 22:17:04 +02:00
Anton Popov	1520f3e924	Merge pull request #48335 from lzydmxy/check_sample_dict_key_is_correct Check simple dictionary key is native unsigned integer	2023-04-12 14:27:39 +02:00
Robert Schulze	05606a8835	Clean up GCC warning pragmas	2023-04-11 18:21:08 +00:00
Han Fei	bf28be8837	fix 02504_regexp_dictionary_table_source	2023-04-11 17:07:44 +02:00
Han Fei	6c33180ac8	Merge branch 'master' into hanfei/refine-expmsg	2023-04-11 13:19:38 +02:00
Han Fei	363b97fab8	refine some messages of exception in regexp tree	2023-04-11 11:45:29 +02:00
Alexey Milovidov	d259217cf3	Merge pull request #48570 from azat/build/logger_useful Remove superfluous includes of logger_userful.h from headers	2023-04-11 03:56:39 +03:00
Alexey Milovidov	93b8fc74ef	Merge pull request #48571 from azat/dict/hashed/uncaught-exception-fix Fix uncaught exception in case of parallel loader for hashed dictionaries	2023-04-10 23:11:01 +03:00
Azat Khuzhin	79b83c4fd2	Remove superfluous includes of logger_userful.h from headers Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-04-10 17:59:30 +02:00
Azat Khuzhin	211cea5e7c	Fix uncaught exception in case of parallel loader for hashed dictionaries Since ThreadPool::wait() rethrows the first exception (if any): <details> <summary>stacktrace</summary> 2023.04.09 12:53:33.629333 [ 22361 ] {} <Fatal> BaseDaemon: (version 22.13.1.1, build id: 5FB01DCAAFFF19F0A9A61E253567F90685989D2F) (from thread 23032) Terminate called for uncaught exception: 2023.04.09 12:53:33.630179 [ 23645 ] {} <Fatal> BaseDaemon: 2023.04.09 12:53:33.630213 [ 23645 ] {} <Fatal> BaseDaemon: Stack trace: 0x7f68b00baccc 0x7f68b006bef2 0x7f68b0056472 0x112a42fe 0x1c17f2a3 0x1c17f238 0xbf4bc3b 0x13961c6d 0x138ee529 0x138ed6bc 0x138dd2f0 0x138dd9c6 0x1571d0dd 0x16197c1f 0x161a231e 0x1619fc93 0x161a51b9 0x11151759 0x1115454e 0x7f68b00b8fd4 0x7f68b013966c 2023.04.09 12:53:33.630247 [ 23645 ] {} <Fatal> BaseDaemon: 3. ? @ 0x7f68b00baccc in ? 2023.04.09 12:53:33.630263 [ 23645 ] {} <Fatal> BaseDaemon: 4. gsignal @ 0x7f68b006bef2 in ? 2023.04.09 12:53:33.630273 [ 23645 ] {} <Fatal> BaseDaemon: 5. abort @ 0x7f68b0056472 in ? 2023.04.09 12:53:33.648815 [ 23645 ] {} <Fatal> BaseDaemon: 6. ./.build/./src/Daemon/BaseDaemon.cpp:456: terminate_handler() @ 0x112a42fe in /usr/lib/debug/usr/bin/clickhouse.debug 2023.04.09 12:53:33.651484 [ 23645 ] {} <Fatal> BaseDaemon: 7. ./.build/./contrib/llvm-project/libcxxabi/src/cxa_handlers.cpp:61: std::__terminate(void (*)()) @ 0x1c17f2a3 in /usr/lib/debug/usr/bin/clickhouse.debug 2023.04.09 12:53:33.654080 [ 23645 ] {} <Fatal> BaseDaemon: 8. ./.build/./contrib/llvm-project/libcxxabi/src/cxa_handlers.cpp:79: std::terminate() @ 0x1c17f238 in /usr/lib/debug/usr/bin/clickhouse.debug 2023.04.09 12:53:35.025565 [ 23645 ] {} <Fatal> BaseDaemon: 9. ? @ 0xbf4bc3b in /usr/lib/debug/usr/bin/clickhouse.debug 2023.04.09 12:53:36.495557 [ 23645 ] {} <Fatal> BaseDaemon: 10. DB::ParallelDictionaryLoader<(DB::DictionaryKeyType)0, true, true>::~ParallelDictionaryLoader() @ 0x13961c6d in /usr/lib/debug/usr/bin/clickhouse.debug 2023.04.09 12:53:37.833142 [ 23645 ] {} <Fatal> BaseDaemon: 11. DB::HashedDictionary<(DB::DictionaryKeyType)0, true, true>::loadData() @ 0x138ee529 in /usr/lib/debug/usr/bin/clickhouse.debug 2023.04.09 12:53:39.124989 [ 23645 ] {} <Fatal> BaseDaemon: 12. DB::HashedDictionary<(DB::DictionaryKeyType)0, true, true>::HashedDictionary(DB::StorageID const&, DB::DictionaryStructure const&, std::__1::shared_ptr<DB::IDictionarySource>, DB::HashedDictionaryStorageConfiguration const&, std::__1::shared_ptr<DB::Block>) @ 0x138ed6bc in /usr/lib/debug/usr/bin/clickhouse.debug </details> Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-04-09 22:52:51 +02:00
Alexey Milovidov	09ea79aaf7	Add support for {server_uuid} macro	2023-04-09 03:04:26 +02:00
ltrk2	4544abc7d6	Remove dead code and unused dependencies	2023-04-06 11:37:12 -07:00
lzydmxy	529e1466df	use check_dictionary_primary_key instead of check_sample_dict_key_is_correct	2023-04-04 12:04:17 +08:00
lzydmxy	368c120f42	check sample dictionary key is native unsigned integer	2023-04-03 15:48:40 +08:00
kssenii	f96e7b59a2	Better	2023-04-01 13:19:07 +02:00
kssenii	1721b70070	Merge remote-tracking branch 'upstream/master' into ilejn-dict-named-collection	2023-04-01 13:18:26 +02:00
Alexey Milovidov	e982fb9f1c	Merge pull request #47880 from azat/threadpool-introspection ThreadPool metrics introspection	2023-03-30 01:27:31 +03:00
Azat Khuzhin	f38a7aeabe	ThreadPool metrics introspection There are lots of thread pools and simple local-vs-global is not enough already, it is good to know which one in particular uses threads. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-03-29 10:46:59 +02:00
Vladimir C	d32c285d17	Merge branch 'master' into vdimir/direct-dict-async-read	2023-03-28 12:41:20 +02:00
Han Fei	e3afa5090f	Merge pull request #47218 from hanfei1991/hanfei/optimize-regexp-tree-1 Refine OptimizeRegularExpression Function and RegexpTreeDict	2023-03-27 15:23:01 +02:00
vdimir	f6de216041	PullingAsyncPipelineExecutor for Direct dictionary with ClickHouse source	2023-03-27 09:52:26 +00:00
Kruglov Pavel	3ee12e21fb	Merge branch 'master' into non-blocking-connect	2023-03-23 20:53:44 +01:00
Han Fei	575c4263a3	address comments	2023-03-22 17:47:25 +01:00
avogar	38e44861ae	Fix possible race conditions	2023-03-21 16:01:54 +00:00
Kseniia Sumarokova	3c550b4314	Merge pull request #46647 from kssenii/named-collections-finish Named collections: finish replacing old code for storages	2023-03-21 12:36:46 +01:00
Robert Schulze	5b036a1a3b	More preparation for libcxx(abi), llvm, clang-tidy 16 (follow-up to #47722 )	2023-03-20 12:55:03 +00:00
kssenii	cae3b335d6	Merge remote-tracking branch 'upstream/master' into named-collections-finish	2023-03-20 11:23:22 +01:00
kssenii	bb0beb7449	Merge remote-tracking branch 'upstream/master' into named-collections-finish	2023-03-17 13:02:36 +01:00
Sema Checherinda	3c6deddd1d	work with comments on PR	2023-03-16 19:55:58 +01:00
Han Fei	e0954ce7be	fix compile	2023-03-16 00:22:05 +01:00
Han Fei	a532503466	Merge branch 'master' into hanfei/optimize-regexp-tree-1	2023-03-15 17:56:01 +01:00
Han Fei	424e8df9ad	fix style	2023-03-15 16:01:12 +01:00
Han Fei	d78a9e03ad	refine	2023-03-15 15:38:11 +01:00
Kseniia Sumarokova	a9a0d2f5c4	Merge pull request #46524 from artem-yadr/master Support for MongoDB Replica Set URI with readPreference and host:port enum in MongoDB dictionaries	2023-03-07 11:40:33 +01:00
Kseniia Sumarokova	386663953c	Merge branch 'master' into named-collections-finish	2023-03-03 12:23:38 +01:00
artem-yadr	e1352adced	Update MongoDBDictionarySource.cpp	2023-03-01 12:50:03 +03:00
Han Fei	e77dd81036	fix	2023-02-24 19:48:46 +01:00
Han Fei	e8527e720b	refine regexp tree dictionary	2023-02-24 13:08:27 +01:00
kssenii	a54b011670	Finish for mysql	2023-02-20 21:37:38 +01:00
Alexey Milovidov	d8cda3dbb8	Remove PVS-Studio	2023-02-19 23:30:05 +01:00
artem-yadr	08734d4dc0	poco changes are now used in MongoDBDictionarySource	2023-02-17 14:56:21 +03:00
Maksim Kita	c469e10092	Dictionaries DictionaryStorageFetchRequest fix	2023-02-16 12:17:02 +01:00
Nikolay Degterinsky	6e4b660033	Move MongoDB and PostgreSQL sources to Sources folder	2023-02-14 22:35:10 +00:00
Ilya Golshtein	38ea27489c	secure in named collections - small cleanup	2023-02-13 01:04:38 +03:00
Alexey Milovidov	44bd95a410	Merge pull request #46167 from ClickHouse/rs/reject-dos-patterns Reject hyperscan regexes which are prone to ReDoS	2023-02-11 06:04:03 +03:00
Ilya Golshtein	3b72b3f13b	secure in named collection - switched to specific_args, tests added	2023-02-10 13:42:11 +03:00
Robert Schulze	74937cf27b	Reject DoS-prone hyperscan regexes	2023-02-09 17:17:35 +00:00
Robert Schulze	e490ec91d9	Merge branch 'master' into rs/fix-fragile-linking	2023-02-09 11:33:59 +01:00
Alexander Tokmakov	8101b044fa	Merge pull request #46091 from azat/sanity-assertions Sanity assertions for closing file descriptors	2023-02-09 01:02:03 +03:00
Ilya Golshtein	f9e81ca7de	secure in named collections - initial	2023-02-08 23:30:16 +03:00
Maksim Kita	77ec255d7c	Merge pull request #45396 from kitaisreal/hashed-dictionary-sharded-nullable-fix HashedDictionary sharded fix nullable values	2023-02-08 17:15:10 +03:00
Robert Schulze	6ff232d782	Merge branch 'master' into rs/fix-fragile-linking	2023-02-08 12:51:12 +01:00
Alexey Milovidov	55c3bbb739	Fix assertion in statistical functions	2023-02-08 00:09:41 +01:00
Robert Schulze	10af0b3e49	Reduce redundancies	2023-02-07 12:27:23 +00:00
Azat Khuzhin	8cc41b7f41	Check return value of ::close() Note, that according close(2), EINTR should not be retriable for close() Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-07 11:28:22 +01:00
Han Fei	d1d893275a	fix	2023-02-06 18:46:23 +01:00
Han Fei	eb76041312	address comments and add one more test	2023-02-06 17:26:20 +01:00
Maksim Kita	e8d66fb1a2	HashedDictionary sharded fix nullable values	2023-02-06 10:50:58 +01:00
Robert Schulze	84b9ff450f	Fix terribly broken, fragile and potentially cyclic linking Sorry for the clickbaity title. This is about static method ConnectionTimeouts::getHTTPTimeouts(). It was be declared in header IO/ConnectionTimeouts.h, and defined in header IO/ConnectionTimeoutsContext.h (!). This is weird and caused issues with linking on s390x (##45520). There was an attempt to fix some inconsistencies (#45848) but neither did @Algunenano nor me at first really understand why the definition is in the header. Turns out that ConnectionTimeoutsContext.h is only #include'd from source files which are part of the normal server build BUT NOT part of the keeper standalone build (which must be enabled via CMake -DBUILD_STANDALONE_KEEPER=1). This dependency was not documented and as a result, some misguided workarounds were introduced earlier, e.g. `0341c6c54b` The deeper cause was that getHTTPTimeouts() is passed a "Context". This class is part of the "dbms" libary which is deliberately not linked by the standalone build of clickhouse-keeper. The context is only used to read the settings and the "Settings" class is part of the clickhouse_common library which is linked by clickhouse-keeper already. To resolve this mess, this PR - creates source file IO/ConnectionTimeouts.cpp and moves all ConnectionTimeouts definitions into it, including getHTTPTimeouts(). - breaks the wrong dependency by passing "Settings" instead of "Context" into getHTTPTimeouts(). - resolves the previous hacks	2023-02-05 20:49:34 +00:00
Han Fei	baa345fa64	remove logs	2023-02-05 18:06:06 +01:00
Han Fei	9ea3de14ce	use re2 by default	2023-02-04 10:53:54 +01:00
Han Fei	2656027c9f	make it work if we dont define use_vectorscan macro	2023-02-03 14:25:53 +01:00
Han Fei	a2e43bc333	log for ci	2023-02-02 14:40:09 +01:00
Han Fei	90153c11fc	fix matching priority	2023-02-01 15:09:04 +01:00
Han Fei	24b8322bc9	Merge branch 'master' into hanfei/regexp-refine	2023-01-31 17:03:51 +01:00
Bharat Nallan	f1d6e3b908	Merge branch 'master' into ncb/odbc-connection-pool-fixes	2023-01-30 15:49:04 -08:00
Alexander Tokmakov	d7c697ee38	fix	2023-01-26 15:24:39 +01:00
Alexander Tokmakov	14db798191	fix	2023-01-26 13:56:16 +01:00
Alexander Tokmakov	9b670946db	Merge branch 'master' into exception_message_patterns5	2023-01-26 00:41:32 +01:00
Han Fei	7df30b1d82	remove trivial logs	2023-01-25 23:57:20 +01:00
Han Fei	f4d38b82e6	RegExpTreeDict use re2 engines when processing heavy regexps	2023-01-25 23:49:00 +01:00
Nikolay Degterinsky	6b2f3de293	Merge pull request #45512 from evillique/fix-msan-build Fix MSan build once again (too heavy translation units)	2023-01-25 16:11:21 +01:00
Alexander Tokmakov	6eb557b2ba	Merge branch 'master' into exception_message_patterns4	2023-01-25 13:49:17 +01:00
Sergei Trifonov	0d1ea05ff6	Merge pull request #45007 from ClickHouse/cancellable-mutex-integration Fast shared mutex integration	2023-01-25 11:15:46 +01:00
Bharat Nallan	2ef8fcb318	Merge branch 'master' into ncb/odbc-connection-pool-fixes	2023-01-24 21:27:20 -08:00
Nikolay Degterinsky	97aef55a7f	Fix typo	2023-01-24 23:00:02 +00:00
Nikolay Degterinsky	d8d85d9bbd	Merge remote-tracking branch 'upstream/master' into fix-msan-build	2023-01-24 22:57:47 +00:00
Nikolay Degterinsky	fb6838b043	Review suggestions	2023-01-24 22:54:01 +00:00
Alexander Tokmakov	3f6594f4c6	forbid old ctor of Exception	2023-01-23 22:18:05 +01:00
Alexander Tokmakov	70d1adfe4b	Better formatting for exception messages (#45449 ) * save format string for NetException * format exceptions * format exceptions 2 * format exceptions 3 * format exceptions 4 * format exceptions 5 * format exceptions 6 * fix * format exceptions 7 * format exceptions 8 * Update MergeTreeIndexGin.cpp * Update AggregateFunctionMap.cpp * Update AggregateFunctionMap.cpp * fix	2023-01-24 00:13:58 +03:00
Nikolay Degterinsky	f9960361db	Fix MSan build	2023-01-23 14:38:07 +00:00
Sergei Trifonov	0fbfa17863	Merge branch 'master' into cancellable-mutex-integration	2023-01-23 12:44:09 +01:00
Maksim Kita	758c8f2776	Merge branch 'master' into dict/remove-preallocate	2023-01-20 13:15:37 +03:00
Han Fei	94336a9b66	fix typo	2023-01-19 13:55:29 +01:00
Han Fei	2884b8837b	fix regexp logical error in stress tests	2023-01-19 12:03:54 +01:00
Bharat Nallan Chakravarthy	16a2585d55	fixes to odbc connection pooling	2023-01-18 16:38:56 -08:00
Azat Khuzhin	4366f7fb3b	Remove PREALLOCATE for HASHED/SPARSE_HASHED dictionaries It does not give significant benefit, but now, you hashed/sparse_hashed dictionaries can be filled in parallel (#40003), using sharded dictionaries, and this should be used instead of PREALLOCATE. Note, that dictionaries, that had been created with PREALLOCATE will work, but simply ignore this attribute. Fixes: #41985 (cc @alexey-milovidov) Reverts: #23979 (cc @kitaisreal) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-18 20:18:37 +01:00
Azat Khuzhin	64e3677961	Avoid double hash calculation in HashedDictionary::getShard(StringRef) Previously it was written this way because getShard() was a simple module operation. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	2783850f08	Minor review fixes in HashedDictionary Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	6e0a7add93	Completelly exception safe HashedDictionary dtor Previously there was one (even though very unlikely) case when the dtor can throw - logging code or ThreadPool::wait. Just guard the dtor with try/catch and done with it. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	74def83c5d	Destroy hashtables for hashed dictionary in parallel only for sharded dict Since there can be multiple hashtables, since each attribute uses it's own hashtable. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	1c0e0ea1e4	Disable sharded dictionaries with updatable sources Support of sharded dictionary for updatable sources is questionable since: - sharded dictionary developed for hashed dictionary with a huge number of keys - updatable source requires storing the whole table in memory (due to how reload works) - also it is an open question will it have some benefits from the updatable source or not, since using updatable source with a huge number of changes in the source does not looks optimal and on the other side if there are small amount of changes the you don't need sharded dictionary at all Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	c97991fce1	Use shared arena for HashedDictionary::blockToAttributes() This should decrease number of allocations. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	01b100da61	Use shared arena in ParallelDictionaryLoader::createShardSelector() (and add missing rollback) This should decrease number of allocations. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	64874824b4	Minor review fixes in HashedDictionary Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	77c1f07636	Make HashedDictionary::~HashedDictionary exception safe Before it was possible for the desturctor to throw, in case of thread allocation fails, rewrite it to trySchedule() and do sequential destroy in this case. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	a3f189e191	Optimize sharded dictionaries with skewed distribution In case of skewed distribution simple division by module will not give you good distribution between shards and eventually this can lead to performance the same as non-sharded dictionary (except for it will occupy +1 thread for Block::scatter). But if HashedDictionary::blockToAttributes() will not have calls to HashedDictionary::getShard() this can be fixed by using a more complex key-to-shard (getShard()) mapping. And actually you do not need to call getShard() in blockToAttributes() you can simply use passed shard, and that's it. And by wrapping key with intHash64() in getShard() skewed distribution can be fixed. Note, that previously I tried similar approach but did not removed getShard() from blockToAttributes(), that's why it failed. And now it works almost as fast as with simple createBlockSelector(), just 13.6% slower (18.75min vs 16.5min, with 16 threads). Note, that I've also tried to add libdivide for this, but it does not improves the performance. I've also tried the approach without scatter, and it works 20% slower then this one (22.5min VS 18.75min, with 16 threads). v2: Use intHashCRC32() over intHash64() for HashedDictionary::getShard() (with intHash64() it works very slower, almost 2x slower, there was 18min with 32 threads) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	655a564280	Parallel hash tables destroy for hashed dictionaries Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	99063b152f	Allow to configure queue backlog of the parallel hashed dictionary loader v2: Decrease default parallel_queue_backlog to 10000 (same speed) v3: Rename parallel_queue_backlog to per_shard_load_backlog v3: Rename per_shard_load_backlog to shard_load_queue_backlog v4: Fix documentation Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	79ad81dfdf	Implement separate queue for parallel loader of hashed dictionaries Previous patches in this series has a bottleneck in rehash(). This is the most slowest operation when insert lots of rows into the hashtable and eventually all that thread pool sometimes work as the most slowest thread since we did not have any queue of blocks. This patch adds such queue and now it scales linearly, so initialy with 1 thread I had ~4 hours for 10e9 elements (UInt64 key, UInt16 value), after this patch it works in 16 minutes with 16 threads (well actually I have to use 32 threads because of distribution of data in the source table). And now with 16 threads it works 16 times faster. Also this patch adds more optimal block splitting for the non-complex dictionaries, and usual block splitting for complex dictionaries. But anyway this moves the overhead from the loading into the hashtable threads out to the reader thread, and this is better, since reader does not uses that much CPU. v2: fix use-after-free on failed load (add missing wait in dtor) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	5d0fd3cdc4	Remove sharded overhead for non-sharded hashed dictionaries By adding one more template parameter - HashedDictionary<sharded> (yes, it is already too much of them, for the template class that has explicit instantion). Since perf tests [1] shows 20% slowdown. [1]: https://s3.amazonaws.com/clickhouse-test-reports/40003/8f0cf2d6b8a7df511afe901331d5e2c7b06c0b4d/performance_comparison_[1/4]/report.html Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	345c422e28	Add ability to load hashed dictionaries using multiple threads Right now dictionaries (here I will talk about only HASHED/SPARSE_HASHED/COMPLEX_KEY_HASHED/COMPLEX_KEY_SPARSE_HASHED) can load data only in one thread, since it uses one hash table that cannot be filled from multiple threads. And in case you have very big dictionary (i.e. 10e9 elements), it can take a awhile to load them, especially for SPARSE_HASHED variants (and if you have such amount of elements there, you are likely use SPARSE_HASHED, since it requires less memory), in my env it takes ~4 hours, which is enormous amount of time. So this patch add support of shards for dictionaries, number of shards determine how much hash tables will use this dictionary, also, and which is more important, how much threads it can use to load the data. And with 16 threads this works 2x faster, not perfect though, see the follow up patches in this series. v0: PARTITION BY v1: SHARDS 1 v2: SHARDS(1) v3: tried optimized mod - logical and, but it does not gain even 10% v4: tried squashing more (max_block_size * shards), but it does not gain even 10% either v5: move SHARDS into layout parameters (unknown simply ignored) v6: tune params for perf tests (to avoid too long queries) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:25 +01:00
serxa	693489a8ad	review fixes	2023-01-12 15:51:04 +00:00
Sergei Trifonov	12d8543578	Merge branch 'master' into cancellable-mutex-integration	2023-01-12 16:03:49 +01:00
Han Fei	6ed4570f73	Merge branch 'master' into regexp-tree-dictionary	2023-01-10 15:36:30 +01:00
Maksim Kita	83a8d3ed25	RangeHashedDictionary update field primary key fix	2023-01-09 13:52:15 +01:00
Sergei Trifonov	81d2ea30ba	Merge branch 'master' into cancellable-mutex-integration	2023-01-07 19:37:46 +01:00
Anton Popov	1f32ffedf8	Merge pull request #43221 from ClickHouse/refactoring-ip-types Replace domain IP types (IPv4, IPv6) with native	2023-01-07 12:01:21 +01:00
serxa	15bb127b01	replace every `std::shared_mutex` with `DB::FastSharedMutex`	2023-01-06 23:35:38 +00:00
Han Fei	a4427a05c2	fix build	2023-01-06 14:30:00 +01:00
Kseniia Sumarokova	573d3283b0	Merge pull request #44327 from kssenii/use-new-named-collections-code-2 Replace old named collections code with new (from #43147) part 2	2023-01-06 13:06:26 +01:00
Han Fei	cac7f65b40	fix build	2023-01-06 11:49:34 +01:00
Han Fei	744084375c	fix build	2023-01-05 22:27:45 +01:00
Han Fei	ae5ee8194b	fix check style	2023-01-05 17:52:05 +01:00
Han Fei	f2a9eea995	write docs and optimize regex compile	2023-01-05 17:38:01 +01:00
Yakov Olkhovskiy	7a5a36cbed	Merge branch 'master' into refactoring-ip-types	2023-01-04 11:11:06 -05:00
Han Fei	65ef7b4adc	fix build	2023-01-04 12:45:12 +01:00
Nikolay Degterinsky	aa41e9b775	Merge pull request #44857 from evillique/fix-msan-build Try to fix MSan build	2023-01-04 04:31:28 +01:00
Han Fei	00e717d7ce	some improvement	2023-01-03 21:41:51 +01:00
kssenii	67509aa2d5	Merge remote-tracking branch 'upstream/master' into use-new-named-collections-code-2	2023-01-03 16:41:30 +01:00
Nikolay Degterinsky	c4431e9931	Fix MSan build	2023-01-03 02:21:26 +00:00
Alexey Milovidov	e855d3519a	Merge branch 'master' into refactoring-ip-types	2023-01-02 21:58:53 +03:00
Han Fei	97cdfdceea	fix style check	2022-12-31 20:36:23 +01:00
Han Fei	c25207fc21	Merge branch 'master' into regexp-tree-dictionary	2022-12-30 17:31:44 +01:00
Han Fei	83c6517fcf	try to fix flaky tests	2022-12-30 17:31:28 +01:00
Nikolay Degterinsky	dfe93b5d82	Merge pull request #42284 from Algunenano/perf_experiment Performance experiment	2022-12-30 03:14:22 +01:00
Han Fei	fa1baef448	add check sanitizer	2022-12-29 23:00:55 +01:00
Han Fei	50905e2005	address comments	2022-12-29 20:13:46 +01:00
Alexey Milovidov	33bcd07be5	Remove old code	2022-12-28 19:02:06 +01:00
Raúl Marín	fc1fa82a39	Merge branch 'master' into perf_experiment	2022-12-27 10:51:58 +01:00
mayamika	f66a0c01ad	Add null dictionary source	2022-12-24 17:11:30 +03:00
Han Fei	4859197c34	fix build	2022-12-22 23:59:04 +01:00
Han Fei	2bb952a796	fix build	2022-12-22 23:27:10 +01:00
Han Fei	efa963fb0e	support regex tree dictionary	2022-12-22 22:42:11 +01:00
Yakov Olkhovskiy	a8cb29da4b	Merge branch 'master' into refactoring-ip-types	2022-12-21 23:56:24 -05:00
Raúl Marín	45d27f461b	Merge branch 'master' into perf_experiment	2022-12-20 09:07:48 +00:00
kssenii	30547d2dcd	Replace old named collections code for url	2022-12-17 00:24:05 +01:00
Vitaly Baranov	fb8aca8319	Merge pull request #44158 from vitlibar/improve-referential-deps Improve referential dependencies	2022-12-14 21:17:02 +01:00
Han Fei	d3f8bb3f52	Merge branch 'master' into regexp-tree-dictionary	2022-12-14 16:29:17 +01:00
Han Fei	2272d712e2	reimplement	2022-12-14 16:28:57 +01:00
Nikolay Degterinsky	9b6d31b95d	Merge branch 'master' into perf_experiment	2022-12-13 17:15:07 +01:00
Vitaly Baranov	5aaff60650	Fix referential dependencies when host & post in a clickHouse dictionary source are set by default.	2022-12-12 18:22:14 +01:00
Alexander Tokmakov	db7f2ed42b	Update getDictionaryConfigurationFromAST.cpp	2022-12-12 16:46:04 +03:00
Han Fei	ae91d0c78e	fix check-style	2022-12-01 13:29:55 +01:00
Han Fei	e28c98d4ba	fix macro flag	2022-12-01 11:50:18 +01:00
Han Fei	41f726ce83	fix compile	2022-11-30 11:21:45 +01:00
Yakov Olkhovskiy	77266ea754	cleanup	2022-11-29 17:34:16 +00:00
Han Fei	27ec6bc42a	Merge branch 'master' into regexp-tree-dictionary	2022-11-29 16:33:30 +01:00
Yakov Olkhovskiy	770b520ded	Merge branch 'master' into refactoring-ip-types	2022-11-28 08:50:19 -05:00
Yakov Olkhovskiy	fd58719271	IPAddressDictionary fixed, test fixed	2022-11-24 20:42:36 +00:00
Yakov Olkhovskiy	de20d58f6b	style fix	2022-11-22 21:55:07 +00:00
Yakov Olkhovskiy	2cbe748e68	functions fixed, test fixed	2022-11-22 21:34:32 +00:00
Raúl Marín	4aa29b6a63	Merge remote-tracking branch 'blessed/master' into perf_experiment	2022-11-22 19:09:00 +01:00
Yakov Olkhovskiy	4d144be39c	replace domain IP types (IPv4, IPv6) with native	2022-11-14 14:17:17 +00:00
avogar	9e89af28c6	Refactor BSONEachRow format, fix bugs, support more data types, support parallel parsing and schema inference	2022-11-10 20:15:14 +00:00
Raúl Marín	6e0a9452e7	Merge remote-tracking branch 'blessed/master' into perf_experiment	2022-10-25 15:25:06 +02:00
Azat Khuzhin	4e76629aaf	Fixes for -Wshorten-64-to-32 - lots of static_cast - add safe_cast - types adjustments - config - IStorage::read/watch - ... - some TODO's (to convert types in future) P.S. That was quite a journey... v2: fixes after rebase v3: fix conflicts after #42308 merged Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-10-21 13:25:19 +02:00
Raúl Marín	e60415d07d	Make clang-tidy happy	2022-10-18 11:40:12 +02:00
Alexander Tokmakov	fa1134f299	Merge branch 'master' into fix_loading_dependencies	2022-10-10 16:30:52 +02:00
Alexander Tokmakov	4175f8cde6	abort instead of __builtin_unreachable in debug builds	2022-10-07 21:49:08 +02:00
Alexander Tokmakov	014784a9ca	Merge branch 'master' into fix_loading_dependencies	2022-10-07 18:58:11 +02:00
Alexander Tokmakov	690ec74bf2	better handling for expressions in dictGet	2022-10-05 20:58:27 +02:00
Robert Schulze	fd86829824	Consolidate config_core.h into config.h Less duplication, less confusion ...	2022-09-28 13:31:57 +00:00
Robert Schulze	78fc36ca49	Generate config.h into ${CONFIG_INCLUDE_PATH} This makes the target location consistent with other auto-generated files like config_formats.h, config_core.h, and config_functions.h and simplifies the build of clickhouse_common.	2022-09-28 12:48:26 +00:00
Kseniia Sumarokova	ea43cb5648	Merge pull request #41261 from kssenii/s3-header-auth Support s3 authorisation headers from ast arguments	2022-09-23 12:48:08 +02:00
Alexey Milovidov	bb6f1bfce2	Fix 9/10 of trash	2022-09-19 08:53:20 +02:00
Alexey Milovidov	730655d4fd	Fix 8/9 of trash	2022-09-19 08:53:20 +02:00
Alexey Milovidov	42b0d444da	Fix 7/8 of trash	2022-09-19 08:53:20 +02:00
Alexey Milovidov	91baedf03a	Fix 6/7 of trash	2022-09-19 08:53:20 +02:00
Alexey Milovidov	84f42e0874	Fix 3/4 of trash	2022-09-19 08:50:53 +02:00
kssenii	420ac4eb43	s3 header auth in ast	2022-09-13 15:13:28 +02:00
Alexey Milovidov	fd235919aa	Remove some methods	2022-09-10 05:04:40 +02:00
kssenii	83514fa2ef	Refactor	2022-09-05 20:08:22 +02:00
Antonio Andelic	e64436fef3	Fix typos with new codespell	2022-09-02 08:54:48 +00:00
Vage Ogannisian	9b2326cc6c	Add RegExpTree dictionary	2022-08-31 19:34:50 +00:00
Alexey Milovidov	1ff535a128	One step back	2022-08-28 02:00:09 +02:00
Vage Ogannisian	540fa7fe5b	Add YAML regular expressions tree dictionary source	2022-08-24 14:01:56 +00:00
vdimir	5fea2091ac	Embed IKeyValue impl into IDictionary.h	2022-08-10 15:58:15 +00:00
vdimir	90fa2ed8c1	better code for join with dict	2022-08-10 14:20:29 +00:00
vdimir	7073067d40	check attributes for join with dict	2022-08-10 14:20:26 +00:00
Yakov Olkhovskiy	d39e9f65de	Merge branch 'master' into fix-quota-key	2022-08-08 11:54:21 -04:00
Robert Schulze	ad0d060dc1	Merge pull request #39904 from ClickHouse/library-bridge-refactoring Prepare library-bridge for catboost integration	2022-08-08 12:15:01 +02:00
Nikolai Kochetov	c6e3e14bcc	Merge pull request #39184 from ClickHouse/respect-remote_url_allow_hosts-for-other-dict-sources Respect remote_url_allow_hosts in relevant dictionary sources.	2022-08-05 17:25:53 +02:00
Robert Schulze	cb146ee6f1	Rename LibraryBridgeHelper to ExternalDictionaryLibraryBridgeHelper - ExternalDictionaryLibraryBridgeHelper provides the server-side interface to access the dictionary part of the library bridge. - In a later commit, CatBoostLibraryBridgeHelper will be added to access the catboost part of the library bridge from the server.	2022-08-04 19:58:37 +00:00
Robert Schulze	ea73b98fb9	Prepare library-bridge for catboost integration - Rename generic file and identifier names in library-bridge to something more dictionary-specific. This is needed because later on, catboost will be integrated into library-bridge. - Also: Some smaller fixes like typos and un-inlining non-performance critical code. - The logic remains unchanged in this commit.	2022-08-04 19:26:51 +00:00
Yakov Olkhovskiy	23037daf17	Merge branch 'master' into fix-quota-key	2022-08-04 12:14:49 -04:00
Yakov Olkhovskiy	2e34b384c1	update tcp protocol, add quota_key	2022-08-03 15:44:08 -04:00
Alexander Tokmakov	82b50e79cf	Merge branch 'master' into tsan_clang_15	2022-08-02 13:00:55 +03:00
Alexander Tokmakov	0d68b1c67f	fix build with clang-15	2022-08-01 18:00:54 +02:00
Azat Khuzhin	3e627e2861	Add profile events for fsync The following new provile events had been added: - FileSync - Number of times the F_FULLFSYNC/fsync/fdatasync function was called for files. - DirectorySync - Number of times the F_FULLFSYNC/fsync/fdatasync function was called for directories. - FileSyncElapsedMicroseconds - Total time spent waiting for F_FULLFSYNC/fsync/fdatasync syscall for files. - DirectorySyncElapsedMicroseconds - Total time spent waiting for F_FULLFSYNC/fsync/fdatasync syscall for directories. v2: rewrite test to sh with retries Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2022-07-31 23:19:30 +03:00
mergify[bot]	e53cf7fd9f	Merge branch 'master' into direct-dictionary-dict-has-multiple-same-keys-fix	2022-07-25 11:41:58 +00:00
Alexander Tokmakov	c77117eadf	Update SSDCacheDictionaryStorage.h	2022-07-22 12:54:20 +03:00
Maksim Kita	6443116e80	DirectDictionary improve performance of dictHas with duplicate keys	2022-07-21 16:29:28 +02:00

... 2 3 4 5 6 ...

1517 Commits