ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-12-12 01:12:12 +00:00

Author	SHA1	Message	Date
Nikolay Degterinsky	575a1a4907	Add header checks to HTTP dictionary source	2023-06-20 13:29:25 +00:00
Dmitry Kardymon	806176d88e	Add input_format_csv_missing_as_default setting and tests	2023-06-15 11:23:08 +00:00
kssenii	25ae93bbf8	Merge remote-tracking branch 'upstream/master' into add-separate-access-for-use-named-collections	2023-06-14 13:33:56 +02:00
JackyWoo	a1641aa25d	Merge branch 'master' into support_redis	2023-06-12 09:53:06 +08:00
Nikolay Degterinsky	9ad8e022a8	Merge branch 'master' into update-mongo	2023-06-10 10:58:02 +02:00
pufit	55d228e78e	Merge branch 'master' into support_redis	2023-06-09 11:45:12 -04:00
kssenii	63f8a3275b	Merge remote-tracking branch 'upstream/master' into add-separate-access-for-use-named-collections	2023-06-09 14:32:41 +02:00
johanngan	be8e048799	Revert invalid RegExpTreeDictionary optimization This reverts the following commits: - `e77dd81036` - `e8527e720b` Additionally, functional tests are added. When scanning complex regexp nodes sequentially with RE2, the old code has an optimization to break out of the loop early upon finding a leaf node that matches. This is an invalid optimization because there's no guarantee that it's actually a VALID match, because its parents might NOT have matched. Semantically, a user would expect this match to be discarded and for the search to continue. Instead, since we skipped matching after the first false positive, subsequent nodes that would have matched are missing from the output value. This affects both dictGet and dictGetAll. It's difficult to distinguish a true positive from a false positive while looping through complex_regexp_nodes because we would have to scan all the parents of a matching node to confirm a true positive. Trying to do this might actually end up being slower than just scanning every complex regexp node, because complex_regexp_nodes is only a subset of all the tree nodes; we may end up duplicating work with scanning that Vectorscan has already done, depending on whether the parent nodes are "simple" or "complex". So instead of trying to fix this optimization, just remove it entirely.	2023-06-06 16:28:44 -05:00
kssenii	adfedb4df0	Add USE NAMED COLLECTION access	2023-06-06 14:46:34 +02:00
johanngan	c0f162c5b6	Add dictGetAll function for RegExpTreeDictionary This function outputs an array of attribute values from all regexp nodes that matched in a regexp tree dictionary. An optional final argument can be passed to limit the array size.	2023-06-04 23:46:04 -05:00
JackyWoo	e6d1b3c351	little fix	2023-06-02 10:05:54 +08:00
JackyWoo	f4f939162d	new redis engine schema design	2023-06-02 10:05:54 +08:00
JackyWoo	357df40c8f	fix tests	2023-06-02 10:05:54 +08:00
JackyWoo	b35867d907	unify storage type	2023-06-02 10:05:54 +08:00
JackyWoo	40cc8d2107	fix code style	2023-06-02 10:05:54 +08:00
JackyWoo	ce203b5ce6	Check redis table structure	2023-06-02 10:05:54 +08:00
JackyWoo	9a495cbf99	Push down filter into Redis	2023-06-02 10:05:54 +08:00
JackyWoo	e91867373c	Add table function Redis	2023-06-02 10:05:54 +08:00
xiebin	28d2269661	Merge branch 'master' into master	2023-05-30 16:13:52 +08:00
xiebin	beb3690c7e	if dictionary id is number, do not convert layout to complex	2023-05-30 16:09:01 +08:00
Alexander Tokmakov	876490ff40	Merge pull request #50065 from azat/dict/load-factor-range-fix Fix hashed/sparse_hashed dictionaries max_load_factor upper range	2023-05-22 15:04:56 +03:00
Nikolay Degterinsky	183f90e45a	Update MongoDB protocol	2023-05-22 09:05:23 +00:00
Azat Khuzhin	c30658a9ed	Fix hashed/sparse_hashed dictionaries max_load_factor upper range Previously due to comparison of floats with doubles, it was incorrectly works for the upper range: (lldb) p (float)0.99 > (float)0.99 (bool) $0 = false (lldb) p (float)0.99 > (double)0.99 (bool) $1 = true This should also fix performance tests errors on CI: clickhouse_driver.errors.ServerException: Code: 36. DB::Exception: default.simple_key_HASHED_dictionary_l0_99: max_load_factor parameter should be within [0.5, 0.99], got 0.99. Stack trace: Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-22 08:59:48 +02:00
Azat Khuzhin	0586a27432	Charge only server memory for dictionaries Right now the memory will be counted for query/user for dictionary, but only if it load by user (via SYSTEM RELOAD QUERY or via dictGet()), but it could be also loaded in backgrounad (due to lifetime, or update_field, so it is like Buffer, only server memory should be charged. v2: mark test as long Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>	2023-05-21 22:53:52 +02:00
Azat Khuzhin	e1e2a83a9e	Print type of the structure that will be used for HASHED/SPARSE_HASHED Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	f8e7d2cb1f	Remove part of the HashTableGrowerWithPrecalculationAndMaxLoadFactor comment Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	c9cde110cd	Add initial degree as parameter for HashTableGrowerWithPrecalculationAndMaxLoadFactor Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	42eac6bfbc	Wrap implementation helpers into HashedDictionaryImpl namespace Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	6f351851ad	Rename grower to HashTableGrowerWithPrecalculationAndMaxLoadFactor Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	1ab130132c	Add more comments into HashedDictionaryCollectionType.h Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	422cbe08fe	Do not use PackedHashMap for non-POD for the purposes of layout In clang-16 the behaviour for POD types had been changed in [1], this does not allows us to use PackedHashMap for some types. [1]: `277123376c` Note, that I tried to come up with a more generic solution then enumeratic types, but failed. Though now I think that this is good, since this shows which types are not allowed for PackedHashMap Another option is to use -fclang-abi-compat=13.0 but I doubt it is a good idea. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	2996b38606	Add ability to configure maximum load factor for the HASHED/SPARSE_HASHED layout As it turns out, HashMap/PackedHashMap works great even with max load factor of 0.99. By "great" I mean it least it works faster then google sparsehash, and not to mention it's friendliness to the memory allocator (it has zero fragmentation since it works with a continuious memory region, in comparison to the sparsehash that doing lots of realloc, which jemalloc does not like, due to it's slabs). Here is a table of different setups: settings \| load (sec) \| read (sec) \| read (million rows/s) \| bytes_allocated \| RSS - \| - \| - \| - \| - \| - HASHED upstream \| - \| - \| - \| - \| 35GiB SPARSE_HASHED upstream \| - \| - \| - \| - \| 26GiB - \| - \| - \| - \| - \| - sparse_hash_map glibc hashbench \| - \| - \| - \| - \| 17.5GiB sparse_hash_map packed allocator \| 101.878 \| 231.48 \| 4.32 \| - \| 17.7GiB PackedHashMap 0.5 \| 15.514 \| 42.35 \| 23.61 \| 20GiB \| 22GiB hashed 0.95 \| 34.903 \| 115.615 \| 8.65 \| 16GiB \| 18.7GiB PackedHashMap 0.95 \| 93.6 \| 19.883 \| 10.68 \| 10GiB \| 12.8GiB PackedHashMap 0.99 \| 26.113 \| 83.6 \| 11.96 \| 10GiB \| 12.3GiB As it shows, PackedHashMap with 0.95 max_load_factor, eats 2.6x less memory then SPARSE_HASHED in upstream, and it also 2x faster for read! v2: fix grower Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	3698302ddb	Accept float values for dictionary layouts configurations Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	7b5d156cc5	Optimize SPARSE_HASHED layout (by using PackedHashMap) In case you want dictionary optimized for memory, SPARSE_HASHED is not always gives you what you need. Consider the following example <UInt64, UInt16> as <Key, Value>, but this pair will also have a 6 byte padding (on amd64), so this is almost 40% of space wastage. And because of this padding, even google::sparse_hash_map, does not make picture better, in fact, sparse_hash_map is not very friendly to memory allocators (especially jemalloc). Here are some numbers for dictionary with 1e9 elements and UInt64 as key, and UInt16 as value: settings \| load (sec) \| read (sec) \| read (million rows/s) \| bytes_allocated \| RSS HASHED upstream \| - \| - \| - \| - \| 35GiB SPARSE_HASHED upstream \| - \| - \| - \| - \| 26GiB - \| - \| - \| - \| - \| - sparse_hash_map glibc hashbench \| - \| - \| - \| - \| 17.5GiB sparse_hash_map packed allocator \| 101.878 \| 231.48 \| 4.32 \| - \| 17.7GiB PackedHashMap \| 15.514 \| 42.35 \| 23.61 \| 20GiB \| 22GiB As you can see PackedHashMap looks way more better then HASHED, and even better then SPARSE_HASHED, but slightly worse then sparse_hash_map with packed allocator (it is done with a custom patch to google sparse_hash_map). v2: rebase on top of bucket_count fix Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
xiebin	b358b53d31	Merge branch 'ClickHouse:master' into master	2023-05-15 11:25:54 +08:00
Alexey Milovidov	5a44dc26e7	Fixes for clang-17	2023-05-13 02:57:31 +02:00
Han Fei	ef74e64336	address comments	2023-05-11 22:18:08 +02:00
xiebin	f9eb6ca6fd	if the data type of numeric key is not native uint, convert to complex.	2023-05-10 23:47:15 +08:00
Alexey Milovidov	dc3aca6e98	Merge branch 'master' into master	2023-05-10 07:44:10 +03:00
Han Fei	ddce47f79e	refine table source for regexp tree dictionary	2023-05-09 20:17:54 +02:00
Han Fei	72fc567d4a	Merge branch 'master' into hanfei/regexp-dict-read	2023-05-08 16:20:12 +02:00
Han Fei	92e57817a2	Support `dictionary` table function for `RegExpTreeDictionary`	2023-05-08 16:14:08 +02:00
Alexander Tokmakov	1224ac9eda	fix build	2023-05-08 00:57:13 +02:00
Alexander Tokmakov	abf6c60ad2	Merge branch 'master' into fix_dictionaries_loading_order	2023-05-08 00:31:03 +02:00
Alexey Milovidov	e633ebee85	Merge branch 'master' into concurrency-control-controllable	2023-05-07 20:07:07 +02:00
xiebin	7f9b21849c	Fixed a lowercase initial letter and removed needless data	2023-05-07 19:06:06 +08:00
xbthink	72dd039d1c	add comments and functional test	2023-05-07 16:22:05 +08:00
Alexey Milovidov	6fddb5bad3	Simplification	2023-05-07 06:31:00 +02:00
Alexey Milovidov	a695d6227d	Make concurrency control controllable	2023-05-07 06:16:30 +02:00
Michael Kolupaev	49394a097e	Fix 'noisy Warning messages' failing when there are no Warning messages	2023-05-06 10:39:59 -07:00
Bin Xie	bc16cc59ff	If a dictionary is created with a complex key, automatically choose the "complex key" layout variant.	2023-05-06 11:09:45 +08:00
Alexander Tokmakov	846abe95e9	fix	2023-05-05 20:50:13 +02:00
Alexander Tokmakov	dd1bbf7c78	fix another issue with dependencies	2023-05-05 16:27:12 +02:00
Azat Khuzhin	2fd1a73812	Fix element_count for HASHED/SPARSE_HASHED with multiple attributes Previosly element_count was multplied by the number of attributes. Fixes: #5440 Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-01 12:44:41 +02:00
Azat Khuzhin	93201f21d9	Fix load_factor for HASHED/SPARSE_HASHED dictionaries with SHARDS Previously, bucket_count was set only for the one shard, and hence load_factor was > 1. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-01 12:44:41 +02:00
MikhailBurdukov	5c9959af49	Resolve conservation	2023-04-28 12:40:47 +00:00
MikhailBurdukov	40ad8499a0	fix	2023-04-26 21:03:27 +00:00
MikhailBurdukov	b229a28e94	Merge branch 'master' into mongo_dict_tls	2023-04-26 23:39:27 +03:00
MikhailBurdukov	389c0af922	Fix style	2023-04-26 19:36:34 +00:00
MikhailBurdukov	baaee66e85	Missing files	2023-04-26 19:29:29 +00:00
MikhailBurdukov	d76430fe90	Added options handling for mongoo dict	2023-04-26 19:19:10 +00:00
Raúl Marín	f0e045bb3d	Merge remote-tracking branch 'blessed/master' into arenita	2023-04-24 10:42:56 +02:00
Rory Crispin	99175aefae	trailing whitespace	2023-04-22 17:02:05 +01:00
Rory Crispin	66300402ee	Bracket on newline	2023-04-22 15:42:57 +01:00
Rory Crispin	5e80e9263e	Remove spaces around ->	2023-04-22 15:29:20 +01:00
Rory Crispin	32dcc2e37b	Merge branch 'master' into dict-lifetime-validation	2023-04-22 14:55:11 +01:00
Rory Crispin	006af1dfa1	validate direct dictionary lifetime is unset during creation	2023-04-22 15:49:04 +02:00
Kruglov Pavel	2ad161d2b7	Merge branch 'master' into non-blocking-connect	2023-04-19 13:39:40 +02:00
robot-clickhouse-ci-2	45f4a5f74c	Merge pull request #47964 from ClickHouse/fast-parquet Read Parquet files faster	2023-04-17 19:27:38 +02:00
Raúl Marín	39f8c43a60	Merge remote-tracking branch 'blessed/master' into arenita	2023-04-17 10:33:38 +02:00
Michael Kolupaev	2d4fe85513	Something	2023-04-17 04:58:32 +00:00
Kseniia Sumarokova	6a0d9a37ce	Merge branch 'master' into fix-mysql-named-collection	2023-04-14 12:03:51 +02:00
kssenii	ad48e1d010	Fox	2023-04-13 19:36:25 +02:00
Raúl Marín	2b70e08f23	Don't count unreserved bytes in Arenas as read_bytes	2023-04-13 12:43:24 +02:00
Robert Schulze	7a21d5888c	Remove -Wshadow suppression which leaked into global namespace	2023-04-13 08:46:40 +00:00
Raúl Marín	da9a539cf7	Reduce the usage of Arena.h	2023-04-13 10:31:32 +02:00
Robert Schulze	f41354ccd6	Merge pull request #48671 from ClickHouse/rs/gcc-removal Remove GCC remainders	2023-04-13 10:15:35 +02:00
Alexander Tokmakov	75f18b1198	Revert "Check simple dictionary key is native unsigned integer"	2023-04-13 01:32:19 +03:00
Robert Schulze	3f7ce60e03	Merge branch 'master' into rs/gcc-removal	2023-04-12 22:17:04 +02:00
Anton Popov	1520f3e924	Merge pull request #48335 from lzydmxy/check_sample_dict_key_is_correct Check simple dictionary key is native unsigned integer	2023-04-12 14:27:39 +02:00
Robert Schulze	05606a8835	Clean up GCC warning pragmas	2023-04-11 18:21:08 +00:00
Han Fei	bf28be8837	fix 02504_regexp_dictionary_table_source	2023-04-11 17:07:44 +02:00
Han Fei	6c33180ac8	Merge branch 'master' into hanfei/refine-expmsg	2023-04-11 13:19:38 +02:00
Han Fei	363b97fab8	refine some messages of exception in regexp tree	2023-04-11 11:45:29 +02:00
Alexey Milovidov	d259217cf3	Merge pull request #48570 from azat/build/logger_useful Remove superfluous includes of logger_userful.h from headers	2023-04-11 03:56:39 +03:00
Alexey Milovidov	93b8fc74ef	Merge pull request #48571 from azat/dict/hashed/uncaught-exception-fix Fix uncaught exception in case of parallel loader for hashed dictionaries	2023-04-10 23:11:01 +03:00
Azat Khuzhin	79b83c4fd2	Remove superfluous includes of logger_userful.h from headers Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-04-10 17:59:30 +02:00
Azat Khuzhin	211cea5e7c	Fix uncaught exception in case of parallel loader for hashed dictionaries Since ThreadPool::wait() rethrows the first exception (if any): <details> <summary>stacktrace</summary> 2023.04.09 12:53:33.629333 [ 22361 ] {} <Fatal> BaseDaemon: (version 22.13.1.1, build id: 5FB01DCAAFFF19F0A9A61E253567F90685989D2F) (from thread 23032) Terminate called for uncaught exception: 2023.04.09 12:53:33.630179 [ 23645 ] {} <Fatal> BaseDaemon: 2023.04.09 12:53:33.630213 [ 23645 ] {} <Fatal> BaseDaemon: Stack trace: 0x7f68b00baccc 0x7f68b006bef2 0x7f68b0056472 0x112a42fe 0x1c17f2a3 0x1c17f238 0xbf4bc3b 0x13961c6d 0x138ee529 0x138ed6bc 0x138dd2f0 0x138dd9c6 0x1571d0dd 0x16197c1f 0x161a231e 0x1619fc93 0x161a51b9 0x11151759 0x1115454e 0x7f68b00b8fd4 0x7f68b013966c 2023.04.09 12:53:33.630247 [ 23645 ] {} <Fatal> BaseDaemon: 3. ? @ 0x7f68b00baccc in ? 2023.04.09 12:53:33.630263 [ 23645 ] {} <Fatal> BaseDaemon: 4. gsignal @ 0x7f68b006bef2 in ? 2023.04.09 12:53:33.630273 [ 23645 ] {} <Fatal> BaseDaemon: 5. abort @ 0x7f68b0056472 in ? 2023.04.09 12:53:33.648815 [ 23645 ] {} <Fatal> BaseDaemon: 6. ./.build/./src/Daemon/BaseDaemon.cpp:456: terminate_handler() @ 0x112a42fe in /usr/lib/debug/usr/bin/clickhouse.debug 2023.04.09 12:53:33.651484 [ 23645 ] {} <Fatal> BaseDaemon: 7. ./.build/./contrib/llvm-project/libcxxabi/src/cxa_handlers.cpp:61: std::__terminate(void (*)()) @ 0x1c17f2a3 in /usr/lib/debug/usr/bin/clickhouse.debug 2023.04.09 12:53:33.654080 [ 23645 ] {} <Fatal> BaseDaemon: 8. ./.build/./contrib/llvm-project/libcxxabi/src/cxa_handlers.cpp:79: std::terminate() @ 0x1c17f238 in /usr/lib/debug/usr/bin/clickhouse.debug 2023.04.09 12:53:35.025565 [ 23645 ] {} <Fatal> BaseDaemon: 9. ? @ 0xbf4bc3b in /usr/lib/debug/usr/bin/clickhouse.debug 2023.04.09 12:53:36.495557 [ 23645 ] {} <Fatal> BaseDaemon: 10. DB::ParallelDictionaryLoader<(DB::DictionaryKeyType)0, true, true>::~ParallelDictionaryLoader() @ 0x13961c6d in /usr/lib/debug/usr/bin/clickhouse.debug 2023.04.09 12:53:37.833142 [ 23645 ] {} <Fatal> BaseDaemon: 11. DB::HashedDictionary<(DB::DictionaryKeyType)0, true, true>::loadData() @ 0x138ee529 in /usr/lib/debug/usr/bin/clickhouse.debug 2023.04.09 12:53:39.124989 [ 23645 ] {} <Fatal> BaseDaemon: 12. DB::HashedDictionary<(DB::DictionaryKeyType)0, true, true>::HashedDictionary(DB::StorageID const&, DB::DictionaryStructure const&, std::__1::shared_ptr<DB::IDictionarySource>, DB::HashedDictionaryStorageConfiguration const&, std::__1::shared_ptr<DB::Block>) @ 0x138ed6bc in /usr/lib/debug/usr/bin/clickhouse.debug </details> Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-04-09 22:52:51 +02:00
Alexey Milovidov	09ea79aaf7	Add support for {server_uuid} macro	2023-04-09 03:04:26 +02:00
ltrk2	4544abc7d6	Remove dead code and unused dependencies	2023-04-06 11:37:12 -07:00
lzydmxy	529e1466df	use check_dictionary_primary_key instead of check_sample_dict_key_is_correct	2023-04-04 12:04:17 +08:00
lzydmxy	368c120f42	check sample dictionary key is native unsigned integer	2023-04-03 15:48:40 +08:00
kssenii	f96e7b59a2	Better	2023-04-01 13:19:07 +02:00
kssenii	1721b70070	Merge remote-tracking branch 'upstream/master' into ilejn-dict-named-collection	2023-04-01 13:18:26 +02:00
Alexey Milovidov	e982fb9f1c	Merge pull request #47880 from azat/threadpool-introspection ThreadPool metrics introspection	2023-03-30 01:27:31 +03:00
Azat Khuzhin	f38a7aeabe	ThreadPool metrics introspection There are lots of thread pools and simple local-vs-global is not enough already, it is good to know which one in particular uses threads. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-03-29 10:46:59 +02:00
Vladimir C	d32c285d17	Merge branch 'master' into vdimir/direct-dict-async-read	2023-03-28 12:41:20 +02:00
Han Fei	e3afa5090f	Merge pull request #47218 from hanfei1991/hanfei/optimize-regexp-tree-1 Refine OptimizeRegularExpression Function and RegexpTreeDict	2023-03-27 15:23:01 +02:00
vdimir	f6de216041	PullingAsyncPipelineExecutor for Direct dictionary with ClickHouse source	2023-03-27 09:52:26 +00:00
Kruglov Pavel	3ee12e21fb	Merge branch 'master' into non-blocking-connect	2023-03-23 20:53:44 +01:00
Han Fei	575c4263a3	address comments	2023-03-22 17:47:25 +01:00
avogar	38e44861ae	Fix possible race conditions	2023-03-21 16:01:54 +00:00
Kseniia Sumarokova	3c550b4314	Merge pull request #46647 from kssenii/named-collections-finish Named collections: finish replacing old code for storages	2023-03-21 12:36:46 +01:00
Robert Schulze	5b036a1a3b	More preparation for libcxx(abi), llvm, clang-tidy 16 (follow-up to #47722 )	2023-03-20 12:55:03 +00:00
kssenii	cae3b335d6	Merge remote-tracking branch 'upstream/master' into named-collections-finish	2023-03-20 11:23:22 +01:00
kssenii	bb0beb7449	Merge remote-tracking branch 'upstream/master' into named-collections-finish	2023-03-17 13:02:36 +01:00
Sema Checherinda	3c6deddd1d	work with comments on PR	2023-03-16 19:55:58 +01:00
Han Fei	e0954ce7be	fix compile	2023-03-16 00:22:05 +01:00
Han Fei	a532503466	Merge branch 'master' into hanfei/optimize-regexp-tree-1	2023-03-15 17:56:01 +01:00
Han Fei	424e8df9ad	fix style	2023-03-15 16:01:12 +01:00
Han Fei	d78a9e03ad	refine	2023-03-15 15:38:11 +01:00
Kseniia Sumarokova	a9a0d2f5c4	Merge pull request #46524 from artem-yadr/master Support for MongoDB Replica Set URI with readPreference and host:port enum in MongoDB dictionaries	2023-03-07 11:40:33 +01:00
Kseniia Sumarokova	386663953c	Merge branch 'master' into named-collections-finish	2023-03-03 12:23:38 +01:00
artem-yadr	e1352adced	Update MongoDBDictionarySource.cpp	2023-03-01 12:50:03 +03:00
Han Fei	e77dd81036	fix	2023-02-24 19:48:46 +01:00
Han Fei	e8527e720b	refine regexp tree dictionary	2023-02-24 13:08:27 +01:00
kssenii	a54b011670	Finish for mysql	2023-02-20 21:37:38 +01:00
Alexey Milovidov	d8cda3dbb8	Remove PVS-Studio	2023-02-19 23:30:05 +01:00
artem-yadr	08734d4dc0	poco changes are now used in MongoDBDictionarySource	2023-02-17 14:56:21 +03:00
Maksim Kita	c469e10092	Dictionaries DictionaryStorageFetchRequest fix	2023-02-16 12:17:02 +01:00
Nikolay Degterinsky	6e4b660033	Move MongoDB and PostgreSQL sources to Sources folder	2023-02-14 22:35:10 +00:00
Ilya Golshtein	38ea27489c	secure in named collections - small cleanup	2023-02-13 01:04:38 +03:00
Alexey Milovidov	44bd95a410	Merge pull request #46167 from ClickHouse/rs/reject-dos-patterns Reject hyperscan regexes which are prone to ReDoS	2023-02-11 06:04:03 +03:00
Ilya Golshtein	3b72b3f13b	secure in named collection - switched to specific_args, tests added	2023-02-10 13:42:11 +03:00
Robert Schulze	74937cf27b	Reject DoS-prone hyperscan regexes	2023-02-09 17:17:35 +00:00
Robert Schulze	e490ec91d9	Merge branch 'master' into rs/fix-fragile-linking	2023-02-09 11:33:59 +01:00
Alexander Tokmakov	8101b044fa	Merge pull request #46091 from azat/sanity-assertions Sanity assertions for closing file descriptors	2023-02-09 01:02:03 +03:00
Ilya Golshtein	f9e81ca7de	secure in named collections - initial	2023-02-08 23:30:16 +03:00
Maksim Kita	77ec255d7c	Merge pull request #45396 from kitaisreal/hashed-dictionary-sharded-nullable-fix HashedDictionary sharded fix nullable values	2023-02-08 17:15:10 +03:00
Robert Schulze	6ff232d782	Merge branch 'master' into rs/fix-fragile-linking	2023-02-08 12:51:12 +01:00
Alexey Milovidov	55c3bbb739	Fix assertion in statistical functions	2023-02-08 00:09:41 +01:00
Robert Schulze	10af0b3e49	Reduce redundancies	2023-02-07 12:27:23 +00:00
Azat Khuzhin	8cc41b7f41	Check return value of ::close() Note, that according close(2), EINTR should not be retriable for close() Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-02-07 11:28:22 +01:00
Han Fei	d1d893275a	fix	2023-02-06 18:46:23 +01:00
Han Fei	eb76041312	address comments and add one more test	2023-02-06 17:26:20 +01:00
Maksim Kita	e8d66fb1a2	HashedDictionary sharded fix nullable values	2023-02-06 10:50:58 +01:00
Robert Schulze	84b9ff450f	Fix terribly broken, fragile and potentially cyclic linking Sorry for the clickbaity title. This is about static method ConnectionTimeouts::getHTTPTimeouts(). It was be declared in header IO/ConnectionTimeouts.h, and defined in header IO/ConnectionTimeoutsContext.h (!). This is weird and caused issues with linking on s390x (##45520). There was an attempt to fix some inconsistencies (#45848) but neither did @Algunenano nor me at first really understand why the definition is in the header. Turns out that ConnectionTimeoutsContext.h is only #include'd from source files which are part of the normal server build BUT NOT part of the keeper standalone build (which must be enabled via CMake -DBUILD_STANDALONE_KEEPER=1). This dependency was not documented and as a result, some misguided workarounds were introduced earlier, e.g. `0341c6c54b` The deeper cause was that getHTTPTimeouts() is passed a "Context". This class is part of the "dbms" libary which is deliberately not linked by the standalone build of clickhouse-keeper. The context is only used to read the settings and the "Settings" class is part of the clickhouse_common library which is linked by clickhouse-keeper already. To resolve this mess, this PR - creates source file IO/ConnectionTimeouts.cpp and moves all ConnectionTimeouts definitions into it, including getHTTPTimeouts(). - breaks the wrong dependency by passing "Settings" instead of "Context" into getHTTPTimeouts(). - resolves the previous hacks	2023-02-05 20:49:34 +00:00
Han Fei	baa345fa64	remove logs	2023-02-05 18:06:06 +01:00
Han Fei	9ea3de14ce	use re2 by default	2023-02-04 10:53:54 +01:00
Han Fei	2656027c9f	make it work if we dont define use_vectorscan macro	2023-02-03 14:25:53 +01:00
Han Fei	a2e43bc333	log for ci	2023-02-02 14:40:09 +01:00
Han Fei	90153c11fc	fix matching priority	2023-02-01 15:09:04 +01:00
Han Fei	24b8322bc9	Merge branch 'master' into hanfei/regexp-refine	2023-01-31 17:03:51 +01:00
Bharat Nallan	f1d6e3b908	Merge branch 'master' into ncb/odbc-connection-pool-fixes	2023-01-30 15:49:04 -08:00
Alexander Tokmakov	d7c697ee38	fix	2023-01-26 15:24:39 +01:00
Alexander Tokmakov	14db798191	fix	2023-01-26 13:56:16 +01:00
Alexander Tokmakov	9b670946db	Merge branch 'master' into exception_message_patterns5	2023-01-26 00:41:32 +01:00
Han Fei	7df30b1d82	remove trivial logs	2023-01-25 23:57:20 +01:00
Han Fei	f4d38b82e6	RegExpTreeDict use re2 engines when processing heavy regexps	2023-01-25 23:49:00 +01:00
Nikolay Degterinsky	6b2f3de293	Merge pull request #45512 from evillique/fix-msan-build Fix MSan build once again (too heavy translation units)	2023-01-25 16:11:21 +01:00
Alexander Tokmakov	6eb557b2ba	Merge branch 'master' into exception_message_patterns4	2023-01-25 13:49:17 +01:00
Sergei Trifonov	0d1ea05ff6	Merge pull request #45007 from ClickHouse/cancellable-mutex-integration Fast shared mutex integration	2023-01-25 11:15:46 +01:00
Bharat Nallan	2ef8fcb318	Merge branch 'master' into ncb/odbc-connection-pool-fixes	2023-01-24 21:27:20 -08:00
Nikolay Degterinsky	97aef55a7f	Fix typo	2023-01-24 23:00:02 +00:00
Nikolay Degterinsky	d8d85d9bbd	Merge remote-tracking branch 'upstream/master' into fix-msan-build	2023-01-24 22:57:47 +00:00
Nikolay Degterinsky	fb6838b043	Review suggestions	2023-01-24 22:54:01 +00:00
Alexander Tokmakov	3f6594f4c6	forbid old ctor of Exception	2023-01-23 22:18:05 +01:00
Alexander Tokmakov	70d1adfe4b	Better formatting for exception messages (#45449 ) * save format string for NetException * format exceptions * format exceptions 2 * format exceptions 3 * format exceptions 4 * format exceptions 5 * format exceptions 6 * fix * format exceptions 7 * format exceptions 8 * Update MergeTreeIndexGin.cpp * Update AggregateFunctionMap.cpp * Update AggregateFunctionMap.cpp * fix	2023-01-24 00:13:58 +03:00
Nikolay Degterinsky	f9960361db	Fix MSan build	2023-01-23 14:38:07 +00:00
Sergei Trifonov	0fbfa17863	Merge branch 'master' into cancellable-mutex-integration	2023-01-23 12:44:09 +01:00
Maksim Kita	758c8f2776	Merge branch 'master' into dict/remove-preallocate	2023-01-20 13:15:37 +03:00
Han Fei	94336a9b66	fix typo	2023-01-19 13:55:29 +01:00
Han Fei	2884b8837b	fix regexp logical error in stress tests	2023-01-19 12:03:54 +01:00
Bharat Nallan Chakravarthy	16a2585d55	fixes to odbc connection pooling	2023-01-18 16:38:56 -08:00
Azat Khuzhin	4366f7fb3b	Remove PREALLOCATE for HASHED/SPARSE_HASHED dictionaries It does not give significant benefit, but now, you hashed/sparse_hashed dictionaries can be filled in parallel (#40003), using sharded dictionaries, and this should be used instead of PREALLOCATE. Note, that dictionaries, that had been created with PREALLOCATE will work, but simply ignore this attribute. Fixes: #41985 (cc @alexey-milovidov) Reverts: #23979 (cc @kitaisreal) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-18 20:18:37 +01:00
Azat Khuzhin	64e3677961	Avoid double hash calculation in HashedDictionary::getShard(StringRef) Previously it was written this way because getShard() was a simple module operation. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	2783850f08	Minor review fixes in HashedDictionary Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	6e0a7add93	Completelly exception safe HashedDictionary dtor Previously there was one (even though very unlikely) case when the dtor can throw - logging code or ThreadPool::wait. Just guard the dtor with try/catch and done with it. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	74def83c5d	Destroy hashtables for hashed dictionary in parallel only for sharded dict Since there can be multiple hashtables, since each attribute uses it's own hashtable. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	1c0e0ea1e4	Disable sharded dictionaries with updatable sources Support of sharded dictionary for updatable sources is questionable since: - sharded dictionary developed for hashed dictionary with a huge number of keys - updatable source requires storing the whole table in memory (due to how reload works) - also it is an open question will it have some benefits from the updatable source or not, since using updatable source with a huge number of changes in the source does not looks optimal and on the other side if there are small amount of changes the you don't need sharded dictionary at all Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	c97991fce1	Use shared arena for HashedDictionary::blockToAttributes() This should decrease number of allocations. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	01b100da61	Use shared arena in ParallelDictionaryLoader::createShardSelector() (and add missing rollback) This should decrease number of allocations. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	64874824b4	Minor review fixes in HashedDictionary Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	77c1f07636	Make HashedDictionary::~HashedDictionary exception safe Before it was possible for the desturctor to throw, in case of thread allocation fails, rewrite it to trySchedule() and do sequential destroy in this case. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	a3f189e191	Optimize sharded dictionaries with skewed distribution In case of skewed distribution simple division by module will not give you good distribution between shards and eventually this can lead to performance the same as non-sharded dictionary (except for it will occupy +1 thread for Block::scatter). But if HashedDictionary::blockToAttributes() will not have calls to HashedDictionary::getShard() this can be fixed by using a more complex key-to-shard (getShard()) mapping. And actually you do not need to call getShard() in blockToAttributes() you can simply use passed shard, and that's it. And by wrapping key with intHash64() in getShard() skewed distribution can be fixed. Note, that previously I tried similar approach but did not removed getShard() from blockToAttributes(), that's why it failed. And now it works almost as fast as with simple createBlockSelector(), just 13.6% slower (18.75min vs 16.5min, with 16 threads). Note, that I've also tried to add libdivide for this, but it does not improves the performance. I've also tried the approach without scatter, and it works 20% slower then this one (22.5min VS 18.75min, with 16 threads). v2: Use intHashCRC32() over intHash64() for HashedDictionary::getShard() (with intHash64() it works very slower, almost 2x slower, there was 18min with 32 threads) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	655a564280	Parallel hash tables destroy for hashed dictionaries Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	99063b152f	Allow to configure queue backlog of the parallel hashed dictionary loader v2: Decrease default parallel_queue_backlog to 10000 (same speed) v3: Rename parallel_queue_backlog to per_shard_load_backlog v3: Rename per_shard_load_backlog to shard_load_queue_backlog v4: Fix documentation Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	79ad81dfdf	Implement separate queue for parallel loader of hashed dictionaries Previous patches in this series has a bottleneck in rehash(). This is the most slowest operation when insert lots of rows into the hashtable and eventually all that thread pool sometimes work as the most slowest thread since we did not have any queue of blocks. This patch adds such queue and now it scales linearly, so initialy with 1 thread I had ~4 hours for 10e9 elements (UInt64 key, UInt16 value), after this patch it works in 16 minutes with 16 threads (well actually I have to use 32 threads because of distribution of data in the source table). And now with 16 threads it works 16 times faster. Also this patch adds more optimal block splitting for the non-complex dictionaries, and usual block splitting for complex dictionaries. But anyway this moves the overhead from the loading into the hashtable threads out to the reader thread, and this is better, since reader does not uses that much CPU. v2: fix use-after-free on failed load (add missing wait in dtor) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	5d0fd3cdc4	Remove sharded overhead for non-sharded hashed dictionaries By adding one more template parameter - HashedDictionary<sharded> (yes, it is already too much of them, for the template class that has explicit instantion). Since perf tests [1] shows 20% slowdown. [1]: https://s3.amazonaws.com/clickhouse-test-reports/40003/8f0cf2d6b8a7df511afe901331d5e2c7b06c0b4d/performance_comparison_[1/4]/report.html Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	345c422e28	Add ability to load hashed dictionaries using multiple threads Right now dictionaries (here I will talk about only HASHED/SPARSE_HASHED/COMPLEX_KEY_HASHED/COMPLEX_KEY_SPARSE_HASHED) can load data only in one thread, since it uses one hash table that cannot be filled from multiple threads. And in case you have very big dictionary (i.e. 10e9 elements), it can take a awhile to load them, especially for SPARSE_HASHED variants (and if you have such amount of elements there, you are likely use SPARSE_HASHED, since it requires less memory), in my env it takes ~4 hours, which is enormous amount of time. So this patch add support of shards for dictionaries, number of shards determine how much hash tables will use this dictionary, also, and which is more important, how much threads it can use to load the data. And with 16 threads this works 2x faster, not perfect though, see the follow up patches in this series. v0: PARTITION BY v1: SHARDS 1 v2: SHARDS(1) v3: tried optimized mod - logical and, but it does not gain even 10% v4: tried squashing more (max_block_size * shards), but it does not gain even 10% either v5: move SHARDS into layout parameters (unknown simply ignored) v6: tune params for perf tests (to avoid too long queries) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:25 +01:00
serxa	693489a8ad	review fixes	2023-01-12 15:51:04 +00:00
Sergei Trifonov	12d8543578	Merge branch 'master' into cancellable-mutex-integration	2023-01-12 16:03:49 +01:00
Han Fei	6ed4570f73	Merge branch 'master' into regexp-tree-dictionary	2023-01-10 15:36:30 +01:00
Maksim Kita	83a8d3ed25	RangeHashedDictionary update field primary key fix	2023-01-09 13:52:15 +01:00
Sergei Trifonov	81d2ea30ba	Merge branch 'master' into cancellable-mutex-integration	2023-01-07 19:37:46 +01:00
Anton Popov	1f32ffedf8	Merge pull request #43221 from ClickHouse/refactoring-ip-types Replace domain IP types (IPv4, IPv6) with native	2023-01-07 12:01:21 +01:00
serxa	15bb127b01	replace every `std::shared_mutex` with `DB::FastSharedMutex`	2023-01-06 23:35:38 +00:00
Han Fei	a4427a05c2	fix build	2023-01-06 14:30:00 +01:00
Kseniia Sumarokova	573d3283b0	Merge pull request #44327 from kssenii/use-new-named-collections-code-2 Replace old named collections code with new (from #43147) part 2	2023-01-06 13:06:26 +01:00
Han Fei	cac7f65b40	fix build	2023-01-06 11:49:34 +01:00
Han Fei	744084375c	fix build	2023-01-05 22:27:45 +01:00
Han Fei	ae5ee8194b	fix check style	2023-01-05 17:52:05 +01:00
Han Fei	f2a9eea995	write docs and optimize regex compile	2023-01-05 17:38:01 +01:00
Yakov Olkhovskiy	7a5a36cbed	Merge branch 'master' into refactoring-ip-types	2023-01-04 11:11:06 -05:00
Han Fei	65ef7b4adc	fix build	2023-01-04 12:45:12 +01:00
Nikolay Degterinsky	aa41e9b775	Merge pull request #44857 from evillique/fix-msan-build Try to fix MSan build	2023-01-04 04:31:28 +01:00
Han Fei	00e717d7ce	some improvement	2023-01-03 21:41:51 +01:00
kssenii	67509aa2d5	Merge remote-tracking branch 'upstream/master' into use-new-named-collections-code-2	2023-01-03 16:41:30 +01:00
Nikolay Degterinsky	c4431e9931	Fix MSan build	2023-01-03 02:21:26 +00:00
Alexey Milovidov	e855d3519a	Merge branch 'master' into refactoring-ip-types	2023-01-02 21:58:53 +03:00

... 2 3 4 5 6 ...

1588 Commits