Azat Khuzhin
2fd1a73812
Fix element_count for HASHED/SPARSE_HASHED with multiple attributes
...
Previosly element_count was multplied by the number of attributes.
Fixes : #5440
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-01 12:44:41 +02:00
Azat Khuzhin
93201f21d9
Fix load_factor for HASHED/SPARSE_HASHED dictionaries with SHARDS
...
Previously, bucket_count was set only for the one shard, and hence
load_factor was > 1.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-01 12:44:41 +02:00
MikhailBurdukov
5c9959af49
Resolve conservation
2023-04-28 12:40:47 +00:00
MikhailBurdukov
40ad8499a0
fix
2023-04-26 21:03:27 +00:00
MikhailBurdukov
b229a28e94
Merge branch 'master' into mongo_dict_tls
2023-04-26 23:39:27 +03:00
MikhailBurdukov
389c0af922
Fix style
2023-04-26 19:36:34 +00:00
MikhailBurdukov
baaee66e85
Missing files
2023-04-26 19:29:29 +00:00
MikhailBurdukov
d76430fe90
Added options handling for mongoo dict
2023-04-26 19:19:10 +00:00
Raúl Marín
f0e045bb3d
Merge remote-tracking branch 'blessed/master' into arenita
2023-04-24 10:42:56 +02:00
Rory Crispin
99175aefae
trailing whitespace
2023-04-22 17:02:05 +01:00
Rory Crispin
66300402ee
Bracket on newline
2023-04-22 15:42:57 +01:00
Rory Crispin
5e80e9263e
Remove spaces around ->
2023-04-22 15:29:20 +01:00
Rory Crispin
32dcc2e37b
Merge branch 'master' into dict-lifetime-validation
2023-04-22 14:55:11 +01:00
Rory Crispin
006af1dfa1
validate direct dictionary lifetime is unset during creation
2023-04-22 15:49:04 +02:00
Kruglov Pavel
2ad161d2b7
Merge branch 'master' into non-blocking-connect
2023-04-19 13:39:40 +02:00
robot-clickhouse-ci-2
45f4a5f74c
Merge pull request #47964 from ClickHouse/fast-parquet
...
Read Parquet files faster
2023-04-17 19:27:38 +02:00
Raúl Marín
39f8c43a60
Merge remote-tracking branch 'blessed/master' into arenita
2023-04-17 10:33:38 +02:00
Michael Kolupaev
2d4fe85513
Something
2023-04-17 04:58:32 +00:00
Kseniia Sumarokova
6a0d9a37ce
Merge branch 'master' into fix-mysql-named-collection
2023-04-14 12:03:51 +02:00
kssenii
ad48e1d010
Fox
2023-04-13 19:36:25 +02:00
Raúl Marín
2b70e08f23
Don't count unreserved bytes in Arenas as read_bytes
2023-04-13 12:43:24 +02:00
Robert Schulze
7a21d5888c
Remove -Wshadow suppression which leaked into global namespace
2023-04-13 08:46:40 +00:00
Raúl Marín
da9a539cf7
Reduce the usage of Arena.h
2023-04-13 10:31:32 +02:00
Robert Schulze
f41354ccd6
Merge pull request #48671 from ClickHouse/rs/gcc-removal
...
Remove GCC remainders
2023-04-13 10:15:35 +02:00
Alexander Tokmakov
75f18b1198
Revert "Check simple dictionary key is native unsigned integer"
2023-04-13 01:32:19 +03:00
Robert Schulze
3f7ce60e03
Merge branch 'master' into rs/gcc-removal
2023-04-12 22:17:04 +02:00
Anton Popov
1520f3e924
Merge pull request #48335 from lzydmxy/check_sample_dict_key_is_correct
...
Check simple dictionary key is native unsigned integer
2023-04-12 14:27:39 +02:00
Robert Schulze
05606a8835
Clean up GCC warning pragmas
2023-04-11 18:21:08 +00:00
Han Fei
bf28be8837
fix 02504_regexp_dictionary_table_source
2023-04-11 17:07:44 +02:00
Han Fei
6c33180ac8
Merge branch 'master' into hanfei/refine-expmsg
2023-04-11 13:19:38 +02:00
Han Fei
363b97fab8
refine some messages of exception in regexp tree
2023-04-11 11:45:29 +02:00
Alexey Milovidov
d259217cf3
Merge pull request #48570 from azat/build/logger_useful
...
Remove superfluous includes of logger_userful.h from headers
2023-04-11 03:56:39 +03:00
Alexey Milovidov
93b8fc74ef
Merge pull request #48571 from azat/dict/hashed/uncaught-exception-fix
...
Fix uncaught exception in case of parallel loader for hashed dictionaries
2023-04-10 23:11:01 +03:00
Azat Khuzhin
79b83c4fd2
Remove superfluous includes of logger_userful.h from headers
...
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-04-10 17:59:30 +02:00
Azat Khuzhin
211cea5e7c
Fix uncaught exception in case of parallel loader for hashed dictionaries
...
Since ThreadPool::wait() rethrows the first exception (if any):
<details>
<summary>stacktrace</summary>
2023.04.09 12:53:33.629333 [ 22361 ] {} <Fatal> BaseDaemon: (version 22.13.1.1, build id: 5FB01DCAAFFF19F0A9A61E253567F90685989D2F) (from thread 23032) Terminate called for uncaught exception:
2023.04.09 12:53:33.630179 [ 23645 ] {} <Fatal> BaseDaemon:
2023.04.09 12:53:33.630213 [ 23645 ] {} <Fatal> BaseDaemon: Stack trace: 0x7f68b00baccc 0x7f68b006bef2 0x7f68b0056472 0x112a42fe 0x1c17f2a3 0x1c17f238 0xbf4bc3b 0x13961c6d 0x138ee529 0x138ed6bc 0x138dd2f0 0x138dd9c6 0x1571d0dd 0x16197c1f 0x161a231e 0x1619fc93 0x161a51b9 0x11151759 0x1115454e 0x7f68b00b8fd4 0x7f68b013966c
2023.04.09 12:53:33.630247 [ 23645 ] {} <Fatal> BaseDaemon: 3. ? @ 0x7f68b00baccc in ?
2023.04.09 12:53:33.630263 [ 23645 ] {} <Fatal> BaseDaemon: 4. gsignal @ 0x7f68b006bef2 in ?
2023.04.09 12:53:33.630273 [ 23645 ] {} <Fatal> BaseDaemon: 5. abort @ 0x7f68b0056472 in ?
2023.04.09 12:53:33.648815 [ 23645 ] {} <Fatal> BaseDaemon: 6. ./.build/./src/Daemon/BaseDaemon.cpp:456: terminate_handler() @ 0x112a42fe in /usr/lib/debug/usr/bin/clickhouse.debug
2023.04.09 12:53:33.651484 [ 23645 ] {} <Fatal> BaseDaemon: 7. ./.build/./contrib/llvm-project/libcxxabi/src/cxa_handlers.cpp:61: std::__terminate(void (*)()) @ 0x1c17f2a3 in /usr/lib/debug/usr/bin/clickhouse.debug
2023.04.09 12:53:33.654080 [ 23645 ] {} <Fatal> BaseDaemon: 8. ./.build/./contrib/llvm-project/libcxxabi/src/cxa_handlers.cpp:79: std::terminate() @ 0x1c17f238 in /usr/lib/debug/usr/bin/clickhouse.debug
2023.04.09 12:53:35.025565 [ 23645 ] {} <Fatal> BaseDaemon: 9. ? @ 0xbf4bc3b in /usr/lib/debug/usr/bin/clickhouse.debug
2023.04.09 12:53:36.495557 [ 23645 ] {} <Fatal> BaseDaemon: 10. DB::ParallelDictionaryLoader<(DB::DictionaryKeyType)0, true, true>::~ParallelDictionaryLoader() @ 0x13961c6d in /usr/lib/debug/usr/bin/clickhouse.debug
2023.04.09 12:53:37.833142 [ 23645 ] {} <Fatal> BaseDaemon: 11. DB::HashedDictionary<(DB::DictionaryKeyType)0, true, true>::loadData() @ 0x138ee529 in /usr/lib/debug/usr/bin/clickhouse.debug
2023.04.09 12:53:39.124989 [ 23645 ] {} <Fatal> BaseDaemon: 12. DB::HashedDictionary<(DB::DictionaryKeyType)0, true, true>::HashedDictionary(DB::StorageID const&, DB::DictionaryStructure const&, std::__1::shared_ptr<DB::IDictionarySource>, DB::HashedDictionaryStorageConfiguration const&, std::__1::shared_ptr<DB::Block>) @ 0x138ed6bc in /usr/lib/debug/usr/bin/clickhouse.debug
</details>
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-04-09 22:52:51 +02:00
Alexey Milovidov
09ea79aaf7
Add support for {server_uuid} macro
2023-04-09 03:04:26 +02:00
ltrk2
4544abc7d6
Remove dead code and unused dependencies
2023-04-06 11:37:12 -07:00
lzydmxy
529e1466df
use check_dictionary_primary_key instead of check_sample_dict_key_is_correct
2023-04-04 12:04:17 +08:00
lzydmxy
368c120f42
check sample dictionary key is native unsigned integer
2023-04-03 15:48:40 +08:00
kssenii
f96e7b59a2
Better
2023-04-01 13:19:07 +02:00
kssenii
1721b70070
Merge remote-tracking branch 'upstream/master' into ilejn-dict-named-collection
2023-04-01 13:18:26 +02:00
Alexey Milovidov
e982fb9f1c
Merge pull request #47880 from azat/threadpool-introspection
...
ThreadPool metrics introspection
2023-03-30 01:27:31 +03:00
Azat Khuzhin
f38a7aeabe
ThreadPool metrics introspection
...
There are lots of thread pools and simple local-vs-global is not enough
already, it is good to know which one in particular uses threads.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-03-29 10:46:59 +02:00
Vladimir C
d32c285d17
Merge branch 'master' into vdimir/direct-dict-async-read
2023-03-28 12:41:20 +02:00
Han Fei
e3afa5090f
Merge pull request #47218 from hanfei1991/hanfei/optimize-regexp-tree-1
...
Refine OptimizeRegularExpression Function and RegexpTreeDict
2023-03-27 15:23:01 +02:00
vdimir
f6de216041
PullingAsyncPipelineExecutor for Direct dictionary with ClickHouse source
2023-03-27 09:52:26 +00:00
Kruglov Pavel
3ee12e21fb
Merge branch 'master' into non-blocking-connect
2023-03-23 20:53:44 +01:00
Han Fei
575c4263a3
address comments
2023-03-22 17:47:25 +01:00
avogar
38e44861ae
Fix possible race conditions
2023-03-21 16:01:54 +00:00
Kseniia Sumarokova
3c550b4314
Merge pull request #46647 from kssenii/named-collections-finish
...
Named collections: finish replacing old code for storages
2023-03-21 12:36:46 +01:00
Robert Schulze
5b036a1a3b
More preparation for libcxx(abi), llvm, clang-tidy 16 (follow-up to #47722 )
2023-03-20 12:55:03 +00:00
kssenii
cae3b335d6
Merge remote-tracking branch 'upstream/master' into named-collections-finish
2023-03-20 11:23:22 +01:00
kssenii
bb0beb7449
Merge remote-tracking branch 'upstream/master' into named-collections-finish
2023-03-17 13:02:36 +01:00
Sema Checherinda
3c6deddd1d
work with comments on PR
2023-03-16 19:55:58 +01:00
Han Fei
e0954ce7be
fix compile
2023-03-16 00:22:05 +01:00
Han Fei
a532503466
Merge branch 'master' into hanfei/optimize-regexp-tree-1
2023-03-15 17:56:01 +01:00
Han Fei
424e8df9ad
fix style
2023-03-15 16:01:12 +01:00
Han Fei
d78a9e03ad
refine
2023-03-15 15:38:11 +01:00
Kseniia Sumarokova
a9a0d2f5c4
Merge pull request #46524 from artem-yadr/master
...
Support for MongoDB Replica Set URI with readPreference and host:port enum in MongoDB dictionaries
2023-03-07 11:40:33 +01:00
Kseniia Sumarokova
386663953c
Merge branch 'master' into named-collections-finish
2023-03-03 12:23:38 +01:00
artem-yadr
e1352adced
Update MongoDBDictionarySource.cpp
2023-03-01 12:50:03 +03:00
Han Fei
e77dd81036
fix
2023-02-24 19:48:46 +01:00
Han Fei
e8527e720b
refine regexp tree dictionary
2023-02-24 13:08:27 +01:00
kssenii
a54b011670
Finish for mysql
2023-02-20 21:37:38 +01:00
Alexey Milovidov
d8cda3dbb8
Remove PVS-Studio
2023-02-19 23:30:05 +01:00
artem-yadr
08734d4dc0
poco changes are now used in MongoDBDictionarySource
2023-02-17 14:56:21 +03:00
Maksim Kita
c469e10092
Dictionaries DictionaryStorageFetchRequest fix
2023-02-16 12:17:02 +01:00
Nikolay Degterinsky
6e4b660033
Move MongoDB and PostgreSQL sources to Sources folder
2023-02-14 22:35:10 +00:00
Ilya Golshtein
38ea27489c
secure in named collections - small cleanup
2023-02-13 01:04:38 +03:00
Alexey Milovidov
44bd95a410
Merge pull request #46167 from ClickHouse/rs/reject-dos-patterns
...
Reject hyperscan regexes which are prone to ReDoS
2023-02-11 06:04:03 +03:00
Ilya Golshtein
3b72b3f13b
secure in named collection - switched to specific_args, tests added
2023-02-10 13:42:11 +03:00
Robert Schulze
74937cf27b
Reject DoS-prone hyperscan regexes
2023-02-09 17:17:35 +00:00
Robert Schulze
e490ec91d9
Merge branch 'master' into rs/fix-fragile-linking
2023-02-09 11:33:59 +01:00
Alexander Tokmakov
8101b044fa
Merge pull request #46091 from azat/sanity-assertions
...
Sanity assertions for closing file descriptors
2023-02-09 01:02:03 +03:00
Ilya Golshtein
f9e81ca7de
secure in named collections - initial
2023-02-08 23:30:16 +03:00
Maksim Kita
77ec255d7c
Merge pull request #45396 from kitaisreal/hashed-dictionary-sharded-nullable-fix
...
HashedDictionary sharded fix nullable values
2023-02-08 17:15:10 +03:00
Robert Schulze
6ff232d782
Merge branch 'master' into rs/fix-fragile-linking
2023-02-08 12:51:12 +01:00
Alexey Milovidov
55c3bbb739
Fix assertion in statistical functions
2023-02-08 00:09:41 +01:00
Robert Schulze
10af0b3e49
Reduce redundancies
2023-02-07 12:27:23 +00:00
Azat Khuzhin
8cc41b7f41
Check return value of ::close()
...
Note, that according close(2), EINTR should not be retriable for close()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-02-07 11:28:22 +01:00
Han Fei
d1d893275a
fix
2023-02-06 18:46:23 +01:00
Han Fei
eb76041312
address comments and add one more test
2023-02-06 17:26:20 +01:00
Maksim Kita
e8d66fb1a2
HashedDictionary sharded fix nullable values
2023-02-06 10:50:58 +01:00
Robert Schulze
84b9ff450f
Fix terribly broken, fragile and potentially cyclic linking
...
Sorry for the clickbaity title. This is about static method
ConnectionTimeouts::getHTTPTimeouts(). It was be declared in header
IO/ConnectionTimeouts.h, and defined in header
IO/ConnectionTimeoutsContext.h (!). This is weird and caused issues with
linking on s390x (##45520). There was an attempt to fix some
inconsistencies (#45848 ) but neither did @Algunenano nor me at first
really understand why the definition is in the header.
Turns out that ConnectionTimeoutsContext.h is only #include'd from
source files which are part of the normal server build BUT NOT part of
the keeper standalone build (which must be enabled via CMake
-DBUILD_STANDALONE_KEEPER=1). This dependency was not documented and as
a result, some misguided workarounds were introduced earlier, e.g.
0341c6c54b
The deeper cause was that getHTTPTimeouts() is passed a "Context". This
class is part of the "dbms" libary which is deliberately not linked by
the standalone build of clickhouse-keeper. The context is only used to
read the settings and the "Settings" class is part of the
clickhouse_common library which is linked by clickhouse-keeper already.
To resolve this mess, this PR
- creates source file IO/ConnectionTimeouts.cpp and moves all
ConnectionTimeouts definitions into it, including getHTTPTimeouts().
- breaks the wrong dependency by passing "Settings" instead of "Context"
into getHTTPTimeouts().
- resolves the previous hacks
2023-02-05 20:49:34 +00:00
Han Fei
baa345fa64
remove logs
2023-02-05 18:06:06 +01:00
Han Fei
9ea3de14ce
use re2 by default
2023-02-04 10:53:54 +01:00
Han Fei
2656027c9f
make it work if we dont define use_vectorscan macro
2023-02-03 14:25:53 +01:00
Han Fei
a2e43bc333
log for ci
2023-02-02 14:40:09 +01:00
Han Fei
90153c11fc
fix matching priority
2023-02-01 15:09:04 +01:00
Han Fei
24b8322bc9
Merge branch 'master' into hanfei/regexp-refine
2023-01-31 17:03:51 +01:00
Bharat Nallan
f1d6e3b908
Merge branch 'master' into ncb/odbc-connection-pool-fixes
2023-01-30 15:49:04 -08:00
Alexander Tokmakov
d7c697ee38
fix
2023-01-26 15:24:39 +01:00
Alexander Tokmakov
14db798191
fix
2023-01-26 13:56:16 +01:00
Alexander Tokmakov
9b670946db
Merge branch 'master' into exception_message_patterns5
2023-01-26 00:41:32 +01:00
Han Fei
7df30b1d82
remove trivial logs
2023-01-25 23:57:20 +01:00
Han Fei
f4d38b82e6
RegExpTreeDict use re2 engines when processing heavy regexps
2023-01-25 23:49:00 +01:00
Nikolay Degterinsky
6b2f3de293
Merge pull request #45512 from evillique/fix-msan-build
...
Fix MSan build once again (too heavy translation units)
2023-01-25 16:11:21 +01:00
Alexander Tokmakov
6eb557b2ba
Merge branch 'master' into exception_message_patterns4
2023-01-25 13:49:17 +01:00
Sergei Trifonov
0d1ea05ff6
Merge pull request #45007 from ClickHouse/cancellable-mutex-integration
...
Fast shared mutex integration
2023-01-25 11:15:46 +01:00
Bharat Nallan
2ef8fcb318
Merge branch 'master' into ncb/odbc-connection-pool-fixes
2023-01-24 21:27:20 -08:00
Nikolay Degterinsky
97aef55a7f
Fix typo
2023-01-24 23:00:02 +00:00
Nikolay Degterinsky
d8d85d9bbd
Merge remote-tracking branch 'upstream/master' into fix-msan-build
2023-01-24 22:57:47 +00:00
Nikolay Degterinsky
fb6838b043
Review suggestions
2023-01-24 22:54:01 +00:00
Alexander Tokmakov
3f6594f4c6
forbid old ctor of Exception
2023-01-23 22:18:05 +01:00
Alexander Tokmakov
70d1adfe4b
Better formatting for exception messages ( #45449 )
...
* save format string for NetException
* format exceptions
* format exceptions 2
* format exceptions 3
* format exceptions 4
* format exceptions 5
* format exceptions 6
* fix
* format exceptions 7
* format exceptions 8
* Update MergeTreeIndexGin.cpp
* Update AggregateFunctionMap.cpp
* Update AggregateFunctionMap.cpp
* fix
2023-01-24 00:13:58 +03:00
Nikolay Degterinsky
f9960361db
Fix MSan build
2023-01-23 14:38:07 +00:00
Sergei Trifonov
0fbfa17863
Merge branch 'master' into cancellable-mutex-integration
2023-01-23 12:44:09 +01:00
Maksim Kita
758c8f2776
Merge branch 'master' into dict/remove-preallocate
2023-01-20 13:15:37 +03:00
Han Fei
94336a9b66
fix typo
2023-01-19 13:55:29 +01:00
Han Fei
2884b8837b
fix regexp logical error in stress tests
2023-01-19 12:03:54 +01:00
Bharat Nallan Chakravarthy
16a2585d55
fixes to odbc connection pooling
2023-01-18 16:38:56 -08:00
Azat Khuzhin
4366f7fb3b
Remove PREALLOCATE for HASHED/SPARSE_HASHED dictionaries
...
It does not give significant benefit, but now, you hashed/sparse_hashed
dictionaries can be filled in parallel (#40003 ), using sharded
dictionaries, and this should be used instead of PREALLOCATE.
Note, that dictionaries, that had been created with PREALLOCATE will
work, but simply ignore this attribute.
Fixes : #41985 (cc @alexey-milovidov)
Reverts: #23979 (cc @kitaisreal)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-18 20:18:37 +01:00
Azat Khuzhin
64e3677961
Avoid double hash calculation in HashedDictionary::getShard(StringRef)
...
Previously it was written this way because getShard() was a simple
module operation.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
2783850f08
Minor review fixes in HashedDictionary
...
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
6e0a7add93
Completelly exception safe HashedDictionary dtor
...
Previously there was one (even though very unlikely) case when the dtor
can throw - logging code or ThreadPool::wait.
Just guard the dtor with try/catch and done with it.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
74def83c5d
Destroy hashtables for hashed dictionary in parallel only for sharded dict
...
Since there can be multiple hashtables, since each attribute uses it's
own hashtable.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
1c0e0ea1e4
Disable sharded dictionaries with updatable sources
...
Support of sharded dictionary for updatable sources is questionable
since:
- sharded dictionary developed for hashed dictionary with a huge number
of keys
- updatable source requires storing the whole table in memory (due to
how reload works)
- also it is an open question will it have some benefits from the
updatable source or not, since using updatable source with a huge
number of changes in the source does not looks optimal and on the
other side if there are small amount of changes the you don't need
sharded dictionary at all
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
c97991fce1
Use shared arena for HashedDictionary::blockToAttributes()
...
This should decrease number of allocations.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
01b100da61
Use shared arena in ParallelDictionaryLoader::createShardSelector() (and add missing rollback)
...
This should decrease number of allocations.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
64874824b4
Minor review fixes in HashedDictionary
...
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
77c1f07636
Make HashedDictionary::~HashedDictionary exception safe
...
Before it was possible for the desturctor to throw, in case of thread
allocation fails, rewrite it to trySchedule() and do sequential destroy
in this case.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
a3f189e191
Optimize sharded dictionaries with skewed distribution
...
In case of skewed distribution simple division by module will not give
you good distribution between shards and eventually this can lead to
performance the same as non-sharded dictionary (except for it will
occupy +1 thread for Block::scatter).
But if HashedDictionary::blockToAttributes() will not have calls to
HashedDictionary::getShard() this can be fixed by using a more complex
key-to-shard (getShard()) mapping. And actually you do not need to call
getShard() in blockToAttributes() you can simply use passed shard, and
that's it.
And by wrapping key with intHash64() in getShard() skewed distribution
can be fixed.
Note, that previously I tried similar approach but did not removed
getShard() from blockToAttributes(), that's why it failed.
And now it works almost as fast as with simple createBlockSelector(),
just 13.6% slower (18.75min vs 16.5min, with 16 threads).
Note, that I've also tried to add libdivide for this, but it does not
improves the performance.
I've also tried the approach without scatter, and it works 20% slower
then this one (22.5min VS 18.75min, with 16 threads).
v2: Use intHashCRC32() over intHash64() for HashedDictionary::getShard()
(with intHash64() it works very slower, almost 2x slower, there was
18min with 32 threads)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
655a564280
Parallel hash tables destroy for hashed dictionaries
...
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
99063b152f
Allow to configure queue backlog of the parallel hashed dictionary loader
...
v2: Decrease default parallel_queue_backlog to 10000 (same speed)
v3: Rename parallel_queue_backlog to per_shard_load_backlog
v3: Rename per_shard_load_backlog to shard_load_queue_backlog
v4: Fix documentation
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
79ad81dfdf
Implement separate queue for parallel loader of hashed dictionaries
...
Previous patches in this series has a bottleneck in rehash(). This is
the most slowest operation when insert lots of rows into the hashtable
and eventually all that thread pool sometimes work as the most slowest
thread since we did not have any queue of blocks.
This patch adds such queue and now it scales linearly, so initialy with
1 thread I had ~4 hours for 10e9 elements (UInt64 key, UInt16 value),
after this patch it works in 16 minutes with 16 threads (well actually I
have to use 32 threads because of distribution of data in the source
table).
And now with 16 threads it works 16 times faster.
Also this patch adds more optimal block splitting for the non-complex
dictionaries, and usual block splitting for complex dictionaries.
But anyway this moves the overhead from the loading into the hashtable
threads out to the reader thread, and this is better, since reader does
not uses that much CPU.
v2: fix use-after-free on failed load (add missing wait in dtor)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
5d0fd3cdc4
Remove sharded overhead for non-sharded hashed dictionaries
...
By adding one more template parameter - HashedDictionary<sharded> (yes,
it is already too much of them, for the template class that has explicit
instantion).
Since perf tests [1] shows 20% slowdown.
[1]: https://s3.amazonaws.com/clickhouse-test-reports/40003/8f0cf2d6b8a7df511afe901331d5e2c7b06c0b4d/performance_comparison_[1/4]/report.html
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
345c422e28
Add ability to load hashed dictionaries using multiple threads
...
Right now dictionaries (here I will talk about only
HASHED/SPARSE_HASHED/COMPLEX_KEY_HASHED/COMPLEX_KEY_SPARSE_HASHED)
can load data only in one thread, since it uses one hash table that
cannot be filled from multiple threads.
And in case you have very big dictionary (i.e. 10e9 elements), it can
take a awhile to load them, especially for SPARSE_HASHED variants (and
if you have such amount of elements there, you are likely use
SPARSE_HASHED, since it requires less memory), in my env it takes ~4
hours, which is enormous amount of time.
So this patch add support of shards for dictionaries, number of shards
determine how much hash tables will use this dictionary, also, and which
is more important, how much threads it can use to load the data.
And with 16 threads this works 2x faster, not perfect though, see the
follow up patches in this series.
v0: PARTITION BY
v1: SHARDS 1
v2: SHARDS(1)
v3: tried optimized mod - logical and, but it does not gain even 10%
v4: tried squashing more (max_block_size * shards), but it does not gain even 10% either
v5: move SHARDS into layout parameters (unknown simply ignored)
v6: tune params for perf tests (to avoid too long queries)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:25 +01:00
serxa
693489a8ad
review fixes
2023-01-12 15:51:04 +00:00
Sergei Trifonov
12d8543578
Merge branch 'master' into cancellable-mutex-integration
2023-01-12 16:03:49 +01:00
Han Fei
6ed4570f73
Merge branch 'master' into regexp-tree-dictionary
2023-01-10 15:36:30 +01:00
Maksim Kita
83a8d3ed25
RangeHashedDictionary update field primary key fix
2023-01-09 13:52:15 +01:00
Sergei Trifonov
81d2ea30ba
Merge branch 'master' into cancellable-mutex-integration
2023-01-07 19:37:46 +01:00
Anton Popov
1f32ffedf8
Merge pull request #43221 from ClickHouse/refactoring-ip-types
...
Replace domain IP types (IPv4, IPv6) with native
2023-01-07 12:01:21 +01:00
serxa
15bb127b01
replace every std::shared_mutex
with DB::FastSharedMutex
2023-01-06 23:35:38 +00:00
Han Fei
a4427a05c2
fix build
2023-01-06 14:30:00 +01:00
Kseniia Sumarokova
573d3283b0
Merge pull request #44327 from kssenii/use-new-named-collections-code-2
...
Replace old named collections code with new (from #43147 ) part 2
2023-01-06 13:06:26 +01:00
Han Fei
cac7f65b40
fix build
2023-01-06 11:49:34 +01:00
Han Fei
744084375c
fix build
2023-01-05 22:27:45 +01:00
Han Fei
ae5ee8194b
fix check style
2023-01-05 17:52:05 +01:00
Han Fei
f2a9eea995
write docs and optimize regex compile
2023-01-05 17:38:01 +01:00
Yakov Olkhovskiy
7a5a36cbed
Merge branch 'master' into refactoring-ip-types
2023-01-04 11:11:06 -05:00
Han Fei
65ef7b4adc
fix build
2023-01-04 12:45:12 +01:00
Nikolay Degterinsky
aa41e9b775
Merge pull request #44857 from evillique/fix-msan-build
...
Try to fix MSan build
2023-01-04 04:31:28 +01:00
Han Fei
00e717d7ce
some improvement
2023-01-03 21:41:51 +01:00
kssenii
67509aa2d5
Merge remote-tracking branch 'upstream/master' into use-new-named-collections-code-2
2023-01-03 16:41:30 +01:00
Nikolay Degterinsky
c4431e9931
Fix MSan build
2023-01-03 02:21:26 +00:00
Alexey Milovidov
e855d3519a
Merge branch 'master' into refactoring-ip-types
2023-01-02 21:58:53 +03:00
Han Fei
97cdfdceea
fix style check
2022-12-31 20:36:23 +01:00
Han Fei
c25207fc21
Merge branch 'master' into regexp-tree-dictionary
2022-12-30 17:31:44 +01:00
Han Fei
83c6517fcf
try to fix flaky tests
2022-12-30 17:31:28 +01:00
Nikolay Degterinsky
dfe93b5d82
Merge pull request #42284 from Algunenano/perf_experiment
...
Performance experiment
2022-12-30 03:14:22 +01:00
Han Fei
fa1baef448
add check sanitizer
2022-12-29 23:00:55 +01:00
Han Fei
50905e2005
address comments
2022-12-29 20:13:46 +01:00
Alexey Milovidov
33bcd07be5
Remove old code
2022-12-28 19:02:06 +01:00
Raúl Marín
fc1fa82a39
Merge branch 'master' into perf_experiment
2022-12-27 10:51:58 +01:00
mayamika
f66a0c01ad
Add null dictionary source
2022-12-24 17:11:30 +03:00
Han Fei
4859197c34
fix build
2022-12-22 23:59:04 +01:00
Han Fei
2bb952a796
fix build
2022-12-22 23:27:10 +01:00
Han Fei
efa963fb0e
support regex tree dictionary
2022-12-22 22:42:11 +01:00
Yakov Olkhovskiy
a8cb29da4b
Merge branch 'master' into refactoring-ip-types
2022-12-21 23:56:24 -05:00
Raúl Marín
45d27f461b
Merge branch 'master' into perf_experiment
2022-12-20 09:07:48 +00:00
kssenii
30547d2dcd
Replace old named collections code for url
2022-12-17 00:24:05 +01:00
Vitaly Baranov
fb8aca8319
Merge pull request #44158 from vitlibar/improve-referential-deps
...
Improve referential dependencies
2022-12-14 21:17:02 +01:00
Han Fei
d3f8bb3f52
Merge branch 'master' into regexp-tree-dictionary
2022-12-14 16:29:17 +01:00
Han Fei
2272d712e2
reimplement
2022-12-14 16:28:57 +01:00
Nikolay Degterinsky
9b6d31b95d
Merge branch 'master' into perf_experiment
2022-12-13 17:15:07 +01:00
Vitaly Baranov
5aaff60650
Fix referential dependencies when host & post in a clickHouse dictionary source are set by default.
2022-12-12 18:22:14 +01:00
Alexander Tokmakov
db7f2ed42b
Update getDictionaryConfigurationFromAST.cpp
2022-12-12 16:46:04 +03:00
Han Fei
ae91d0c78e
fix check-style
2022-12-01 13:29:55 +01:00
Han Fei
e28c98d4ba
fix macro flag
2022-12-01 11:50:18 +01:00
Han Fei
41f726ce83
fix compile
2022-11-30 11:21:45 +01:00
Yakov Olkhovskiy
77266ea754
cleanup
2022-11-29 17:34:16 +00:00
Han Fei
27ec6bc42a
Merge branch 'master' into regexp-tree-dictionary
2022-11-29 16:33:30 +01:00
Yakov Olkhovskiy
770b520ded
Merge branch 'master' into refactoring-ip-types
2022-11-28 08:50:19 -05:00
Yakov Olkhovskiy
fd58719271
IPAddressDictionary fixed, test fixed
2022-11-24 20:42:36 +00:00
Yakov Olkhovskiy
de20d58f6b
style fix
2022-11-22 21:55:07 +00:00
Yakov Olkhovskiy
2cbe748e68
functions fixed, test fixed
2022-11-22 21:34:32 +00:00
Raúl Marín
4aa29b6a63
Merge remote-tracking branch 'blessed/master' into perf_experiment
2022-11-22 19:09:00 +01:00
Yakov Olkhovskiy
4d144be39c
replace domain IP types (IPv4, IPv6) with native
2022-11-14 14:17:17 +00:00
avogar
9e89af28c6
Refactor BSONEachRow format, fix bugs, support more data types, support parallel parsing and schema inference
2022-11-10 20:15:14 +00:00
Raúl Marín
6e0a9452e7
Merge remote-tracking branch 'blessed/master' into perf_experiment
2022-10-25 15:25:06 +02:00
Azat Khuzhin
4e76629aaf
Fixes for -Wshorten-64-to-32
...
- lots of static_cast
- add safe_cast
- types adjustments
- config
- IStorage::read/watch
- ...
- some TODO's (to convert types in future)
P.S. That was quite a journey...
v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-10-21 13:25:19 +02:00
Raúl Marín
e60415d07d
Make clang-tidy happy
2022-10-18 11:40:12 +02:00
Alexander Tokmakov
fa1134f299
Merge branch 'master' into fix_loading_dependencies
2022-10-10 16:30:52 +02:00
Alexander Tokmakov
4175f8cde6
abort instead of __builtin_unreachable in debug builds
2022-10-07 21:49:08 +02:00
Alexander Tokmakov
014784a9ca
Merge branch 'master' into fix_loading_dependencies
2022-10-07 18:58:11 +02:00
Alexander Tokmakov
690ec74bf2
better handling for expressions in dictGet
2022-10-05 20:58:27 +02:00
Robert Schulze
fd86829824
Consolidate config_core.h into config.h
...
Less duplication, less confusion ...
2022-09-28 13:31:57 +00:00
Robert Schulze
78fc36ca49
Generate config.h into ${CONFIG_INCLUDE_PATH}
...
This makes the target location consistent with other auto-generated
files like config_formats.h, config_core.h, and config_functions.h and
simplifies the build of clickhouse_common.
2022-09-28 12:48:26 +00:00
Kseniia Sumarokova
ea43cb5648
Merge pull request #41261 from kssenii/s3-header-auth
...
Support s3 authorisation headers from ast arguments
2022-09-23 12:48:08 +02:00
Alexey Milovidov
bb6f1bfce2
Fix 9/10 of trash
2022-09-19 08:53:20 +02:00
Alexey Milovidov
730655d4fd
Fix 8/9 of trash
2022-09-19 08:53:20 +02:00
Alexey Milovidov
42b0d444da
Fix 7/8 of trash
2022-09-19 08:53:20 +02:00
Alexey Milovidov
91baedf03a
Fix 6/7 of trash
2022-09-19 08:53:20 +02:00
Alexey Milovidov
84f42e0874
Fix 3/4 of trash
2022-09-19 08:50:53 +02:00
kssenii
420ac4eb43
s3 header auth in ast
2022-09-13 15:13:28 +02:00
Alexey Milovidov
fd235919aa
Remove some methods
2022-09-10 05:04:40 +02:00
kssenii
83514fa2ef
Refactor
2022-09-05 20:08:22 +02:00
Antonio Andelic
e64436fef3
Fix typos with new codespell
2022-09-02 08:54:48 +00:00
Vage Ogannisian
9b2326cc6c
Add RegExpTree dictionary
2022-08-31 19:34:50 +00:00
Alexey Milovidov
1ff535a128
One step back
2022-08-28 02:00:09 +02:00
Vage Ogannisian
540fa7fe5b
Add YAML regular expressions tree dictionary source
2022-08-24 14:01:56 +00:00
vdimir
5fea2091ac
Embed IKeyValue impl into IDictionary.h
2022-08-10 15:58:15 +00:00
vdimir
90fa2ed8c1
better code for join with dict
2022-08-10 14:20:29 +00:00
vdimir
7073067d40
check attributes for join with dict
2022-08-10 14:20:26 +00:00
Yakov Olkhovskiy
d39e9f65de
Merge branch 'master' into fix-quota-key
2022-08-08 11:54:21 -04:00
Robert Schulze
ad0d060dc1
Merge pull request #39904 from ClickHouse/library-bridge-refactoring
...
Prepare library-bridge for catboost integration
2022-08-08 12:15:01 +02:00
Nikolai Kochetov
c6e3e14bcc
Merge pull request #39184 from ClickHouse/respect-remote_url_allow_hosts-for-other-dict-sources
...
Respect remote_url_allow_hosts in relevant dictionary sources.
2022-08-05 17:25:53 +02:00
Robert Schulze
cb146ee6f1
Rename LibraryBridgeHelper to ExternalDictionaryLibraryBridgeHelper
...
- ExternalDictionaryLibraryBridgeHelper provides the server-side
interface to access the dictionary part of the library bridge.
- In a later commit, CatBoostLibraryBridgeHelper will be added to access
the catboost part of the library bridge from the server.
2022-08-04 19:58:37 +00:00
Robert Schulze
ea73b98fb9
Prepare library-bridge for catboost integration
...
- Rename generic file and identifier names in library-bridge to
something more dictionary-specific. This is needed because later on,
catboost will be integrated into library-bridge.
- Also: Some smaller fixes like typos and un-inlining non-performance
critical code.
- The logic remains unchanged in this commit.
2022-08-04 19:26:51 +00:00
Yakov Olkhovskiy
23037daf17
Merge branch 'master' into fix-quota-key
2022-08-04 12:14:49 -04:00
Yakov Olkhovskiy
2e34b384c1
update tcp protocol, add quota_key
2022-08-03 15:44:08 -04:00
Alexander Tokmakov
82b50e79cf
Merge branch 'master' into tsan_clang_15
2022-08-02 13:00:55 +03:00
Alexander Tokmakov
0d68b1c67f
fix build with clang-15
2022-08-01 18:00:54 +02:00
Azat Khuzhin
3e627e2861
Add profile events for fsync
...
The following new provile events had been added:
- FileSync - Number of times the F_FULLFSYNC/fsync/fdatasync function was called for files.
- DirectorySync - Number of times the F_FULLFSYNC/fsync/fdatasync function was called for directories.
- FileSyncElapsedMicroseconds - Total time spent waiting for F_FULLFSYNC/fsync/fdatasync syscall for files.
- DirectorySyncElapsedMicroseconds - Total time spent waiting for F_FULLFSYNC/fsync/fdatasync syscall for directories.
v2: rewrite test to sh with retries
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-07-31 23:19:30 +03:00
mergify[bot]
e53cf7fd9f
Merge branch 'master' into direct-dictionary-dict-has-multiple-same-keys-fix
2022-07-25 11:41:58 +00:00
Alexander Tokmakov
c77117eadf
Update SSDCacheDictionaryStorage.h
2022-07-22 12:54:20 +03:00
Maksim Kita
6443116e80
DirectDictionary improve performance of dictHas with duplicate keys
2022-07-21 16:29:28 +02:00
Alexander Tokmakov
9a55f84885
Revert "Remove broken optimisation in Direct dictionary dictHas implementation"
2022-07-21 16:24:18 +03:00
Alexander Tokmakov
a8da5d96fc
remove some dead and commented code
2022-07-21 15:05:48 +02:00
James Morrison
d60a02829f
Remove broken optimisation in Direct dictionary dictHas implementation
...
I noticed this while working on another feature - if a set of keys being
passed to `hasKeys` contains duplicates, then only one of the result
slots for these keys will be populated.
My fix uses to a simpler implementation which is likely slower, but is
correct, which seems more important. No doubt faster approaches exist
which are also correct.
2022-07-20 14:45:32 +00:00
Nikolai Kochetov
8e3bc8c66d
Merge branch 'master' into respect-remote_url_allow_hosts-for-other-dict-sources
2022-07-19 12:11:17 +02:00
Robert Schulze
13482af4ee
First try at reducing the use of StringRef
...
- to be replaced by std::string_view
- suggested in #39262
2022-07-17 17:26:02 +00:00
Robert Schulze
deda29b46b
Pass const StringRef by value, not by reference
...
See #39224
2022-07-15 11:34:56 +00:00
Nikolai Kochetov
772c009f2f
Fixing build.
2022-07-14 18:44:43 +00:00
Nikolai Kochetov
5005700eee
Fixing build
2022-07-13 15:33:18 +00:00
Nikolai Kochetov
937a9e9d9f
Fixing build/
2022-07-13 15:05:49 +00:00
Nikolai Kochetov
36e34d8cc6
Respect remote_url_allow_hosts in relevant dictionary sources.
2022-07-13 14:53:23 +00:00
Nikolai Kochetov
020c99a269
Merge pull request #38617 from azat/contrib-debug-symbols
...
Add separate option to omit symbols from heavy contrib
2022-07-06 14:40:24 +02:00
kssenii
cfff7c4c28
Merge master
2022-07-04 14:13:26 +02:00
Azat Khuzhin
e8f5cd3c68
Add separate option to omit symbols from heavy contrib
...
Sometimes it is useful to build contrib with debug symbols for further
debugging.
With everything turned ON (i.e. debug build) I got 3.3GB vs 3.0GB w/o
this patch, 9% bloat, thoughts about this is this OK or not for you, if
not STRIP_DEBUG_SYMBOLS_HEAVY_CONTRIB can be OFF by default (regardless
of build type).
P.S. aws debug symbols adds just 1.7%.
v2: rename STRIP_HEAVY_DEBUG_SYMBOLS
v3: OMIT_HEAVY_DEBUG_SYMBOLS
v4: documentation had been removed
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-07-02 06:32:03 +03:00
Maksim Kita
95687f2d01
CacheDictionaryUpdateUnit make update state non atomic
2022-06-30 13:15:35 +02:00
Maksim Kita
f443cf66f0
CacheDictionary simplify update queue
2022-06-30 12:53:56 +02:00
Maksim Kita
09be594c81
Dictionaries added TSA annotations
2022-06-29 19:15:47 +02:00
Robert Schulze
f692ead6ad
Don't use std::unique_lock unless we have to
...
Replace where possible by std::lock_guard which is more light-weight.
2022-06-28 19:19:06 +00:00
Igor Nikonov
e41d612b1d
Cleanup: local clang-tidy warnings founded during review
2022-06-27 20:57:18 +00:00
kssenii
2c5aeaaa1a
Add auto close for postgres connection
2022-06-27 13:46:52 +02:00
Robert Schulze
5f5732a2c4
Merge pull request #37969 from ClickHouse/consistent-macro-usage
...
More consistent use of platform macros
2022-06-10 14:10:01 +02:00
Robert Schulze
1a0b5f33b3
More consistent use of platform macros
...
cmake/target.cmake defines macros for the supported platforms, this
commit changes predefined system macros to our own macros.
__linux__ --> OS_LINUX
__APPLE__ --> OS_DARWIN
__FreeBSD__ --> OS_FREEBSD
2022-06-10 10:22:31 +02:00
Maksim Kita
a9d568c63c
Dictionaries custom query with update field fix
2022-06-09 11:51:07 +02:00
Maksim Kita
4e160105b9
Merge pull request #37805 from kitaisreal/dictionaries-hierarchy-nullable-key-support
...
Hierarchical dictinaries support nullable parent key
2022-06-08 12:36:09 +02:00
Maksim Kita
3fd8294907
Fixed style check
2022-06-03 18:05:09 +02:00
Maksim Kita
6db5c08fde
Functions dictGetChildren, dictGetDescendants added support for nullable parent key
2022-06-03 17:36:16 +02:00
Maksim Kita
a0cbbd9edc
Hierarchical Cache, Direct dictionaries added support for nullable parent key
2022-06-03 17:21:55 +02:00
mergify[bot]
906182548b
Merge branch 'master' into refactor-read-metrics-and-callbacks
2022-06-02 18:30:21 +00:00
Maksim Kita
20b55a45b2
Hierarchical dictionaries support nullable parent key
2022-06-02 19:24:23 +02:00
Maksim Kita
0b40e05ffc
Merge pull request #37778 from kitaisreal/dictionaries-improve-logging-during-exception
...
Dictionaries improve exception messages
2022-06-02 19:16:07 +02:00
Nikolai Kochetov
00395e752e
Cleanup
2022-06-02 16:59:14 +00:00
Maksim Kita
cf00e5110d
Dictionaries improve exception messages
2022-06-02 11:37:28 +02:00
Nikolai Kochetov
edac3d6714
Merge branch 'master' into refactor-read-metrics-and-callbacks
2022-06-02 09:36:20 +00:00