Commit Graph

1472 Commits

Author SHA1 Message Date
Azat Khuzhin
2783850f08 Minor review fixes in HashedDictionary
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
6e0a7add93 Completelly exception safe HashedDictionary dtor
Previously there was one (even though very unlikely) case when the dtor
can throw - logging code or ThreadPool::wait.

Just guard the dtor with try/catch and done with it.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
74def83c5d Destroy hashtables for hashed dictionary in parallel only for sharded dict
Since there can be multiple hashtables, since each attribute uses it's
own hashtable.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
1c0e0ea1e4 Disable sharded dictionaries with updatable sources
Support of sharded dictionary for updatable sources is questionable
since:
- sharded dictionary developed for hashed dictionary with a huge number
  of keys
- updatable source requires storing the whole table in memory (due to
  how reload works)
- also it is an open question will it have some benefits from the
  updatable source or not, since using updatable source with a huge
  number of changes in the source does not looks optimal and on the
  other side if there are small amount of changes the you don't need
  sharded dictionary at all

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
c97991fce1 Use shared arena for HashedDictionary::blockToAttributes()
This should decrease number of allocations.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
01b100da61 Use shared arena in ParallelDictionaryLoader::createShardSelector() (and add missing rollback)
This should decrease number of allocations.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
64874824b4 Minor review fixes in HashedDictionary
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
77c1f07636 Make HashedDictionary::~HashedDictionary exception safe
Before it was possible for the desturctor to throw, in case of thread
allocation fails, rewrite it to trySchedule() and do sequential destroy
in this case.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
a3f189e191 Optimize sharded dictionaries with skewed distribution
In case of skewed distribution simple division by module will not give
you good distribution between shards and eventually this can lead to
performance the same as non-sharded dictionary (except for it will
occupy +1 thread for Block::scatter).

But if HashedDictionary::blockToAttributes() will not have calls to
HashedDictionary::getShard() this can be fixed by using a more complex
key-to-shard (getShard()) mapping. And actually you do not need to call
getShard() in blockToAttributes() you can simply use passed shard, and
that's it.

And by wrapping key with intHash64() in getShard() skewed distribution
can be fixed.

Note, that previously I tried similar approach but did not removed
getShard() from blockToAttributes(), that's why it failed.

And now it works almost as fast as with simple createBlockSelector(),
just 13.6% slower (18.75min vs 16.5min, with 16 threads).

Note, that I've also tried to add libdivide for this, but it does not
improves the performance.

I've also tried the approach without scatter, and it works 20% slower
then this one (22.5min VS 18.75min, with 16 threads).

v2: Use intHashCRC32() over intHash64() for HashedDictionary::getShard()
    (with intHash64() it works very slower, almost 2x slower, there was
    18min with 32 threads)

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
655a564280 Parallel hash tables destroy for hashed dictionaries
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
99063b152f Allow to configure queue backlog of the parallel hashed dictionary loader
v2: Decrease default parallel_queue_backlog to 10000 (same speed)
v3: Rename parallel_queue_backlog to per_shard_load_backlog
v3: Rename per_shard_load_backlog to shard_load_queue_backlog
v4: Fix documentation
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
79ad81dfdf Implement separate queue for parallel loader of hashed dictionaries
Previous patches in this series has a bottleneck in rehash(). This is
the most slowest operation when insert lots of rows into the hashtable
and eventually all that thread pool sometimes work as the most slowest
thread since we did not have any queue of blocks.

This patch adds such queue and now it scales linearly, so initialy with
1 thread I had ~4 hours for 10e9 elements (UInt64 key, UInt16 value),
after this patch it works in 16 minutes with 16 threads (well actually I
have to use 32 threads because of distribution of data in the source
table).

And now with 16 threads it works 16 times faster.

Also this patch adds more optimal block splitting for the non-complex
dictionaries, and usual block splitting for complex dictionaries.
But anyway this moves the overhead from the loading into the hashtable
threads out to the reader thread, and this is better, since reader does
not uses that much CPU.

v2: fix use-after-free on failed load (add missing wait in dtor)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
5d0fd3cdc4 Remove sharded overhead for non-sharded hashed dictionaries
By adding one more template parameter - HashedDictionary<sharded> (yes,
it is already too much of them, for the template class that has explicit
instantion).

Since perf tests [1] shows 20% slowdown.

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/40003/8f0cf2d6b8a7df511afe901331d5e2c7b06c0b4d/performance_comparison_[1/4]/report.html

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
345c422e28 Add ability to load hashed dictionaries using multiple threads
Right now dictionaries (here I will talk about only
HASHED/SPARSE_HASHED/COMPLEX_KEY_HASHED/COMPLEX_KEY_SPARSE_HASHED)
can load data only in one thread, since it uses one hash table that
cannot be filled from multiple threads.

And in case you have very big dictionary (i.e. 10e9 elements), it can
take a awhile to load them, especially for SPARSE_HASHED variants (and
if you have such amount of elements there, you are likely use
SPARSE_HASHED, since it requires less memory), in my env it takes ~4
hours, which is enormous amount of time.

So this patch add support of shards for dictionaries, number of shards
determine how much hash tables will use this dictionary, also, and which
is more important, how much threads it can use to load the data.

And with 16 threads this works 2x faster, not perfect though, see the
follow up patches in this series.

v0: PARTITION BY
v1: SHARDS 1
v2: SHARDS(1)
v3: tried optimized mod - logical and, but it does not gain even 10%
v4: tried squashing more (max_block_size * shards), but it does not gain even 10% either
v5: move SHARDS into layout parameters (unknown simply ignored)
v6: tune params for perf tests (to avoid too long queries)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:25 +01:00
serxa
693489a8ad review fixes 2023-01-12 15:51:04 +00:00
Sergei Trifonov
12d8543578
Merge branch 'master' into cancellable-mutex-integration 2023-01-12 16:03:49 +01:00
Han Fei
6ed4570f73
Merge branch 'master' into regexp-tree-dictionary 2023-01-10 15:36:30 +01:00
Maksim Kita
83a8d3ed25 RangeHashedDictionary update field primary key fix 2023-01-09 13:52:15 +01:00
Sergei Trifonov
81d2ea30ba
Merge branch 'master' into cancellable-mutex-integration 2023-01-07 19:37:46 +01:00
Anton Popov
1f32ffedf8
Merge pull request #43221 from ClickHouse/refactoring-ip-types
Replace domain IP types (IPv4, IPv6) with native
2023-01-07 12:01:21 +01:00
serxa
15bb127b01 replace every std::shared_mutex with DB::FastSharedMutex 2023-01-06 23:35:38 +00:00
Han Fei
a4427a05c2 fix build 2023-01-06 14:30:00 +01:00
Kseniia Sumarokova
573d3283b0
Merge pull request #44327 from kssenii/use-new-named-collections-code-2
Replace old named collections code with new (from #43147) part 2
2023-01-06 13:06:26 +01:00
Han Fei
cac7f65b40 fix build 2023-01-06 11:49:34 +01:00
Han Fei
744084375c fix build 2023-01-05 22:27:45 +01:00
Han Fei
ae5ee8194b fix check style 2023-01-05 17:52:05 +01:00
Han Fei
f2a9eea995 write docs and optimize regex compile 2023-01-05 17:38:01 +01:00
Yakov Olkhovskiy
7a5a36cbed
Merge branch 'master' into refactoring-ip-types 2023-01-04 11:11:06 -05:00
Han Fei
65ef7b4adc fix build 2023-01-04 12:45:12 +01:00
Nikolay Degterinsky
aa41e9b775
Merge pull request #44857 from evillique/fix-msan-build
Try to fix MSan build
2023-01-04 04:31:28 +01:00
Han Fei
00e717d7ce some improvement 2023-01-03 21:41:51 +01:00
kssenii
67509aa2d5 Merge remote-tracking branch 'upstream/master' into use-new-named-collections-code-2 2023-01-03 16:41:30 +01:00
Nikolay Degterinsky
c4431e9931 Fix MSan build 2023-01-03 02:21:26 +00:00
Alexey Milovidov
e855d3519a
Merge branch 'master' into refactoring-ip-types 2023-01-02 21:58:53 +03:00
Han Fei
97cdfdceea fix style check 2022-12-31 20:36:23 +01:00
Han Fei
c25207fc21
Merge branch 'master' into regexp-tree-dictionary 2022-12-30 17:31:44 +01:00
Han Fei
83c6517fcf try to fix flaky tests 2022-12-30 17:31:28 +01:00
Nikolay Degterinsky
dfe93b5d82
Merge pull request #42284 from Algunenano/perf_experiment
Performance experiment
2022-12-30 03:14:22 +01:00
Han Fei
fa1baef448 add check sanitizer 2022-12-29 23:00:55 +01:00
Han Fei
50905e2005 address comments 2022-12-29 20:13:46 +01:00
Alexey Milovidov
33bcd07be5 Remove old code 2022-12-28 19:02:06 +01:00
Raúl Marín
fc1fa82a39
Merge branch 'master' into perf_experiment 2022-12-27 10:51:58 +01:00
mayamika
f66a0c01ad Add null dictionary source 2022-12-24 17:11:30 +03:00
Han Fei
4859197c34 fix build 2022-12-22 23:59:04 +01:00
Han Fei
2bb952a796 fix build 2022-12-22 23:27:10 +01:00
Han Fei
efa963fb0e support regex tree dictionary 2022-12-22 22:42:11 +01:00
Yakov Olkhovskiy
a8cb29da4b
Merge branch 'master' into refactoring-ip-types 2022-12-21 23:56:24 -05:00
Raúl Marín
45d27f461b
Merge branch 'master' into perf_experiment 2022-12-20 09:07:48 +00:00
kssenii
30547d2dcd Replace old named collections code for url 2022-12-17 00:24:05 +01:00
Vitaly Baranov
fb8aca8319
Merge pull request #44158 from vitlibar/improve-referential-deps
Improve referential dependencies
2022-12-14 21:17:02 +01:00
Han Fei
d3f8bb3f52 Merge branch 'master' into regexp-tree-dictionary 2022-12-14 16:29:17 +01:00
Han Fei
2272d712e2 reimplement 2022-12-14 16:28:57 +01:00
Nikolay Degterinsky
9b6d31b95d
Merge branch 'master' into perf_experiment 2022-12-13 17:15:07 +01:00
Vitaly Baranov
5aaff60650 Fix referential dependencies when host & post in a clickHouse dictionary source are set by default. 2022-12-12 18:22:14 +01:00
Alexander Tokmakov
db7f2ed42b
Update getDictionaryConfigurationFromAST.cpp 2022-12-12 16:46:04 +03:00
Han Fei
ae91d0c78e fix check-style 2022-12-01 13:29:55 +01:00
Han Fei
e28c98d4ba fix macro flag 2022-12-01 11:50:18 +01:00
Han Fei
41f726ce83 fix compile 2022-11-30 11:21:45 +01:00
Yakov Olkhovskiy
77266ea754 cleanup 2022-11-29 17:34:16 +00:00
Han Fei
27ec6bc42a
Merge branch 'master' into regexp-tree-dictionary 2022-11-29 16:33:30 +01:00
Yakov Olkhovskiy
770b520ded
Merge branch 'master' into refactoring-ip-types 2022-11-28 08:50:19 -05:00
Yakov Olkhovskiy
fd58719271 IPAddressDictionary fixed, test fixed 2022-11-24 20:42:36 +00:00
Yakov Olkhovskiy
de20d58f6b style fix 2022-11-22 21:55:07 +00:00
Yakov Olkhovskiy
2cbe748e68 functions fixed, test fixed 2022-11-22 21:34:32 +00:00
Raúl Marín
4aa29b6a63 Merge remote-tracking branch 'blessed/master' into perf_experiment 2022-11-22 19:09:00 +01:00
Yakov Olkhovskiy
4d144be39c replace domain IP types (IPv4, IPv6) with native 2022-11-14 14:17:17 +00:00
avogar
9e89af28c6 Refactor BSONEachRow format, fix bugs, support more data types, support parallel parsing and schema inference 2022-11-10 20:15:14 +00:00
Raúl Marín
6e0a9452e7 Merge remote-tracking branch 'blessed/master' into perf_experiment 2022-10-25 15:25:06 +02:00
Azat Khuzhin
4e76629aaf Fixes for -Wshorten-64-to-32
- lots of static_cast
- add safe_cast
- types adjustments
  - config
  - IStorage::read/watch
  - ...
- some TODO's (to convert types in future)

P.S. That was quite a journey...

v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-10-21 13:25:19 +02:00
Raúl Marín
e60415d07d Make clang-tidy happy 2022-10-18 11:40:12 +02:00
Alexander Tokmakov
fa1134f299 Merge branch 'master' into fix_loading_dependencies 2022-10-10 16:30:52 +02:00
Alexander Tokmakov
4175f8cde6 abort instead of __builtin_unreachable in debug builds 2022-10-07 21:49:08 +02:00
Alexander Tokmakov
014784a9ca Merge branch 'master' into fix_loading_dependencies 2022-10-07 18:58:11 +02:00
Alexander Tokmakov
690ec74bf2 better handling for expressions in dictGet 2022-10-05 20:58:27 +02:00
Robert Schulze
fd86829824
Consolidate config_core.h into config.h
Less duplication, less confusion ...
2022-09-28 13:31:57 +00:00
Robert Schulze
78fc36ca49
Generate config.h into ${CONFIG_INCLUDE_PATH}
This makes the target location consistent with other auto-generated
files like config_formats.h, config_core.h, and config_functions.h and
simplifies the build of clickhouse_common.
2022-09-28 12:48:26 +00:00
Kseniia Sumarokova
ea43cb5648
Merge pull request #41261 from kssenii/s3-header-auth
Support s3 authorisation headers from ast arguments
2022-09-23 12:48:08 +02:00
Alexey Milovidov
bb6f1bfce2 Fix 9/10 of trash 2022-09-19 08:53:20 +02:00
Alexey Milovidov
730655d4fd Fix 8/9 of trash 2022-09-19 08:53:20 +02:00
Alexey Milovidov
42b0d444da Fix 7/8 of trash 2022-09-19 08:53:20 +02:00
Alexey Milovidov
91baedf03a Fix 6/7 of trash 2022-09-19 08:53:20 +02:00
Alexey Milovidov
84f42e0874 Fix 3/4 of trash 2022-09-19 08:50:53 +02:00
kssenii
420ac4eb43 s3 header auth in ast 2022-09-13 15:13:28 +02:00
Alexey Milovidov
fd235919aa Remove some methods 2022-09-10 05:04:40 +02:00
kssenii
83514fa2ef Refactor 2022-09-05 20:08:22 +02:00
Antonio Andelic
e64436fef3 Fix typos with new codespell 2022-09-02 08:54:48 +00:00
Vage Ogannisian
9b2326cc6c Add RegExpTree dictionary 2022-08-31 19:34:50 +00:00
Alexey Milovidov
1ff535a128 One step back 2022-08-28 02:00:09 +02:00
Vage Ogannisian
540fa7fe5b Add YAML regular expressions tree dictionary source 2022-08-24 14:01:56 +00:00
vdimir
5fea2091ac
Embed IKeyValue impl into IDictionary.h 2022-08-10 15:58:15 +00:00
vdimir
90fa2ed8c1
better code for join with dict 2022-08-10 14:20:29 +00:00
vdimir
7073067d40
check attributes for join with dict 2022-08-10 14:20:26 +00:00
Yakov Olkhovskiy
d39e9f65de
Merge branch 'master' into fix-quota-key 2022-08-08 11:54:21 -04:00
Robert Schulze
ad0d060dc1
Merge pull request #39904 from ClickHouse/library-bridge-refactoring
Prepare library-bridge for catboost integration
2022-08-08 12:15:01 +02:00
Nikolai Kochetov
c6e3e14bcc
Merge pull request #39184 from ClickHouse/respect-remote_url_allow_hosts-for-other-dict-sources
Respect remote_url_allow_hosts in relevant dictionary sources.
2022-08-05 17:25:53 +02:00
Robert Schulze
cb146ee6f1
Rename LibraryBridgeHelper to ExternalDictionaryLibraryBridgeHelper
- ExternalDictionaryLibraryBridgeHelper provides the server-side
  interface to access the dictionary part of the library bridge.

- In a later commit, CatBoostLibraryBridgeHelper will be added to access
  the catboost part of the library bridge from the server.
2022-08-04 19:58:37 +00:00
Robert Schulze
ea73b98fb9
Prepare library-bridge for catboost integration
- Rename generic file and identifier names in library-bridge to
  something more dictionary-specific. This is needed because later on,
  catboost will be integrated into library-bridge.

- Also: Some smaller fixes like typos and un-inlining non-performance
  critical code.

- The logic remains unchanged in this commit.
2022-08-04 19:26:51 +00:00
Yakov Olkhovskiy
23037daf17
Merge branch 'master' into fix-quota-key 2022-08-04 12:14:49 -04:00
Yakov Olkhovskiy
2e34b384c1 update tcp protocol, add quota_key 2022-08-03 15:44:08 -04:00
Alexander Tokmakov
82b50e79cf
Merge branch 'master' into tsan_clang_15 2022-08-02 13:00:55 +03:00
Alexander Tokmakov
0d68b1c67f fix build with clang-15 2022-08-01 18:00:54 +02:00
Azat Khuzhin
3e627e2861 Add profile events for fsync
The following new provile events had been added:

- FileSync - Number of times the F_FULLFSYNC/fsync/fdatasync function was called for files.
- DirectorySync - Number of times the F_FULLFSYNC/fsync/fdatasync function was called for directories.
- FileSyncElapsedMicroseconds - Total time spent waiting for F_FULLFSYNC/fsync/fdatasync syscall for files.
- DirectorySyncElapsedMicroseconds - Total time spent waiting for F_FULLFSYNC/fsync/fdatasync syscall for directories.

v2: rewrite test to sh with retries
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-07-31 23:19:30 +03:00
mergify[bot]
e53cf7fd9f
Merge branch 'master' into direct-dictionary-dict-has-multiple-same-keys-fix 2022-07-25 11:41:58 +00:00
Alexander Tokmakov
c77117eadf
Update SSDCacheDictionaryStorage.h 2022-07-22 12:54:20 +03:00
Maksim Kita
6443116e80 DirectDictionary improve performance of dictHas with duplicate keys 2022-07-21 16:29:28 +02:00
Alexander Tokmakov
9a55f84885
Revert "Remove broken optimisation in Direct dictionary dictHas implementation" 2022-07-21 16:24:18 +03:00
Alexander Tokmakov
a8da5d96fc remove some dead and commented code 2022-07-21 15:05:48 +02:00
James Morrison
d60a02829f Remove broken optimisation in Direct dictionary dictHas implementation
I noticed this while working on another feature - if a set of keys being
passed to `hasKeys` contains duplicates, then only one of the result
slots for these keys will be populated.

My fix uses to a simpler implementation which is likely slower, but is
correct, which seems more important. No doubt faster approaches exist
which are also correct.
2022-07-20 14:45:32 +00:00
Nikolai Kochetov
8e3bc8c66d
Merge branch 'master' into respect-remote_url_allow_hosts-for-other-dict-sources 2022-07-19 12:11:17 +02:00
Robert Schulze
13482af4ee
First try at reducing the use of StringRef
- to be replaced by std::string_view
- suggested in #39262
2022-07-17 17:26:02 +00:00
Robert Schulze
deda29b46b
Pass const StringRef by value, not by reference
See #39224
2022-07-15 11:34:56 +00:00
Nikolai Kochetov
772c009f2f Fixing build. 2022-07-14 18:44:43 +00:00
Nikolai Kochetov
5005700eee Fixing build 2022-07-13 15:33:18 +00:00
Nikolai Kochetov
937a9e9d9f Fixing build/ 2022-07-13 15:05:49 +00:00
Nikolai Kochetov
36e34d8cc6 Respect remote_url_allow_hosts in relevant dictionary sources. 2022-07-13 14:53:23 +00:00
Nikolai Kochetov
020c99a269
Merge pull request #38617 from azat/contrib-debug-symbols
Add separate option to omit symbols from heavy contrib
2022-07-06 14:40:24 +02:00
kssenii
cfff7c4c28 Merge master 2022-07-04 14:13:26 +02:00
Azat Khuzhin
e8f5cd3c68 Add separate option to omit symbols from heavy contrib
Sometimes it is useful to build contrib with debug symbols for further
debugging.

With everything turned ON (i.e. debug build) I got 3.3GB vs 3.0GB w/o
this patch, 9% bloat, thoughts about this is this OK or not for you, if
not STRIP_DEBUG_SYMBOLS_HEAVY_CONTRIB can be OFF by default (regardless
of build type).

P.S. aws debug symbols adds just 1.7%.
v2: rename STRIP_HEAVY_DEBUG_SYMBOLS
v3: OMIT_HEAVY_DEBUG_SYMBOLS
v4: documentation had been removed
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-07-02 06:32:03 +03:00
Maksim Kita
95687f2d01 CacheDictionaryUpdateUnit make update state non atomic 2022-06-30 13:15:35 +02:00
Maksim Kita
f443cf66f0 CacheDictionary simplify update queue 2022-06-30 12:53:56 +02:00
Maksim Kita
09be594c81 Dictionaries added TSA annotations 2022-06-29 19:15:47 +02:00
Robert Schulze
f692ead6ad
Don't use std::unique_lock unless we have to
Replace where possible by std::lock_guard which is more light-weight.
2022-06-28 19:19:06 +00:00
Igor Nikonov
e41d612b1d Cleanup: local clang-tidy warnings founded during review 2022-06-27 20:57:18 +00:00
kssenii
2c5aeaaa1a Add auto close for postgres connection 2022-06-27 13:46:52 +02:00
Robert Schulze
5f5732a2c4
Merge pull request #37969 from ClickHouse/consistent-macro-usage
More consistent use of platform macros
2022-06-10 14:10:01 +02:00
Robert Schulze
1a0b5f33b3
More consistent use of platform macros
cmake/target.cmake defines macros for the supported platforms, this
commit changes predefined system macros to our own macros.

__linux__ --> OS_LINUX
__APPLE__ --> OS_DARWIN
__FreeBSD__ --> OS_FREEBSD
2022-06-10 10:22:31 +02:00
Maksim Kita
a9d568c63c Dictionaries custom query with update field fix 2022-06-09 11:51:07 +02:00
Maksim Kita
4e160105b9
Merge pull request #37805 from kitaisreal/dictionaries-hierarchy-nullable-key-support
Hierarchical dictinaries support nullable parent key
2022-06-08 12:36:09 +02:00
Maksim Kita
3fd8294907 Fixed style check 2022-06-03 18:05:09 +02:00
Maksim Kita
6db5c08fde Functions dictGetChildren, dictGetDescendants added support for nullable parent key 2022-06-03 17:36:16 +02:00
Maksim Kita
a0cbbd9edc Hierarchical Cache, Direct dictionaries added support for nullable parent key 2022-06-03 17:21:55 +02:00
mergify[bot]
906182548b
Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-06-02 18:30:21 +00:00
Maksim Kita
20b55a45b2 Hierarchical dictionaries support nullable parent key 2022-06-02 19:24:23 +02:00
Maksim Kita
0b40e05ffc
Merge pull request #37778 from kitaisreal/dictionaries-improve-logging-during-exception
Dictionaries improve exception messages
2022-06-02 19:16:07 +02:00
Nikolai Kochetov
00395e752e Cleanup 2022-06-02 16:59:14 +00:00
Maksim Kita
cf00e5110d Dictionaries improve exception messages 2022-06-02 11:37:28 +02:00
Nikolai Kochetov
edac3d6714 Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-06-02 09:36:20 +00:00
mergify[bot]
d4e722bbfa
Merge branch 'master' into http-named-collection 2022-05-30 16:40:18 +00:00
Nikolai Kochetov
5ef51ed27b Fix more tests. 2022-05-30 13:10:30 +00:00
Nikolai Kochetov
5b4658aa5e Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-05-30 09:47:35 +00:00
Alexey Milovidov
73e2e63414
Merge pull request #37612 from ClickHouse/clang-tidy-14
Fix clang-tidy-14, part 1
2022-05-29 03:16:32 +03:00
Alexey Milovidov
c50791dd3b Fix clang-tidy-14, part 1 2022-05-27 22:52:14 +02:00
HeenaBansal2009
a061acadbe Remove std::move from trivially-copyable object 2022-05-27 11:04:29 -07:00
Yakov Olkhovskiy
41ef0044f0 endpoint is added 2022-05-27 13:43:34 -04:00
Yakov Olkhovskiy
25884c68f1 http named collection source implemented for dictionary 2022-05-26 20:46:26 -04:00
Nikolai Kochetov
84f97b53de Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-05-26 11:07:45 +00:00
Maksim Kita
b12b363158 Fixed build of hierarchical index for HashedArrayDictionary 2022-05-25 22:40:19 +02:00
Nikolai Kochetov
1b85f2c1d6 Merge branch 'master' into refactor-read-metrics-and-callbacks 2022-05-25 16:27:40 +02:00
Maksim Kita
585b86446e Added hierarchical_index_bytes_allocated column in system.dictionaries 2022-05-23 12:42:00 +02:00
Maksim Kita
f76e3801de Fixed tests 2022-05-23 12:42:00 +02:00
Maksim Kita
7e4c950bd9 Fixed style check 2022-05-23 12:42:00 +02:00
Maksim Kita
25d6bd1f34 Dictionaries optimize hierarchical index structure 2022-05-23 12:42:00 +02:00
Maksim Kita
1142e05683 Dictionaries allow to specify bidirectional for hierarhical attribute 2022-05-23 12:42:00 +02:00
Maksim Kita
100afa8bcf Dictionary getDescendants performance improvement 2022-05-23 12:42:00 +02:00
Nikolai Kochetov
56feef01e7 Move some resources 2022-05-20 19:49:31 +00:00
Robert Schulze
43945cea1b
Fixing some warnings 2022-05-16 20:59:27 +02:00
Anton Popov
e911900054 remove last mentions of data streams 2022-05-09 19:15:24 +00:00
Robert Schulze
330212e0f4
Remove inherited create() method + disallow copying
The original motivation for this commit was that shared_ptr_helper used
std::shared_ptr<>() which does two heap allocations instead of
make_shared<>() which does a single allocation. Turned out that
1. the affected code (--> Storages/) is not on a hot path (rendering the
performance argument moot ...)
2. yet copying Storage objects is potentially dangerous and was
   previously allowed.

Hence, this change

- removes shared_ptr_helper and as a result all inherited create() methods,

- instead, Storage objects are now created using make_shared<>() by the
  caller (for that to work, many constructors had to be made public), and

- all Storage classes were marked as noncopyable using boost::noncopyable.

In sum, we are (likely) not making things faster but the code becomes
cleaner and harder to misuse.
2022-05-02 08:46:52 +02:00
Alexey Milovidov
1ddb04b992
Merge pull request #36715 from amosbird/refactorbase
Reorganize source files so that base won't depend on Common
2022-04-30 09:40:58 +03:00
Amos Bird
4a5e4274f0
base should not depend on Common 2022-04-29 10:26:35 +08:00
Maksim Kita
c4d98aaa4c ClickHouseDictionarySource context copy 2022-04-28 14:16:49 +02:00
Maksim Kita
fa1579cdd5
Merge pull request #36390 from lthaooo/fix_dictionary_reload_bug
fix dictionary reload bug
2022-04-28 13:37:34 +02:00
alesapin
92296484e7
Merge pull request #36348 from rschu1ze/erase_if3
Replace remove-erase idiom by C++20 erase()/erase_if()
2022-04-25 23:34:18 +02:00
tavplubix
7f50bebba1
Merge pull request #36463 from ClickHouse/fix_36451
Ignore DNS errors when checking if dictionary source is local
2022-04-22 15:28:48 +03:00
Alexander Tokmakov
5d129e13ee ignore DNS errors when checking if dictionary source is local 2022-04-20 18:40:10 +02:00
Robert Schulze
b24ca8de52
Fix various clang-tidy warnings
When I tried to add cool new clang-tidy 14 warnings, I noticed that the
current clang-tidy settings already produce a ton of warnings. This
commit addresses many of these. Almost all of them were non-critical,
i.e. C vs. C++ style casts.
2022-04-20 10:29:05 +02:00
Robert Schulze
b6d7367538
Merge remote-tracking branch 'origin/master' into erase_if3
Conflicts:
- Interpreters/ActionsDAG.cpp
2022-04-20 10:02:59 +02:00
lthaooo
1b533aa583 fix dictionary reload bug 2022-04-18 21:13:08 +08:00
Alexey Milovidov
36595e4206
Merge pull request #36320 from ClickHouse/fix-clang-tidy-14
Fix clang-tidy-14 (part 1)
2022-04-18 07:02:10 +03:00
Alexey Milovidov
242919eddd Remove abbreviation 2022-04-18 01:02:49 +02:00
Robert Schulze
1e1df8e101
Replace remove-erase idiom by C++20 erase()/erase_if()
- makes the code less verbose while being as efficient
2022-04-17 12:04:47 +02:00
Alexey Milovidov
294efeccfe Fix clang-tidy-14 (part 1) 2022-04-16 04:54:04 +02:00
Julian Gilyadov
a4f56f3330
Throw exception when file cant be executed instead of displaying success 2022-04-12 17:52:44 +02:00
Antonio Andelic
9990abb76a Use compile-time check for Exception messages, fix wrong messages 2022-03-29 13:16:11 +00:00
Kseniia Sumarokova
4a464d18be
Update CassandraDictionarySource.cpp 2022-03-18 14:04:14 +01:00
Kseniia Sumarokova
5f0fdd64fe
Update src/Dictionaries/CassandraDictionarySource.cpp
Co-authored-by: Nikolay Degterinsky <43110995+evillique@users.noreply.github.com>
2022-03-18 10:41:39 +01:00
kssenii
a2cd165d38 Add remote host filter 2022-03-17 11:48:42 +01:00
Nikolai Kochetov
a380aa6b8a
Merge pull request #35294 from ClickHouse/reload-remote_url_allow_hosts
Reload remote_url_allow_hosts after config update.
2022-03-15 22:07:16 +01:00
Nikolai Kochetov
97aa6c82ce Reload remote_url_allow_hosts after config update. 2022-03-15 13:00:31 +00:00
Maksim Kita
1d674123a9 Fix clang-tidy warnings in Databases, DataTypes, Dictionaries folders 2022-03-14 18:17:35 +00:00
Alexey Milovidov
df906dfbd4 Change comments 2022-03-11 23:46:02 +01:00
1lann
5423c5a45c Fix typo of update_lag
In external dictionary providers, the allowed keys for configuration seemed to have a typo
of "update_lag" as "update_tag", preventing the use of "update_lag". This change fixes that.
2022-03-07 18:31:20 +08:00
Maksim Kita
b1a956c5f1 clang-tidy check performance-move-const-arg fix 2022-03-02 18:15:27 +00:00
Maksim Kita
1f5837359e clang-tidy check performance-noexcept-move-constructor fix 2022-03-02 18:15:27 +00:00
mreddy017
f893002b69 Fix vulnerable code related to std::move and noexcept
This commit fixes the vulnerable code related to std::move and noexcept identified by clangtidy tool.
2022-03-02 18:15:27 +00:00
Vitaly Baranov
aee67a6693
Merge pull request #31484 from eungenue/Implement-SSL-X509-certificate-authentication
Implement ssl x509 certificate authentication
2022-02-21 11:30:52 +03:00
Maksim Kita
13cbf79ecb Improve performance of insert into table functions URL, S3, File, HDFS 2022-02-10 20:06:23 +00:00
Anton Popov
298838f891 avoid unnecessary copying of Settings 2022-02-10 12:13:51 +03:00
Azat Khuzhin
bedf208cbd Use fmt::runtime() for LOG_* for non constexpr
Here is oneliner:

    $ gg 'LOG_\(DEBUG\|TRACE\|INFO\|TEST\|WARNING\|ERROR\|FATAL\)([^,]*, [a-zA-Z]' -- :*.cpp :*.h | cut -d: -f1 | sort -u | xargs -r sed -E -i 's#(LOG_[A-Z]*)\(([^,]*), ([A-Za-z][^,)]*)#\1(\2, fmt::runtime(\3)#'

Note, that I tried to do this with coccinelle (tool for semantic
patchin), but it cannot parse C++:

    $ cat fmt.cocci
    @@
    expression log;
    expression var;
    @@

    -LOG_DEBUG(log, var)
    +LOG_DEBUG(log, fmt::runtime(var))

I've also tried to use some macros/templates magic to do this implicitly
in logger_useful.h, but I failed to do so, and apparently it is not
possible for now.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>

v2: manual fixes
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-02-01 14:30:03 +03:00
Maksim Kita
5ef83deaa6 Update sort to pdqsort 2022-01-30 19:49:48 +00:00
Maksim Kita
43604a2e8f Fixed tests 2022-01-25 21:56:29 +00:00
Maksim Kita
ca77f652e2 Fixed style check 2022-01-25 11:13:37 +00:00
Maksim Kita
e27332ce10 RangeHashedDictionary added options range_lookup_strategy, convert_null_range_bound_to_open 2022-01-25 11:13:37 +00:00
Maksim Kita
4e7e67e330 Fixed tests 2022-01-25 11:13:37 +00:00
Maksim Kita
c72f7f2147 RangeHashedDictionary added support for range values that extend Int64 type 2022-01-25 11:13:37 +00:00
Maksim Kita
bcbd956b83 RangeHashedDictionary change layout structure 2022-01-25 11:13:37 +00:00
Maksim Kita
f76536c079
Merge pull request #33914 from kitaisreal/dictionaries-added-support-for-date-time-64
Dictionaries added support for DateTime64
2022-01-23 13:21:22 +01:00
Maksim Kita
bceb2d598f Fixed style check 2022-01-22 20:34:42 +00:00
Maksim Kita
3e30641fc8 Dictionaries updated support for empty attributes 2022-01-22 20:01:45 +00:00
Maksim Kita
fd87d81108 Dictionaries added support for DateTime64 2022-01-22 18:03:45 +00:00
alexey-milovidov
2e7a1fe229
Merge pull request #33871 from kitaisreal/flat-dictionary-improve-data-load-performance
FlatDictionary improve data load performance
2022-01-22 13:07:05 +03:00
alexey-milovidov
eb6849f7c7
Merge pull request #33842 from azat/cmake-contrib-fixes
More cmake external modules cleanups
2022-01-22 10:34:54 +03:00
Maksim Kita
b130a2b1ed Fixed tests 2022-01-21 19:22:55 +00:00
Maksim Kita
2d712b1d26 FlatDictionary improve data load performance 2022-01-21 15:01:55 +00:00
Maksim Kita
548a7bccee
Merge pull request #33804 from CurtizJ/redis-pool
Use connection pool for redis dictionary
2022-01-21 11:40:35 +01:00
Maksim Kita
c68fe35b2f
Merge pull request #33832 from kitaisreal/type-id-name-fix
TypeId better naming
2022-01-21 11:36:02 +01:00
Azat Khuzhin
d25b59803e contrib/abseil: add cmake ALIAS library
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-01-21 10:11:06 +03:00
Maksim Kita
32ecf374bf Fixed style check 2022-01-20 22:36:42 +00:00
Maksim Kita
b2d95f6750
Merge pull request #33831 from kitaisreal/dictionaries-added-date32-type-support
Dictionaries added Date32 type support
2022-01-20 19:04:46 +01:00
Maksim Kita
4d41c6a2ac Fixed build 2022-01-20 17:08:55 +00:00
Maksim Kita
502c1637d5
Merge pull request #33827 from kitaisreal/range-hashed-dictionary-handle-invalid-intervals
RangeHashedDictionary handle invalid intervals
2022-01-20 17:16:49 +01:00
Anton Popov
84da87e4f1 minor fixes in redis dictionary 2022-01-20 17:55:56 +03:00
Maksim Kita
115ecb09a8 Dictionaries added Date32 type support 2022-01-20 13:48:33 +00:00
Maksim Kita
97605b7c9c Fixed tests 2022-01-20 13:36:12 +00:00
Anton Popov
6015d86dee handle exception while getting connection from pool 2022-01-20 16:31:00 +03:00
Maksim Kita
844eb4ccdc RangeHashedDictionary handle invalid intervals 2022-01-20 11:16:18 +00:00
Kruglov Pavel
dd2971791c
Merge pull request #33791 from kitaisreal/dictionaries-read-keys-array-copy-fix
Dictionaries remove unnecessary copy of keys during read
2022-01-20 13:59:41 +03:00
Azat Khuzhin
e0e81b340d Fix w/o ODBC build 2022-01-20 10:02:02 +03:00
Azat Khuzhin
a75b748fee Remove unbundled mysql support 2022-01-20 10:02:01 +03:00
Azat Khuzhin
16adb8c4d6 Remove unbundled cassandra support 2022-01-20 10:01:13 +03:00
Azat Khuzhin
8ede97925e Remove unbundled sparsehash support 2022-01-20 10:01:11 +03:00
Anton Popov
4810d13aff use connection pool for redis dictionary 2022-01-20 03:12:28 +03:00
Maksim Kita
fd6a728953 Dictionaries read keys array copy fix 2022-01-19 16:08:56 +00:00
Maksim Kita
41a6cd54aa
Merge pull request #33516 from kitaisreal/range-hashed-dictionary-interval-tree
RangeHashedDictionary use interval tree
2022-01-19 16:30:31 +01:00
Kruglov Pavel
2295a07066
Merge pull request #33534 from azat/fwd-decl
RFC: Split headers, move SystemLog into module, more forward declarations
2022-01-18 17:22:49 +03:00
Maksim Kita
60bcf88228 Added IntervalTree documentation 2022-01-18 13:20:43 +00:00
Maksim Kita
30dab61f97
Merge pull request #33526 from kitaisreal/dictionary-rename-fix
Dictionary rename fix
2022-01-18 13:11:34 +01:00
Maksim Kita
3ca17afa00 Fixed build 2022-01-17 20:35:52 +00:00
Maksim Kita
42ce3f2ae8 Fixed tests 2022-01-17 14:37:23 +00:00
Eugene Galkin
f46dca4793 support x509 ssl certificate authentication 2022-01-17 15:01:38 +03:00
alexey-milovidov
16feccbb34
Merge pull request #33672 from kitaisreal/functions-dict-get-has-implicit-key-cast
Functions dictGet, dictHas implicit key cast
2022-01-17 07:38:36 +03:00
Maksim Kita
ea78c0b33c Fixed style 2022-01-16 16:45:33 +00:00
Maksim Kita
39ddd48435 Fix style check 2022-01-16 15:33:03 +00:00
Maksim Kita
746c2a1306 Fix style check 2022-01-16 15:29:46 +00:00
Maksim Kita
0e6b90f513 Fix tests 2022-01-16 12:31:51 +00:00
Maksim Kita
12f352305a Fix tests 2022-01-16 12:23:40 +00:00
Maksim Kita
9c3cc7adab RangeHashedDictionary use IntervalTree 2022-01-16 12:23:40 +00:00
Maksim Kita
dd62c3c93e DictionarySourceCoordinator update interface 2022-01-16 12:22:22 +00:00
Maksim Kita
ff15e5af1d Fixed tests 2022-01-16 11:45:36 +00:00
Maksim Kita
a0ad7a1014 Update IExternalLoadable interface 2022-01-16 00:06:10 +00:00
Maksim Kita
0df98140af Functions dictGet, dictHas implicit key cast 2022-01-15 23:25:05 +00:00
Maksim Kita
47d8b07681 Dictionary rename fix 2022-01-11 18:53:45 +03:00
Kseniia Sumarokova
6587cd0ec6
Merge pull request #33231 from kssenii/settings-changes-with-named-conf
Pass settings as key value or config for storage with settings
2022-01-11 12:30:41 +03:00
Azat Khuzhin
aee034a597 Use explicit template instantiation for SystemLog
- Move some code into module part to avoid dependency from IStorage in SystemLog
- Remove extra headers from SystemLog.h
- Rewrite some code that was relying on headers that was included by SystemLog.h

v2: rebase
v3: squash move into module part with explicit template instantiation
    (to make each commit self compilable after rebase)
2022-01-10 22:01:41 +03:00
kssenii
21c34ad59b Add support for dictionary source 2022-01-10 14:00:03 +03:00
Maksim Kita
ff53466db6 RangeHashedDictionary ddl expression fix 2022-01-09 00:15:29 +03:00
Maksim Kita
4b4468b34a Dictionaries use single arena for multiple string attributes 2022-01-08 13:26:11 +03:00
alesapin
b1d2bdf569
Merge pull request #33130 from kssenii/validate-config
Validate config keys for external dictionaries
2022-01-07 12:59:02 +03:00
Maksim Kita
c84193ac67 DictionaryStructure fixes 2022-01-04 14:02:46 +03:00
kssenii
bfc705c098 Better 2021-12-30 15:19:17 +03:00