Commit Graph

41573 Commits

Author SHA1 Message Date
Azat Khuzhin
fb6f7631c2 Add ability to pass grower for HashTable during creation
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
7b5d156cc5 Optimize SPARSE_HASHED layout (by using PackedHashMap)
In case you want dictionary optimized for memory, SPARSE_HASHED is not
always gives you what you need.

Consider the following example <UInt64, UInt16> as <Key, Value>, but
this pair will also have a 6 byte padding (on amd64), so this is almost
40% of space wastage.

And because of this padding, even google::sparse_hash_map, does not make
picture better, in fact, sparse_hash_map is not very friendly to memory
allocators (especially jemalloc).

Here are some numbers for dictionary with 1e9 elements and UInt64 as
key, and UInt16 as value:

settings                         | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
HASHED upstream                  | -          | -          | -                     | -               | 35GiB
SPARSE_HASHED upstream           | -          | -          | -                     | -               | 26GiB
-                                | -          | -          | -                     | -               | -
sparse_hash_map glibc hashbench  | -          | -          | -                     | -               | 17.5GiB
sparse_hash_map packed allocator | 101.878    | 231.48     | 4.32                  | -               | 17.7GiB
PackedHashMap                    | 15.514     | 42.35      | 23.61                 | 20GiB           | 22GiB

As you can see PackedHashMap looks way more better then HASHED, and
even better then SPARSE_HASHED, but slightly worse then sparse_hash_map
with packed allocator (it is done with a custom patch to google
sparse_hash_map).

v2: rebase on top of bucket_count fix
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
b44497fd4c Introduce PackedHashMap (HashMap with structure without padding)
In case of you have HashMap with <UInt64, UInt16> as <Key, Value> the
overhead of 38% can be crutial, especially if you have tons of keys.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
c4f23e87f1 Export grower_type in HashTable
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Michael Kolupaev
e84f0895e7 Support hardlinking parts transactionally 2023-05-18 21:05:56 -07:00
Yakov Olkhovskiy
a2c3de5082
Merge pull request #49933 from ClickHouse/fix-ipv6-proto-serialization
Fix IPv6 encoding in protobuf
2023-05-18 23:02:15 -04:00
Dmitry Novik
aea71cf1bb
Merge branch 'master' into group-by-constant-fix 2023-05-19 01:29:56 +02:00
Michael Kolupaev
8dc59c1efe Fix test_insert_same_partition_and_merge failing if one Azure request attempt fails 2023-05-18 21:40:24 +00:00
Amos Bird
6b4dcbd3ed
Use PROJECT_*_DIR instead of CMAKE_*_DIR. 2023-05-18 23:23:39 +08:00
Sergei Trifonov
f98c337d2f
Fix stack-use-after-scope in resource manager test (#49908)
* Fix stack-use-after-scope in resource manager test

* fix
2023-05-18 14:53:46 +02:00
Kseniia Sumarokova
adebac1a92
Merge branch 'master' into fix-assertion-in-do-cleanup 2023-05-18 12:22:02 +02:00
FFFFFFFHHHHHHH
d31371adac
Merge branch 'master' into dot_product 2023-05-18 15:31:25 +08:00
Alexey Milovidov
86e14547d4
Merge pull request #49964 from ClickHouse/kssenii-patch-7
Follow up to #49429
2023-05-18 09:20:00 +03:00
Kseniia Sumarokova
855c95f626
Update src/Interpreters/Cache/Metadata.cpp
Co-authored-by: Igor Nikonov <954088+devcrafter@users.noreply.github.com>
2023-05-17 22:46:09 +02:00
Azat Khuzhin
e2e3a03dbe
Revert "groupArray returns cannot be nullable" 2023-05-17 22:33:30 +02:00
Timur Solodovnikov
c7ab59302f
Set allow_experimental_query_cache setting as obsolete (#49934)
* set allow_experimental_query_cache as obsolete

* add tsolodov to trusted contributors

* CI linter

---------

Co-authored-by: Nikita Mikhaylov <mikhaylovnikitka@gmail.com>
2023-05-17 20:03:42 +02:00
Kseniia Sumarokova
1c04085e8f
Update MergeTreeWriteAheadLog.h 2023-05-17 18:15:51 +02:00
kssenii
f2dbcb5146 Better fix 2023-05-17 16:27:06 +02:00
Han Fei
ed1d036151
Merge pull request #49884 from azat/dist-fix-async-block-processing
Fix processing pending batch for Distributed async INSERT after restart
2023-05-17 15:19:42 +02:00
Alexander Tokmakov
36c31e1d79
Improve concurrent parts removal with zero copy replication (#49630)
* improve concurrent parts removal

* fix

* fix
2023-05-17 14:07:34 +03:00
Alexander Tokmakov
1e529263d0
Merge branch 'master' into Follow_up_Backup_Restore_concurrency_check_node_2 2023-05-17 13:57:50 +03:00
Vitaly Baranov
6c8a923c9d
Merge branch 'master' into write-encrypted-to-backup 2023-05-17 12:37:05 +02:00
Kseniia Sumarokova
edceda494d
Merge branch 'master' into add-more-logging-for-cache 2023-05-17 12:24:59 +02:00
Kseniia Sumarokova
3787b7f127
Update Metadata.cpp 2023-05-17 12:16:18 +02:00
Azat Khuzhin
fdfb1eda55 Fix {Local,Remote}ReadThrottlerSleepMicroseconds metric values
And also update the test, since now you could have slightly less sleep
intervals, if query spend some time in other places.

But what is important is that query_duration_ms does not exceeded
calculated delay.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-17 12:12:39 +02:00
Azat Khuzhin
7383da0c52 Fix per-query remote throttler
remote throttler by some reason had been overwritten by the global one
during reloads, likely this is for graceful reload of this option, but
it breaks per-query throttling, remove this logic.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-17 12:12:39 +02:00
Azat Khuzhin
3c80e30f02 Fix per-query IO/BACKUPs throttling settings (when default profile has them)
When some of this settings was set for default profile (in
users.xml/users.yml), then it will be always used regardless of what
user passed.

Fix this by not inherit per-query throttlers, for this they should be
reset before making query context and they should not be initialized as
before in Context::makeQueryContext(), since makeQueryContext() called
too early, when user settings was not read yet.

But there we had also initialization of per-server throttling, move this
into the ContextSharedPart::configureServerWideThrottling(), and call it
once we have ServerSettings set.

Also note, that this patch makes the following settings - server
settings:
- max_replicated_fetches_network_bandwidth_for_server
- max_replicated_sends_network_bandwidth_for_server
But this change should not affect anybody, since it is done with
compatiblity (i.e. if this setting is set in users profile it will be
read from it as well as a fallback).

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-17 12:12:39 +02:00
Igor Nikonov
7d647c50c7
Merge branch 'master' into clearable_hash_set_without_zero_storage 2023-05-17 11:29:01 +02:00
FFFFFFFHHHHHHH
fd1e6557e1
Merge branch 'master' into dot_product 2023-05-17 14:40:06 +08:00
fhbai
c104354894 fix 2023-05-17 14:39:30 +08:00
Vitaly Baranov
f4ac4c3f9d Corrections after review. 2023-05-17 03:23:16 +02:00
Yakov Olkhovskiy
0a44a69dc8 remove unnecessary header 2023-05-17 00:22:13 +00:00
Yakov Olkhovskiy
282297b677 binary encoding of IPv6 in protobuf 2023-05-16 23:46:01 +00:00
serxa
abacf1f990 add missing quota_key in operator== for connections 2023-05-16 19:14:54 +00:00
serxa
b12eefc694 fix timeout units and log message 2023-05-16 18:57:04 +00:00
Alexander Tokmakov
0da82945ac fix 2023-05-16 18:18:48 +02:00
Alexander Tokmakov
3d26232cc0
Merge pull request #49918 from ClickHouse/remove_unused_code
Remove unused code
2023-05-16 18:53:49 +03:00
kssenii
724949927b Add logging 2023-05-16 17:36:48 +02:00
Antonio Andelic
4bc5a76fa7
Add Compose request for GCS (#49693)
* Add compose request

* Check if outcome is successful

---------

Co-authored-by: Nikita Mikhaylov <mikhaylovnikitka@gmail.com>
2023-05-16 17:20:06 +02:00
Dmitry Novik
2287dd8633
Merge pull request #49800 from ClickHouse/fix-adding-cast
Analyzer: apply _CAST to constants only once
2023-05-16 17:05:02 +02:00
Igor Nikonov
dea5cbcf4e
Slightly update comment 2023-05-16 16:39:00 +02:00
vdimir
1f55c320b4 Fix style 2023-05-16 16:23:53 +02:00
vdimir
ca005ecea1 Update comment about filtering nulls in asof join 2023-05-16 16:23:53 +02:00
vdimir
a7bb8f412f Allow ASOF JOIN over nullable right column 2023-05-16 16:23:53 +02:00
alesapin
50a536bba8 Remove unused code 2023-05-16 15:26:24 +02:00
Alexander Tokmakov
b6716a8f0f Merge branch 'master' into fix_some_tests4 2023-05-16 14:46:27 +02:00
Vitaly Baranov
b068f0b619 Fix build. 2023-05-16 14:27:27 +02:00
Vitaly Baranov
2ec94a42b7 Remove default parameters from virtual functions. 2023-05-16 14:27:27 +02:00
Vitaly Baranov
943707963f Add backup setting "decrypt_files_from_encrypted_disks" 2023-05-16 14:27:27 +02:00
Vitaly Baranov
019493efa3 Fix throttling in backups. 2023-05-16 14:27:27 +02:00
Vitaly Baranov
5198997fd8 Remove ReadSettings from backup entries. 2023-05-16 14:27:27 +02:00
Vitaly Baranov
7cea264230 Fix whitespaces. 2023-05-16 14:27:27 +02:00
Vitaly Baranov
c48c20fac8 Use combined checksums for encrypted immutable files. 2023-05-16 14:27:27 +02:00
Vitaly Baranov
517e119e03 Move checksum calculation to IBackupEntry. 2023-05-16 14:27:27 +02:00
Vitaly Baranov
002fd19cb7 Move the common part of BackupIO_* to BackupIO_Default. 2023-05-16 14:27:23 +02:00
Vitaly Baranov
c92219f01b BACKUP now writes encrypted data for tables on encrypted disks. 2023-05-16 14:26:33 +02:00
Vitaly Baranov
cc50fcc60a Remove the 'temporary_file_' argument from BackupEntryFromImmutableFile's constructor. 2023-05-16 14:25:37 +02:00
Vitaly Baranov
bc880db5d9 Add functions to read/write encrypted files from IDisk. 2023-05-16 14:25:37 +02:00
Vitaly Baranov
101aa6eff0 Add function copyS3FileFromDisk(). 2023-05-16 14:25:37 +02:00
Vitaly Baranov
69114cb550 Add function getBlobPath() to IDisk interface to allow copying to/from disks which are not built on top of IObjectStorage. 2023-05-16 14:25:36 +02:00
Vitaly Baranov
fd2731845c Simplify interface of IBackupWriter: Remove supportNativeCopy() function. 2023-05-16 14:25:36 +02:00
Smita Kulkarni
9a2645a729 Fixed clang build 2023-05-16 14:09:38 +02:00
kssenii
d4ea3ea045 Fix 2023-05-16 13:54:13 +02:00
alesapin
93bd09ddd6
Merge branch 'master' into fix_another_zero_copy_bug 2023-05-16 12:24:52 +02:00
Kruglov Pavel
b414760d43
Merge pull request #49673 from Avogar/fiber-local-var
Fix assert in SpanHolder::finish() with fibers
2023-05-16 11:59:33 +02:00
alesapin
0b4ab70dd9
Merge pull request #49891 from hanfei1991/hanfei/chassert-1
use chassert in MergeTreeDeduplicationLog to have better log info
2023-05-16 11:50:11 +02:00
Sema Checherinda
03c51208d1
Merge pull request #44869 from CheSema/multi_part_upload
rework WriteBufferFromS3, add tests, add abortion
2023-05-16 10:52:01 +02:00
Robert Schulze
59bc3e25be
Merge pull request #49824 from AVMusorin/allow-alias-column-kafka
KafkaEngine: Allow usage of Alias column type
2023-05-15 23:40:03 +02:00
FFFFFFFHHHHHHH
11b94a626a
Fix aggregate function kolmogorovSmirnovTest (#49768) 2023-05-15 23:20:29 +02:00
Sergei Trifonov
cbc15bf35a
Add DynamicResourceManager and FairPolicy into scheduling subsystem (#49671)
* Add `DynamicResourceManager` and `FairPolicy` into scheduling subsystem

* fix test

* fix tidy build
2023-05-15 23:13:17 +02:00
Alexander Tokmakov
c9d6ee3c98
Merge pull request #49874 from azat/build/fix
Fix "reference to local binding" after fixes for clang-17
2023-05-15 23:25:18 +03:00
Vitaly Baranov
801cacc13f
Merge pull request #49831 from vitlibar/fix-setting-null-in-profile-def
Fix setting NULL in profile definition
2023-05-15 22:24:49 +02:00
Vitaly Baranov
bf3336a84e
Merge pull request #47640 from ilejn/row_policy_template
Row policy for database
2023-05-15 20:05:15 +02:00
Alexander Tokmakov
65bc702b0b fix 2023-05-15 20:02:30 +02:00
Michael Kolupaev
91db148513 Fix AsynchronousReadIndirectBufferFromRemoteFS breaking on short seeks 2023-05-15 11:02:24 -07:00
Han Fei
4137a5e058 use chassert in MergeTreeDeduplicationLog to have better log info 2023-05-15 18:51:16 +02:00
Kruglov Pavel
900aca5f0a
Delete unneded files 2023-05-15 18:33:09 +02:00
Kruglov Pavel
bfcaf95aed
Delete unneded files 2023-05-15 18:32:54 +02:00
Alexander Tokmakov
05ae7b2c2d fix some tests 2023-05-15 18:28:12 +02:00
avogar
78064d0622 Better comments 2023-05-15 15:52:14 +00:00
avogar
b23afdc533 Fix build for aarch64-darwin 2023-05-15 15:48:00 +00:00
Igor Nikonov
97e1513b22
Merge branch 'master' into clearable_hash_set_without_zero_storage 2023-05-15 17:42:10 +02:00
vdimir
07de815d96
Merge pull request #49836 from arthurpassos/add_extract_kv_max_number_of_pairs_safeguard 2023-05-15 16:31:01 +02:00
Anton Popov
512b27ef27
Merge pull request #49873 from amosbird/fix_49839
Fix a bug with projections and the aggregate_functions_null_for_empty setting (for query_plan_optimize_projection)
2023-05-15 15:58:42 +02:00
Azat Khuzhin
f2a023140e Fix processing pending batch for Distributed async INSERT after restart
After abnormal server restart current_batch.txt (that contains list of
files to send to the remote shard), may not have all files, if it was
terminated between unlink .bin files and truncation of current_batch.txt

But it should be fixed in a more reliable way, though to backport the
patch I kept it simple.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-15 15:57:30 +02:00
AVMusorin
418a61a68c
Allow using Alias column type for KafkaEngine
```
create table kafka
(
 a UInt32,
 a_str String Alias toString(a)
) engine = Kafka;

create table data
(
  a UInt32;
  a_str String
) engine = MergeTree
order by tuple();

create materialized view data_mv to data
(
  a UInt32,
  a_str String
) as
select a, a_str from kafka;
```
Alias type works as expected in comparison with MATERIALIZED/EPHEMERAL
or column with default expression.

Ref: https://github.com/ClickHouse/ClickHouse/pull/47138

Co-authored-by: Azat Khuzhin <a3at.mail@gmail.com>
2023-05-15 15:39:58 +02:00
Sema Checherinda
dccdb3e678 work with comments on PR 2023-05-15 14:41:51 +02:00
Arthur Passos
e8f971aa2b use LIMIT_EXCEEDED instead of TOO_LARGE_MAP_SIZE 2023-05-15 09:25:10 -03:00
Arthur Passos
b06e34a77f Accept key value delimiter as part of value 2023-05-15 13:52:47 +02:00
Azat Khuzhin
665545ec45 Fix "reference to local binding" after fixes for clang-17
Follow-up for: #49851 (cc @alexey-milovidov)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-15 12:45:20 +02:00
Alexander Tokmakov
25912a2673
Merge pull request #49876 from JackyWoo/fix_typo
fix typo
2023-05-15 13:32:58 +03:00
Kruglov Pavel
558eda4146
Merge pull request #49412 from azat/block-use-dense-hash-map
Switch Block::NameMap to google::dense_hash_map over HashMap
2023-05-15 12:22:55 +02:00
JackyWoo
8d1bcb5c2f fix typo 2023-05-15 16:51:20 +08:00
Amos Bird
4764259f60
Fix a bug with projections and the aggregate_functions_null_for_empty
setting (for query_plan_optimize_projection)

Fix a bug with projections and the aggregate_functions_null_for_empty
setting. This was already fixed in PR #42198 but got forgotten after
using query_plan_optimize_projection.
2023-05-15 14:17:16 +08:00
Alexey Milovidov
1db35384d9 Support bitCount for big integers 2023-05-15 03:30:03 +02:00
alekar
528e68bfc4
Merge branch 'master' into fix-osx-setsockopt-errors 2023-05-14 15:35:55 -07:00
Sergei Trifonov
8f20085d9a
Merge pull request #48923 from ClickHouse/async-loader
Add AsyncLoader with dependency tracking and runtime prioritization
2023-05-14 15:12:39 +02:00
robot-clickhouse
33ca77b4ca
Merge pull request #49843 from azat/joinGet-non-deterministic
[RFC] Mark joinGet() as non deterministic (so as dictGet)
2023-05-14 11:12:12 +02:00
alekar
2631d3db20
Merge branch 'master' into fix-osx-setsockopt-errors 2023-05-13 23:03:17 -07:00
Manas Alekar
c87b33a24d Fix error on OS X regarding resetting timeouts.
This happens when remote disconnects due to inactivity. It seems
to work on Linux, likely due to difference in SO_LINGER, maybe a
different default timeout on Darwin.

Verified manually using clickhouse cloud using following process:

1. Connect to instance.
2. Run `show tables`.
3. Wait 6 minutes.
4. Run `show tables`.

With this fix, the EINVAL is not reported, and client will simply
reconnect.
2023-05-13 22:55:27 -07:00