ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-12-16 11:22:12 +00:00

Author	SHA1	Message	Date
lgbo-ustc	7772fed161	update 1. fixed the memoery overflow problem when handle all delayed buckets parallely 2. resue exists tests	2023-05-22 10:17:40 +08:00
lgbo-ustc	39db0f84d9	add comment	2023-05-22 10:17:40 +08:00
lgbo-ustc	39ff030a6e	grace hash join supports right/full join	2023-05-22 10:17:40 +08:00
Igor Nikonov	f5dc07d052	tryReserve() cleanup simplify removing eviction candidates	2023-05-21 22:01:28 +00:00
Azat Khuzhin	ef06bb8f14	Fix crashing in case of Replicated database without arguments Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-21 23:12:39 +02:00
Azat Khuzhin	66cf16410d	Preserve initial_query_id for ON CLUSTER queries v2: add proper escaping v3: set distributed_ddl_output_mode=none for test to fix replicated database build Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-21 23:04:54 +02:00
Azat Khuzhin	b6cc504717	Remove Common/OpenTelemetryTraceContext.h from Context.h Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-21 23:04:33 +02:00
Azat Khuzhin	0586a27432	Charge only server memory for dictionaries Right now the memory will be counted for query/user for dictionary, but only if it load by user (via SYSTEM RELOAD QUERY or via dictGet()), but it could be also loaded in backgrounad (due to lifetime, or update_field, so it is like Buffer, only server memory should be charged. v2: mark test as long Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com> Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>	2023-05-21 22:53:52 +02:00
Robert Schulze	2a9ff30a7f	Merge pull request #49380 from azat/dict/hashed-memory Improve memory usage and speed of SPARSE_HASHED/HASHED dictionaries	2023-05-21 15:46:41 +02:00
Amos Bird	0a3d986e42	Fix reporting projection broken part	2023-05-21 20:58:58 +08:00
serxa	44b1754ccf	more profile events	2023-05-21 12:43:47 +00:00
serxa	c56e6a8b80	Add more profile events for distributconnections	2023-05-21 12:15:06 +00:00
Sergei Trifonov	3c002755e2	Merge pull request #50036 from ClickHouse/fix-load-balancing Load balancing bugfixes	2023-05-21 11:21:55 +02:00
kssenii	8924c17575	Fix build	2023-05-20 13:31:27 +02:00
vdimir	8b77e2096c	Merge pull request #49760 from arthurpassos/extract_kv_ignore_kv_delimiter_when_reading_value	2023-05-20 13:27:59 +02:00
Azat Khuzhin	82054d40a5	Add proper escaping for DDL OpenTelemetry context serialization Before you was able to break the format by using "\n" or "\t", that will simply lead to DDL hang, because DDLWorker will simply log the error and do nothing more: <Error> DDLWorker: Cannot parse DDL task query-0000000056: Incorrect task format. Will try to send error status: Code: 27. DB::ParsingException: Cannot parse input: expected '\n' before: 'bar\n1\n'. (CANNOT_PARSE_INPUT_ASSERTION_FAILED) (version 23.5.1.1) Fix this by adding proper escaping. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-20 13:15:49 +02:00
Azat Khuzhin	52c5fd5cb9	Rewrite OpenTelemetry context serialization for DDL without IO/Operators.h This is required to switch to escaped versions. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-20 13:14:40 +02:00
Kseniia Sumarokova	a0480daef3	Update waitServersToFinish.h	2023-05-20 12:43:24 +02:00
Igor Nikonov	fbcbd3ab90	Merge pull request #49846 from ClickHouse/clearable_hash_set_without_zero_storage Clearable hash table and zero values	2023-05-20 11:19:44 +02:00
Alexey Milovidov	4e3188126f	Merge pull request #49050 from FFFFFFFHHHHHHH/dot_product Add Function dotProduct for array	2023-05-20 03:07:13 +03:00
Alexey Milovidov	2323542e47	Merge pull request #50022 from ClickHouse/geo-types-production-ready Geo types are production ready	2023-05-20 02:02:23 +03:00
Alexey Milovidov	54f7b8e6ab	Merge pull request #50030 from kssenii/aws-client-save-provider Add method getCredentials() to S3::Client	2023-05-20 01:59:58 +03:00
Igor Nikonov	af80e29519	Merge branch 'master' into clearable_hash_set_without_zero_storage	2023-05-19 23:36:30 +02:00
Sergei Trifonov	14e8132ac4	Merge branch 'master' into fix-load-balancing	2023-05-19 23:05:27 +02:00
Michael Kolupaev	6fd5d8e8ba	Add setting output_format_parquet_compliant_nested_types to produce more compatible Parquet files	2023-05-19 18:39:50 +00:00
alekar	de710209a7	Merge branch 'master' into fix-osx-setsockopt-errors	2023-05-19 11:15:01 -07:00
serxa	052d8aca71	limit `max_tries` value by `max_error_cap` to avoid unlimited number of retries	2023-05-19 18:13:29 +00:00
serxa	d69c35fcdd	fix PoolWithFailover `error_count` integer overflow	2023-05-19 17:57:00 +00:00
serxa	086888b285	fix ConnectionPoolWithFailover::getPriority	2023-05-19 17:54:29 +00:00
serxa	35e77f8e2a	fix comment	2023-05-19 17:53:22 +00:00
kssenii	791bb6cd4c	Fix style check	2023-05-19 17:35:01 +02:00
kssenii	3e42ee7f2b	Get rid of finalize callback in object storages	2023-05-19 17:29:37 +02:00
Antonio Andelic	4af8187464	Activate restarting thread in both cases	2023-05-19 15:06:02 +00:00
kssenii	b29edc4737	Add method	2023-05-19 16:38:14 +02:00
kssenii	0eab528f9f	Move common code	2023-05-19 16:23:56 +02:00
mateng915	5237dd0245	New system table zookeeper connection (#45245 ) * Feature: Support new system table to show which zookeeper node be connected Description: ============ Currently we have no place to check which zk node be connected otherwise using lsof command. It not convenient Solution: ========= Implemented a new system table, system.zookeeper_host when CK Server has zk this table will show the zk node dir which connected by current CK server Noted: This table can support multi-zookeeper cluster scenario. * fixed review comments * added test case * update test cases * remove unused code * fixed review comments and removed unused code * updated test cases for print host, port and is_expired * modify the code comments * fixed CI Failed * fixed code style check failure * updated test cases by added Tags * update test reference * update test cases * added system.zookeeper_connection doc * Update docs/en/operations/system-tables/zookeeper_connection.md * Update docs/en/operations/system-tables/zookeeper_connection.md * Update docs/en/operations/system-tables/zookeeper_connection.md --------- Co-authored-by: Alexander Tokmakov <tavplubix@gmail.com>	2023-05-19 17:06:43 +03:00
Sergei Trifonov	67bf9ac539	Merge pull request #49797 from azat/fix-throttlers Fix per-query IO/BACKUPs throttling settings	2023-05-19 15:51:57 +02:00
Antonio Andelic	acf71c5b9a	Fix typo	2023-05-19 15:48:31 +02:00
Antonio Andelic	3107070e76	Avoid deadlock when starting table in attach thread	2023-05-19 12:48:19 +00:00
Dmitry Novik	d705e5102b	Merge pull request #49838 from ClickHouse/group-by-constant-fix Analyzer: do not optimize GROUP BY keys with ROLLUP and CUBE	2023-05-19 14:27:34 +02:00
Sergei Trifonov	5db5f6e44b	Merge branch 'master' into fix-throttlers	2023-05-19 14:08:36 +02:00
Alexey Milovidov	ab162756ba	Merge branch 'master' into dot_product	2023-05-19 14:46:53 +03:00
Antonio Andelic	9c3b17fa18	Remove whitespace	2023-05-19 13:00:51 +02:00
alesapin	632ab8a3d1	Merge pull request #49996 from ClickHouse/az Fix test_insert_same_partition_and_merge failing if one Azure request attempt fails	2023-05-19 12:58:47 +02:00
Antonio Andelic	e46476dba2	Update src/Coordination/Changelog.cpp Co-authored-by: alesapin <alesapin@clickhouse.com>	2023-05-19 12:44:20 +02:00
Alexey Milovidov	f5506210d6	Geo types are production ready	2023-05-19 12:43:55 +02:00
alesapin	e741450b88	Merge branch 'master' into fix_another_zero_copy_bug	2023-05-19 12:40:48 +02:00
alesapin	e5b001abda	Merge branch 'master' into fix_some_tests4	2023-05-19 12:34:03 +02:00
Antonio Andelic	6e468b29e8	Check return value of ftruncate	2023-05-19 10:15:06 +00:00
Alexey Milovidov	70c83f5133	Merge pull request #49991 from amosbird/clickhouse_as_library Use PROJECT__DIR instead of CMAKE__DIR.	2023-05-19 12:37:18 +03:00
Azat Khuzhin	e1e2a83a9e	Print type of the structure that will be used for HASHED/SPARSE_HASHED Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	f8e7d2cb1f	Remove part of the HashTableGrowerWithPrecalculationAndMaxLoadFactor comment Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	c9cde110cd	Add initial degree as parameter for HashTableGrowerWithPrecalculationAndMaxLoadFactor Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	01bf041cca	Rewrite HashTableGrower{,WithPrecalculation}::set w/o ternary operators Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	634f168a74	Introduce max_size_degree for HashTableGrower{,WithPrecalculation} Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	42eac6bfbc	Wrap implementation helpers into HashedDictionaryImpl namespace Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	6f351851ad	Rename grower to HashTableGrowerWithPrecalculationAndMaxLoadFactor Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	1ab130132c	Add more comments into HashedDictionaryCollectionType.h Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	7eba6def94	Add a comment for HashTableGrowerWithPrecalculation about load factor Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	422cbe08fe	Do not use PackedHashMap for non-POD for the purposes of layout In clang-16 the behaviour for POD types had been changed in [1], this does not allows us to use PackedHashMap for some types. [1]: `277123376c` Note, that I tried to come up with a more generic solution then enumeratic types, but failed. Though now I think that this is good, since this shows which types are not allowed for PackedHashMap Another option is to use -fclang-abi-compat=13.0 but I doubt it is a good idea. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	fc19e79f50	Change coding style of declaring packed attribute in PackedHashMap Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	65dd87d0da	Fix "reference binding to misaligned address" in PackedHashMap Use separate helpers that accept/return values, instead of reference, anyway PackedHashMap is developed for small structure. v0: fix for keys v2: fix for values v3: fix bitEquals v4: fix for iterating over HashMap Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	7c8d8eeb56	Use Cell::setMapped() over separate helper insertSetMapped() Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	2996b38606	Add ability to configure maximum load factor for the HASHED/SPARSE_HASHED layout As it turns out, HashMap/PackedHashMap works great even with max load factor of 0.99. By "great" I mean it least it works faster then google sparsehash, and not to mention it's friendliness to the memory allocator (it has zero fragmentation since it works with a continuious memory region, in comparison to the sparsehash that doing lots of realloc, which jemalloc does not like, due to it's slabs). Here is a table of different setups: settings \| load (sec) \| read (sec) \| read (million rows/s) \| bytes_allocated \| RSS - \| - \| - \| - \| - \| - HASHED upstream \| - \| - \| - \| - \| 35GiB SPARSE_HASHED upstream \| - \| - \| - \| - \| 26GiB - \| - \| - \| - \| - \| - sparse_hash_map glibc hashbench \| - \| - \| - \| - \| 17.5GiB sparse_hash_map packed allocator \| 101.878 \| 231.48 \| 4.32 \| - \| 17.7GiB PackedHashMap 0.5 \| 15.514 \| 42.35 \| 23.61 \| 20GiB \| 22GiB hashed 0.95 \| 34.903 \| 115.615 \| 8.65 \| 16GiB \| 18.7GiB PackedHashMap 0.95 \| 93.6 \| 19.883 \| 10.68 \| 10GiB \| 12.8GiB PackedHashMap 0.99 \| 26.113 \| 83.6 \| 11.96 \| 10GiB \| 12.3GiB As it shows, PackedHashMap with 0.95 max_load_factor, eats 2.6x less memory then SPARSE_HASHED in upstream, and it also 2x faster for read! v2: fix grower Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	3698302ddb	Accept float values for dictionary layouts configurations Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	8c6d691f52	Use HashTable constructor in HashSet Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	fb6f7631c2	Add ability to pass grower for HashTable during creation Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	7b5d156cc5	Optimize SPARSE_HASHED layout (by using PackedHashMap) In case you want dictionary optimized for memory, SPARSE_HASHED is not always gives you what you need. Consider the following example <UInt64, UInt16> as <Key, Value>, but this pair will also have a 6 byte padding (on amd64), so this is almost 40% of space wastage. And because of this padding, even google::sparse_hash_map, does not make picture better, in fact, sparse_hash_map is not very friendly to memory allocators (especially jemalloc). Here are some numbers for dictionary with 1e9 elements and UInt64 as key, and UInt16 as value: settings \| load (sec) \| read (sec) \| read (million rows/s) \| bytes_allocated \| RSS HASHED upstream \| - \| - \| - \| - \| 35GiB SPARSE_HASHED upstream \| - \| - \| - \| - \| 26GiB - \| - \| - \| - \| - \| - sparse_hash_map glibc hashbench \| - \| - \| - \| - \| 17.5GiB sparse_hash_map packed allocator \| 101.878 \| 231.48 \| 4.32 \| - \| 17.7GiB PackedHashMap \| 15.514 \| 42.35 \| 23.61 \| 20GiB \| 22GiB As you can see PackedHashMap looks way more better then HASHED, and even better then SPARSE_HASHED, but slightly worse then sparse_hash_map with packed allocator (it is done with a custom patch to google sparse_hash_map). v2: rebase on top of bucket_count fix Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	b44497fd4c	Introduce PackedHashMap (HashMap with structure without padding) In case of you have HashMap with <UInt64, UInt16> as <Key, Value> the overhead of 38% can be crutial, especially if you have tons of keys. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	c4f23e87f1	Export grower_type in HashTable Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Michael Kolupaev	e84f0895e7	Support hardlinking parts transactionally	2023-05-18 21:05:56 -07:00
Yakov Olkhovskiy	a2c3de5082	Merge pull request #49933 from ClickHouse/fix-ipv6-proto-serialization Fix IPv6 encoding in protobuf	2023-05-18 23:02:15 -04:00
Nikolay Degterinsky	ef45956713	Fix style	2023-05-19 01:31:45 +00:00
Nikolay Degterinsky	b8be714830	Add schema inference to more table engines	2023-05-19 00:44:27 +00:00
Dmitry Novik	aea71cf1bb	Merge branch 'master' into group-by-constant-fix	2023-05-19 01:29:56 +02:00
Michael Kolupaev	8dc59c1efe	Fix test_insert_same_partition_and_merge failing if one Azure request attempt fails	2023-05-18 21:40:24 +00:00
Amos Bird	6b4dcbd3ed	Use PROJECT__DIR instead of CMAKE__DIR.	2023-05-18 23:23:39 +08:00
Sergei Trifonov	f98c337d2f	Fix stack-use-after-scope in resource manager test (#49908 ) * Fix stack-use-after-scope in resource manager test * fix	2023-05-18 14:53:46 +02:00
Kseniia Sumarokova	adebac1a92	Merge branch 'master' into fix-assertion-in-do-cleanup	2023-05-18 12:22:02 +02:00
Victor Krasnov	83d066e5cf	Re-enable Date and Date32 as parameters of toUnixTimestamp function	2023-05-18 09:07:27 +00:00
FFFFFFFHHHHHHH	d31371adac	Merge branch 'master' into dot_product	2023-05-18 15:31:25 +08:00
Alexey Milovidov	86e14547d4	Merge pull request #49964 from ClickHouse/kssenii-patch-7 Follow up to #49429	2023-05-18 09:20:00 +03:00
Kseniia Sumarokova	855c95f626	Update src/Interpreters/Cache/Metadata.cpp Co-authored-by: Igor Nikonov <954088+devcrafter@users.noreply.github.com>	2023-05-17 22:46:09 +02:00
Azat Khuzhin	e2e3a03dbe	Revert "`groupArray` returns cannot be nullable"	2023-05-17 22:33:30 +02:00
Timur Solodovnikov	c7ab59302f	Set allow_experimental_query_cache setting as obsolete (#49934 ) * set allow_experimental_query_cache as obsolete * add tsolodov to trusted contributors * CI linter --------- Co-authored-by: Nikita Mikhaylov <mikhaylovnikitka@gmail.com>	2023-05-17 20:03:42 +02:00
Kseniia Sumarokova	1c04085e8f	Update MergeTreeWriteAheadLog.h	2023-05-17 18:15:51 +02:00
kssenii	f2dbcb5146	Better fix	2023-05-17 16:27:06 +02:00
Han Fei	ed1d036151	Merge pull request #49884 from azat/dist-fix-async-block-processing Fix processing pending batch for Distributed async INSERT after restart	2023-05-17 15:19:42 +02:00
avogar	7443dc925c	Fix possible Logical error on bad Nullable parsing for text formats	2023-05-17 13:12:00 +00:00
avogar	2ff3c8badd	Remove testing code	2023-05-17 11:41:00 +00:00
avogar	846804fed0	Add separate handshake_timeout for receiving Hello packet from replica	2023-05-17 11:39:04 +00:00
Alexander Tokmakov	36c31e1d79	Improve concurrent parts removal with zero copy replication (#49630 ) * improve concurrent parts removal * fix * fix	2023-05-17 14:07:34 +03:00
Alexander Tokmakov	1e529263d0	Merge branch 'master' into Follow_up_Backup_Restore_concurrency_check_node_2	2023-05-17 13:57:50 +03:00
Vitaly Baranov	6c8a923c9d	Merge branch 'master' into write-encrypted-to-backup	2023-05-17 12:37:05 +02:00
Kseniia Sumarokova	edceda494d	Merge branch 'master' into add-more-logging-for-cache	2023-05-17 12:24:59 +02:00
Kseniia Sumarokova	3787b7f127	Update Metadata.cpp	2023-05-17 12:16:18 +02:00
Azat Khuzhin	fdfb1eda55	Fix {Local,Remote}ReadThrottlerSleepMicroseconds metric values And also update the test, since now you could have slightly less sleep intervals, if query spend some time in other places. But what is important is that query_duration_ms does not exceeded calculated delay. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-17 12:12:39 +02:00
Azat Khuzhin	7383da0c52	Fix per-query remote throttler remote throttler by some reason had been overwritten by the global one during reloads, likely this is for graceful reload of this option, but it breaks per-query throttling, remove this logic. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-17 12:12:39 +02:00
Azat Khuzhin	3c80e30f02	Fix per-query IO/BACKUPs throttling settings (when default profile has them) When some of this settings was set for default profile (in users.xml/users.yml), then it will be always used regardless of what user passed. Fix this by not inherit per-query throttlers, for this they should be reset before making query context and they should not be initialized as before in Context::makeQueryContext(), since makeQueryContext() called too early, when user settings was not read yet. But there we had also initialization of per-server throttling, move this into the ContextSharedPart::configureServerWideThrottling(), and call it once we have ServerSettings set. Also note, that this patch makes the following settings - server settings: - max_replicated_fetches_network_bandwidth_for_server - max_replicated_sends_network_bandwidth_for_server But this change should not affect anybody, since it is done with compatiblity (i.e. if this setting is set in users profile it will be read from it as well as a fallback). Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-17 12:12:39 +02:00
Igor Nikonov	7d647c50c7	Merge branch 'master' into clearable_hash_set_without_zero_storage	2023-05-17 11:29:01 +02:00

1 2 3 4 5 ...

41662 Commits