Merge branch 'master' into coverage

This commit is contained in:
Alexey Milovidov 2023-11-04 21:49:59 +01:00
commit cd94d02455
511 changed files with 9548 additions and 2455 deletions

View File

@ -1,4 +1,5 @@
### Table of Contents
**[ClickHouse release v23.10, 2023-11-02](#2310)**<br/>
**[ClickHouse release v23.9, 2023-09-28](#239)**<br/>
**[ClickHouse release v23.8 LTS, 2023-08-31](#238)**<br/>
**[ClickHouse release v23.7, 2023-07-27](#237)**<br/>
@ -12,6 +13,184 @@
# 2023 Changelog
### ClickHouse release 23.10, 2023-11-02
#### Backward Incompatible Change
* There is no longer an option to automatically remove broken data parts. This closes [#55174](https://github.com/ClickHouse/ClickHouse/issues/55174). [#55184](https://github.com/ClickHouse/ClickHouse/pull/55184) ([Alexey Milovidov](https://github.com/alexey-milovidov)). [#55557](https://github.com/ClickHouse/ClickHouse/pull/55557) ([Jihyuk Bok](https://github.com/tomahawk28)).
* The obsolete in-memory data parts can no longer be read from the write-ahead log. If you have configured in-memory parts before, they have to be removed before the upgrade. [#55186](https://github.com/ClickHouse/ClickHouse/pull/55186) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove the integration with Meilisearch. Reason: it was compatible only with the old version 0.18. The recent version of Meilisearch changed the protocol and does not work anymore. Note: we would appreciate it if you help to return it back. [#55189](https://github.com/ClickHouse/ClickHouse/pull/55189) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Rename directory monitor concept into background INSERT. All the settings `*directory_monitor*` had been renamed to `distributed_background_insert*`. *Backward compatibility should be preserved* (since old settings had been added as an alias). [#55978](https://github.com/ClickHouse/ClickHouse/pull/55978) ([Azat Khuzhin](https://github.com/azat)).
* Do not interpret the `send_timeout` set on the client side as the `receive_timeout` on the server side and vise-versa. [#56035](https://github.com/ClickHouse/ClickHouse/pull/56035) ([Azat Khuzhin](https://github.com/azat)).
* Comparison of time intervals with different units will throw an exception. This closes [#55942](https://github.com/ClickHouse/ClickHouse/issues/55942). You might have occasionally rely on the previous behavior when the underlying numeric values were compared regardless of the units. [#56090](https://github.com/ClickHouse/ClickHouse/pull/56090) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Rewrited the experimental `S3Queue` table engine completely: changed the way we keep information in zookeeper which allows to make less zookeeper requests, added caching of zookeeper state in cases when we know the state will not change, improved the polling from s3 process to make it less aggressive, changed the way ttl and max set for trached files is maintained, now it is a background process. Added `system.s3queue` and `system.s3queue_log` tables. Closes [#54998](https://github.com/ClickHouse/ClickHouse/issues/54998). [#54422](https://github.com/ClickHouse/ClickHouse/pull/54422) ([Kseniia Sumarokova](https://github.com/kssenii)).
#### New Feature
* Add function `arrayFold(accumulator, x1, ..., xn -> expression, initial, array1, ..., arrayn)` which applies a lambda function to multiple arrays of the same cardinality and collects the result in an accumulator. [#49794](https://github.com/ClickHouse/ClickHouse/pull/49794) ([Lirikl](https://github.com/Lirikl)).
* Support for `Npy` format. `SELECT * FROM file('example_array.npy', Npy)`. [#55982](https://github.com/ClickHouse/ClickHouse/pull/55982) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* If a table has a space-filling curve in its key, e.g., `ORDER BY mortonEncode(x, y)`, the conditions on its arguments, e.g., `x >= 10 AND x <= 20 AND y >= 20 AND y <= 30` can be used for indexing. A setting `analyze_index_with_space_filling_curves` is added to enable or disable this analysis. This closes [#41195](https://github.com/ClickHouse/ClickHouse/issue/41195). Continuation of [#4538](https://github.com/ClickHouse/ClickHouse/pull/4538). Continuation of [#6286](https://github.com/ClickHouse/ClickHouse/pull/6286). Continuation of [#28130](https://github.com/ClickHouse/ClickHouse/pull/28130). Continuation of [#41753](https://github.com/ClickHouse/ClickHouse/pull/#41753). [#55642](https://github.com/ClickHouse/ClickHouse/pull/55642) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* A new setting called `force_optimize_projection_name`, it takes a name of projection as an argument. If it's value set to a non-empty string, ClickHouse checks that this projection is used in the query at least once. Closes [#55331](https://github.com/ClickHouse/ClickHouse/issues/55331). [#56134](https://github.com/ClickHouse/ClickHouse/pull/56134) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Support asynchronous inserts with external data via native protocol. Previously it worked only if data is inlined into query. [#54730](https://github.com/ClickHouse/ClickHouse/pull/54730) ([Anton Popov](https://github.com/CurtizJ)).
* Added aggregation function `lttb` which uses the [Largest-Triangle-Three-Buckets](https://skemman.is/bitstream/1946/15343/3/SS_MSthesis.pdf) algorithm for downsampling data for visualization. [#53145](https://github.com/ClickHouse/ClickHouse/pull/53145) ([Sinan](https://github.com/sinsinan)).
* Query`CHECK TABLE` has better performance and usability (sends progress updates, cancellable). Support checking particular part with `CHECK TABLE ... PART 'part_name'`. [#53404](https://github.com/ClickHouse/ClickHouse/pull/53404) ([vdimir](https://github.com/vdimir)).
* Added function `jsonMergePatch`. When working with JSON data as strings, it provides a way to merge these strings (of JSON objects) together to form a single string containing a single JSON object. [#54364](https://github.com/ClickHouse/ClickHouse/pull/54364) ([Memo](https://github.com/Joeywzr)).
* The second part of Kusto Query Language dialect support. [Phase 1 implementation ](https://github.com/ClickHouse/ClickHouse/pull/37961) has been merged. [#42510](https://github.com/ClickHouse/ClickHouse/pull/42510) ([larryluogit](https://github.com/larryluogit)).
* Added a new SQL function, `arrayRandomSample(arr, k)` which returns a sample of k elements from the input array. Similar functionality could previously be achieved only with less convenient syntax, e.g. "SELECT arrayReduce('groupArraySample(3)', range(10))". [#54391](https://github.com/ClickHouse/ClickHouse/pull/54391) ([itayisraelov](https://github.com/itayisraelov)).
* Introduce `-ArgMin`/`-ArgMax` aggregate combinators which allow to aggregate by min/max values only. One use case can be found in [#54818](https://github.com/ClickHouse/ClickHouse/issues/54818). This PR also reorganize combinators into dedicated folder. [#54947](https://github.com/ClickHouse/ClickHouse/pull/54947) ([Amos Bird](https://github.com/amosbird)).
* Allow to drop cache for Protobuf format with `SYSTEM DROP SCHEMA FORMAT CACHE [FOR Protobuf]`. [#55064](https://github.com/ClickHouse/ClickHouse/pull/55064) ([Aleksandr Musorin](https://github.com/AVMusorin)).
* Add external HTTP Basic authenticator. [#55199](https://github.com/ClickHouse/ClickHouse/pull/55199) ([Aleksei Filatov](https://github.com/aalexfvk)).
* Added function `byteSwap` which reverses the bytes of unsigned integers. This is particularly useful for reversing values of types which are represented as unsigned integers internally such as IPv4. [#55211](https://github.com/ClickHouse/ClickHouse/pull/55211) ([Priyansh Agrawal](https://github.com/Priyansh121096)).
* Added function `formatQuery()` which returns a formatted version (possibly spanning multiple lines) of a SQL query string. Also added function `formatQuerySingleLine()` which does the same but the returned string will not contain linebreaks. [#55239](https://github.com/ClickHouse/ClickHouse/pull/55239) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Added `DWARF` input format that reads debug symbols from an ELF executable/library/object file. [#55450](https://github.com/ClickHouse/ClickHouse/pull/55450) ([Michael Kolupaev](https://github.com/al13n321)).
* Allow to save unparsed records and errors in RabbitMQ, NATS and FileLog engines. Add virtual columns `_error` and `_raw_message`(for NATS and RabbitMQ), `_raw_record` (for FileLog) that are filled when ClickHouse fails to parse new record. The behaviour is controlled under storage settings `nats_handle_error_mode` for NATS, `rabbitmq_handle_error_mode` for RabbitMQ, `handle_error_mode` for FileLog similar to `kafka_handle_error_mode`. If it's set to `default`, en exception will be thrown when ClickHouse fails to parse a record, if it's set to `stream`, erorr and raw record will be saved into virtual columns. Closes [#36035](https://github.com/ClickHouse/ClickHouse/issues/36035). [#55477](https://github.com/ClickHouse/ClickHouse/pull/55477) ([Kruglov Pavel](https://github.com/Avogar)).
* Keeper client improvement: add `get_all_children_number command` that returns number of all children nodes under a specific path. [#55485](https://github.com/ClickHouse/ClickHouse/pull/55485) ([guoxiaolong](https://github.com/guoxiaolongzte)).
* Keeper client improvement: add `get_direct_children_number` command that returns number of direct children nodes under a path. [#55898](https://github.com/ClickHouse/ClickHouse/pull/55898) ([xuzifu666](https://github.com/xuzifu666)).
* Add statement `SHOW SETTING setting_name` which is a simpler version of existing statement `SHOW SETTINGS`. [#55979](https://github.com/ClickHouse/ClickHouse/pull/55979) ([Maksim Kita](https://github.com/kitaisreal)).
* Added fields `substreams` and `filenames` to the `system.parts_columns` table. [#55108](https://github.com/ClickHouse/ClickHouse/pull/55108) ([Anton Popov](https://github.com/CurtizJ)).
* Add support for `SHOW MERGES` query. [#55815](https://github.com/ClickHouse/ClickHouse/pull/55815) ([megao](https://github.com/jetgm)).
* Introduce a setting `create_table_empty_primary_key_by_default` for default `ORDER BY ()`. [#55899](https://github.com/ClickHouse/ClickHouse/pull/55899) ([Srikanth Chekuri](https://github.com/srikanthccv)).
#### Performance Improvement
* Add option `query_plan_preserve_num_streams_after_window_functions` to preserve the number of streams after evaluating window functions to allow parallel stream processing. [#50771](https://github.com/ClickHouse/ClickHouse/pull/50771) ([frinkr](https://github.com/frinkr)).
* Release more streams if data is small. [#53867](https://github.com/ClickHouse/ClickHouse/pull/53867) ([Jiebin Sun](https://github.com/jiebinn)).
* RoaringBitmaps being optimized before serialization. [#55044](https://github.com/ClickHouse/ClickHouse/pull/55044) ([UnamedRus](https://github.com/UnamedRus)).
* Posting lists in inverted indexes are now optimized to use the smallest possible representation for internal bitmaps. Depending on the repetitiveness of the data, this may significantly reduce the space consumption of inverted indexes. [#55069](https://github.com/ClickHouse/ClickHouse/pull/55069) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Fix contention on Context lock, this significantly improves performance for a lot of short-running concurrent queries. [#55121](https://github.com/ClickHouse/ClickHouse/pull/55121) ([Maksim Kita](https://github.com/kitaisreal)).
* Improved the performance of inverted index creation by 30%. This was achieved by replacing `std::unordered_map` with `absl::flat_hash_map`. [#55210](https://github.com/ClickHouse/ClickHouse/pull/55210) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Support ORC filter push down (rowgroup level). [#55330](https://github.com/ClickHouse/ClickHouse/pull/55330) ([李扬](https://github.com/taiyang-li)).
* Improve performance of external aggregation with a lot of temporary files. [#55489](https://github.com/ClickHouse/ClickHouse/pull/55489) ([Maksim Kita](https://github.com/kitaisreal)).
* Set a reasonable size for the marks cache for secondary indices by default to avoid loading the marks over and over again. [#55654](https://github.com/ClickHouse/ClickHouse/pull/55654) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Avoid unnecessary reconstruction of index granules when reading skip indexes. This addresses [#55653](https://github.com/ClickHouse/ClickHouse/issues/55653#issuecomment-1763766009). [#55683](https://github.com/ClickHouse/ClickHouse/pull/55683) ([Amos Bird](https://github.com/amosbird)).
* Cache CAST function in set during execution to improve the performance of function `IN` when set element type doesn't exactly match column type. [#55712](https://github.com/ClickHouse/ClickHouse/pull/55712) ([Duc Canh Le](https://github.com/canhld94)).
* Performance improvement for `ColumnVector::insertMany` and `ColumnVector::insertManyFrom`. [#55714](https://github.com/ClickHouse/ClickHouse/pull/55714) ([frinkr](https://github.com/frinkr)).
* Optimized Map subscript operations by predicting the next row's key position and reduce the comparisons. [#55929](https://github.com/ClickHouse/ClickHouse/pull/55929) ([lgbo](https://github.com/lgbo-ustc)).
* Support struct fields pruning in Parquet (in previous versions it didn't work in some cases). [#56117](https://github.com/ClickHouse/ClickHouse/pull/56117) ([lgbo](https://github.com/lgbo-ustc)).
* Add the ability to tune the number of parallel replicas used in a query execution based on the estimation of rows to read. [#51692](https://github.com/ClickHouse/ClickHouse/pull/51692) ([Raúl Marín](https://github.com/Algunenano)).
* Optimized external aggregation memory consumption in case many temporary files were generated. [#54798](https://github.com/ClickHouse/ClickHouse/pull/54798) ([Nikita Taranov](https://github.com/nickitat)).
* Distributed queries executed in `async_socket_for_remote` mode (default) now respect `max_threads` limit. Previously, some queries could create excessive threads (up to `max_distributed_connections`), causing server performance issues. [#53504](https://github.com/ClickHouse/ClickHouse/pull/53504) ([filimonov](https://github.com/filimonov)).
* Caching skip-able entries while executing DDL from Zookeeper distributed DDL queue. [#54828](https://github.com/ClickHouse/ClickHouse/pull/54828) ([Duc Canh Le](https://github.com/canhld94)).
* Experimental inverted indexes do not store tokens with too many matches (i.e. row ids in the posting list). This saves space and avoids ineffective index lookups when sequential scans would be equally fast or faster. The previous heuristics (`density` parameter passed to the index definition) that controlled when tokens would not be stored was too confusing for users. A much simpler heuristics based on parameter `max_rows_per_postings_list` (default: 64k) is introduced which directly controls the maximum allowed number of row ids in a postings list. [#55616](https://github.com/ClickHouse/ClickHouse/pull/55616) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Improve write performance to `EmbeddedRocksDB` tables. [#55732](https://github.com/ClickHouse/ClickHouse/pull/55732) ([Duc Canh Le](https://github.com/canhld94)).
* Improved overall resilience for ClickHouse in case of many parts within partition (more than 1000). It might reduce the number of `TOO_MANY_PARTS` errors. [#55526](https://github.com/ClickHouse/ClickHouse/pull/55526) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Reduced memory consumption during loading of hierarchical dictionaries. [#55838](https://github.com/ClickHouse/ClickHouse/pull/55838) ([Nikita Taranov](https://github.com/nickitat)).
* All dictionaries support setting `dictionary_use_async_executor`. [#55839](https://github.com/ClickHouse/ClickHouse/pull/55839) ([vdimir](https://github.com/vdimir)).
* Prevent excesive memory usage when deserializing AggregateFunctionTopKGenericData. [#55947](https://github.com/ClickHouse/ClickHouse/pull/55947) ([Raúl Marín](https://github.com/Algunenano)).
* On a Keeper with lots of watches AsyncMetrics threads can consume 100% of CPU for noticable time in `DB::KeeperStorage::getSessionsWithWatchesCount()`. The fix is to avoid traversing heavy `watches` and `list_watches` sets. [#56054](https://github.com/ClickHouse/ClickHouse/pull/56054) ([Alexander Gololobov](https://github.com/davenger)).
* Add setting `optimize_trivial_approximate_count_query` to use `count()` approximation for storage EmbeddedRocksDB. Enable trivial count for StorageJoin. [#55806](https://github.com/ClickHouse/ClickHouse/pull/55806) ([Duc Canh Le](https://github.com/canhld94)).
#### Improvement
* Functions `toDayOfWeek()` (MySQL alias: `DAYOFWEEK()`), `toYearWeek()` (`YEARWEEK()`) and `toWeek()` (`WEEK()`) now supports `String` arguments. This makes its behavior consistent with MySQL's behavior. [#55589](https://github.com/ClickHouse/ClickHouse/pull/55589) ([Robert Schulze](https://github.com/rschu1ze)).
* Introduced setting `date_time_overflow_behavior` with possible values `ignore`, `throw`, `saturate` that controls the overflow behavior when converting from Date, Date32, DateTime64, Integer or Float to Date, Date32, DateTime or DateTime64. [#55696](https://github.com/ClickHouse/ClickHouse/pull/55696) ([Andrey Zvonov](https://github.com/zvonand)).
* Implement query parameters support for `ALTER TABLE ... ACTION PARTITION [ID] {parameter_name:ParameterType}`. Merges [#49516](https://github.com/ClickHouse/ClickHouse/issues/49516). Closes [#49449](https://github.com/ClickHouse/ClickHouse/issues/49449). [#55604](https://github.com/ClickHouse/ClickHouse/pull/55604) ([alesapin](https://github.com/alesapin)).
* Print processor ids in a prettier manner in EXPLAIN. [#48852](https://github.com/ClickHouse/ClickHouse/pull/48852) ([Vlad Seliverstov](https://github.com/behebot)).
* Creating a direct dictionary with a lifetime field will be rejected at create time (as the lifetime does not make sense for direct dictionaries). Fixes: [#27861](https://github.com/ClickHouse/ClickHouse/issues/27861). [#49043](https://github.com/ClickHouse/ClickHouse/pull/49043) ([Rory Crispin](https://github.com/RoryCrispin)).
* Allow parameters in queries with partitions like `ALTER TABLE t DROP PARTITION`. Closes [#49449](https://github.com/ClickHouse/ClickHouse/issues/49449). [#49516](https://github.com/ClickHouse/ClickHouse/pull/49516) ([Nikolay Degterinsky](https://github.com/evillique)).
* Add a new column `xid` for `system.zookeeper_connection`. [#50702](https://github.com/ClickHouse/ClickHouse/pull/50702) ([helifu](https://github.com/helifu)).
* Display the correct server settings in `system.server_settings` after configuration reload. [#53774](https://github.com/ClickHouse/ClickHouse/pull/53774) ([helifu](https://github.com/helifu)).
* Add support for mathematical minus `` character in queries, similar to `-`. [#54100](https://github.com/ClickHouse/ClickHouse/pull/54100) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add replica groups to the experimental `Replicated` database engine. Closes [#53620](https://github.com/ClickHouse/ClickHouse/issues/53620). [#54421](https://github.com/ClickHouse/ClickHouse/pull/54421) ([Nikolay Degterinsky](https://github.com/evillique)).
* It is better to retry retriable s3 errors than totally fail the query. Set bigger value to the s3_retry_attempts by default. [#54770](https://github.com/ClickHouse/ClickHouse/pull/54770) ([Sema Checherinda](https://github.com/CheSema)).
* Add load balancing mode `hostname_levenshtein_distance`. [#54826](https://github.com/ClickHouse/ClickHouse/pull/54826) ([JackyWoo](https://github.com/JackyWoo)).
* Improve hiding secrets in logs. [#55089](https://github.com/ClickHouse/ClickHouse/pull/55089) ([Vitaly Baranov](https://github.com/vitlibar)).
* For now the projection analysis will be performed only on top of query plan. The setting `query_plan_optimize_projection` became obsolete (it was enabled by default long time ago). [#55112](https://github.com/ClickHouse/ClickHouse/pull/55112) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* When function `untuple` is now called on a tuple with named elements and itself has an alias (e.g. `select untuple(tuple(1)::Tuple(element_alias Int)) AS untuple_alias`), then the result column name is now generated from the untuple alias and the tuple element alias (in the example: "untuple_alias.element_alias"). [#55123](https://github.com/ClickHouse/ClickHouse/pull/55123) ([garcher22](https://github.com/garcher22)).
* Added setting `describe_include_virtual_columns`, which allows to include virtual columns of table into result of `DESCRIBE` query. Added setting `describe_compact_output`. If it is set to `true`, `DESCRIBE` query returns only names and types of columns without extra information. [#55129](https://github.com/ClickHouse/ClickHouse/pull/55129) ([Anton Popov](https://github.com/CurtizJ)).
* Sometimes `OPTIMIZE` with `optimize_throw_if_noop=1` may fail with an error `unknown reason` while the real cause of it - different projections in different parts. This behavior is fixed. [#55130](https://github.com/ClickHouse/ClickHouse/pull/55130) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Allow to have several `MaterializedPostgreSQL` tables following the same Postgres table. By default this behaviour is not enabled (for compatibility, because it is a backward-incompatible change), but can be turned on with setting `materialized_postgresql_use_unique_replication_consumer_identifier`. Closes [#54918](https://github.com/ClickHouse/ClickHouse/issues/54918). [#55145](https://github.com/ClickHouse/ClickHouse/pull/55145) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Allow to parse negative `DateTime64` and `DateTime` with fractional part from short strings. [#55146](https://github.com/ClickHouse/ClickHouse/pull/55146) ([Andrey Zvonov](https://github.com/zvonand)).
* To improve compatibility with MySQL, 1. `information_schema.tables` now includes the new field `table_rows`, and 2. `information_schema.columns` now includes the new field `extra`. [#55215](https://github.com/ClickHouse/ClickHouse/pull/55215) ([Robert Schulze](https://github.com/rschu1ze)).
* Clickhouse-client won't show "0 rows in set" if it is zero and if exception was thrown. [#55240](https://github.com/ClickHouse/ClickHouse/pull/55240) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Support rename table without keyword `TABLE` like `RENAME db.t1 to db.t2`. [#55373](https://github.com/ClickHouse/ClickHouse/pull/55373) ([凌涛](https://github.com/lingtaolf)).
* Add `internal_replication` to `system.clusters`. [#55377](https://github.com/ClickHouse/ClickHouse/pull/55377) ([Konstantin Morozov](https://github.com/k-morozov)).
* Select remote proxy resolver based on request protocol, add proxy feature docs and remove `DB::ProxyConfiguration::Protocol::ANY`. [#55430](https://github.com/ClickHouse/ClickHouse/pull/55430) ([Arthur Passos](https://github.com/arthurpassos)).
* Avoid retrying keeper operations on INSERT after table shutdown. [#55519](https://github.com/ClickHouse/ClickHouse/pull/55519) ([Azat Khuzhin](https://github.com/azat)).
* `SHOW COLUMNS` now correctly reports type `FixedString` as `BLOB` if setting `use_mysql_types_in_show_columns` is on. Also added two new settings, `mysql_map_string_to_text_in_show_columns` and `mysql_map_fixed_string_to_text_in_show_columns` to switch the output for types `String` and `FixedString` as `TEXT` or `BLOB`. [#55617](https://github.com/ClickHouse/ClickHouse/pull/55617) ([Serge Klochkov](https://github.com/slvrtrn)).
* During ReplicatedMergeTree tables startup clickhouse server checks set of parts for unexpected parts (exists locally, but not in zookeeper). All unexpected parts move to detached directory and instead of them server tries to restore some ancestor (covered) parts. Now server tries to restore closest ancestors instead of random covered parts. [#55645](https://github.com/ClickHouse/ClickHouse/pull/55645) ([alesapin](https://github.com/alesapin)).
* The advanced dashboard now supports draggable charts on touch devices. This closes [#54206](https://github.com/ClickHouse/ClickHouse/issues/54206). [#55649](https://github.com/ClickHouse/ClickHouse/pull/55649) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Use the default query format if declared when outputting exception with `http_write_exception_in_output_format`. [#55739](https://github.com/ClickHouse/ClickHouse/pull/55739) ([Raúl Marín](https://github.com/Algunenano)).
* Provide a better message for common MATERIALIZED VIEW pitfalls. [#55826](https://github.com/ClickHouse/ClickHouse/pull/55826) ([Raúl Marín](https://github.com/Algunenano)).
* If you dropped the current database, you will still be able to run some queries in `clickhouse-local` and switch to another database. This makes the behavior consistent with `clickhouse-client`. This closes [#55834](https://github.com/ClickHouse/ClickHouse/issues/55834). [#55853](https://github.com/ClickHouse/ClickHouse/pull/55853) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Functions `(add|subtract)(Year|Quarter|Month|Week|Day|Hour|Minute|Second|Millisecond|Microsecond|Nanosecond)` now support string-encoded date arguments, e.g. `SELECT addDays('2023-10-22', 1)`. This increases compatibility with MySQL and is needed by Tableau Online. [#55869](https://github.com/ClickHouse/ClickHouse/pull/55869) ([Robert Schulze](https://github.com/rschu1ze)).
* The setting `apply_deleted_mask` when disabled allows to read rows that where marked as deleted by lightweight DELETE queries. This is useful for debugging. [#55952](https://github.com/ClickHouse/ClickHouse/pull/55952) ([Alexander Gololobov](https://github.com/davenger)).
* Allow skipping `null` values when serailizing Tuple to json objects, which makes it possible to keep compatiability with Spark's `to_json` function, which is also useful for gluten. [#55956](https://github.com/ClickHouse/ClickHouse/pull/55956) ([李扬](https://github.com/taiyang-li)).
* Functions `(add|sub)Date()` now support string-encoded date arguments, e.g. `SELECT addDate('2023-10-22 11:12:13', INTERVAL 5 MINUTE)`. The same support for string-encoded date arguments is added to the plus and minus operators, e.g. `SELECT '2023-10-23' + INTERVAL 1 DAY`. This increases compatibility with MySQL and is needed by Tableau Online. [#55960](https://github.com/ClickHouse/ClickHouse/pull/55960) ([Robert Schulze](https://github.com/rschu1ze)).
* Allow unquoted strings with CR (`\r`) in CSV format. Closes [#39930](https://github.com/ClickHouse/ClickHouse/issues/39930). [#56046](https://github.com/ClickHouse/ClickHouse/pull/56046) ([Kruglov Pavel](https://github.com/Avogar)).
* Allow to run `clickhouse-keeper` using embedded config. [#56086](https://github.com/ClickHouse/ClickHouse/pull/56086) ([Maksim Kita](https://github.com/kitaisreal)).
* Set limit of the maximum configuration value for `queued.min.messages` to avoid problem with start fetching data with Kafka. [#56121](https://github.com/ClickHouse/ClickHouse/pull/56121) ([Stas Morozov](https://github.com/r3b-fish)).
* Fixed a typo in SQL function `minSampleSizeContinous` (renamed `minSampleSizeContinuous`). Old name is preserved for backward compatibility. This closes: [#56139](https://github.com/ClickHouse/ClickHouse/issues/56139). [#56143](https://github.com/ClickHouse/ClickHouse/pull/56143) ([Dorota Szeremeta](https://github.com/orotaday)).
* Print path for broken parts on disk before shutting down the server. Before this change if a part is corrupted on disk and server cannot start, it was almost impossible to understand which part is broken. This is fixed. [#56181](https://github.com/ClickHouse/ClickHouse/pull/56181) ([Duc Canh Le](https://github.com/canhld94)).
#### Build/Testing/Packaging Improvement
* If the database in Docker is already initialized, it doesn't need to be initialized again upon subsequent launches. This can potentially fix the issue of infinite container restarts when the database fails to load within 1000 attempts (relevant for very large databases and multi-node setups). [#50724](https://github.com/ClickHouse/ClickHouse/pull/50724) ([Alexander Nikolaev](https://github.com/AlexNik)).
* Resource with source code including submodules is built in Darwin special build task. It may be used to build ClickHouse without checking out the submodules. [#51435](https://github.com/ClickHouse/ClickHouse/pull/51435) ([Ilya Yatsishin](https://github.com/qoega)).
* An error was occuring when building ClickHouse with the AVX series of instructions enabled globally (which isn't recommended). The reason is that snappy does not enable `SNAPPY_HAVE_X86_CRC32`. [#55049](https://github.com/ClickHouse/ClickHouse/pull/55049) ([monchickey](https://github.com/monchickey)).
* Solve issue with launching standalone `clickhouse-keeper` from `clickhouse-server` package. [#55226](https://github.com/ClickHouse/ClickHouse/pull/55226) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* In the tests, RabbitMQ version is updated to 3.12.6. Improved logs collection for RabbitMQ tests. [#55424](https://github.com/ClickHouse/ClickHouse/pull/55424) ([Ilya Yatsishin](https://github.com/qoega)).
* Modified the error message difference between openssl and boringssl to fix the functional test. [#55975](https://github.com/ClickHouse/ClickHouse/pull/55975) ([MeenaRenganathan22](https://github.com/MeenaRenganathan22)).
* Use upstream repo for apache datasketches. [#55787](https://github.com/ClickHouse/ClickHouse/pull/55787) ([Nikita Taranov](https://github.com/nickitat)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Skip hardlinking inverted index files in mutation [#47663](https://github.com/ClickHouse/ClickHouse/pull/47663) ([cangyin](https://github.com/cangyin)).
* Fixed bug of `match` function (regex) with pattern containing alternation produces incorrect key condition. Closes #53222. [#54696](https://github.com/ClickHouse/ClickHouse/pull/54696) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix 'Cannot find column' in read-in-order optimization with ARRAY JOIN [#51746](https://github.com/ClickHouse/ClickHouse/pull/51746) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Support missed experimental `Object(Nullable(json))` subcolumns in query. [#54052](https://github.com/ClickHouse/ClickHouse/pull/54052) ([zps](https://github.com/VanDarkholme7)).
* Re-add fix for `accurateCastOrNull()` [#54629](https://github.com/ClickHouse/ClickHouse/pull/54629) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Fix detecting `DEFAULT` for columns of a Distributed table created without AS [#55060](https://github.com/ClickHouse/ClickHouse/pull/55060) ([Vitaly Baranov](https://github.com/vitlibar)).
* Proper cleanup in case of exception in ctor of ShellCommandSource [#55103](https://github.com/ClickHouse/ClickHouse/pull/55103) ([Alexander Gololobov](https://github.com/davenger)).
* Fix deadlock in LDAP assigned role update [#55119](https://github.com/ClickHouse/ClickHouse/pull/55119) ([Julian Maicher](https://github.com/jmaicher)).
* Suppress error statistics update for internal exceptions [#55128](https://github.com/ClickHouse/ClickHouse/pull/55128) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix deadlock in backups [#55132](https://github.com/ClickHouse/ClickHouse/pull/55132) ([alesapin](https://github.com/alesapin)).
* Fix storage Iceberg files retrieval [#55144](https://github.com/ClickHouse/ClickHouse/pull/55144) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix partition pruning of extra columns in set. [#55172](https://github.com/ClickHouse/ClickHouse/pull/55172) ([Amos Bird](https://github.com/amosbird)).
* Fix recalculation of skip indexes in ALTER UPDATE queries when table has adaptive granularity [#55202](https://github.com/ClickHouse/ClickHouse/pull/55202) ([Duc Canh Le](https://github.com/canhld94)).
* Fix for background download in fs cache [#55252](https://github.com/ClickHouse/ClickHouse/pull/55252) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Avoid possible memory leaks in compressors in case of missing buffer finalization [#55262](https://github.com/ClickHouse/ClickHouse/pull/55262) ([Azat Khuzhin](https://github.com/azat)).
* Fix functions execution over sparse columns [#55275](https://github.com/ClickHouse/ClickHouse/pull/55275) ([Azat Khuzhin](https://github.com/azat)).
* Fix incorrect merging of Nested for SELECT FINAL FROM SummingMergeTree [#55276](https://github.com/ClickHouse/ClickHouse/pull/55276) ([Azat Khuzhin](https://github.com/azat)).
* Fix bug with inability to drop detached partition in replicated merge tree on top of S3 without zero copy [#55309](https://github.com/ClickHouse/ClickHouse/pull/55309) ([alesapin](https://github.com/alesapin)).
* Fix a crash in MergeSortingPartialResultTransform (due to zero chunks after `remerge`) [#55335](https://github.com/ClickHouse/ClickHouse/pull/55335) ([Azat Khuzhin](https://github.com/azat)).
* Fix data-race in CreatingSetsTransform (on errors) due to throwing shared exception [#55338](https://github.com/ClickHouse/ClickHouse/pull/55338) ([Azat Khuzhin](https://github.com/azat)).
* Fix trash optimization (up to a certain extent) [#55353](https://github.com/ClickHouse/ClickHouse/pull/55353) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix leak in StorageHDFS [#55370](https://github.com/ClickHouse/ClickHouse/pull/55370) ([Azat Khuzhin](https://github.com/azat)).
* Fix parsing of arrays in cast operator [#55417](https://github.com/ClickHouse/ClickHouse/pull/55417) ([Anton Popov](https://github.com/CurtizJ)).
* Fix filtering by virtual columns with OR filter in query [#55418](https://github.com/ClickHouse/ClickHouse/pull/55418) ([Azat Khuzhin](https://github.com/azat)).
* Fix MongoDB connection issues [#55419](https://github.com/ClickHouse/ClickHouse/pull/55419) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix MySQL interface boolean representation [#55427](https://github.com/ClickHouse/ClickHouse/pull/55427) ([Serge Klochkov](https://github.com/slvrtrn)).
* Fix MySQL text protocol DateTime formatting and LowCardinality(Nullable(T)) types reporting [#55479](https://github.com/ClickHouse/ClickHouse/pull/55479) ([Serge Klochkov](https://github.com/slvrtrn)).
* Make `use_mysql_types_in_show_columns` affect only `SHOW COLUMNS` [#55481](https://github.com/ClickHouse/ClickHouse/pull/55481) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix stack symbolizer parsing `DW_FORM_ref_addr` incorrectly and sometimes crashing [#55483](https://github.com/ClickHouse/ClickHouse/pull/55483) ([Michael Kolupaev](https://github.com/al13n321)).
* Destroy fiber in case of exception in cancelBefore in AsyncTaskExecutor [#55516](https://github.com/ClickHouse/ClickHouse/pull/55516) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix Query Parameters not working with custom HTTP handlers [#55521](https://github.com/ClickHouse/ClickHouse/pull/55521) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Fix checking of non handled data for Values format [#55527](https://github.com/ClickHouse/ClickHouse/pull/55527) ([Azat Khuzhin](https://github.com/azat)).
* Fix 'Invalid cursor state' in odbc interacting with MS SQL Server [#55558](https://github.com/ClickHouse/ClickHouse/pull/55558) ([vdimir](https://github.com/vdimir)).
* Fix max execution time and 'break' overflow mode [#55577](https://github.com/ClickHouse/ClickHouse/pull/55577) ([Alexander Gololobov](https://github.com/davenger)).
* Fix crash in QueryNormalizer with cyclic aliases [#55602](https://github.com/ClickHouse/ClickHouse/pull/55602) ([vdimir](https://github.com/vdimir)).
* Disable wrong optimization and add a test [#55609](https://github.com/ClickHouse/ClickHouse/pull/55609) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Merging [#52352](https://github.com/ClickHouse/ClickHouse/issues/52352) [#55621](https://github.com/ClickHouse/ClickHouse/pull/55621) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add a test to avoid incorrect decimal sorting [#55662](https://github.com/ClickHouse/ClickHouse/pull/55662) ([Amos Bird](https://github.com/amosbird)).
* Fix progress bar for s3 and azure Cluster functions with url without globs [#55666](https://github.com/ClickHouse/ClickHouse/pull/55666) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix filtering by virtual columns with OR filter in query (resubmit) [#55678](https://github.com/ClickHouse/ClickHouse/pull/55678) ([Azat Khuzhin](https://github.com/azat)).
* Fixes and improvements for Iceberg storage [#55695](https://github.com/ClickHouse/ClickHouse/pull/55695) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix data race in CreatingSetsTransform (v2) [#55786](https://github.com/ClickHouse/ClickHouse/pull/55786) ([Azat Khuzhin](https://github.com/azat)).
* Throw exception when parsing illegal string as float if precise_float_parsing is true [#55861](https://github.com/ClickHouse/ClickHouse/pull/55861) ([李扬](https://github.com/taiyang-li)).
* Disable predicate pushdown if the CTE contains stateful functions [#55871](https://github.com/ClickHouse/ClickHouse/pull/55871) ([Raúl Marín](https://github.com/Algunenano)).
* Fix normalize ASTSelectWithUnionQuery, as it was stripping `FORMAT` from the query [#55887](https://github.com/ClickHouse/ClickHouse/pull/55887) ([flynn](https://github.com/ucasfl)).
* Try to fix possible segfault in Native ORC input format [#55891](https://github.com/ClickHouse/ClickHouse/pull/55891) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix window functions in case of sparse columns. [#55895](https://github.com/ClickHouse/ClickHouse/pull/55895) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* fix: StorageNull supports subcolumns [#55912](https://github.com/ClickHouse/ClickHouse/pull/55912) ([FFish](https://github.com/wxybear)).
* Do not write retriable errors for Replicated mutate/merge into error log [#55944](https://github.com/ClickHouse/ClickHouse/pull/55944) ([Azat Khuzhin](https://github.com/azat)).
* Fix `SHOW DATABASES LIMIT <N>` [#55962](https://github.com/ClickHouse/ClickHouse/pull/55962) ([Raúl Marín](https://github.com/Algunenano)).
* Fix autogenerated Protobuf schema with fields with underscore [#55974](https://github.com/ClickHouse/ClickHouse/pull/55974) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix dateTime64ToSnowflake64() with non-default scale [#55983](https://github.com/ClickHouse/ClickHouse/pull/55983) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix output/input of Arrow dictionary column [#55989](https://github.com/ClickHouse/ClickHouse/pull/55989) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix fetching schema from schema registry in AvroConfluent [#55991](https://github.com/ClickHouse/ClickHouse/pull/55991) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix 'Block structure mismatch' on concurrent ALTER and INSERTs in Buffer table [#55995](https://github.com/ClickHouse/ClickHouse/pull/55995) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix incorrect free space accounting for least_used JBOD policy [#56030](https://github.com/ClickHouse/ClickHouse/pull/56030) ([Azat Khuzhin](https://github.com/azat)).
* Fix missing scalar issue when evaluating subqueries inside table functions [#56057](https://github.com/ClickHouse/ClickHouse/pull/56057) ([Amos Bird](https://github.com/amosbird)).
* Fix wrong query result when http_write_exception_in_output_format=1 [#56135](https://github.com/ClickHouse/ClickHouse/pull/56135) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix schema cache for fallback JSON->JSONEachRow with changed settings [#56172](https://github.com/ClickHouse/ClickHouse/pull/56172) ([Kruglov Pavel](https://github.com/Avogar)).
* Add error handler to odbc-bridge [#56185](https://github.com/ClickHouse/ClickHouse/pull/56185) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
### ClickHouse release 23.9, 2023-09-28
#### Backward Incompatible Change

View File

@ -313,9 +313,9 @@ set (DEBUG_INFO_FLAGS "-g -gdwarf-4")
option(DISABLE_OMIT_FRAME_POINTER "Disable omit frame pointer compiler optimization" OFF)
if (DISABLE_OMIT_FRAME_POINTER)
set (CMAKE_CXX_FLAGS_ADD "${CMAKE_CXX_FLAGS_ADD} -fno-omit-frame-pointer")
set (CMAKE_C_FLAGS_ADD "${CMAKE_C_FLAGS_ADD} -fno-omit-frame-pointer")
set (CMAKE_ASM_FLAGS_ADD "${CMAKE_ASM_FLAGS_ADD} -fno-omit-frame-pointer")
set (CMAKE_CXX_FLAGS_ADD "${CMAKE_CXX_FLAGS_ADD} -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer")
set (CMAKE_C_FLAGS_ADD "${CMAKE_C_FLAGS_ADD} -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer")
set (CMAKE_ASM_FLAGS_ADD "${CMAKE_ASM_FLAGS_ADD} -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer")
endif()
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${COMPILER_FLAGS} ${CMAKE_CXX_FLAGS_ADD}")

View File

@ -22,8 +22,6 @@ curl https://clickhouse.com/ | sh
## Upcoming Events
* [**v23.9 Community Call**]([https://clickhouse.com/company/events/v23-8-community-release-call](https://clickhouse.com/company/events/v23-9-community-release-call)?utm_source=github&utm_medium=social&utm_campaign=release-webinar-2023-08) - Sep 28 - 23.9 is rapidly approaching. Original creator, co-founder, and CTO of ClickHouse Alexey Milovidov will walk us through the highlights of the release.
* [**ClickHouse Meetup in Amsterdam**](https://www.meetup.com/clickhouse-netherlands-user-group/events/296334590/) - Oct 31
* [**ClickHouse Meetup in Beijing**](https://www.meetup.com/clickhouse-beijing-user-group/events/296334856/) - Nov 4
* [**ClickHouse Meetup in San Francisco**](https://www.meetup.com/clickhouse-silicon-valley-meetup-group/events/296334923/) - Nov 8
* [**ClickHouse Meetup in Singapore**](https://www.meetup.com/clickhouse-singapore-meetup-group/events/296334976/) - Nov 15
@ -45,4 +43,4 @@ We are a globally diverse and distributed team, united behind a common goal of c
Check out our **current openings** here: https://clickhouse.com/company/careers
Cant find what you are looking for, but want to let us know you are interested in joining ClickHouse? Email careers@clickhouse.com!
Can't find what you are looking for, but want to let us know you are interested in joining ClickHouse? Email careers@clickhouse.com!

View File

@ -13,9 +13,10 @@ The following versions of ClickHouse server are currently being supported with s
| Version | Supported |
|:-|:-|
| 23.10 | ✔️ |
| 23.9 | ✔️ |
| 23.8 | ✔️ |
| 23.7 | ✔️ |
| 23.7 | |
| 23.6 | ❌ |
| 23.5 | ❌ |
| 23.4 | ❌ |

View File

@ -47,6 +47,10 @@ else ()
target_compile_definitions(common PUBLIC WITH_COVERAGE=0)
endif ()
if (TARGET ch_contrib::crc32_s390x)
target_link_libraries(common PUBLIC ch_contrib::crc32_s390x)
endif()
target_include_directories(common PUBLIC .. "${CMAKE_CURRENT_BINARY_DIR}/..")
target_link_libraries (common

View File

@ -35,6 +35,10 @@
#pragma clang diagnostic ignored "-Wreserved-identifier"
#endif
#if defined(__s390x__)
#include <base/crc32c_s390x.h>
#define CRC_INT s390x_crc32c
#endif
/**
* The std::string_view-like container to avoid creating strings to find substrings in the hash table.
@ -264,8 +268,8 @@ inline size_t hashLessThan8(const char * data, size_t size)
if (size >= 4)
{
UInt64 a = unalignedLoad<uint32_t>(data);
return hashLen16(size + (a << 3), unalignedLoad<uint32_t>(data + size - 4));
UInt64 a = unalignedLoadLittleEndian<uint32_t>(data);
return hashLen16(size + (a << 3), unalignedLoadLittleEndian<uint32_t>(data + size - 4));
}
if (size > 0)
@ -285,8 +289,8 @@ inline size_t hashLessThan16(const char * data, size_t size)
{
if (size > 8)
{
UInt64 a = unalignedLoad<UInt64>(data);
UInt64 b = unalignedLoad<UInt64>(data + size - 8);
UInt64 a = unalignedLoadLittleEndian<UInt64>(data);
UInt64 b = unalignedLoadLittleEndian<UInt64>(data + size - 8);
return hashLen16(a, rotateByAtLeast1(b + size, static_cast<UInt8>(size))) ^ b;
}
@ -315,13 +319,13 @@ struct CRC32Hash
do
{
UInt64 word = unalignedLoad<UInt64>(pos);
UInt64 word = unalignedLoadLittleEndian<UInt64>(pos);
res = static_cast<unsigned>(CRC_INT(res, word));
pos += 8;
} while (pos + 8 < end);
UInt64 word = unalignedLoad<UInt64>(end - 8); /// I'm not sure if this is normal.
UInt64 word = unalignedLoadLittleEndian<UInt64>(end - 8); /// I'm not sure if this is normal.
res = static_cast<unsigned>(CRC_INT(res, word));
return res;

26
base/base/crc32c_s390x.h Normal file
View File

@ -0,0 +1,26 @@
#pragma once
#include <crc32-s390x.h>
inline uint32_t s390x_crc32c_u8(uint32_t crc, uint8_t v)
{
return crc32c_le_vx(crc, reinterpret_cast<unsigned char *>(&v), sizeof(v));
}
inline uint32_t s390x_crc32c_u16(uint32_t crc, uint16_t v)
{
v = __builtin_bswap16(v);
return crc32c_le_vx(crc, reinterpret_cast<unsigned char *>(&v), sizeof(v));
}
inline uint32_t s390x_crc32c_u32(uint32_t crc, uint32_t v)
{
v = __builtin_bswap32(v);
return crc32c_le_vx(crc, reinterpret_cast<unsigned char *>(&v), sizeof(v));
}
inline uint64_t s390x_crc32c(uint64_t crc, uint64_t v)
{
v = __builtin_bswap64(v);
return crc32c_le_vx(static_cast<uint32_t>(crc), reinterpret_cast<unsigned char *>(&v), sizeof(uint64_t));
}

View File

@ -68,7 +68,7 @@ namespace Net
struct ProxyConfig
/// HTTP proxy server configuration.
{
ProxyConfig() : port(HTTP_PORT), protocol("http"), tunnel(true) { }
ProxyConfig() : port(HTTP_PORT), protocol("http"), tunnel(true), originalRequestProtocol("http") { }
std::string host;
/// Proxy server host name or IP address.
@ -87,6 +87,9 @@ namespace Net
/// A regular expression defining hosts for which the proxy should be bypassed,
/// e.g. "localhost|127\.0\.0\.1|192\.168\.0\.\d+". Can also be an empty
/// string to disable proxy bypassing.
std::string originalRequestProtocol;
/// Original request protocol (http or https).
/// Required in the case of: HTTPS request over HTTP proxy with tunneling (CONNECT) off.
};
HTTPClientSession();

View File

@ -418,7 +418,7 @@ void HTTPClientSession::reconnect()
std::string HTTPClientSession::proxyRequestPrefix() const
{
std::string result("http://");
std::string result(_proxyConfig.originalRequestProtocol + "://");
result.append(_host);
/// Do not append default by default, since this may break some servers.
/// One example of such server is GCS (Google Cloud Storage).

View File

@ -2,11 +2,11 @@
# NOTE: has nothing common with DBMS_TCP_PROTOCOL_VERSION,
# only DBMS_TCP_PROTOCOL_VERSION should be incremented on protocol changes.
SET(VERSION_REVISION 54479)
SET(VERSION_REVISION 54480)
SET(VERSION_MAJOR 23)
SET(VERSION_MINOR 10)
SET(VERSION_MINOR 11)
SET(VERSION_PATCH 1)
SET(VERSION_GITHASH 8f9a227de1f530cdbda52c145d41a6b0f1d29961)
SET(VERSION_DESCRIBE v23.10.1.1-testing)
SET(VERSION_STRING 23.10.1.1)
SET(VERSION_GITHASH 13adae0e42fd48de600486fc5d4b64d39f80c43e)
SET(VERSION_DESCRIBE v23.11.1.1-testing)
SET(VERSION_STRING 23.11.1.1)
# end of autochange

View File

@ -3,4 +3,4 @@ It allows to integrate JEMalloc into CMake project.
- Remove JEMALLOC_HAVE_ATTR_FORMAT_GNU_PRINTF because it's non standard.
- Added JEMALLOC_CONFIG_MALLOC_CONF substitution
- Add musl support (USE_MUSL)
- Also note, that darwin build requires JEMALLOC_PREFIX, while others don not
- Also note, that darwin build requires JEMALLOC_PREFIX, while others do not

2
contrib/libhdfs3 vendored

@ -1 +1 @@
Subproject commit 377220ef351ae24994a5fcd2b5fa3930d00c4db0
Subproject commit bdcb91354b1c05b21e73043a112a6f1e3b013497

View File

@ -1,4 +1,4 @@
if(NOT OS_FREEBSD AND NOT APPLE AND NOT ARCH_PPC64LE AND NOT ARCH_S390X)
if(NOT OS_FREEBSD AND NOT APPLE AND NOT ARCH_PPC64LE)
option(ENABLE_HDFS "Enable HDFS" ${ENABLE_LIBRARIES})
elseif(ENABLE_HDFS)
message (${RECONFIGURE_MESSAGE_LEVEL} "Cannot use HDFS3 with current configuration")

2
contrib/orc vendored

@ -1 +1 @@
Subproject commit f31c271110a2f0dac908a152f11708193ae209ee
Subproject commit e24f2c2a3ca0769c96704ab20ad6f512a83ea2ad

View File

@ -34,7 +34,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="23.9.2.56"
ARG VERSION="23.10.1.1976"
ARG PACKAGES="clickhouse-keeper"
# user/group precreated explicitly with fixed uid/gid on purpose.

View File

@ -32,7 +32,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="23.9.2.56"
ARG VERSION="23.10.1.1976"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
# user/group precreated explicitly with fixed uid/gid on purpose.

View File

@ -30,7 +30,7 @@ RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb ${REPO_CHANNEL} main"
ARG VERSION="23.9.2.56"
ARG VERSION="23.10.1.1976"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
# set non-empty deb_location_url url to create a docker image

View File

@ -19,7 +19,7 @@ For more information and documentation see https://clickhouse.com/.
### Compatibility
- The amd64 image requires support for [SSE3 instructions](https://en.wikipedia.org/wiki/SSE3). Virtually all x86 CPUs after 2005 support SSE3.
- The arm64 image requires support for the [ARMv8.2-A architecture](https://en.wikipedia.org/wiki/AArch64#ARMv8.2-A). Most ARM CPUs after 2017 support ARMv8.2-A. A notable exception is Raspberry Pi 4 from 2019 whose CPU only supports ARMv8.0-A.
- The arm64 image requires support for the [ARMv8.2-A architecture](https://en.wikipedia.org/wiki/AArch64#ARMv8.2-A) and additionally the Load-Acquire RCpc register. The register is optional in version ARMv8.2-A and mandatory in [ARMv8.3-A](https://en.wikipedia.org/wiki/AArch64#ARMv8.3-A). Supported in Graviton >=2, Azure and GCP instances. Examples for unsupported devices are Raspberry Pi 4 (ARMv8.0-A) and Jetson AGX Xavier/Orin (ARMv8.2-A).
## How to use this image

View File

@ -72,6 +72,13 @@ function configure()
sudo chown clickhouse /etc/clickhouse-server/config.d/keeper_port.xml
sudo chgrp clickhouse /etc/clickhouse-server/config.d/keeper_port.xml
#Randomize merge tree setting allow_experimental_block_number_column
value=$(($RANDOM % 2))
sudo cat /etc/clickhouse-server/config.d/merge_tree_settings.xml \
| sed "s|<allow_experimental_block_number_column>[01]</allow_experimental_block_number_column>|<allow_experimental_block_number_column>$value</allow_experimental_block_number_column>|" \
> /etc/clickhouse-server/config.d/merge_tree_settings.xml.tmp
sudo mv /etc/clickhouse-server/config.d/merge_tree_settings.xml.tmp /etc/clickhouse-server/config.d/merge_tree_settings.xml
# for clickhouse-server (via service)
echo "ASAN_OPTIONS='malloc_context_size=10 verbosity=1 allocator_release_to_os_interval_ms=10000'" >> /etc/environment
# for clickhouse-client
@ -177,6 +184,9 @@ function stop()
echo "thread apply all backtrace (on stop)" >> /test_output/gdb.log
timeout 30m gdb -batch -ex 'thread apply all backtrace' -p "$pid" | ts '%Y-%m-%d %H:%M:%S' >> /test_output/gdb.log
clickhouse stop --force
else
echo -e "Warning: server did not stop yet$OK" >> /test_output/test_results.tsv
clickhouse stop --force
fi
}

View File

@ -2,5 +2,4 @@
set -x
service zookeeper start && sleep 7 && /usr/share/zookeeper/bin/zkCli.sh -server localhost:2181 -create create /clickhouse_test '';
timeout 40m gdb -q -ex 'set print inferior-events off' -ex 'set confirm off' -ex 'set print thread-events off' -ex run -ex bt -ex quit --args ./unit_tests_dbms --gtest_output='json:test_output/test_result.json' | tee test_output/test_result.txt

View File

@ -189,6 +189,7 @@ rg -Fav -e "Code: 236. DB::Exception: Cancelled merging parts" \
-e "ZooKeeperClient" \
-e "KEEPER_EXCEPTION" \
-e "DirectoryMonitor" \
-e "DistributedInsertQueue" \
-e "TABLE_IS_READ_ONLY" \
-e "Code: 1000, e.code() = 111, Connection refused" \
-e "UNFINISHED" \

View File

@ -75,7 +75,7 @@ sidebar_label: 2022
* Fix usage of nested columns with non-array columns with the same prefix [2] [#28762](https://github.com/ClickHouse/ClickHouse/pull/28762) ([Anton Popov](https://github.com/CurtizJ)).
* Lower compiled_expression_cache_size to 128MB [#28816](https://github.com/ClickHouse/ClickHouse/pull/28816) ([Maksim Kita](https://github.com/kitaisreal)).
* Column default dictGet identifier fix [#28863](https://github.com/ClickHouse/ClickHouse/pull/28863) ([Maksim Kita](https://github.com/kitaisreal)).
* Don not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Do not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Merging [#27963](https://github.com/ClickHouse/ClickHouse/issues/27963) [#29063](https://github.com/ClickHouse/ClickHouse/pull/29063) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix terminate on uncaught exception [#29216](https://github.com/ClickHouse/ClickHouse/pull/29216) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix arcadia (pre)stable 21.10 build [#29250](https://github.com/ClickHouse/ClickHouse/pull/29250) ([DimasKovas](https://github.com/DimasKovas)).

View File

@ -29,7 +29,7 @@ sidebar_label: 2022
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Don not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Do not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Merging [#27963](https://github.com/ClickHouse/ClickHouse/issues/27963) [#29063](https://github.com/ClickHouse/ClickHouse/pull/29063) ([Maksim Kita](https://github.com/kitaisreal)).
* May be fix s3 tests [#29762](https://github.com/ClickHouse/ClickHouse/pull/29762) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix ca-bundle.crt in kerberized_hadoop/Dockerfile [#30358](https://github.com/ClickHouse/ClickHouse/pull/30358) ([Vladimir C](https://github.com/vdimir)).

View File

@ -14,6 +14,6 @@ sidebar_label: 2022
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Don not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Do not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Merging [#27963](https://github.com/ClickHouse/ClickHouse/issues/27963) [#29063](https://github.com/ClickHouse/ClickHouse/pull/29063) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix terminate on uncaught exception [#29216](https://github.com/ClickHouse/ClickHouse/pull/29216) ([Alexander Tokmakov](https://github.com/tavplubix)).

View File

@ -15,6 +15,6 @@ sidebar_label: 2022
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Don not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Do not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Merging [#27963](https://github.com/ClickHouse/ClickHouse/issues/27963) [#29063](https://github.com/ClickHouse/ClickHouse/pull/29063) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix terminate on uncaught exception [#29216](https://github.com/ClickHouse/ClickHouse/pull/29216) ([Alexander Tokmakov](https://github.com/tavplubix)).

View File

@ -13,6 +13,6 @@ sidebar_label: 2022
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Don not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Do not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Merging [#27963](https://github.com/ClickHouse/ClickHouse/issues/27963) [#29063](https://github.com/ClickHouse/ClickHouse/pull/29063) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix terminate on uncaught exception [#29216](https://github.com/ClickHouse/ClickHouse/pull/29216) ([Alexander Tokmakov](https://github.com/tavplubix)).

View File

@ -0,0 +1,406 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.10.1.1976-stable (13adae0e42f) FIXME as compared to v23.9.1.1854-stable (8f9a227de1f)
#### Backward Incompatible Change
* Rewrited storage S3Queue completely: changed the way we keep information in zookeeper which allows to make less zookeeper requests, added caching of zookeeper state in cases when we know the state will not change, improved the polling from s3 process to make it less aggressive, changed the way ttl and max set for trached files is maintained, now it is a background process. Added `system.s3queue` and `system.s3queue_log` tables. Closes [#54998](https://github.com/ClickHouse/ClickHouse/issues/54998). [#54422](https://github.com/ClickHouse/ClickHouse/pull/54422) ([Kseniia Sumarokova](https://github.com/kssenii)).
* There is no longer an option to automatically remove broken data parts. This closes [#55174](https://github.com/ClickHouse/ClickHouse/issues/55174). [#55184](https://github.com/ClickHouse/ClickHouse/pull/55184) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* The obsolete in-memory data parts can no longer be read from the write-ahead log. If you have configured in-memory parts before, they have to be removed before the upgrade. [#55186](https://github.com/ClickHouse/ClickHouse/pull/55186) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove the integration with Meilisearch. Reason: it was compatible only with the old version 0.18. The recent version of Meilisearch changed the protocol and does not work anymore. Note: we would appreciate it if you help to return it back. [#55189](https://github.com/ClickHouse/ClickHouse/pull/55189) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Rename directory monitor concept into background INSERT. All settings `*directory_monitor*` had been renamed to `distributed_background_insert*`. **Backward compatibility should be preserved** (since old settings had been added as an alias). [#55978](https://github.com/ClickHouse/ClickHouse/pull/55978) ([Azat Khuzhin](https://github.com/azat)).
* Do not mix-up send_timeout and receive_timeout. [#56035](https://github.com/ClickHouse/ClickHouse/pull/56035) ([Azat Khuzhin](https://github.com/azat)).
* Comparison of time intervals with different units will throw an exception. This closes [#55942](https://github.com/ClickHouse/ClickHouse/issues/55942). You might have occasionally rely on the previous behavior when the underlying numeric values were compared regardless of the units. [#56090](https://github.com/ClickHouse/ClickHouse/pull/56090) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### New Feature
* Add function "arrayFold(x1, ..., xn, accum -> expression, array1, ..., arrayn, init_accum)" which applies a lambda function to multiple arrays of the same cardinality and collects the result in an accumulator. [#49794](https://github.com/ClickHouse/ClickHouse/pull/49794) ([Lirikl](https://github.com/Lirikl)).
* Added aggregation function lttb which uses the [Largest-Triangle-Three-Buckets](https://skemman.is/bitstream/1946/15343/3/SS_MSthesis.pdf) algorithm for downsampling data for visualization. [#53145](https://github.com/ClickHouse/ClickHouse/pull/53145) ([Sinan](https://github.com/sinsinan)).
* Query`CHECK TABLE` has better performance and usability (sends progress updates, cancellable). Support checking particular part with `CHECK TABLE ... PART 'part_name'`. [#53404](https://github.com/ClickHouse/ClickHouse/pull/53404) ([vdimir](https://github.com/vdimir)).
* Added function `jsonMergePatch`. When working with JSON data as strings, it provides a way to merge these strings (of JSON objects) together to form a single string containing a single JSON object. [#54364](https://github.com/ClickHouse/ClickHouse/pull/54364) ([Memo](https://github.com/Joeywzr)).
* Added a new SQL function, "arrayRandomSample(arr, k)" which returns a sample of k elements from the input array. Similar functionality could previously be achieved only with less convenient syntax, e.g. "SELECT arrayReduce('groupArraySample(3)', range(10))". [#54391](https://github.com/ClickHouse/ClickHouse/pull/54391) ([itayisraelov](https://github.com/itayisraelov)).
* Added new function `getHttpHeader` to get HTTP request header value used for a request to ClickHouse server. Return empty string if the request is not done over HTTP protocol or there is no such header. [#54813](https://github.com/ClickHouse/ClickHouse/pull/54813) ([凌涛](https://github.com/lingtaolf)).
* Introduce -ArgMin/-ArgMax aggregate combinators which allow to aggregate by min/max values only. One use case can be found in [#54818](https://github.com/ClickHouse/ClickHouse/issues/54818). This PR also reorganize combinators into dedicated folder. [#54947](https://github.com/ClickHouse/ClickHouse/pull/54947) ([Amos Bird](https://github.com/amosbird)).
* Allow to drop cache for Protobuf format with `SYSTEM DROP SCHEMA FORMAT CACHE [FOR Protobuf]`. [#55064](https://github.com/ClickHouse/ClickHouse/pull/55064) ([Aleksandr Musorin](https://github.com/AVMusorin)).
* Add external HTTP Basic authenticator. [#55199](https://github.com/ClickHouse/ClickHouse/pull/55199) ([Aleksei Filatov](https://github.com/aalexfvk)).
* Added function `byteSwap` which reverses the bytes of unsigned integers. This is particularly useful for reversing values of types which are represented as unsigned integers internally such as IPv4. [#55211](https://github.com/ClickHouse/ClickHouse/pull/55211) ([Priyansh Agrawal](https://github.com/Priyansh121096)).
* Added function `formatQuery()` which returns a formatted version (possibly spanning multiple lines) of a SQL query string. Also added function `formatQuerySingleLine()` which does the same but the returned string will not contain linebreaks. [#55239](https://github.com/ClickHouse/ClickHouse/pull/55239) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Added DWARF input format that reads debug symbols from an ELF executable/library/object file. [#55450](https://github.com/ClickHouse/ClickHouse/pull/55450) ([Michael Kolupaev](https://github.com/al13n321)).
* Allow to save unparsed records and errors in RabbitMQ, NATS and FileLog engines. Add virtual columns `_error` and `_raw_message`(for NATS and RabbitMQ), `_raw_record` (for FileLog) that are filled when ClickHouse fails to parse new record. The behaviour is controlled under storage settings `nats_handle_error_mode` for NATS, `rabbitmq_handle_error_mode` for RabbitMQ, `handle_error_mode` for FileLog similar to `kafka_handle_error_mode`. If it's set to `default`, en exception will be thrown when ClickHouse fails to parse a record, if it's set to `stream`, erorr and raw record will be saved into virtual columns. Closes [#36035](https://github.com/ClickHouse/ClickHouse/issues/36035). [#55477](https://github.com/ClickHouse/ClickHouse/pull/55477) ([Kruglov Pavel](https://github.com/Avogar)).
* Keeper client improvement: add get_all_children_number command that returns number of all children nodes under a specific path. [#55485](https://github.com/ClickHouse/ClickHouse/pull/55485) ([guoxiaolong](https://github.com/guoxiaolongzte)).
* If a table has a space-filling curve in its key, e.g., `ORDER BY mortonEncode(x, y)`, the conditions on its arguments, e.g., `x >= 10 AND x <= 20 AND y >= 20 AND y <= 30` can be used for indexing. A setting `analyze_index_with_space_filling_curves` is added to enable or disable this analysis. This closes [#41195](https://github.com/ClickHouse/ClickHouse/issues/41195). Continuation of [#4538](https://github.com/ClickHouse/ClickHouse/issues/4538). Continuation of [#6286](https://github.com/ClickHouse/ClickHouse/issues/6286). Continuation of [#28130](https://github.com/ClickHouse/ClickHouse/issues/28130). Continuation of [#41753](https://github.com/ClickHouse/ClickHouse/issues/41753). [#55642](https://github.com/ClickHouse/ClickHouse/pull/55642) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add setting `optimize_trivial_approximate_count_query` to use `count()` approximation for storage EmbeddedRocksDB. Enable trivial count for StorageJoin. [#55806](https://github.com/ClickHouse/ClickHouse/pull/55806) ([Duc Canh Le](https://github.com/canhld94)).
* Keeper client improvement: add get_direct_children_number command that returns number of direct children nodes under a path. [#55898](https://github.com/ClickHouse/ClickHouse/pull/55898) ([xuzifu666](https://github.com/xuzifu666)).
* Add statement `SHOW SETTING setting_name` which is a simpler version of existing statement `SHOW SETTINGS`. [#55979](https://github.com/ClickHouse/ClickHouse/pull/55979) ([Maksim Kita](https://github.com/kitaisreal)).
* This pr gives possibility to pass data in Npy format to Clickhouse. ``` SELECT * FROM file('example_array.npy', Npy). [#55982](https://github.com/ClickHouse/ClickHouse/pull/55982) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* This PR impleents a new setting called `force_optimize_projection_name`, it takes a name of projection as an argument. If it's value set to a non-empty string, ClickHouse checks that this projection is used in the query at least once. Closes [#55331](https://github.com/ClickHouse/ClickHouse/issues/55331). [#56134](https://github.com/ClickHouse/ClickHouse/pull/56134) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
#### Performance Improvement
* Add option `query_plan_preserve_num_streams_after_window_functions` to preserve the number of streams after evaluating window functions to allow parallel stream processing. [#50771](https://github.com/ClickHouse/ClickHouse/pull/50771) ([frinkr](https://github.com/frinkr)).
* Release more num_streams if data is small. [#53867](https://github.com/ClickHouse/ClickHouse/pull/53867) ([Jiebin Sun](https://github.com/jiebinn)).
* RoaringBitmaps being optimized before serialization. [#55044](https://github.com/ClickHouse/ClickHouse/pull/55044) ([UnamedRus](https://github.com/UnamedRus)).
* Posting lists in inverted indexes are now optimized to use the smallest possible representation for internal bitmaps. Depending on the repetitiveness of the data, this may significantly reduce the space consumption of inverted indexes. [#55069](https://github.com/ClickHouse/ClickHouse/pull/55069) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Fix contention on Context lock, this significantly improves performance for a lot of short-running concurrent queries. [#55121](https://github.com/ClickHouse/ClickHouse/pull/55121) ([Maksim Kita](https://github.com/kitaisreal)).
* Improved the performance of inverted index creation by 30%. This was achieved by replacing `std::unordered_map` with `absl::flat_hash_map`. [#55210](https://github.com/ClickHouse/ClickHouse/pull/55210) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Support orc filter push down (rowgroup level). [#55330](https://github.com/ClickHouse/ClickHouse/pull/55330) ([李扬](https://github.com/taiyang-li)).
* Improve performance of external aggregation with a lot of temporary files. [#55489](https://github.com/ClickHouse/ClickHouse/pull/55489) ([Maksim Kita](https://github.com/kitaisreal)).
* Set a reasonable size for the marks cache for secondary indices by default to avoid loading the marks over and over again. [#55654](https://github.com/ClickHouse/ClickHouse/pull/55654) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Avoid unnecessary reconstruction of index granules when reading skip indexes. This addresses [#55653](https://github.com/ClickHouse/ClickHouse/issues/55653)#issuecomment-1763766009 . [#55683](https://github.com/ClickHouse/ClickHouse/pull/55683) ([Amos Bird](https://github.com/amosbird)).
* Cache cast function in set during execution to improve the performance of function `IN` when set element type doesn't exactly match column type. [#55712](https://github.com/ClickHouse/ClickHouse/pull/55712) ([Duc Canh Le](https://github.com/canhld94)).
* Performance improvement for `ColumnVector::insertMany` and `ColumnVector::insertManyFrom`. [#55714](https://github.com/ClickHouse/ClickHouse/pull/55714) ([frinkr](https://github.com/frinkr)).
* ... Getting values from a map is widely used. In practice, the key structrues are usally the same in the same map column, we could try to predict the next row's key position and reduce the comparisons. [#55929](https://github.com/ClickHouse/ClickHouse/pull/55929) ([lgbo](https://github.com/lgbo-ustc)).
* ... Fix an issue that struct field prune doesn't work in some cases. For example ```sql INSERT INTO FUNCTION file('test_parquet_struct', Parquet, 'x Tuple(a UInt32, b UInt32, c String)') SELECT tuple(number, rand(), concat('testxxxxxxx' toString(number))) FROM numbers(10);. [#56117](https://github.com/ClickHouse/ClickHouse/pull/56117) ([lgbo](https://github.com/lgbo-ustc)).
#### Improvement
* This is the second part of Kusto Query Language dialect support. [Phase 1 implementation ](https://github.com/ClickHouse/ClickHouse/pull/37961) has been merged. [#42510](https://github.com/ClickHouse/ClickHouse/pull/42510) ([larryluogit](https://github.com/larryluogit)).
* Op processors IDs are raw ptrs casted to UInt64. Print it in a prettier manner:. [#48852](https://github.com/ClickHouse/ClickHouse/pull/48852) ([Vlad Seliverstov](https://github.com/behebot)).
* Creating a direct dictionary with a lifetime field set will be rejected at create time. Fixes: [#27861](https://github.com/ClickHouse/ClickHouse/issues/27861). [#49043](https://github.com/ClickHouse/ClickHouse/pull/49043) ([Rory Crispin](https://github.com/RoryCrispin)).
* Allow parameters in queries with partitions like `ALTER TABLE t DROP PARTITION`. Closes [#49449](https://github.com/ClickHouse/ClickHouse/issues/49449). [#49516](https://github.com/ClickHouse/ClickHouse/pull/49516) ([Nikolay Degterinsky](https://github.com/evillique)).
* 1.Refactor the code about zookeeper_connection 2.Add a new column xid for zookeeper_connection. [#50702](https://github.com/ClickHouse/ClickHouse/pull/50702) ([helifu](https://github.com/helifu)).
* Add the ability to tune the number of parallel replicas used in a query execution based on the estimation of rows to read. [#51692](https://github.com/ClickHouse/ClickHouse/pull/51692) ([Raúl Marín](https://github.com/Algunenano)).
* Distributed queries executed in `async_socket_for_remote` mode (default) now respect `max_threads` limit. Previously, some queries could create excessive threads (up to `max_distributed_connections`), causing server performance issues. [#53504](https://github.com/ClickHouse/ClickHouse/pull/53504) ([filimonov](https://github.com/filimonov)).
* Display the correct server settings after reload. [#53774](https://github.com/ClickHouse/ClickHouse/pull/53774) ([helifu](https://github.com/helifu)).
* Add support for mathematical minus `` character in queries, similar to `-`. [#54100](https://github.com/ClickHouse/ClickHouse/pull/54100) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add replica groups to the Replicated database engine. Closes [#53620](https://github.com/ClickHouse/ClickHouse/issues/53620). [#54421](https://github.com/ClickHouse/ClickHouse/pull/54421) ([Nikolay Degterinsky](https://github.com/evillique)).
* This PR will fix UBsan test error [here](https://github.com/ClickHouse/ClickHouse/pull/54566). [#54568](https://github.com/ClickHouse/ClickHouse/pull/54568) ([JackyWoo](https://github.com/JackyWoo)).
* Support asynchronous inserts with external data via native protocol. Previously it worked only if data is inlined into query. [#54730](https://github.com/ClickHouse/ClickHouse/pull/54730) ([Anton Popov](https://github.com/CurtizJ)).
* It is better to retry retriable s3 errors than totally fail the query. Set bigger value to the s3_retry_attempts by default. [#54770](https://github.com/ClickHouse/ClickHouse/pull/54770) ([Sema Checherinda](https://github.com/CheSema)).
* Optimised external aggregation memory consumption in case many temporary files were generated. [#54798](https://github.com/ClickHouse/ClickHouse/pull/54798) ([Nikita Taranov](https://github.com/nickitat)).
* Add load balancing test_hostname_levenshtein_distance. [#54826](https://github.com/ClickHouse/ClickHouse/pull/54826) ([JackyWoo](https://github.com/JackyWoo)).
* Caching skip-able entries while executing DDL from Zookeeper distributed DDL queue. [#54828](https://github.com/ClickHouse/ClickHouse/pull/54828) ([Duc Canh Le](https://github.com/canhld94)).
* Improve hiding secrets in logs. [#55089](https://github.com/ClickHouse/ClickHouse/pull/55089) ([Vitaly Baranov](https://github.com/vitlibar)).
* Added fields `substreams` and `filenames` to the `system.parts_columns` table. [#55108](https://github.com/ClickHouse/ClickHouse/pull/55108) ([Anton Popov](https://github.com/CurtizJ)).
* For now the projection analysis will be performed only on top of query plan. The setting `query_plan_optimize_projection` became obsolete (it was enabled by default long time ago). [#55112](https://github.com/ClickHouse/ClickHouse/pull/55112) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* When function "untuple()" is now called on a tuple with named elements and itself has an alias (e.g. "select untuple(tuple(1)::Tuple(element_alias Int)) AS untuple_alias"), then the result column name is now generated from the untuple alias and the tuple element alias (in the example: "untuple_alias.element_alias"). [#55123](https://github.com/ClickHouse/ClickHouse/pull/55123) ([garcher22](https://github.com/garcher22)).
* Added setting `describe_include_virtual_columns`, which allows to include virtual columns of table into result of `DESCRIBE` query. Added setting `describe_compact_output`. If it is set to `true`, `DESCRIBE` query returns only names and types of columns without extra information. [#55129](https://github.com/ClickHouse/ClickHouse/pull/55129) ([Anton Popov](https://github.com/CurtizJ)).
* Sometimes `OPTIMIZE` with `optimize_throw_if_noop=1` may fail with an error `unknown reason` while the real cause of it - different projections in different parts. This behavior is fixed. [#55130](https://github.com/ClickHouse/ClickHouse/pull/55130) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Allow to have several MaterializedPostgreSQL tables following the same Postgres table. By default this behaviour is not enabled (for compatibility, because it is backward-incompatible change), but can be turned on with setting `materialized_postgresql_use_unique_replication_consumer_identifier`. Closes [#54918](https://github.com/ClickHouse/ClickHouse/issues/54918). [#55145](https://github.com/ClickHouse/ClickHouse/pull/55145) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Allow to parse negative DateTime64 and DateTime with fractional part from short strings. [#55146](https://github.com/ClickHouse/ClickHouse/pull/55146) ([Andrey Zvonov](https://github.com/zvonand)).
* To improve compatibility with MySQL, 1. "information_schema.tables" now includes the new field "table_rows", and 2. "information_schema.columns" now includes the new field "extra". [#55215](https://github.com/ClickHouse/ClickHouse/pull/55215) ([Robert Schulze](https://github.com/rschu1ze)).
* Clickhouse-client won't show "0 rows in set" if it is zero and if exception was thrown. [#55240](https://github.com/ClickHouse/ClickHouse/pull/55240) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Support rename table without keyword `TABLE` like `RENAME db.t1 to db.t2`. [#55373](https://github.com/ClickHouse/ClickHouse/pull/55373) ([凌涛](https://github.com/lingtaolf)).
* Add internal_replication to system.clusters. [#55377](https://github.com/ClickHouse/ClickHouse/pull/55377) ([Konstantin Morozov](https://github.com/k-morozov)).
* Select remote proxy resolver based on request protocol, add proxy feature docs and remove `DB::ProxyConfiguration::Protocol::ANY`. [#55430](https://github.com/ClickHouse/ClickHouse/pull/55430) ([Arthur Passos](https://github.com/arthurpassos)).
* Avoid retrying keeper operations on INSERT after table shutdown. [#55519](https://github.com/ClickHouse/ClickHouse/pull/55519) ([Azat Khuzhin](https://github.com/azat)).
* Improved overall resilience for ClickHouse in case of many parts within partition (more than 1000). It might reduce the number of `TOO_MANY_PARTS` errors. [#55526](https://github.com/ClickHouse/ClickHouse/pull/55526) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Follow up https://github.com/ClickHouse/ClickHouse/pull/55184 to not to fall into `UNKNOWN_SETTING` error when the user uses the deleted MergeTree settings. [#55557](https://github.com/ClickHouse/ClickHouse/pull/55557) ([Jihyuk Bok](https://github.com/tomahawk28)).
* Updated `dashboard.html()` added a `if` to check if there is one chart, then "maximize" and "drag" buttons are not shown. [#55581](https://github.com/ClickHouse/ClickHouse/pull/55581) ([bhavuk2002](https://github.com/bhavuk2002)).
* Functions `toDayOfWeek()` (MySQL alias: `DAYOFWEEK()`), `toYearWeek()` (`YEARWEEK()`) and `toWeek()` (`WEEK()`) now supports `String` arguments. This makes its behavior consistent with MySQL's behavior. [#55589](https://github.com/ClickHouse/ClickHouse/pull/55589) ([Robert Schulze](https://github.com/rschu1ze)).
* Implement query parameters support for `ALTER TABLE ... ACTION PARTITION [ID] {parameter_name:ParameterType}`. Merges [#49516](https://github.com/ClickHouse/ClickHouse/issues/49516). Closes [#49449](https://github.com/ClickHouse/ClickHouse/issues/49449). [#55604](https://github.com/ClickHouse/ClickHouse/pull/55604) ([alesapin](https://github.com/alesapin)).
* Inverted indexes do not store tokens with too many matches (i.e. row ids in the posting list). This saves space and avoids ineffective index lookups when sequential scans would be equally fast or faster. The previous heuristics (`density` parameter passed to the index definition) that controlled when tokens would not be stored was too confusing for users. A much simpler heuristics based on parameter `max_rows_per_postings_list` (default: 64k) is introduced which directly controls the maximum allowed number of row ids in a postings list. [#55616](https://github.com/ClickHouse/ClickHouse/pull/55616) ([Harry Lee](https://github.com/HarryLeeIBM)).
* `SHOW COLUMNS` now correctly reports type `FixedString` as `BLOB` if setting `use_mysql_types_in_show_columns` is on. Also added two new settings, `mysql_map_string_to_text_in_show_columns` and `mysql_map_fixed_string_to_text_in_show_columns` to switch the output for types `String` and `FixedString` as `TEXT` or `BLOB`. [#55617](https://github.com/ClickHouse/ClickHouse/pull/55617) ([Serge Klochkov](https://github.com/slvrtrn)).
* During ReplicatedMergeTree tables startup clickhouse server checks set of parts for unexpected parts (exists locally, but not in zookeeper). All unexpected parts move to detached directory and instead of them server tries to restore some ancestor (covered) parts. Now server tries to restore closest ancestors instead of random covered parts. [#55645](https://github.com/ClickHouse/ClickHouse/pull/55645) ([alesapin](https://github.com/alesapin)).
* The advanced dashboard now supports draggable charts on touch devices. This closes [#54206](https://github.com/ClickHouse/ClickHouse/issues/54206). [#55649](https://github.com/ClickHouse/ClickHouse/pull/55649) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Introduced setting `date_time_overflow_behavior` with possible values `ignore`, `throw`, `saturate` that controls the overflow behavior when converting from Date, Date32, DateTime64, Integer or Float to Date, Date32, DateTime or DateTime64. [#55696](https://github.com/ClickHouse/ClickHouse/pull/55696) ([Andrey Zvonov](https://github.com/zvonand)).
* Improve write performance to rocksdb. [#55732](https://github.com/ClickHouse/ClickHouse/pull/55732) ([Duc Canh Le](https://github.com/canhld94)).
* Use the default query format if declared when outputting exception with http_write_exception_in_output_format. [#55739](https://github.com/ClickHouse/ClickHouse/pull/55739) ([Raúl Marín](https://github.com/Algunenano)).
* Use upstream repo for apache datasketches. [#55787](https://github.com/ClickHouse/ClickHouse/pull/55787) ([Nikita Taranov](https://github.com/nickitat)).
* Add support for SHOW MERGES query. [#55815](https://github.com/ClickHouse/ClickHouse/pull/55815) ([megao](https://github.com/jetgm)).
* Provide a better message for common MV pitfalls. [#55826](https://github.com/ClickHouse/ClickHouse/pull/55826) ([Raúl Marín](https://github.com/Algunenano)).
* Reduced memory consumption during loading of hierarchical dictionaries. [#55838](https://github.com/ClickHouse/ClickHouse/pull/55838) ([Nikita Taranov](https://github.com/nickitat)).
* All dictionaries support setting `dictionary_use_async_executor`. [#55839](https://github.com/ClickHouse/ClickHouse/pull/55839) ([vdimir](https://github.com/vdimir)).
* If you dropped the current database, you will still be able to run some queries in `clickhouse-local` and switch to another database. This makes the behavior consistent with `clickhouse-client`. This closes [#55834](https://github.com/ClickHouse/ClickHouse/issues/55834). [#55853](https://github.com/ClickHouse/ClickHouse/pull/55853) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Functions `(add|subtract)(Year|Quarter|Month|Week|Day|Hour|Minute|Second|Millisecond|Microsecond|Nanosecond)` now support string-encoded date arguments, e.g. `SELECT addDays('2023-10-22', 1)`. This increases compatibility with MySQL and is needed by Tableau Online. [#55869](https://github.com/ClickHouse/ClickHouse/pull/55869) ([Robert Schulze](https://github.com/rschu1ze)).
* Introduce setting `create_table_empty_primary_key_by_default` for default `ORDER BY ()`. [#55899](https://github.com/ClickHouse/ClickHouse/pull/55899) ([Srikanth Chekuri](https://github.com/srikanthccv)).
* Prevent excesive memory usage when deserializing AggregateFunctionTopKGenericData. [#55947](https://github.com/ClickHouse/ClickHouse/pull/55947) ([Raúl Marín](https://github.com/Algunenano)).
* The setting `apply_deleted_mask` when disabled allows to read rows that where marked as deleted by lightweight DELETE queries. This is useful for debugging. [#55952](https://github.com/ClickHouse/ClickHouse/pull/55952) ([Alexander Gololobov](https://github.com/davenger)).
* Allow skip null values when serailize tuple to json objects, which makes it possible to keep compatiability with spark to_json function, which is also useful for gluten. [#55956](https://github.com/ClickHouse/ClickHouse/pull/55956) ([李扬](https://github.com/taiyang-li)).
* Functions `(add|sub)Date()` now support string-encoded date arguments, e.g. `SELECT addDate('2023-10-22 11:12:13', INTERVAL 5 MINUTE)`. The same support for string-encoded date arguments is added to the plus and minus operators, e.g. `SELECT '2023-10-23' + INTERVAL 1 DAY`. This increases compatibility with MySQL and is needed by Tableau Online. [#55960](https://github.com/ClickHouse/ClickHouse/pull/55960) ([Robert Schulze](https://github.com/rschu1ze)).
* Allow unquoted strings with CR in CSV format. Closes [#39930](https://github.com/ClickHouse/ClickHouse/issues/39930). [#56046](https://github.com/ClickHouse/ClickHouse/pull/56046) ([Kruglov Pavel](https://github.com/Avogar)).
* On a Keeper with lots of watches AsyncMetrics threads can consume 100% of CPU for noticable time in `DB::KeeperStorage::getSessionsWithWatchesCount()`. The fix is to avoid traversing heavy `watches` and `list_watches` sets. [#56054](https://github.com/ClickHouse/ClickHouse/pull/56054) ([Alexander Gololobov](https://github.com/davenger)).
* Allow to run `clickhouse-keeper` using embedded config. [#56086](https://github.com/ClickHouse/ClickHouse/pull/56086) ([Maksim Kita](https://github.com/kitaisreal)).
* Set limit of the maximum configuration value for queued.min.messages to avoid problem with start fetching data with Kafka. [#56121](https://github.com/ClickHouse/ClickHouse/pull/56121) ([Stas Morozov](https://github.com/r3b-fish)).
* Fixed typo in SQL function `minSampleSizeContinous` (renamed `minSampleSizeContinuous`). Old name is preserved for backward compatibility. This closes: [#56139](https://github.com/ClickHouse/ClickHouse/issues/56139). [#56143](https://github.com/ClickHouse/ClickHouse/pull/56143) ([Dorota Szeremeta](https://github.com/orotaday)).
* Print corrupted part path on disk before shutdown server. Before this change if a part is corrupted on disk and server cannot start, it was almost impossible to understand which part is broken. This is fixed. [#56181](https://github.com/ClickHouse/ClickHouse/pull/56181) ([Duc Canh Le](https://github.com/canhld94)).
#### Build/Testing/Packaging Improvement
* If the database is already initialized, it doesn't need to be initialized again upon subsequent launches. This can potentially fix the issue of infinite container restarts when the database fails to load within 1000 attempts (relevant for very large databases and multi-node setups). [#50724](https://github.com/ClickHouse/ClickHouse/pull/50724) ([Alexander Nikolaev](https://github.com/AlexNik)).
* Resource with source code including submodules is built in Darwin special build task. It may be used to build ClickHouse without checkouting submodules. [#51435](https://github.com/ClickHouse/ClickHouse/pull/51435) ([Ilya Yatsishin](https://github.com/qoega)).
* An error will occur when building ClickHouse with the avx series of instructions enabled. CMake command ```shell cmake .. -DCMAKE_BUILD_TYPE=Release -DENABLE_AVX=ON -DENABLE_AVX2=ON -DENABLE_AVX2_FOR_SPEC_OP=ON ``` Failed message ``` [1558/11630] Building CXX object contrib/snappy-cmake/CMakeFiles/_snappy.dir/__/snappy/snappy.cc.o FAILED: contrib/snappy-cmake/CMakeFiles/_snappy.dir/__/snappy/snappy.cc.o /usr/bin/ccache /usr/bin/clang++-17 --target=x86_64-linux-gnu --sysroot=/opt/ClickHouse/cmake/linux/../../contrib/sysroot/linux-x86_64/x86_64-linux-gnu/libc -DHAVE_CONFIG_H -DSTD_EXCEPTION_HAS_STACK_TRACE=1 -D_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS -I/opt/ClickHouse/base/glibc-compatibility/memcpy -isystem /opt/ClickHouse/contrib/snappy -isystem /opt/ClickHouse/build/contrib/snappy-cmake -isystem /opt/ClickHouse/contrib/llvm-project/libcxx/include -isystem /opt/ClickHouse/contrib/llvm-project/libcxxabi/include -isystem /opt/ClickHouse/contrib/libunwind/include --gcc-toolchain=/opt/ClickHouse/cmake/linux/../../contrib/sysroot/linux-x86_64 -fdiagnostics-color=always -Wno-enum-constexpr-conversion -fsized-deallocation -pipe -mssse3 -msse4.1 -msse4.2 -mpclmul -mpopcnt -mavx -mavx2 -fasynchronous-unwind-tables -ffile-prefix-map=/opt/ClickHouse=. -falign-functions=32 -mbranches-within-32B-boundaries -fdiagnostics-absolute-paths -fstrict-vtable-pointers -w -O3 -DNDEBUG -D OS_LINUX -Werror -nostdinc++ -std=c++2b -MD -MT contrib/snappy-cmake/CMakeFiles/_snappy.dir/__/snappy/snappy.cc.o -MF contrib/snappy-cmake/CMakeFiles/_snappy.dir/__/snappy/snappy.cc.o.d -o contrib/snappy-cmake/CMakeFiles/_snappy.dir/__/snappy/snappy.cc.o -c /opt/ClickHouse/contrib/snappy/snappy.cc /opt/ClickHouse/contrib/snappy/snappy.cc:1061:3: error: unknown type name '__m256i' 1061 | __m256i data = _mm256_lddqu_si256(static_cast<const __m256i *>(src)); | ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1061:55: error: unknown type name '__m256i' 1061 | __m256i data = _mm256_lddqu_si256(static_cast<const __m256i *>(src)); | ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1061:18: error: use of undeclared identifier '_mm256_lddqu_si256'; did you mean '_mm_lddqu_si128'? 1061 | __m256i data = _mm256_lddqu_si256(static_cast<const __m256i *>(src)); | ^~~~~~~~~~~~~~~~~~ | _mm_lddqu_si128 /usr/lib/llvm-17/lib/clang/17/include/pmmintrin.h:38:1: note: '_mm_lddqu_si128' declared here 38 | _mm_lddqu_si128(__m128i_u const *__p) | ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1061:66: error: cannot initialize a parameter of type 'const __m128i_u *' with an lvalue of type 'const void *' 1061 | __m256i data = _mm256_lddqu_si256(static_cast<const __m256i *>(src)); | ^~~ /usr/lib/llvm-17/lib/clang/17/include/pmmintrin.h:38:34: note: passing argument to parameter '__p' here 38 | _mm_lddqu_si128(__m128i_u const *__p) | ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1062:40: error: unknown type name '__m256i' 1062 | _mm256_storeu_si256(reinterpret_cast<__m256i *>(dst), data); | ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1065:49: error: unknown type name '__m256i' 1065 | data = _mm256_lddqu_si256(static_cast<const __m256i *>(src) + 1); | ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1065:65: error: arithmetic on a pointer to void 1065 | data = _mm256_lddqu_si256(static_cast<const __m256i *>(src) + 1); | ~~~ ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1066:42: error: unknown type name '__m256i' 1066 | _mm256_storeu_si256(reinterpret_cast<__m256i *>(dst) + 1, data); | ^ 8 errors generated. [1567/11630] Building CXX object contrib/rocksdb-cma...rocksdb.dir/__/rocksdb/db/arena_wrapped_db_iter.cc.o ninja: build stopped: subcommand failed. ``` The reason is that snappy does not enable SNAPPY_HAVE_X86_CRC32. [#55049](https://github.com/ClickHouse/ClickHouse/pull/55049) ([monchickey](https://github.com/monchickey)).
* Add `instance_env_variables` option to integration tests. [#55208](https://github.com/ClickHouse/ClickHouse/pull/55208) ([Arthur Passos](https://github.com/arthurpassos)).
* Solve issue with launching standalone clickhouse-keeper from clickhouse-server package. [#55226](https://github.com/ClickHouse/ClickHouse/pull/55226) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* In tests RabbitMQ version is updated to 3.12.6. Improved logs collection for RabbitMQ tests. [#55424](https://github.com/ClickHouse/ClickHouse/pull/55424) ([Ilya Yatsishin](https://github.com/qoega)).
* Fix integration check python script to use gh api url - Add Readme for CI tests. [#55476](https://github.com/ClickHouse/ClickHouse/pull/55476) ([Max K.](https://github.com/mkaynov)).
* Fix integration check python script to use gh api url - Add Readme for CI tests. [#55716](https://github.com/ClickHouse/ClickHouse/pull/55716) ([Max K.](https://github.com/mkaynov)).
* Check sha512 for tgz; use a proper repository for keeper; write only filenames to TGZ.sha512 files for tarball packages. Prerequisite for [#31473](https://github.com/ClickHouse/ClickHouse/issues/31473). [#55717](https://github.com/ClickHouse/ClickHouse/pull/55717) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Updated to get free port for azurite. [#55796](https://github.com/ClickHouse/ClickHouse/pull/55796) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Reduce the ouput info. [#55938](https://github.com/ClickHouse/ClickHouse/pull/55938) ([helifu](https://github.com/helifu)).
* Modified the error message difference between openssl and boringssl to fix the functional test. [#55975](https://github.com/ClickHouse/ClickHouse/pull/55975) ([MeenaRenganathan22](https://github.com/MeenaRenganathan22)).
* Changes to support the HDFS for s390x. [#56128](https://github.com/ClickHouse/ClickHouse/pull/56128) ([MeenaRenganathan22](https://github.com/MeenaRenganathan22)).
* Fix flaky test of jbod balancer by relaxing the Gini coefficient and introducing more determinism in insertions. [#56175](https://github.com/ClickHouse/ClickHouse/pull/56175) ([Amos Bird](https://github.com/amosbird)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* skip hardlinking inverted index files in mutation [#47663](https://github.com/ClickHouse/ClickHouse/pull/47663) ([cangyin](https://github.com/cangyin)).
* Fix 'Cannot find column' in read-in-order optimization with ARRAY JOIN [#51746](https://github.com/ClickHouse/ClickHouse/pull/51746) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Support missed Object(Nullable(json)) subcolumns in query. [#54052](https://github.com/ClickHouse/ClickHouse/pull/54052) ([zps](https://github.com/VanDarkholme7)).
* Re-add fix for `accurateCastOrNull()` [#54629](https://github.com/ClickHouse/ClickHouse/pull/54629) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Fix detecting DEFAULT for columns of a Distributed table created without AS [#55060](https://github.com/ClickHouse/ClickHouse/pull/55060) ([Vitaly Baranov](https://github.com/vitlibar)).
* Proper cleanup in case of exception in ctor of ShellCommandSource [#55103](https://github.com/ClickHouse/ClickHouse/pull/55103) ([Alexander Gololobov](https://github.com/davenger)).
* Fix deadlock in LDAP assigned role update [#55119](https://github.com/ClickHouse/ClickHouse/pull/55119) ([Julian Maicher](https://github.com/jmaicher)).
* Suppress error statistics update for internal exceptions [#55128](https://github.com/ClickHouse/ClickHouse/pull/55128) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix deadlock in backups [#55132](https://github.com/ClickHouse/ClickHouse/pull/55132) ([alesapin](https://github.com/alesapin)).
* Fix storage Iceberg files retrieval [#55144](https://github.com/ClickHouse/ClickHouse/pull/55144) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix partition pruning of extra columns in set. [#55172](https://github.com/ClickHouse/ClickHouse/pull/55172) ([Amos Bird](https://github.com/amosbird)).
* Fix recalculation of skip indexes in ALTER UPDATE queries when table has adaptive granularity [#55202](https://github.com/ClickHouse/ClickHouse/pull/55202) ([Duc Canh Le](https://github.com/canhld94)).
* Fix for background download in fs cache [#55252](https://github.com/ClickHouse/ClickHouse/pull/55252) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Avoid possible memory leaks in compressors in case of missing buffer finalization [#55262](https://github.com/ClickHouse/ClickHouse/pull/55262) ([Azat Khuzhin](https://github.com/azat)).
* Fix functions execution over sparse columns [#55275](https://github.com/ClickHouse/ClickHouse/pull/55275) ([Azat Khuzhin](https://github.com/azat)).
* Fix incorrect merging of Nested for SELECT FINAL FROM SummingMergeTree [#55276](https://github.com/ClickHouse/ClickHouse/pull/55276) ([Azat Khuzhin](https://github.com/azat)).
* Fix bug with inability to drop detached partition in replicated merge tree on top of S3 without zero copy [#55309](https://github.com/ClickHouse/ClickHouse/pull/55309) ([alesapin](https://github.com/alesapin)).
* Fix SIGSEGV in MergeSortingPartialResultTransform (due to zero chunks after remerge()) [#55335](https://github.com/ClickHouse/ClickHouse/pull/55335) ([Azat Khuzhin](https://github.com/azat)).
* Fix data-race in CreatingSetsTransform (on errors) due to throwing shared exception [#55338](https://github.com/ClickHouse/ClickHouse/pull/55338) ([Azat Khuzhin](https://github.com/azat)).
* Fix trash optimization (up to a certain extent) [#55353](https://github.com/ClickHouse/ClickHouse/pull/55353) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix leak in StorageHDFS [#55370](https://github.com/ClickHouse/ClickHouse/pull/55370) ([Azat Khuzhin](https://github.com/azat)).
* Fix parsing of arrays in cast operator [#55417](https://github.com/ClickHouse/ClickHouse/pull/55417) ([Anton Popov](https://github.com/CurtizJ)).
* Fix filtering by virtual columns with OR filter in query [#55418](https://github.com/ClickHouse/ClickHouse/pull/55418) ([Azat Khuzhin](https://github.com/azat)).
* Fix MongoDB connection issues [#55419](https://github.com/ClickHouse/ClickHouse/pull/55419) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix MySQL interface boolean representation [#55427](https://github.com/ClickHouse/ClickHouse/pull/55427) ([Serge Klochkov](https://github.com/slvrtrn)).
* Fix MySQL text protocol DateTime formatting and LowCardinality(Nullable(T)) types reporting [#55479](https://github.com/ClickHouse/ClickHouse/pull/55479) ([Serge Klochkov](https://github.com/slvrtrn)).
* Make `use_mysql_types_in_show_columns` affect only `SHOW COLUMNS` [#55481](https://github.com/ClickHouse/ClickHouse/pull/55481) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix stack symbolizer parsing DW_FORM_ref_addr incorrectly and sometimes crashing [#55483](https://github.com/ClickHouse/ClickHouse/pull/55483) ([Michael Kolupaev](https://github.com/al13n321)).
* Destroy fiber in case of exception in cancelBefore in AsyncTaskExecutor [#55516](https://github.com/ClickHouse/ClickHouse/pull/55516) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix Query Parameters not working with custom HTTP handlers [#55521](https://github.com/ClickHouse/ClickHouse/pull/55521) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Fix checking of non handled data for Values format [#55527](https://github.com/ClickHouse/ClickHouse/pull/55527) ([Azat Khuzhin](https://github.com/azat)).
* Fix 'Invalid cursor state' in odbc interacting with MS SQL Server [#55558](https://github.com/ClickHouse/ClickHouse/pull/55558) ([vdimir](https://github.com/vdimir)).
* Fix max execution time and 'break' overflow mode [#55577](https://github.com/ClickHouse/ClickHouse/pull/55577) ([Alexander Gololobov](https://github.com/davenger)).
* Fix crash in QueryNormalizer with cyclic aliases [#55602](https://github.com/ClickHouse/ClickHouse/pull/55602) ([vdimir](https://github.com/vdimir)).
* Disable wrong optimization and add a test [#55609](https://github.com/ClickHouse/ClickHouse/pull/55609) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Merging [#52352](https://github.com/ClickHouse/ClickHouse/issues/52352) [#55621](https://github.com/ClickHouse/ClickHouse/pull/55621) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add a test to avoid incorrect decimal sorting [#55662](https://github.com/ClickHouse/ClickHouse/pull/55662) ([Amos Bird](https://github.com/amosbird)).
* Fix progress bar for s3 and azure Cluster functions with url without globs [#55666](https://github.com/ClickHouse/ClickHouse/pull/55666) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix filtering by virtual columns with OR filter in query (resubmit) [#55678](https://github.com/ClickHouse/ClickHouse/pull/55678) ([Azat Khuzhin](https://github.com/azat)).
* Fixes and improvements for Iceberg storage [#55695](https://github.com/ClickHouse/ClickHouse/pull/55695) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix data race in CreatingSetsTransform (v2) [#55786](https://github.com/ClickHouse/ClickHouse/pull/55786) ([Azat Khuzhin](https://github.com/azat)).
* Throw exception when parsing illegal string as float if precise_float_parsing is true [#55861](https://github.com/ClickHouse/ClickHouse/pull/55861) ([李扬](https://github.com/taiyang-li)).
* Disable predicate pushdown if the CTE contains stateful functions [#55871](https://github.com/ClickHouse/ClickHouse/pull/55871) ([Raúl Marín](https://github.com/Algunenano)).
* Fix normalize ASTSelectWithUnionQuery strip FORMAT of the query [#55887](https://github.com/ClickHouse/ClickHouse/pull/55887) ([flynn](https://github.com/ucasfl)).
* Try to fix possible segfault in Native ORC input format [#55891](https://github.com/ClickHouse/ClickHouse/pull/55891) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix window functions in case of sparse columns. [#55895](https://github.com/ClickHouse/ClickHouse/pull/55895) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* fix: StorageNull supports subcolumns [#55912](https://github.com/ClickHouse/ClickHouse/pull/55912) ([FFish](https://github.com/wxybear)).
* Do not write retriable errors for Replicated mutate/merge into error log [#55944](https://github.com/ClickHouse/ClickHouse/pull/55944) ([Azat Khuzhin](https://github.com/azat)).
* Fix `SHOW DATABASES LIMIT <N>` [#55962](https://github.com/ClickHouse/ClickHouse/pull/55962) ([Raúl Marín](https://github.com/Algunenano)).
* Fix autogenerated Protobuf schema with fields with underscore [#55974](https://github.com/ClickHouse/ClickHouse/pull/55974) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix dateTime64ToSnowflake64() with non-default scale [#55983](https://github.com/ClickHouse/ClickHouse/pull/55983) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix output/input of Arrow dictionary column [#55989](https://github.com/ClickHouse/ClickHouse/pull/55989) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix fetching schema from schema registry in AvroConfluent [#55991](https://github.com/ClickHouse/ClickHouse/pull/55991) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix 'Block structure mismatch' on concurrent ALTER and INSERTs in Buffer table [#55995](https://github.com/ClickHouse/ClickHouse/pull/55995) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix incorrect free space accounting for least_used JBOD policy [#56030](https://github.com/ClickHouse/ClickHouse/pull/56030) ([Azat Khuzhin](https://github.com/azat)).
* Fix missing scalar issue when evaluating subqueries inside table functions [#56057](https://github.com/ClickHouse/ClickHouse/pull/56057) ([Amos Bird](https://github.com/amosbird)).
* Fix wrong query result when http_write_exception_in_output_format=1 [#56135](https://github.com/ClickHouse/ClickHouse/pull/56135) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix schema cache for fallback JSON->JSONEachRow with changed settings [#56172](https://github.com/ClickHouse/ClickHouse/pull/56172) ([Kruglov Pavel](https://github.com/Avogar)).
* Add error handler to odbc-bridge [#56185](https://github.com/ClickHouse/ClickHouse/pull/56185) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
#### NO CL ENTRY
* NO CL ENTRY: 'Revert "Fix libssh+openssl3 & s390x (part 2)"'. [#55188](https://github.com/ClickHouse/ClickHouse/pull/55188) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Support SAMPLE BY for VIEW"'. [#55357](https://github.com/ClickHouse/ClickHouse/pull/55357) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Revert "refine error code of duplicated index in create query""'. [#55467](https://github.com/ClickHouse/ClickHouse/pull/55467) ([Han Fei](https://github.com/hanfei1991)).
* NO CL ENTRY: 'Update mysql.md - Remove the Private Preview Note'. [#55486](https://github.com/ClickHouse/ClickHouse/pull/55486) ([Ryadh DAHIMENE](https://github.com/Ryado)).
* NO CL ENTRY: 'Revert "Removed "maximize" and "drag" buttons from `dashboard` in case of single chart"'. [#55623](https://github.com/ClickHouse/ClickHouse/pull/55623) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Fix filtering by virtual columns with OR filter in query"'. [#55657](https://github.com/ClickHouse/ClickHouse/pull/55657) ([Antonio Andelic](https://github.com/antonio2368)).
* NO CL ENTRY: 'Revert "Improve ColumnDecimal, ColumnVector getPermutation performance using pdqsort with RadixSort"'. [#55682](https://github.com/ClickHouse/ClickHouse/pull/55682) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Integration check script fix ups"'. [#55694](https://github.com/ClickHouse/ClickHouse/pull/55694) ([alesapin](https://github.com/alesapin)).
* NO CL ENTRY: 'Revert "Fix 'Block structure mismatch' on concurrent ALTER and INSERTs in Buffer table"'. [#56103](https://github.com/ClickHouse/ClickHouse/pull/56103) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Add function getHttpHeader"'. [#56109](https://github.com/ClickHouse/ClickHouse/pull/56109) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Fix output/input of Arrow dictionary column"'. [#56150](https://github.com/ClickHouse/ClickHouse/pull/56150) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Set defaults_for_omitted_fields to true for hive text format [#49486](https://github.com/ClickHouse/ClickHouse/pull/49486) ([李扬](https://github.com/taiyang-li)).
* Fixing join tests with analyzer [#49555](https://github.com/ClickHouse/ClickHouse/pull/49555) ([vdimir](https://github.com/vdimir)).
* Make exception about `ALTER TABLE ... DROP COLUMN|INDEX|PROJECTION` more clear [#50181](https://github.com/ClickHouse/ClickHouse/pull/50181) ([Alexander Gololobov](https://github.com/davenger)).
* ANTI JOIN: Invalid number of rows in Chunk [#50944](https://github.com/ClickHouse/ClickHouse/pull/50944) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Analyzer: fix row policy [#53170](https://github.com/ClickHouse/ClickHouse/pull/53170) ([Dmitry Novik](https://github.com/novikd)).
* Add a test with Block structure mismatch in grace hash join. [#53278](https://github.com/ClickHouse/ClickHouse/pull/53278) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Support skip_unused_shards in Analyzer [#53282](https://github.com/ClickHouse/ClickHouse/pull/53282) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Revert "Revert "Planner prepare filters for analysis"" [#53792](https://github.com/ClickHouse/ClickHouse/pull/53792) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Add setting allow_experimental_partial_result [#54514](https://github.com/ClickHouse/ClickHouse/pull/54514) ([vdimir](https://github.com/vdimir)).
* Fix CI skip build and skip tests checks [#54532](https://github.com/ClickHouse/ClickHouse/pull/54532) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* `MergeTreeData::forcefullyMovePartToDetachedAndRemoveFromMemory` does not respect mutations [#54653](https://github.com/ClickHouse/ClickHouse/pull/54653) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix azure test by using unique names [#54738](https://github.com/ClickHouse/ClickHouse/pull/54738) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Use `--filter` to reduce checkout time [#54857](https://github.com/ClickHouse/ClickHouse/pull/54857) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Remove tests [#54873](https://github.com/ClickHouse/ClickHouse/pull/54873) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Remove useless path from lockSharedData in StorageReplicatedMergeTree [#54989](https://github.com/ClickHouse/ClickHouse/pull/54989) ([Mike Kot](https://github.com/myrrc)).
* Fix broken test [#55002](https://github.com/ClickHouse/ClickHouse/pull/55002) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Update version_date.tsv and changelogs after v23.8.3.48-lts [#55063](https://github.com/ClickHouse/ClickHouse/pull/55063) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Use `source` instead of `bash` for pre-build script [#55071](https://github.com/ClickHouse/ClickHouse/pull/55071) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Update s3queue.md to add experimental flag [#55093](https://github.com/ClickHouse/ClickHouse/pull/55093) ([Peignon Melvyn](https://github.com/melvynator)).
* Clean data dir and always start an old server version in aggregate functions compatibility test. [#55105](https://github.com/ClickHouse/ClickHouse/pull/55105) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Update version_date.tsv and changelogs after v23.9.1.1854-stable [#55118](https://github.com/ClickHouse/ClickHouse/pull/55118) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Add links to check reports in status comment [#55122](https://github.com/ClickHouse/ClickHouse/pull/55122) ([vdimir](https://github.com/vdimir)).
* Bump croaring to 2.0.2 [#55127](https://github.com/ClickHouse/ClickHouse/pull/55127) ([Robert Schulze](https://github.com/rschu1ze)).
* check if block is empty after async insert retries [#55143](https://github.com/ClickHouse/ClickHouse/pull/55143) ([Han Fei](https://github.com/hanfei1991)).
* Improve linker detection on macOS [#55147](https://github.com/ClickHouse/ClickHouse/pull/55147) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix libssh+openssl3 & s390x [#55154](https://github.com/ClickHouse/ClickHouse/pull/55154) ([Boris Kuschel](https://github.com/bkuschel)).
* Fix file cache temporary file segment range in FileSegment::reserve [#55164](https://github.com/ClickHouse/ClickHouse/pull/55164) ([vdimir](https://github.com/vdimir)).
* Fix libssh+openssl3 & s390x (part 2) [#55187](https://github.com/ClickHouse/ClickHouse/pull/55187) ([Boris Kuschel](https://github.com/bkuschel)).
* Fix wrong test name [#55190](https://github.com/ClickHouse/ClickHouse/pull/55190) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix libssh+openssl3 & s390x (Part 2.1) [#55192](https://github.com/ClickHouse/ClickHouse/pull/55192) ([Boris Kuschel](https://github.com/bkuschel)).
* Reset signals caught by clickhouse-client if a pager is in use [#55193](https://github.com/ClickHouse/ClickHouse/pull/55193) ([Azat Khuzhin](https://github.com/azat)).
* Update README.md [#55209](https://github.com/ClickHouse/ClickHouse/pull/55209) ([Tyler Hannan](https://github.com/tylerhannan)).
* remove the blocker to grow the metadata file version [#55218](https://github.com/ClickHouse/ClickHouse/pull/55218) ([Sema Checherinda](https://github.com/CheSema)).
* Fix syntax highlight in client for spaceship operator [#55224](https://github.com/ClickHouse/ClickHouse/pull/55224) ([Azat Khuzhin](https://github.com/azat)).
* Fix mypy errors [#55228](https://github.com/ClickHouse/ClickHouse/pull/55228) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Upgrade MinIO to support accepting non signed requests [#55245](https://github.com/ClickHouse/ClickHouse/pull/55245) ([Azat Khuzhin](https://github.com/azat)).
* tests: switch test_throttling to S3 over https to make it more production like [#55247](https://github.com/ClickHouse/ClickHouse/pull/55247) ([Azat Khuzhin](https://github.com/azat)).
* Evaluate defaults during async insert safer [#55253](https://github.com/ClickHouse/ClickHouse/pull/55253) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix data race in context [#55260](https://github.com/ClickHouse/ClickHouse/pull/55260) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Do not allow tests with state ERROR be overwritten by PASSED [#55261](https://github.com/ClickHouse/ClickHouse/pull/55261) ([Azat Khuzhin](https://github.com/azat)).
* Fix query formatting for SYSTEM queries [#55277](https://github.com/ClickHouse/ClickHouse/pull/55277) ([Azat Khuzhin](https://github.com/azat)).
* Context added TSA [#55278](https://github.com/ClickHouse/ClickHouse/pull/55278) ([Maksim Kita](https://github.com/kitaisreal)).
* Improve logging in query cache [#55296](https://github.com/ClickHouse/ClickHouse/pull/55296) ([Robert Schulze](https://github.com/rschu1ze)).
* MaterializedPostgreSQL: remove back check [#55297](https://github.com/ClickHouse/ClickHouse/pull/55297) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Make `use_mysql_types_in_show_columns` independent from the connection [#55298](https://github.com/ClickHouse/ClickHouse/pull/55298) ([Robert Schulze](https://github.com/rschu1ze)).
* Make `HandlingRuleHTTPHandlerFactory` more stupid, but less error prone [#55307](https://github.com/ClickHouse/ClickHouse/pull/55307) ([alesapin](https://github.com/alesapin)).
* Fix tsan issue in croaring [#55311](https://github.com/ClickHouse/ClickHouse/pull/55311) ([Robert Schulze](https://github.com/rschu1ze)).
* Refactorings and better documentation for `toStartOfInterval()` [#55327](https://github.com/ClickHouse/ClickHouse/pull/55327) ([Robert Schulze](https://github.com/rschu1ze)).
* Review [#51946](https://github.com/ClickHouse/ClickHouse/issues/51946) and partially revert it [#55336](https://github.com/ClickHouse/ClickHouse/pull/55336) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update README.md [#55339](https://github.com/ClickHouse/ClickHouse/pull/55339) ([Tyler Hannan](https://github.com/tylerhannan)).
* Context locks small fixes [#55352](https://github.com/ClickHouse/ClickHouse/pull/55352) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix bad test `01605_dictinct_two_level` [#55354](https://github.com/ClickHouse/ClickHouse/pull/55354) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix test_max_rows_to_read_leaf_with_view flakiness (due to prefer_localhost_replica) [#55355](https://github.com/ClickHouse/ClickHouse/pull/55355) ([Azat Khuzhin](https://github.com/azat)).
* Better exception messages [#55356](https://github.com/ClickHouse/ClickHouse/pull/55356) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update CHANGELOG.md [#55359](https://github.com/ClickHouse/ClickHouse/pull/55359) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Better recursion depth check [#55361](https://github.com/ClickHouse/ClickHouse/pull/55361) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Merging [#43085](https://github.com/ClickHouse/ClickHouse/issues/43085) [#55362](https://github.com/ClickHouse/ClickHouse/pull/55362) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix data-race in web disk [#55372](https://github.com/ClickHouse/ClickHouse/pull/55372) ([Azat Khuzhin](https://github.com/azat)).
* Disable skim under TSan (Rust does not supports ThreadSanitizer) [#55378](https://github.com/ClickHouse/ClickHouse/pull/55378) ([Azat Khuzhin](https://github.com/azat)).
* Fix missing thread accounting for insert_distributed_sync=1 [#55392](https://github.com/ClickHouse/ClickHouse/pull/55392) ([Azat Khuzhin](https://github.com/azat)).
* Improve tests for untuple() [#55425](https://github.com/ClickHouse/ClickHouse/pull/55425) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix test that never worked test_rabbitmq_random_detach [#55453](https://github.com/ClickHouse/ClickHouse/pull/55453) ([Ilya Yatsishin](https://github.com/qoega)).
* Fix out of bound error in system.remote_data_paths + disk web [#55468](https://github.com/ClickHouse/ClickHouse/pull/55468) ([alesapin](https://github.com/alesapin)).
* Remove existing moving/ dir if allow_remove_stale_moving_parts is off [#55480](https://github.com/ClickHouse/ClickHouse/pull/55480) ([Mike Kot](https://github.com/myrrc)).
* Bump curl to 8.4 [#55492](https://github.com/ClickHouse/ClickHouse/pull/55492) ([Robert Schulze](https://github.com/rschu1ze)).
* Minor fixes for 02882_replicated_fetch_checksums_doesnt_match.sql [#55493](https://github.com/ClickHouse/ClickHouse/pull/55493) ([vdimir](https://github.com/vdimir)).
* AggregatingTransform initGenerate race condition fix [#55495](https://github.com/ClickHouse/ClickHouse/pull/55495) ([Maksim Kita](https://github.com/kitaisreal)).
* HashTable resize exception handle fix [#55497](https://github.com/ClickHouse/ClickHouse/pull/55497) ([Maksim Kita](https://github.com/kitaisreal)).
* fix lots of 'Structure does not match' warnings in ci [#55503](https://github.com/ClickHouse/ClickHouse/pull/55503) ([Han Fei](https://github.com/hanfei1991)).
* Cleanup: parallel replica coordinator usage [#55515](https://github.com/ClickHouse/ClickHouse/pull/55515) ([Igor Nikonov](https://github.com/devcrafter)).
* add k-morozov to trusted contributors [#55523](https://github.com/ClickHouse/ClickHouse/pull/55523) ([Mike Kot](https://github.com/myrrc)).
* Forbid create inverted index if setting not enabled [#55529](https://github.com/ClickHouse/ClickHouse/pull/55529) ([flynn](https://github.com/ucasfl)).
* Better exception messages but without SEGFAULT [#55541](https://github.com/ClickHouse/ClickHouse/pull/55541) ([Antonio Andelic](https://github.com/antonio2368)).
* Avoid setting same promise twice [#55553](https://github.com/ClickHouse/ClickHouse/pull/55553) ([Antonio Andelic](https://github.com/antonio2368)).
* Better error message in case when merge selecting task failed. [#55554](https://github.com/ClickHouse/ClickHouse/pull/55554) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Add a test [#55564](https://github.com/ClickHouse/ClickHouse/pull/55564) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Added healthcheck for LDAP [#55571](https://github.com/ClickHouse/ClickHouse/pull/55571) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fix global context for tests with --gtest_filter [#55583](https://github.com/ClickHouse/ClickHouse/pull/55583) ([Azat Khuzhin](https://github.com/azat)).
* Fix replica groups for Replicated database engine [#55587](https://github.com/ClickHouse/ClickHouse/pull/55587) ([Azat Khuzhin](https://github.com/azat)).
* Remove unused protobuf includes [#55590](https://github.com/ClickHouse/ClickHouse/pull/55590) ([Raúl Marín](https://github.com/Algunenano)).
* Apply Context changes to standalone Keeper [#55591](https://github.com/ClickHouse/ClickHouse/pull/55591) ([Antonio Andelic](https://github.com/antonio2368)).
* Do not fail if label-to-remove does not exists in PR [#55592](https://github.com/ClickHouse/ClickHouse/pull/55592) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* CI: cast extra column expression `pull_request_number` to Int32 [#55599](https://github.com/ClickHouse/ClickHouse/pull/55599) ([Han Fei](https://github.com/hanfei1991)).
* Add back a test that was removed by mistake [#55605](https://github.com/ClickHouse/ClickHouse/pull/55605) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Bump croaring to v2.0.4 [#55606](https://github.com/ClickHouse/ClickHouse/pull/55606) ([Robert Schulze](https://github.com/rschu1ze)).
* byteswap: Add 16/32-byte integer support [#55607](https://github.com/ClickHouse/ClickHouse/pull/55607) ([Robert Schulze](https://github.com/rschu1ze)).
* Revert [#54421](https://github.com/ClickHouse/ClickHouse/issues/54421) [#55613](https://github.com/ClickHouse/ClickHouse/pull/55613) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix: race condition in kusto implementation [#55615](https://github.com/ClickHouse/ClickHouse/pull/55615) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Remove passed tests from `analyzer_tech_debt.txt` [#55618](https://github.com/ClickHouse/ClickHouse/pull/55618) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Enable 02161_addressToLineWithInlines [#55622](https://github.com/ClickHouse/ClickHouse/pull/55622) ([Michael Kolupaev](https://github.com/al13n321)).
* KeyCondition: preparation [#55625](https://github.com/ClickHouse/ClickHouse/pull/55625) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix flakiness of test_system_merges (by increasing sleep interval properly) [#55627](https://github.com/ClickHouse/ClickHouse/pull/55627) ([Azat Khuzhin](https://github.com/azat)).
* fix `structure does not match` logs again [#55628](https://github.com/ClickHouse/ClickHouse/pull/55628) ([Han Fei](https://github.com/hanfei1991)).
* KeyCondition: small changes [#55640](https://github.com/ClickHouse/ClickHouse/pull/55640) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Resubmit [#54421](https://github.com/ClickHouse/ClickHouse/issues/54421) [#55641](https://github.com/ClickHouse/ClickHouse/pull/55641) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix some typos [#55646](https://github.com/ClickHouse/ClickHouse/pull/55646) ([alesapin](https://github.com/alesapin)).
* Show move/maximize only if there is more than a single chart [#55648](https://github.com/ClickHouse/ClickHouse/pull/55648) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Enable test_query_is_lock_free[detach table] for the analyzer [#55668](https://github.com/ClickHouse/ClickHouse/pull/55668) ([Raúl Marín](https://github.com/Algunenano)).
* Allow FINAL with parallel replicas with custom key [#55679](https://github.com/ClickHouse/ClickHouse/pull/55679) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix StorageMaterializedView::isRemote [#55681](https://github.com/ClickHouse/ClickHouse/pull/55681) ([vdimir](https://github.com/vdimir)).
* Bump gRPC to 1.34.1 [#55693](https://github.com/ClickHouse/ClickHouse/pull/55693) ([Robert Schulze](https://github.com/rschu1ze)).
* Randomize block_number column setting in ci [#55713](https://github.com/ClickHouse/ClickHouse/pull/55713) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Use pool for proxied S3 disk http sessions [#55718](https://github.com/ClickHouse/ClickHouse/pull/55718) ([Aleksei Filatov](https://github.com/aalexfvk)).
* Analyzer: fix block stucture mismatch in matview with engine distributed [#55741](https://github.com/ClickHouse/ClickHouse/pull/55741) ([vdimir](https://github.com/vdimir)).
* Use diff object again, since JSON API limits the files [#55750](https://github.com/ClickHouse/ClickHouse/pull/55750) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Big endian platform max intersection fix [#55756](https://github.com/ClickHouse/ClickHouse/pull/55756) ([Suzy Wang](https://github.com/SuzyWangIBMer)).
* Remove temporary debug logging in MultiplexedConnections [#55764](https://github.com/ClickHouse/ClickHouse/pull/55764) ([Michael Kolupaev](https://github.com/al13n321)).
* Check if partition ID is `nullptr` [#55765](https://github.com/ClickHouse/ClickHouse/pull/55765) ([Antonio Andelic](https://github.com/antonio2368)).
* Control Keeper feature flag randomization with env [#55766](https://github.com/ClickHouse/ClickHouse/pull/55766) ([Antonio Andelic](https://github.com/antonio2368)).
* Added test to check CapnProto cache [#55769](https://github.com/ClickHouse/ClickHouse/pull/55769) ([Aleksandr Musorin](https://github.com/AVMusorin)).
* Query Cache: Only cache initial query [#55771](https://github.com/ClickHouse/ClickHouse/pull/55771) ([zhongyuankai](https://github.com/zhongyuankai)).
* Temporarily disable flaky test [#55772](https://github.com/ClickHouse/ClickHouse/pull/55772) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix test test_postgresql_replica_database_engine_2/test.py::test_replica_consumer [#55774](https://github.com/ClickHouse/ClickHouse/pull/55774) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Improve ColumnDecimal, ColumnVector getPermutation performance using pdqsort with RadixSort fix [#55775](https://github.com/ClickHouse/ClickHouse/pull/55775) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix black check [#55779](https://github.com/ClickHouse/ClickHouse/pull/55779) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Correctly grep fuzzer.log [#55780](https://github.com/ClickHouse/ClickHouse/pull/55780) ([Antonio Andelic](https://github.com/antonio2368)).
* Parallel replicas: cleanup, less copying during announcement [#55781](https://github.com/ClickHouse/ClickHouse/pull/55781) ([Igor Nikonov](https://github.com/devcrafter)).
* Enable test_mutation_simple with the analyzer [#55791](https://github.com/ClickHouse/ClickHouse/pull/55791) ([Raúl Marín](https://github.com/Algunenano)).
* Improve enrich image [#55793](https://github.com/ClickHouse/ClickHouse/pull/55793) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Bump gRPC to v1.36.4 [#55811](https://github.com/ClickHouse/ClickHouse/pull/55811) ([Robert Schulze](https://github.com/rschu1ze)).
* Attemp to fix test_dictionaries_redis flakiness [#55813](https://github.com/ClickHouse/ClickHouse/pull/55813) ([Raúl Marín](https://github.com/Algunenano)).
* Add diagnostic checks for issue [#55041](https://github.com/ClickHouse/ClickHouse/issues/55041) [#55835](https://github.com/ClickHouse/ClickHouse/pull/55835) ([Robert Schulze](https://github.com/rschu1ze)).
* Correct aggregate functions ser/deserialization to be endianness-independent. [#55837](https://github.com/ClickHouse/ClickHouse/pull/55837) ([Austin Kothig](https://github.com/kothiga)).
* Bump gRPC to v1.37.1 [#55840](https://github.com/ClickHouse/ClickHouse/pull/55840) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix 00002_log_and_exception_messages_formatting [#55844](https://github.com/ClickHouse/ClickHouse/pull/55844) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix caching objects in pygithub, and changelogs [#55845](https://github.com/ClickHouse/ClickHouse/pull/55845) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update version_date.tsv and changelogs after v23.3.14.78-lts [#55847](https://github.com/ClickHouse/ClickHouse/pull/55847) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v23.9.2.56-stable [#55848](https://github.com/ClickHouse/ClickHouse/pull/55848) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v23.8.4.69-lts [#55849](https://github.com/ClickHouse/ClickHouse/pull/55849) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* fix node setting in the test [#55850](https://github.com/ClickHouse/ClickHouse/pull/55850) ([Sema Checherinda](https://github.com/CheSema)).
* Add load_metadata_threads to describe filesystem cache [#55863](https://github.com/ClickHouse/ClickHouse/pull/55863) ([Jordi Villar](https://github.com/jrdi)).
* One final leftover in diff_urls of PRInfo [#55874](https://github.com/ClickHouse/ClickHouse/pull/55874) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix digest check in replicated ddl worker [#55877](https://github.com/ClickHouse/ClickHouse/pull/55877) ([Sergei Trifonov](https://github.com/serxa)).
* Test parallel replicas with rollup [#55886](https://github.com/ClickHouse/ClickHouse/pull/55886) ([Raúl Marín](https://github.com/Algunenano)).
* Fix some tests with Replicated database [#55889](https://github.com/ClickHouse/ClickHouse/pull/55889) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Update stress.py [#55890](https://github.com/ClickHouse/ClickHouse/pull/55890) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Revert "Revert "Revert "Add settings for real-time updates during query execution""" [#55893](https://github.com/ClickHouse/ClickHouse/pull/55893) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Make test `system_zookeeper_connection` better [#55900](https://github.com/ClickHouse/ClickHouse/pull/55900) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* A test `01019_alter_materialized_view_consistent` is unstable with Analyzer [#55901](https://github.com/ClickHouse/ClickHouse/pull/55901) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove C++ templates, because they are stupid [#55910](https://github.com/ClickHouse/ClickHouse/pull/55910) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Bump gRPC to v1.39.1 [#55914](https://github.com/ClickHouse/ClickHouse/pull/55914) ([Robert Schulze](https://github.com/rschu1ze)).
* Add sanity check to RPNBuilderFunctionTreeNode [#55915](https://github.com/ClickHouse/ClickHouse/pull/55915) ([Robert Schulze](https://github.com/rschu1ze)).
* Bump gRPC to v1.42.0 [#55916](https://github.com/ClickHouse/ClickHouse/pull/55916) ([Robert Schulze](https://github.com/rschu1ze)).
* Set storage.has_lightweight_delete_parts flag when a part has been loaded [#55935](https://github.com/ClickHouse/ClickHouse/pull/55935) ([Alexander Gololobov](https://github.com/davenger)).
* Include information about supported versions in bug report issue template [#55937](https://github.com/ClickHouse/ClickHouse/pull/55937) ([Nikita Taranov](https://github.com/nickitat)).
* arrayFold: Switch accumulator and array arguments [#55948](https://github.com/ClickHouse/ClickHouse/pull/55948) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix overrides via connections_credentials in case of root directives exists [#55949](https://github.com/ClickHouse/ClickHouse/pull/55949) ([Azat Khuzhin](https://github.com/azat)).
* Test for Bug 43644 [#55955](https://github.com/ClickHouse/ClickHouse/pull/55955) ([Robert Schulze](https://github.com/rschu1ze)).
* Bump protobuf to v3.19.6 [#55963](https://github.com/ClickHouse/ClickHouse/pull/55963) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix possible performance test error [#55964](https://github.com/ClickHouse/ClickHouse/pull/55964) ([Azat Khuzhin](https://github.com/azat)).
* Avoid counting lost parts twice [#55987](https://github.com/ClickHouse/ClickHouse/pull/55987) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Convert unnecessary std::scoped_lock usage to std::lock_guard [#56006](https://github.com/ClickHouse/ClickHouse/pull/56006) ([Robert Schulze](https://github.com/rschu1ze)).
* Stress tests: Try to wait until server is responsive after gdb detach [#56009](https://github.com/ClickHouse/ClickHouse/pull/56009) ([Raúl Marín](https://github.com/Algunenano)).
* test_storage_s3_queue - add debug info [#56011](https://github.com/ClickHouse/ClickHouse/pull/56011) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Bump protobuf to v21.9 [#56014](https://github.com/ClickHouse/ClickHouse/pull/56014) ([Robert Schulze](https://github.com/rschu1ze)).
* Correct the implementation of function `jsonMergePatch` [#56020](https://github.com/ClickHouse/ClickHouse/pull/56020) ([Anton Popov](https://github.com/CurtizJ)).
* Fix 02438_sync_replica_lightweight [#56023](https://github.com/ClickHouse/ClickHouse/pull/56023) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix some bad code by making it worse [#56026](https://github.com/ClickHouse/ClickHouse/pull/56026) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix bash completion for mawk (and update format list and add one more delimiter) [#56050](https://github.com/ClickHouse/ClickHouse/pull/56050) ([Azat Khuzhin](https://github.com/azat)).
* Analyzer: Fix crash on window resolve [#56055](https://github.com/ClickHouse/ClickHouse/pull/56055) ([Dmitry Novik](https://github.com/novikd)).
* Fix function_json_value_return_type_allow_nullable setting name in doc [#56056](https://github.com/ClickHouse/ClickHouse/pull/56056) ([vdimir](https://github.com/vdimir)).
* Force shutdown in upgrade test [#56074](https://github.com/ClickHouse/ClickHouse/pull/56074) ([Raúl Marín](https://github.com/Algunenano)).
* Try enable `01154_move_partition_long` with s3 [#56080](https://github.com/ClickHouse/ClickHouse/pull/56080) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix race condition between DROP_RANGE and committing existing block [#56083](https://github.com/ClickHouse/ClickHouse/pull/56083) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix flakiness of 02263_lazy_mark_load [#56087](https://github.com/ClickHouse/ClickHouse/pull/56087) ([Michael Kolupaev](https://github.com/al13n321)).
* Make the code less bloated [#56091](https://github.com/ClickHouse/ClickHouse/pull/56091) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Maybe smaller binary [#56112](https://github.com/ClickHouse/ClickHouse/pull/56112) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove old trash from unit tests [#56113](https://github.com/ClickHouse/ClickHouse/pull/56113) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove some bloat [#56114](https://github.com/ClickHouse/ClickHouse/pull/56114) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix test_format_schema_on_server flakiness [#56116](https://github.com/ClickHouse/ClickHouse/pull/56116) ([Azat Khuzhin](https://github.com/azat)).
* Fix: schedule delayed part checks correctly [#56123](https://github.com/ClickHouse/ClickHouse/pull/56123) ([Igor Nikonov](https://github.com/devcrafter)).
* Beautify `show merges` [#56124](https://github.com/ClickHouse/ClickHouse/pull/56124) ([Denny Crane](https://github.com/den-crane)).
* Better options for disabling frame pointer omitting [#56130](https://github.com/ClickHouse/ClickHouse/pull/56130) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fix: incorrect brace style in clang-format [#56133](https://github.com/ClickHouse/ClickHouse/pull/56133) ([Igor Nikonov](https://github.com/devcrafter)).
* Do not try to activate covered parts when handilng unexpected parts [#56137](https://github.com/ClickHouse/ClickHouse/pull/56137) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Re-fix 'Block structure mismatch' on concurrent ALTER and INSERTs in Buffer table [#56140](https://github.com/ClickHouse/ClickHouse/pull/56140) ([Michael Kolupaev](https://github.com/al13n321)).
* Update version_date.tsv and changelogs after v23.3.15.29-lts [#56145](https://github.com/ClickHouse/ClickHouse/pull/56145) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v23.8.5.16-lts [#56146](https://github.com/ClickHouse/ClickHouse/pull/56146) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v23.9.3.12-stable [#56147](https://github.com/ClickHouse/ClickHouse/pull/56147) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v23.7.6.111-stable [#56148](https://github.com/ClickHouse/ClickHouse/pull/56148) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Fasttest timeout setting [#56160](https://github.com/ClickHouse/ClickHouse/pull/56160) ([Max K.](https://github.com/mkaynov)).
* Use monotonic clock for part check scheduling [#56162](https://github.com/ClickHouse/ClickHouse/pull/56162) ([Igor Nikonov](https://github.com/devcrafter)).
* More metrics for fs cache [#56165](https://github.com/ClickHouse/ClickHouse/pull/56165) ([Kseniia Sumarokova](https://github.com/kssenii)).
* FileCache minor changes [#56168](https://github.com/ClickHouse/ClickHouse/pull/56168) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Update 01414_mutations_and_errors_zookeeper.sh [#56176](https://github.com/ClickHouse/ClickHouse/pull/56176) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Shard fs cache keys [#56194](https://github.com/ClickHouse/ClickHouse/pull/56194) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Do less work when there lots of read requests and watches for same paths [#56197](https://github.com/ClickHouse/ClickHouse/pull/56197) ([Alexander Gololobov](https://github.com/davenger)).
* Remove skip_unused_shards tests from analyzer skiplist [#56200](https://github.com/ClickHouse/ClickHouse/pull/56200) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Easy tests fix for analyzer [#56211](https://github.com/ClickHouse/ClickHouse/pull/56211) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Add a warning if delayed accounting is not enabled (breaks OSIOWaitMicroseconds) [#56227](https://github.com/ClickHouse/ClickHouse/pull/56227) ([Azat Khuzhin](https://github.com/azat)).
* Do not remove part if `Too many open files` is thrown [#56238](https://github.com/ClickHouse/ClickHouse/pull/56238) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix ORC commit [#56261](https://github.com/ClickHouse/ClickHouse/pull/56261) ([Raúl Marín](https://github.com/Algunenano)).
* Fix typo in largestTriangleThreeBuckets.md [#56263](https://github.com/ClickHouse/ClickHouse/pull/56263) ([Nikita Taranov](https://github.com/nickitat)).

View File

@ -0,0 +1,31 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.3.15.29-lts (218336662e4) FIXME as compared to v23.3.14.78-lts (c8f4ba52c65)
#### Build/Testing/Packaging Improvement
* Backported in [#55671](https://github.com/ClickHouse/ClickHouse/issues/55671): If the database is already initialized, it doesn't need to be initialized again upon subsequent launches. This can potentially fix the issue of infinite container restarts when the database fails to load within 1000 attempts (relevant for very large databases and multi-node setups). [#50724](https://github.com/ClickHouse/ClickHouse/pull/50724) ([Alexander Nikolaev](https://github.com/AlexNik)).
* Backported in [#55734](https://github.com/ClickHouse/ClickHouse/issues/55734): Fix integration check python script to use gh api url - Add Readme for CI tests. [#55716](https://github.com/ClickHouse/ClickHouse/pull/55716) ([Max K.](https://github.com/mkaynov)).
* Backported in [#55829](https://github.com/ClickHouse/ClickHouse/issues/55829): Check sha512 for tgz; use a proper repository for keeper; write only filenames to TGZ.sha512 files for tarball packages. Prerequisite for [#31473](https://github.com/ClickHouse/ClickHouse/issues/31473). [#55717](https://github.com/ClickHouse/ClickHouse/pull/55717) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix bug with inability to drop detached partition in replicated merge tree on top of S3 without zero copy [#55309](https://github.com/ClickHouse/ClickHouse/pull/55309) ([alesapin](https://github.com/alesapin)).
* Fix crash in QueryNormalizer with cyclic aliases [#55602](https://github.com/ClickHouse/ClickHouse/pull/55602) ([vdimir](https://github.com/vdimir)).
* Fix window functions in case of sparse columns. [#55895](https://github.com/ClickHouse/ClickHouse/pull/55895) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
#### NO CL ENTRY
* NO CL ENTRY: 'Pin rust version to fix GLIBC compatibility check'. [#55788](https://github.com/ClickHouse/ClickHouse/pull/55788) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Fix incorrect createColumn call on join clause [#48998](https://github.com/ClickHouse/ClickHouse/pull/48998) ([Yi Sheng](https://github.com/ongkong)).
* Use `--filter` to reduce checkout time [#54857](https://github.com/ClickHouse/ClickHouse/pull/54857) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Improve enrich image [#55793](https://github.com/ClickHouse/ClickHouse/pull/55793) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* One final leftover in diff_urls of PRInfo [#55874](https://github.com/ClickHouse/ClickHouse/pull/55874) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).

View File

@ -0,0 +1,78 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.7.6.111-stable (6b047a47504) FIXME as compared to v23.7.5.30-stable (e86c21fb922)
#### Improvement
* Backported in [#54285](https://github.com/ClickHouse/ClickHouse/issues/54285): Enable allow_remove_stale_moving_parts by default. [#54260](https://github.com/ClickHouse/ClickHouse/pull/54260) ([vdimir](https://github.com/vdimir)).
#### Build/Testing/Packaging Improvement
* Backported in [#55291](https://github.com/ClickHouse/ClickHouse/issues/55291): Resource with source code including submodules is built in Darwin special build task. It may be used to build ClickHouse without checkouting submodules. [#51435](https://github.com/ClickHouse/ClickHouse/pull/51435) ([Ilya Yatsishin](https://github.com/qoega)).
* Backported in [#54705](https://github.com/ClickHouse/ClickHouse/issues/54705): Enrich `changed_images.json` with the latest tag from master for images that are not changed in the pull request. [#54369](https://github.com/ClickHouse/ClickHouse/pull/54369) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Backported in [#54683](https://github.com/ClickHouse/ClickHouse/issues/54683): We build and upload them for every push, which isn't worth it. [#54675](https://github.com/ClickHouse/ClickHouse/pull/54675) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#55407](https://github.com/ClickHouse/ClickHouse/issues/55407): Solve issue with launching standalone clickhouse-keeper from clickhouse-server package. [#55226](https://github.com/ClickHouse/ClickHouse/pull/55226) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#55723](https://github.com/ClickHouse/ClickHouse/issues/55723): Fix integration check python script to use gh api url - Add Readme for CI tests. [#55716](https://github.com/ClickHouse/ClickHouse/pull/55716) ([Max K.](https://github.com/mkaynov)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix recalculation of skip indexes and projections in `ALTER DELETE` queries [#52530](https://github.com/ClickHouse/ClickHouse/pull/52530) ([Anton Popov](https://github.com/CurtizJ)).
* RFC: Fix filtering by virtual columns with OR expression [#52653](https://github.com/ClickHouse/ClickHouse/pull/52653) ([Azat Khuzhin](https://github.com/azat)).
* Fix reading of unnecessary column in case of multistage `PREWHERE` [#52689](https://github.com/ClickHouse/ClickHouse/pull/52689) ([Anton Popov](https://github.com/CurtizJ)).
* Fix sorting of sparse columns with large limit [#52827](https://github.com/ClickHouse/ClickHouse/pull/52827) ([Anton Popov](https://github.com/CurtizJ)).
* Fix reading of empty `Nested(Array(LowCardinality(...)))` [#52949](https://github.com/ClickHouse/ClickHouse/pull/52949) ([Anton Popov](https://github.com/CurtizJ)).
* Fix adding sub-second intervals to DateTime [#53309](https://github.com/ClickHouse/ClickHouse/pull/53309) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix: moved to prewhere condition actions can lose column [#53492](https://github.com/ClickHouse/ClickHouse/pull/53492) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix crash in join on sparse column [#53548](https://github.com/ClickHouse/ClickHouse/pull/53548) ([vdimir](https://github.com/vdimir)).
* Fix named_collection_admin alias [#54066](https://github.com/ClickHouse/ClickHouse/pull/54066) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix rows_before_limit_at_least for DelayedSource. [#54122](https://github.com/ClickHouse/ClickHouse/pull/54122) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix: allow IPv6 for bloom filter [#54200](https://github.com/ClickHouse/ClickHouse/pull/54200) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Check for overflow before addition in `analysisOfVariance` function [#54385](https://github.com/ClickHouse/ClickHouse/pull/54385) ([Antonio Andelic](https://github.com/antonio2368)).
* reproduce and fix the bug in removeSharedRecursive [#54430](https://github.com/ClickHouse/ClickHouse/pull/54430) ([Sema Checherinda](https://github.com/CheSema)).
* Fix aggregate projections with normalized states [#54480](https://github.com/ClickHouse/ClickHouse/pull/54480) ([Amos Bird](https://github.com/amosbird)).
* Fix possible parsing error in WithNames formats with disabled input_format_with_names_use_header [#54513](https://github.com/ClickHouse/ClickHouse/pull/54513) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix rare case of CHECKSUM_DOESNT_MATCH error [#54549](https://github.com/ClickHouse/ClickHouse/pull/54549) ([alesapin](https://github.com/alesapin)).
* Fix zero copy garbage [#54550](https://github.com/ClickHouse/ClickHouse/pull/54550) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix "Invalid number of rows in Chunk" in MaterializedPostgreSQL [#54844](https://github.com/ClickHouse/ClickHouse/pull/54844) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Prevent attaching parts from tables with different projections or indices [#55062](https://github.com/ClickHouse/ClickHouse/pull/55062) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Fix deadlock in LDAP assigned role update [#55119](https://github.com/ClickHouse/ClickHouse/pull/55119) ([Julian Maicher](https://github.com/jmaicher)).
* Fix storage Iceberg files retrieval [#55144](https://github.com/ClickHouse/ClickHouse/pull/55144) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix functions execution over sparse columns [#55275](https://github.com/ClickHouse/ClickHouse/pull/55275) ([Azat Khuzhin](https://github.com/azat)).
* Fix bug with inability to drop detached partition in replicated merge tree on top of S3 without zero copy [#55309](https://github.com/ClickHouse/ClickHouse/pull/55309) ([alesapin](https://github.com/alesapin)).
* Fix trash optimization (up to a certain extent) [#55353](https://github.com/ClickHouse/ClickHouse/pull/55353) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix parsing of arrays in cast operator [#55417](https://github.com/ClickHouse/ClickHouse/pull/55417) ([Anton Popov](https://github.com/CurtizJ)).
* Fix filtering by virtual columns with OR filter in query [#55418](https://github.com/ClickHouse/ClickHouse/pull/55418) ([Azat Khuzhin](https://github.com/azat)).
* Fix MongoDB connection issues [#55419](https://github.com/ClickHouse/ClickHouse/pull/55419) ([Nikolay Degterinsky](https://github.com/evillique)).
* Destroy fiber in case of exception in cancelBefore in AsyncTaskExecutor [#55516](https://github.com/ClickHouse/ClickHouse/pull/55516) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix crash in QueryNormalizer with cyclic aliases [#55602](https://github.com/ClickHouse/ClickHouse/pull/55602) ([vdimir](https://github.com/vdimir)).
* Fix filtering by virtual columns with OR filter in query (resubmit) [#55678](https://github.com/ClickHouse/ClickHouse/pull/55678) ([Azat Khuzhin](https://github.com/azat)).
* Fix window functions in case of sparse columns. [#55895](https://github.com/ClickHouse/ClickHouse/pull/55895) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
#### NO CL CATEGORY
* Backported in [#55704](https://github.com/ClickHouse/ClickHouse/issues/55704):. [#55657](https://github.com/ClickHouse/ClickHouse/pull/55657) ([Antonio Andelic](https://github.com/antonio2368)).
#### NO CL ENTRY
* NO CL ENTRY: 'Revert "Merge pull request [#52395](https://github.com/ClickHouse/ClickHouse/issues/52395) from azat/rust/reproducible-builds"'. [#55517](https://github.com/ClickHouse/ClickHouse/pull/55517) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Test libunwind changes. [#51436](https://github.com/ClickHouse/ClickHouse/pull/51436) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Retry blob listing in test_alter_moving_garbage [#52193](https://github.com/ClickHouse/ClickHouse/pull/52193) ([vdimir](https://github.com/vdimir)).
* add tests with connection reset by peer error, and retry it inside client [#52441](https://github.com/ClickHouse/ClickHouse/pull/52441) ([Sema Checherinda](https://github.com/CheSema)).
* Small fix for HTTPHeaderFilter [#53146](https://github.com/ClickHouse/ClickHouse/pull/53146) ([San](https://github.com/santrancisco)).
* fix Logical Error in AsynchronousBoundedReadBuffer [#53651](https://github.com/ClickHouse/ClickHouse/pull/53651) ([Sema Checherinda](https://github.com/CheSema)).
* Replace dlcdn.apache.org by archive domain [#54081](https://github.com/ClickHouse/ClickHouse/pull/54081) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix segfault in system.zookeeper [#54326](https://github.com/ClickHouse/ClickHouse/pull/54326) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix CI skip build and skip tests checks [#54532](https://github.com/ClickHouse/ClickHouse/pull/54532) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Update WebObjectStorage.cpp [#54695](https://github.com/ClickHouse/ClickHouse/pull/54695) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Use `--filter` to reduce checkout time [#54857](https://github.com/ClickHouse/ClickHouse/pull/54857) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* check if block is empty after async insert retries [#55143](https://github.com/ClickHouse/ClickHouse/pull/55143) ([Han Fei](https://github.com/hanfei1991)).
* MaterializedPostgreSQL: remove back check [#55297](https://github.com/ClickHouse/ClickHouse/pull/55297) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Remove existing moving/ dir if allow_remove_stale_moving_parts is off [#55480](https://github.com/ClickHouse/ClickHouse/pull/55480) ([Mike Kot](https://github.com/myrrc)).
* One final leftover in diff_urls of PRInfo [#55874](https://github.com/ClickHouse/ClickHouse/pull/55874) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).

View File

@ -0,0 +1,24 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.8.5.16-lts (e8a1af5fe2f) FIXME as compared to v23.8.4.69-lts (d4d1e7b9ded)
#### Build/Testing/Packaging Improvement
* Backported in [#55830](https://github.com/ClickHouse/ClickHouse/issues/55830): Check sha512 for tgz; use a proper repository for keeper; write only filenames to TGZ.sha512 files for tarball packages. Prerequisite for [#31473](https://github.com/ClickHouse/ClickHouse/issues/31473). [#55717](https://github.com/ClickHouse/ClickHouse/pull/55717) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix storage Iceberg files retrieval [#55144](https://github.com/ClickHouse/ClickHouse/pull/55144) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Try to fix possible segfault in Native ORC input format [#55891](https://github.com/ClickHouse/ClickHouse/pull/55891) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix window functions in case of sparse columns. [#55895](https://github.com/ClickHouse/ClickHouse/pull/55895) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Use `--filter` to reduce checkout time [#54857](https://github.com/ClickHouse/ClickHouse/pull/54857) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* One final leftover in diff_urls of PRInfo [#55874](https://github.com/ClickHouse/ClickHouse/pull/55874) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Bring relevant commits from backport/23.8/55336 to 23.8 [#56029](https://github.com/ClickHouse/ClickHouse/pull/56029) ([Austin Kothig](https://github.com/kothiga)).

View File

@ -0,0 +1,20 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.9.3.12-stable (b7230b06563) FIXME as compared to v23.9.2.56-stable (a1bf3f1de55)
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix storage Iceberg files retrieval [#55144](https://github.com/ClickHouse/ClickHouse/pull/55144) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Try to fix possible segfault in Native ORC input format [#55891](https://github.com/ClickHouse/ClickHouse/pull/55891) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix window functions in case of sparse columns. [#55895](https://github.com/ClickHouse/ClickHouse/pull/55895) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Use `--filter` to reduce checkout time [#54857](https://github.com/ClickHouse/ClickHouse/pull/54857) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* One final leftover in diff_urls of PRInfo [#55874](https://github.com/ClickHouse/ClickHouse/pull/55874) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).

View File

@ -11,6 +11,8 @@ This is intended for continuous integration checks that run on Linux servers. If
The cross-build for macOS is based on the [Build instructions](../development/build.md), follow them first.
The following sections provide a walk-through for building ClickHouse for `x86_64` macOS. If youre targeting ARM architecture, simply substitute all occurrences of `x86_64` with `aarch64`. For example, replace `x86_64-apple-darwin` with `aarch64-apple-darwin` throughout the steps.
## Install Clang-17
Follow the instructions from https://apt.llvm.org/ for your Ubuntu or Debian setup.
@ -30,13 +32,13 @@ export CCTOOLS=$(cd ~/cctools && pwd)
mkdir ${CCTOOLS}
cd ${CCTOOLS}
git clone https://github.com/tpoechtrager/apple-libtapi.git
git clone --depth=1 https://github.com/tpoechtrager/apple-libtapi.git
cd apple-libtapi
INSTALLPREFIX=${CCTOOLS} ./build.sh
./install.sh
cd ..
git clone https://github.com/tpoechtrager/cctools-port.git
git clone --depth=1 https://github.com/tpoechtrager/cctools-port.git
cd cctools-port/cctools
./configure --prefix=$(readlink -f ${CCTOOLS}) --with-libtapi=$(readlink -f ${CCTOOLS}) --target=x86_64-apple-darwin
make install
@ -46,7 +48,7 @@ Also, we need to download macOS X SDK into the working tree.
``` bash
cd ClickHouse/cmake/toolchain/darwin-x86_64
curl -L 'https://github.com/phracker/MacOSX-SDKs/releases/download/10.15/MacOSX10.15.sdk.tar.xz' | tar xJ --strip-components=1
curl -L 'https://github.com/phracker/MacOSX-SDKs/releases/download/11.3/MacOSX11.0.sdk.tar.xz' | tar xJ --strip-components=1
```
## Build ClickHouse {#build-clickhouse}

View File

@ -23,7 +23,7 @@ sudo bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"
``` bash
cd ClickHouse
mkdir build-riscv64
CC=clang-16 CXX=clang++-16 cmake . -Bbuild-riscv64 -G Ninja -DCMAKE_TOOLCHAIN_FILE=cmake/linux/toolchain-riscv64.cmake -DGLIBC_COMPATIBILITY=OFF -DENABLE_LDAP=OFF -DOPENSSL_NO_ASM=ON -DENABLE_JEMALLOC=ON -DENABLE_PARQUET=OFF -DENABLE_GRPC=OFF -DENABLE_HDFS=OFF -DENABLE_MYSQL=OFF
CC=clang-17 CXX=clang++-17 cmake . -Bbuild-riscv64 -G Ninja -DCMAKE_TOOLCHAIN_FILE=cmake/linux/toolchain-riscv64.cmake -DGLIBC_COMPATIBILITY=OFF -DENABLE_LDAP=OFF -DOPENSSL_NO_ASM=ON -DENABLE_JEMALLOC=ON -DENABLE_PARQUET=OFF -DENABLE_GRPC=OFF -DENABLE_HDFS=OFF -DENABLE_MYSQL=OFF
ninja -C build-riscv64
```

View File

@ -219,13 +219,21 @@ You can also run your custom-built ClickHouse binary with the config file from t
## IDE (Integrated Development Environment) {#ide-integrated-development-environment}
If you do not know which IDE to use, we recommend that you use CLion. CLion is commercial software, but it offers 30 days free trial period. It is also free of charge for students. CLion can be used both on Linux and on macOS.
**CLion (recommended)**
KDevelop and QTCreator are other great alternatives of an IDE for developing ClickHouse. KDevelop comes in as a very handy IDE although unstable. If KDevelop crashes after a while upon opening project, you should click “Stop All” button as soon as it has opened the list of projects files. After doing so KDevelop should be fine to work with.
If you do not know which IDE to use, we recommend that you use [CLion](https://www.jetbrains.com/clion/). CLion is commercial software but it offers a 30 day free trial. It is also free of charge for students. CLion can be used on both Linux and macOS.
As simple code editors, you can use Sublime Text or Visual Studio Code, or Kate (all of which are available on Linux).
A few things to know when using CLion to develop ClickHouse:
Just in case, it is worth mentioning that CLion creates `build` path on its own, it also on its own selects `debug` for build type, for configuration it uses a version of CMake that is defined in CLion and not the one installed by you, and finally, CLion will use `make` to run build tasks instead of `ninja`. This is normal behaviour, just keep that in mind to avoid confusion.
- CLion creates a `build` path on its own and automatically selects `debug` for the build type
- It uses a version of CMake that is defined in CLion and not the one installed by you
- CLion will use `make` to run build tasks instead of `ninja` (this is normal behavior)
**Other alternatives**
[KDevelop](https://kdevelop.org/) and [QTCreator](https://www.qt.io/product/development-tools) are other great alternative IDEs for developing ClickHouse. While KDevelop is a great IDE, it is sometimes unstable. If KDevelop crashes when opening a project, you should click the “Stop All” button as soon as it has opened the list of projects files. After doing so, KDevelop should be fine to work with.
Other IDEs you can use are [Sublime Text](https://www.sublimetext.com/), [Visual Studio Code](https://code.visualstudio.com/), or [Kate](https://kate-editor.org/) (all of which are available on Linux). If you are using VS Code, we recommend using the [clangd extension](https://marketplace.visualstudio.com/items?itemName=llvm-vs-code-extensions.vscode-clangd) to replace IntelliSense as it is much more performant.
## Writing Code {#writing-code}

View File

@ -28,7 +28,6 @@ SETTINGS
kafka_topic_list = 'topic1,topic2,...',
kafka_group_name = 'group_name',
kafka_format = 'data_format'[,]
[kafka_row_delimiter = 'delimiter_symbol',]
[kafka_schema = '',]
[kafka_num_consumers = N,]
[kafka_max_block_size = 0,]
@ -53,7 +52,6 @@ Required parameters:
Optional parameters:
- `kafka_row_delimiter` — Delimiter character, which ends the message. **This setting is deprecated and is no longer used, not left for compatibility reasons.**
- `kafka_schema` — Parameter that must be used if the format requires a schema definition. For example, [Capn Proto](https://capnproto.org/) requires the path to the schema file and the name of the root `schema.capnp:Message` object.
- `kafka_num_consumers` — The number of consumers per table. Specify more consumers if the throughput of one consumer is insufficient. The total number of consumers should not exceed the number of partitions in the topic, since only one consumer can be assigned per partition, and must not be greater than the number of physical cores on the server where ClickHouse is deployed. Default: `1`.
- `kafka_max_block_size` — The maximum batch size (in messages) for poll. Default: [max_insert_block_size](../../../operations/settings/settings.md#setting-max_insert_block_size).
@ -64,7 +62,7 @@ Optional parameters:
- `kafka_poll_max_batch_size` — Maximum amount of messages to be polled in a single Kafka poll. Default: [max_block_size](../../../operations/settings/settings.md#setting-max_block_size).
- `kafka_flush_interval_ms` — Timeout for flushing data from Kafka. Default: [stream_flush_interval_ms](../../../operations/settings/settings.md#stream-flush-interval-ms).
- `kafka_thread_per_consumer` — Provide independent thread for each consumer. When enabled, every consumer flush the data independently, in parallel (otherwise — rows from several consumers squashed to form one block). Default: `0`.
- `kafka_handle_error_mode` — How to handle errors for Kafka engine. Possible values: default, stream.
- `kafka_handle_error_mode` — How to handle errors for Kafka engine. Possible values: default (the exception will be thrown if we fail to parse a message), stream (the exception message and raw message will be saved in virtual columns `_error` and `_raw_message`).
- `kafka_commit_on_select` — Commit messages when select query is made. Default: `false`.
- `kafka_max_rows_per_message` — The maximum number of rows written in one kafka message for row-based formats. Default : `1`.
@ -249,6 +247,13 @@ Example:
- `_headers.name` — Array of message's headers keys.
- `_headers.value` — Array of message's headers values.
Additional virtual columns when `kafka_handle_error_mode='stream'`:
- `_raw_message` - Raw message that couldn't be parsed successfully.
- `_error` - Exception message happened during failed parsing.
Note: `_raw_message` and `_error` virtual columns are filled only in case of exception during parsing, they are always empty when message was parsed successfully.
## Data formats support {#data-formats-support}
Kafka engine supports all [formats](../../../interfaces/formats.md) supported in ClickHouse.

View File

@ -25,7 +25,6 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
nats_url = 'host:port',
nats_subjects = 'subject1,subject2,...',
nats_format = 'data_format'[,]
[nats_row_delimiter = 'delimiter_symbol',]
[nats_schema = '',]
[nats_num_consumers = N,]
[nats_queue_group = 'group_name',]
@ -40,7 +39,8 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
[nats_password = 'password',]
[nats_token = 'clickhouse',]
[nats_startup_connect_tries = '5']
[nats_max_rows_per_message = 1]
[nats_max_rows_per_message = 1,]
[nats_handle_error_mode = 'default']
```
Required parameters:
@ -51,7 +51,6 @@ Required parameters:
Optional parameters:
- `nats_row_delimiter` Delimiter character, which ends the message. **This setting is deprecated and is no longer used, not left for compatibility reasons.**
- `nats_schema` Parameter that must be used if the format requires a schema definition. For example, [Capn Proto](https://capnproto.org/) requires the path to the schema file and the name of the root `schema.capnp:Message` object.
- `nats_num_consumers` The number of consumers per table. Default: `1`. Specify more consumers if the throughput of one consumer is insufficient.
- `nats_queue_group` Name for queue group of NATS subscribers. Default is the table name.
@ -66,6 +65,7 @@ Optional parameters:
- `nats_token` - NATS auth token.
- `nats_startup_connect_tries` - Number of connect tries at startup. Default: `5`.
- `nats_max_rows_per_message` — The maximum number of rows written in one NATS message for row-based formats. (default : `1`).
- `nats_handle_error_mode` — How to handle errors for RabbitMQ engine. Possible values: default (the exception will be thrown if we fail to parse a message), stream (the exception message and raw message will be saved in virtual columns `_error` and `_raw_message`).
SSL connection:
@ -165,6 +165,14 @@ If you want to change the target table by using `ALTER`, we recommend disabling
- `_subject` - NATS message subject.
Additional virtual columns when `kafka_handle_error_mode='stream'`:
- `_raw_message` - Raw message that couldn't be parsed successfully.
- `_error` - Exception message happened during failed parsing.
Note: `_raw_message` and `_error` virtual columns are filled only in case of exception during parsing, they are always empty when message was parsed successfully.
## Data formats support {#data-formats-support}
NATS engine supports all [formats](../../../interfaces/formats.md) supported in ClickHouse.

View File

@ -28,7 +28,6 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
[rabbitmq_exchange_type = 'exchange_type',]
[rabbitmq_routing_key_list = 'key1,key2,...',]
[rabbitmq_secure = 0,]
[rabbitmq_row_delimiter = 'delimiter_symbol',]
[rabbitmq_schema = '',]
[rabbitmq_num_consumers = N,]
[rabbitmq_num_queues = N,]
@ -45,7 +44,8 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
[rabbitmq_username = '',]
[rabbitmq_password = '',]
[rabbitmq_commit_on_select = false,]
[rabbitmq_max_rows_per_message = 1]
[rabbitmq_max_rows_per_message = 1,]
[rabbitmq_handle_error_mode = 'default']
```
Required parameters:
@ -58,7 +58,6 @@ Optional parameters:
- `rabbitmq_exchange_type` The type of RabbitMQ exchange: `direct`, `fanout`, `topic`, `headers`, `consistent_hash`. Default: `fanout`.
- `rabbitmq_routing_key_list` A comma-separated list of routing keys.
- `rabbitmq_row_delimiter` Delimiter character, which ends the message. **This setting is deprecated and is no longer used, not left for compatibility reasons.**
- `rabbitmq_schema` Parameter that must be used if the format requires a schema definition. For example, [Capn Proto](https://capnproto.org/) requires the path to the schema file and the name of the root `schema.capnp:Message` object.
- `rabbitmq_num_consumers` The number of consumers per table. Specify more consumers if the throughput of one consumer is insufficient. Default: `1`
- `rabbitmq_num_queues` Total number of queues. Increasing this number can significantly improve performance. Default: `1`.
@ -78,6 +77,7 @@ Optional parameters:
- `rabbitmq_max_rows_per_message` — The maximum number of rows written in one RabbitMQ message for row-based formats. Default : `1`.
- `rabbitmq_empty_queue_backoff_start` — A start backoff point to reschedule read if the rabbitmq queue is empty.
- `rabbitmq_empty_queue_backoff_end` — An end backoff point to reschedule read if the rabbitmq queue is empty.
- `rabbitmq_handle_error_mode` — How to handle errors for RabbitMQ engine. Possible values: default (the exception will be thrown if we fail to parse a message), stream (the exception message and raw message will be saved in virtual columns `_error` and `_raw_message`).
@ -191,6 +191,13 @@ Example:
- `_message_id` - messageID of the received message; non-empty if was set, when message was published.
- `_timestamp` - timestamp of the received message; non-empty if was set, when message was published.
Additional virtual columns when `kafka_handle_error_mode='stream'`:
- `_raw_message` - Raw message that couldn't be parsed successfully.
- `_error` - Exception message happened during failed parsing.
Note: `_raw_message` and `_error` virtual columns are filled only in case of exception during parsing, they are always empty when message was parsed successfully.
## Data formats support {#data-formats-support}
RabbitMQ engine supports all [formats](../../../interfaces/formats.md) supported in ClickHouse.

View File

@ -866,6 +866,7 @@ Tags:
- `prefer_not_to_merge` — Disables merging of data parts on this volume. When this setting is enabled, merging data on this volume is not allowed. This allows controlling how ClickHouse works with slow disks.
- `perform_ttl_move_on_insert` — Disables TTL move on data part INSERT. By default (if enabled) if we insert a data part that already expired by the TTL move rule it immediately goes to a volume/disk declared in move rule. This can significantly slowdown insert in case if destination volume/disk is slow (e.g. S3). If disabled then already expired data part is written into a default volume and then right after moved to TTL volume.
- `load_balancing` - Policy for disk balancing, `round_robin` or `least_used`.
- `least_used_ttl_ms` - Configure timeout (in milliseconds) for the updating available space on all disks (`0` - update always, `-1` - never update, default is `60000`). Note, if the disk can be used by ClickHouse only and is not subject to a online filesystem resize/shrink you can use `-1`, in all other cases it is not recommended, since eventually it will lead to incorrect space distribution.
Configuration examples:

View File

@ -48,61 +48,61 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] AS [db2.]name2
#### policy_name
`policy_name` - (optionally) policy name, it will be used to store temporary files for async send
`policy_name` - (optionally) policy name, it will be used to store temporary files for background send
**See Also**
- [insert_distributed_sync](../../../operations/settings/settings.md#insert_distributed_sync) setting
- [distributed_foreground_insert](../../../operations/settings/settings.md#distributed_foreground_insert) setting
- [MergeTree](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes) for the examples
### Distributed Settings
#### fsync_after_insert
`fsync_after_insert` - do the `fsync` for the file data after asynchronous insert to Distributed. Guarantees that the OS flushed the whole inserted data to a file **on the initiator node** disk.
`fsync_after_insert` - do the `fsync` for the file data after background insert to Distributed. Guarantees that the OS flushed the whole inserted data to a file **on the initiator node** disk.
#### fsync_directories
`fsync_directories` - do the `fsync` for directories. Guarantees that the OS refreshed directory metadata after operations related to asynchronous inserts on Distributed table (after insert, after sending the data to shard, etc).
`fsync_directories` - do the `fsync` for directories. Guarantees that the OS refreshed directory metadata after operations related to background inserts on Distributed table (after insert, after sending the data to shard, etc).
#### bytes_to_throw_insert
`bytes_to_throw_insert` - if more than this number of compressed bytes will be pending for async INSERT, an exception will be thrown. 0 - do not throw. Default 0.
`bytes_to_throw_insert` - if more than this number of compressed bytes will be pending for background INSERT, an exception will be thrown. 0 - do not throw. Default 0.
#### bytes_to_delay_insert
`bytes_to_delay_insert` - if more than this number of compressed bytes will be pending for async INSERT, the query will be delayed. 0 - do not delay. Default 0.
`bytes_to_delay_insert` - if more than this number of compressed bytes will be pending for background INSERT, the query will be delayed. 0 - do not delay. Default 0.
#### max_delay_to_insert
`max_delay_to_insert` - max delay of inserting data into Distributed table in seconds, if there are a lot of pending bytes for async send. Default 60.
`max_delay_to_insert` - max delay of inserting data into Distributed table in seconds, if there are a lot of pending bytes for background send. Default 60.
#### monitor_batch_inserts
#### background_insert_batch
`monitor_batch_inserts` - same as [distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts)
`background_insert_batch` - same as [distributed_background_insert_batch](../../../operations/settings/settings.md#distributed_background_insert_batch)
#### monitor_split_batch_on_failure
#### background_insert_split_batch_on_failure
`monitor_split_batch_on_failure` - same as [distributed_directory_monitor_split_batch_on_failure](../../../operations/settings/settings.md#distributed_directory_monitor_split_batch_on_failure)
`background_insert_split_batch_on_failure` - same as [distributed_background_insert_split_batch_on_failure](../../../operations/settings/settings.md#distributed_background_insert_split_batch_on_failure)
#### monitor_sleep_time_ms
#### background_insert_sleep_time_ms
`monitor_sleep_time_ms` - same as [distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms)
`background_insert_sleep_time_ms` - same as [distributed_background_insert_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_sleep_time_ms)
#### monitor_max_sleep_time_ms
#### background_insert_max_sleep_time_ms
`monitor_max_sleep_time_ms` - same as [distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms)
`background_insert_max_sleep_time_ms` - same as [distributed_background_insert_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_max_sleep_time_ms)
:::note
**Durability settings** (`fsync_...`):
- Affect only asynchronous INSERTs (i.e. `insert_distributed_sync=false`) when data first stored on the initiator node disk and later asynchronously send to shards.
- Affect only background INSERTs (i.e. `distributed_foreground_insert=false`) when data first stored on the initiator node disk and later, in background, send to shards.
- May significantly decrease the inserts' performance
- Affect writing the data stored inside Distributed table folder into the **node which accepted your insert**. If you need to have guarantees of writing data to underlying MergeTree tables - see durability settings (`...fsync...`) in `system.merge_tree_settings`
For **Insert limit settings** (`..._insert`) see also:
- [insert_distributed_sync](../../../operations/settings/settings.md#insert_distributed_sync) setting
- [distributed_foreground_insert](../../../operations/settings/settings.md#distributed_foreground_insert) setting
- [prefer_localhost_replica](../../../operations/settings/settings.md#settings-prefer-localhost-replica) setting
- `bytes_to_throw_insert` handled before `bytes_to_delay_insert`, so you should not set it to the value less then `bytes_to_delay_insert`
:::
@ -232,7 +232,7 @@ You should be concerned about the sharding scheme in the following cases:
- Queries are used that require joining data (`IN` or `JOIN`) by a specific key. If data is sharded by this key, you can use local `IN` or `JOIN` instead of `GLOBAL IN` or `GLOBAL JOIN`, which is much more efficient.
- A large number of servers is used (hundreds or more) with a large number of small queries, for example, queries for data of individual clients (e.g. websites, advertisers, or partners). In order for the small queries to not affect the entire cluster, it makes sense to locate data for a single client on a single shard. Alternatively, you can set up bi-level sharding: divide the entire cluster into “layers”, where a layer may consist of multiple shards. Data for a single client is located on a single layer, but shards can be added to a layer as necessary, and data is randomly distributed within them. `Distributed` tables are created for each layer, and a single shared distributed table is created for global queries.
Data is written asynchronously. When inserted in the table, the data block is just written to the local file system. The data is sent to the remote servers in the background as soon as possible. The periodicity for sending data is managed by the [distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms) and [distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms) settings. The `Distributed` engine sends each file with inserted data separately, but you can enable batch sending of files with the [distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts) setting. This setting improves cluster performance by better utilizing local server and network resources. You should check whether data is sent successfully by checking the list of files (data waiting to be sent) in the table directory: `/var/lib/clickhouse/data/database/table/`. The number of threads performing background tasks can be set by [background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size) setting.
Data is written in background. When inserted in the table, the data block is just written to the local file system. The data is sent to the remote servers in the background as soon as possible. The periodicity for sending data is managed by the [distributed_background_insert_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_sleep_time_ms) and [distributed_background_insert_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_max_sleep_time_ms) settings. The `Distributed` engine sends each file with inserted data separately, but you can enable batch sending of files with the [distributed_background_insert_batch](../../../operations/settings/settings.md#distributed_background_insert_batch) setting. This setting improves cluster performance by better utilizing local server and network resources. You should check whether data is sent successfully by checking the list of files (data waiting to be sent) in the table directory: `/var/lib/clickhouse/data/database/table/`. The number of threads performing background tasks can be set by [background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size) setting.
If the server ceased to exist or had a rough restart (for example, due to a hardware failure) after an `INSERT` to a `Distributed` table, the inserted data might be lost. If a damaged data part is detected in the table directory, it is transferred to the `broken` subdirectory and no longer used.

View File

@ -0,0 +1,105 @@
---
slug: /en/engines/table-engines/special/filelog
sidebar_position: 160
sidebar_label: FileLog
---
# FileLog Engine {#filelog-engine}
This engine allows to process application log files as a stream of records.
`FileLog` lets you:
- Subscribe to log files.
- Process new records as they are appended to subscribed log files.
## Creating a Table {#creating-a-table}
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE = FileLog('path_to_logs', 'format_name') SETTINGS
[poll_timeout_ms = 0,]
[poll_max_batch_size = 0,]
[max_block_size = 0,]
[max_threads = 0,]
[poll_directory_watch_events_backoff_init = 500,]
[poll_directory_watch_events_backoff_max = 32000,]
[poll_directory_watch_events_backoff_factor = 2,]
[handle_error_mode = 'default']
```
Engine arguments:
- `path_to_logs` Path to log files to subscribe. It can be path to a directory with log files or to a single log file. Note that ClickHouse allows only paths inside `user_files` directory.
- `format_name` - Record format. Note that FileLog process each line in a file as a separate record and not all data formats are suitable for it.
Optional parameters:
- `poll_timeout_ms` - Timeout for single poll from log file. Default: [stream_poll_timeout_ms](../../../operations/settings/settings.md#stream_poll_timeout_ms).
- `poll_max_batch_size` — Maximum amount of records to be polled in a single poll. Default: [max_block_size](../../../operations/settings/settings.md#setting-max_block_size).
- `max_block_size` — The maximum batch size (in records) for poll. Default: [max_insert_block_size](../../../operations/settings/settings.md#setting-max_insert_block_size).
- `max_threads` - Number of max threads to parse files, default is 0, which means the number will be max(1, physical_cpu_cores / 4).
- `poll_directory_watch_events_backoff_init` - The initial sleep value for watch directory thread. Default: `500`.
- `poll_directory_watch_events_backoff_max` - The max sleep value for watch directory thread. Default: `32000`.
- `poll_directory_watch_events_backoff_factor` - The speed of backoff, exponential by default. Default: `2`.
- `handle_error_mode` — How to handle errors for FileLog engine. Possible values: default (the exception will be thrown if we fail to parse a message), stream (the exception message and raw message will be saved in virtual columns `_error` and `_raw_message`).
## Description {#description}
The delivered records are tracked automatically, so each record in a log file is only counted once.
`SELECT` is not particularly useful for reading records (except for debugging), because each record can be read only once. It is more practical to create real-time threads using [materialized views](../../../sql-reference/statements/create/view.md). To do this:
1. Use the engine to create a FileLog table and consider it a data stream.
2. Create a table with the desired structure.
3. Create a materialized view that converts data from the engine and puts it into a previously created table.
When the `MATERIALIZED VIEW` joins the engine, it starts collecting data in the background. This allows you to continually receive records from log files and convert them to the required format using `SELECT`.
One FileLog table can have as many materialized views as you like, they do not read data from the table directly, but receive new records (in blocks), this way you can write to several tables with different detail level (with grouping - aggregation and without).
Example:
``` sql
CREATE TABLE logs (
timestamp UInt64,
level String,
message String
) ENGINE = FileLog('user_files/my_app/app.log', 'JSONEachRow');
CREATE TABLE daily (
day Date,
level String,
total UInt64
) ENGINE = SummingMergeTree(day, (day, level), 8192);
CREATE MATERIALIZED VIEW consumer TO daily
AS SELECT toDate(toDateTime(timestamp)) AS day, level, count() as total
FROM queue GROUP BY day, level;
SELECT level, sum(total) FROM daily GROUP BY level;
```
To stop receiving streams data or to change the conversion logic, detach the materialized view:
``` sql
DETACH TABLE consumer;
ATTACH TABLE consumer;
```
If you want to change the target table by using `ALTER`, we recommend disabling the material view to avoid discrepancies between the target table and the data from the view.
## Virtual Columns {#virtual-columns}
- `_filename` - Name of the log file.
- `_offset` - Offset in the log file.
Additional virtual columns when `kafka_handle_error_mode='stream'`:
- `_raw_record` - Raw record that couldn't be parsed successfully.
- `_error` - Exception message happened during failed parsing.
Note: `_raw_record` and `_error` virtual columns are filled only in case of exception during parsing, they are always empty when message was parsed successfully.

View File

@ -74,6 +74,7 @@ The supported formats are:
| [ArrowStream](#data-format-arrow-stream) | ✔ | ✔ |
| [ORC](#data-format-orc) | ✔ | ✔ |
| [One](#data-format-one) | ✔ | ✗ |
| [Npy](#data-format-npy) | ✔ | ✗ |
| [RowBinary](#rowbinary) | ✔ | ✔ |
| [RowBinaryWithNames](#rowbinarywithnamesandtypes) | ✔ | ✔ |
| [RowBinaryWithNamesAndTypes](#rowbinarywithnamesandtypes) | ✔ | ✔ |
@ -2454,6 +2455,50 @@ Result:
└──────────────┘
```
## Npy {#data-format-npy}
This function is designed to load a NumPy array from a .npy file into ClickHouse. The NumPy file format is a binary format used for efficiently storing arrays of numerical data. During import, ClickHouse treats top level dimension as an array of rows with single column. Supported Npy data types and their corresponding type in ClickHouse:
| Npy type | ClickHouse type |
|:--------:|:---------------:|
| b1 | UInt8 |
| i1 | Int8 |
| i2 | Int16 |
| i4 | Int32 |
| i8 | Int64 |
| u1 | UInt8 |
| u2 | UInt16 |
| u4 | UInt32 |
| u8 | UInt64 |
| f4 | Float32 |
| f8 | Float64 |
| S | String |
| U | String |
**Example of saving an array in .npy format using Python**
```Python
import numpy as np
arr = np.array([[[1],[2],[3]],[[4],[5],[6]]])
np.save('example_array.npy', arr)
```
**Example of reading a NumPy file in ClickHouse**
Query:
```sql
SELECT *
FROM file('example_array.npy', Npy)
```
Result:
```
┌─array─────────┐
│ [[1],[2],[3]] │
│ [[4],[5],[6]] │
└───────────────┘
```
## LineAsString {#lineasstring}
In this format, every line of input data is interpreted as a single string value. This format can only be parsed for table with a single field of type [String](/docs/en/sql-reference/data-types/string.md). The remaining columns must be set to [DEFAULT](/docs/en/sql-reference/statements/create/table.md/#default) or [MATERIALIZED](/docs/en/sql-reference/statements/create/table.md/#materialized), or omitted.

View File

@ -1823,6 +1823,10 @@ The trailing slash is mandatory.
## Prometheus {#prometheus}
:::note
ClickHouse Cloud does not currently support connecting to Prometheus. To be notified when this feature is supported, please contact support@clickhouse.com.
:::
Exposing metrics data for scraping from [Prometheus](https://prometheus.io).
Settings:
@ -2758,3 +2762,7 @@ Proxy settings are determined in the following order:
ClickHouse will check the highest priority resolver type for the request protocol. If it is not defined,
it will check the next highest priority resolver type, until it reaches the environment resolver.
This also allows a mix of resolver types can be used.
### disable_tunneling_for_https_requests_over_http_proxy {#disable_tunneling_for_https_requests_over_http_proxy}
By default, tunneling (i.e, `HTTP CONNECT`) is used to make `HTTPS` requests over `HTTP` proxy. This setting can be used to disable it.

View File

@ -106,6 +106,15 @@ Possible values:
Default value: 0.
## max_bytes_before_external_sort {#settings-max_bytes_before_external_sort}
Enables or disables execution of `ORDER BY` clauses in external memory. See [ORDER BY Implementation Details](../../sql-reference/statements/select/order-by.md#implementation-details)
- Maximum volume of RAM (in bytes) that can be used by the single [ORDER BY](../../sql-reference/statements/select/order-by.md) operation. Recommended value is half of available system memory
- 0 — `ORDER BY` in external memory disabled.
Default value: 0.
## max_rows_to_sort {#max-rows-to-sort}
A maximum number of rows before sorting. This allows you to limit memory consumption when sorting.
@ -163,7 +172,27 @@ If you set `timeout_before_checking_execution_speed `to 0, ClickHouse will use c
## timeout_overflow_mode {#timeout-overflow-mode}
What to do if the query is run longer than max_execution_time: throw or break. By default, throw.
What to do if the query is run longer than `max_execution_time`: `throw` or `break`. By default, `throw`.
# max_execution_time_leaf
Similar semantic to `max_execution_time` but only apply on leaf node for distributed or remote queries.
For example, if we want to limit execution time on leaf node to `10s` but no limit on the initial node, instead of having `max_execution_time` in the nested subquery settings:
``` sql
SELECT count() FROM cluster(cluster, view(SELECT * FROM t SETTINGS max_execution_time = 10));
```
We can use `max_execution_time_leaf` as the query settings:
``` sql
SELECT count() FROM cluster(cluster, view(SELECT * FROM t)) SETTINGS max_execution_time_leaf = 10;
```
# timeout_overflow_mode_leaf
What to do when the query in leaf node run longer than `max_execution_time_leaf`: `throw` or `break`. By default, `throw`.
## min_execution_speed {#min-execution-speed}

View File

@ -2473,7 +2473,7 @@ See also:
- [distributed_replica_error_cap](#distributed_replica_error_cap)
- [distributed_replica_error_half_life](#settings-distributed_replica_error_half_life)
## distributed_directory_monitor_sleep_time_ms {#distributed_directory_monitor_sleep_time_ms}
## distributed_background_insert_sleep_time_ms {#distributed_background_insert_sleep_time_ms}
Base interval for the [Distributed](../../engines/table-engines/special/distributed.md) table engine to send data. The actual interval grows exponentially in the event of errors.
@ -2483,9 +2483,9 @@ Possible values:
Default value: 100 milliseconds.
## distributed_directory_monitor_max_sleep_time_ms {#distributed_directory_monitor_max_sleep_time_ms}
## distributed_background_insert_max_sleep_time_ms {#distributed_background_insert_max_sleep_time_ms}
Maximum interval for the [Distributed](../../engines/table-engines/special/distributed.md) table engine to send data. Limits exponential growth of the interval set in the [distributed_directory_monitor_sleep_time_ms](#distributed_directory_monitor_sleep_time_ms) setting.
Maximum interval for the [Distributed](../../engines/table-engines/special/distributed.md) table engine to send data. Limits exponential growth of the interval set in the [distributed_background_insert_sleep_time_ms](#distributed_background_insert_sleep_time_ms) setting.
Possible values:
@ -2493,7 +2493,7 @@ Possible values:
Default value: 30000 milliseconds (30 seconds).
## distributed_directory_monitor_batch_inserts {#distributed_directory_monitor_batch_inserts}
## distributed_background_insert_batch {#distributed_background_insert_batch}
Enables/disables inserted data sending in batches.
@ -2506,13 +2506,13 @@ Possible values:
Default value: 0.
## distributed_directory_monitor_split_batch_on_failure {#distributed_directory_monitor_split_batch_on_failure}
## distributed_background_insert_split_batch_on_failure {#distributed_background_insert_split_batch_on_failure}
Enables/disables splitting batches on failures.
Sometimes sending particular batch to the remote shard may fail, because of some complex pipeline after (i.e. `MATERIALIZED VIEW` with `GROUP BY`) due to `Memory limit exceeded` or similar errors. In this case, retrying will not help (and this will stuck distributed sends for the table) but sending files from that batch one by one may succeed INSERT.
So installing this setting to `1` will disable batching for such batches (i.e. temporary disables `distributed_directory_monitor_batch_inserts` for failed batches).
So installing this setting to `1` will disable batching for such batches (i.e. temporary disables `distributed_background_insert_batch` for failed batches).
Possible values:
@ -2695,15 +2695,15 @@ Possible values:
Default value: 0.
## insert_distributed_sync {#insert_distributed_sync}
## distributed_foreground_insert {#distributed_foreground_insert}
Enables or disables synchronous data insertion into a [Distributed](../../engines/table-engines/special/distributed.md/#distributed) table.
By default, when inserting data into a `Distributed` table, the ClickHouse server sends data to cluster nodes in asynchronous mode. When `insert_distributed_sync=1`, the data is processed synchronously, and the `INSERT` operation succeeds only after all the data is saved on all shards (at least one replica for each shard if `internal_replication` is true).
By default, when inserting data into a `Distributed` table, the ClickHouse server sends data to cluster nodes in background mode. When `distributed_foreground_insert=1`, the data is processed synchronously, and the `INSERT` operation succeeds only after all the data is saved on all shards (at least one replica for each shard if `internal_replication` is true).
Possible values:
- 0 — Data is inserted in asynchronous mode.
- 0 — Data is inserted in background mode.
- 1 — Data is inserted in synchronous mode.
Default value: `0`.
@ -2762,7 +2762,7 @@ Result:
## use_compact_format_in_distributed_parts_names {#use_compact_format_in_distributed_parts_names}
Uses compact format for storing blocks for async (`insert_distributed_sync`) INSERT into tables with `Distributed` engine.
Uses compact format for storing blocks for background (`distributed_foreground_insert`) INSERT into tables with `Distributed` engine.
Possible values:
@ -2772,7 +2772,7 @@ Possible values:
Default value: `1`.
:::note
- with `use_compact_format_in_distributed_parts_names=0` changes from cluster definition will not be applied for async INSERT.
- with `use_compact_format_in_distributed_parts_names=0` changes from cluster definition will not be applied for background INSERT.
- with `use_compact_format_in_distributed_parts_names=1` changing the order of the nodes in the cluster definition, will change the `shard_index`/`replica_index` so be aware.
:::
@ -3944,6 +3944,16 @@ Possible values:
Default value: `0`.
## force_optimize_projection_name {#force-optimize-projection_name}
If it is set to a non-empty string, check that this projection is used in the query at least once.
Possible values:
- string: name of projection that used in a query
Default value: `''`.
## alter_sync {#alter-sync}
Allows to set up waiting for actions to be executed on replicas by [ALTER](../../sql-reference/statements/alter/index.md), [OPTIMIZE](../../sql-reference/statements/optimize.md) or [TRUNCATE](../../sql-reference/statements/truncate.md) queries.
@ -3956,6 +3966,10 @@ Possible values:
Default value: `1`.
:::note
`alter_sync` is applicable to `Replicated` tables only, it does nothing to alters of not `Replicated` tables.
:::
## replication_wait_for_inactive_replica_timeout {#replication-wait-for-inactive-replica-timeout}
Specifies how long (in seconds) to wait for inactive replicas to execute [ALTER](../../sql-reference/statements/alter/index.md), [OPTIMIZE](../../sql-reference/statements/optimize.md) or [TRUNCATE](../../sql-reference/statements/truncate.md) queries.
@ -4780,6 +4794,10 @@ a Tuple(
)
```
## analyze_index_with_space_filling_curves
If a table has a space-filling curve in its index, e.g. `ORDER BY mortonEncode(x, y)`, and the query has conditions on its arguments, e.g. `x >= 10 AND x <= 20 AND y >= 20 AND y <= 30`, use the space-filling curve for index analysis.
## dictionary_use_async_executor {#dictionary_use_async_executor}
Execute a pipeline for reading dictionary source in several threads. It's supported only by dictionaries with local CLICKHOUSE source.

View File

@ -78,6 +78,11 @@ If procfs is supported and enabled on the system, ClickHouse server collects the
- `OSReadBytes`
- `OSWriteBytes`
:::note
`OSIOWaitMicroseconds` is disabled by default in Linux kernels starting from 5.14.x.
You can enable it using `sudo sysctl kernel.task_delayacct=1` or by creating a `.conf` file in `/etc/sysctl.d/` with `kernel.task_delayacct = 1`
:::
## Related content
- Blog: [System Tables and a window into the internals of ClickHouse](https://clickhouse.com/blog/clickhouse-debugging-issues-with-system-tables)

View File

@ -115,7 +115,7 @@ Parameters:
<settings>
<connect_timeout>3</connect_timeout>
<!-- Sync insert is set forcibly, leave it here just in case. -->
<insert_distributed_sync>1</insert_distributed_sync>
<distributed_foreground_insert>1</distributed_foreground_insert>
</settings>
<!-- Copying tasks description.

View File

@ -44,7 +44,7 @@ INSERT INTO map_map VALUES
('2000-01-01', '2000-01-01 00:00:00', (['c', 'd', 'e'], [10, 10, 10])),
('2000-01-01', '2000-01-01 00:01:00', (['d', 'e', 'f'], [10, 10, 10])),
('2000-01-01', '2000-01-01 00:01:00', (['f', 'g', 'g'], [10, 10, 10]));
SELECT
timeslot,
sumMap(status),
@ -317,6 +317,15 @@ FROM people
└────────┴───────────────────────────┘
```
## -ArgMin
The suffix -ArgMin can be appended to the name of any aggregate function. In this case, the aggregate function accepts an additional argument, which should be any comparable expression. The aggregate function processes only the rows that have the minimum value for the specified extra expression.
Examples: `sumArgMin(column, expr)`, `countArgMin(expr)`, `avgArgMin(x, expr)` and so on.
## -ArgMax
Similar to suffix -ArgMin but processes only the rows that have the maximum value for the specified extra expression.
## Related Content

View File

@ -60,7 +60,7 @@ SELECT largestTriangleThreeBuckets(4)(x, y) FROM largestTriangleThreeBuckets_tes
Result:
``` text
┌────────largestTriangleThreeBuckets(3)(x, y)───────────┐
┌────────largestTriangleThreeBuckets(4)(x, y)───────────┐
│ [(1,10),(3,15),(5,40),(10,70)] │
└───────────────────────────────────────────────────────┘
```

View File

@ -899,11 +899,11 @@ Other types are not supported yet. The function returns the attribute for the pr
Data must completely fit into RAM.
## Dictionary Updates {#dictionary-updates}
## Refreshing dictionary data using LIFETIME {#lifetime}
ClickHouse periodically updates the dictionaries. The update interval for fully downloaded dictionaries and the invalidation interval for cached dictionaries are defined in the `lifetime` tag in seconds.
ClickHouse periodically updates dictionaries based on the `LIFETIME` tag (defined in seconds). `LIFETIME` is the update interval for fully downloaded dictionaries and the invalidation interval for cached dictionaries.
Dictionary updates (other than loading for first use) do not block queries. During updates, the old version of a dictionary is used. If an error occurs during an update, the error is written to the server log, and queries continue using the old version of dictionaries.
During updates, the old version of a dictionary can still be queried. Dictionary updates (other than when loading the dictionary for first use) do not block queries. If an error occurs during an update, the error is written to the server log and queries can continue using the old version of the dictionary. If a dictionary update is successful, the old version of the dictionary is replaced atomically.
Example of settings:

View File

@ -1557,10 +1557,10 @@ Returns for a given date, the number of days passed since [1 January 0000](https
toDaysSinceYearZero(date[, time_zone])
```
Aliases: `TO_DAYS`
Alias: `TO_DAYS`
**Arguments**
- `date` — The date to calculate the number of days passed since year zero from. [Date](../../sql-reference/data-types/date.md), [Date32](../../sql-reference/data-types/date32.md), [DateTime](../../sql-reference/data-types/datetime.md) or [DateTime64](../../sql-reference/data-types/datetime64.md).
- `time_zone` — A String type const value or a expression represent the time zone. [String types](../../sql-reference/data-types/string.md)
@ -1584,6 +1584,56 @@ Result:
└────────────────────────────────────────────┘
```
**See Also**
- [fromDaysSinceYearZero](#fromDaysSinceYearZero)
## fromDaysSinceYearZero
Returns for a given number of days passed since [1 January 0000](https://en.wikipedia.org/wiki/Year_zero) the corresponding date in the [proleptic Gregorian calendar defined by ISO 8601](https://en.wikipedia.org/wiki/Gregorian_calendar#Proleptic_Gregorian_calendar). The calculation is the same as in MySQL's [`FROM_DAYS()`](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_from-days) function.
The result is undefined if it cannot be represented within the bounds of the [Date](../../sql-reference/data-types/date.md) type.
**Syntax**
``` sql
fromDaysSinceYearZero(days)
```
Alias: `FROM_DAYS`
**Arguments**
- `days` — The number of days passed since year zero.
**Returned value**
The date corresponding to the number of days passed since year zero.
Type: [Date](../../sql-reference/data-types/date.md).
**Example**
``` sql
SELECT fromDaysSinceYearZero(739136), fromDaysSinceYearZero(toDaysSinceYearZero(toDate('2023-09-08')));
```
Result:
``` text
┌─fromDaysSinceYearZero(739136)─┬─fromDaysSinceYearZero(toDaysSinceYearZero(toDate('2023-09-08')))─┐
│ 2023-09-08 │ 2023-09-08 │
└───────────────────────────────┴──────────────────────────────────────────────────────────────────┘
```
**See Also**
- [toDaysSinceYearZero](#toDaysSinceYearZero)
## fromDaysSinceYearZero32
Like [fromDaysSinceYearZero](#fromDaysSinceYearZero) but returns a [Date32](../../sql-reference/data-types/date32.md).
## age
Returns the `unit` component of the difference between `startdate` and `enddate`. The difference is calculated using a precision of 1 microsecond.
@ -2016,7 +2066,7 @@ Result:
## addDate
Adds the time interval or date interval to the provided date or date with time.
Adds the time interval to the provided date, date with time or String-encoded date / date with time.
If the addition results in a value outside the bounds of the data type, the result is undefined.
@ -2028,7 +2078,7 @@ addDate(date, interval)
**Arguments**
- `date` — The date or date with time to which `interval` is added. [Date](../../sql-reference/data-types/date.md), [Date32](../../sql-reference/data-types/date32.md), [DateTime](../../sql-reference/data-types/datetime.md) or [DateTime64](../../sql-reference/data-types/datetime64.md).
- `date` — The date or date with time to which `interval` is added. [Date](../../sql-reference/data-types/date.md), [Date32](../../sql-reference/data-types/date32.md), [DateTime](../../sql-reference/data-types/datetime.md), [DateTime64](../../sql-reference/data-types/datetime64.md), or [String](../../sql-reference/data-types/string.md)
- `interval` — Interval to add. [Interval](../../sql-reference/data-types/special-data-types/interval.md).
**Returned value**
@ -2059,7 +2109,7 @@ Alias: `ADDDATE`
## subDate
Subtracts the time interval or date interval from the provided date or date with time.
Subtracts the time interval from the provided date, date with time or String-encoded date / date with time.
If the subtraction results in a value outside the bounds of the data type, the result is undefined.
@ -2071,7 +2121,7 @@ subDate(date, interval)
**Arguments**
- `date` — The date or date with time from which `interval` is subtracted. [Date](../../sql-reference/data-types/date.md), [Date32](../../sql-reference/data-types/date32.md), [DateTime](../../sql-reference/data-types/datetime.md) or [DateTime64](../../sql-reference/data-types/datetime64.md).
- `date` — The date or date with time from which `interval` is subtracted. [Date](../../sql-reference/data-types/date.md), [Date32](../../sql-reference/data-types/date32.md), [DateTime](../../sql-reference/data-types/datetime.md), [DateTime64](../../sql-reference/data-types/datetime64.md), or [String](../../sql-reference/data-types/string.md)
- `interval` — Interval to subtract. [Interval](../../sql-reference/data-types/special-data-types/interval.md).
**Returned value**

View File

@ -68,45 +68,6 @@ WHERE macro = 'test';
└───────┴──────────────┘
```
## getHttpHeader
Returns the value of specified http header.If there is no such header or the request method is not http, it will return empty string.
**Syntax**
```sql
getHttpHeader(name);
```
**Arguments**
- `name` — Http header name .[String](../../sql-reference/data-types/string.md#string)
**Returned value**
Value of the specified header.
Type:[String](../../sql-reference/data-types/string.md#string).
When we use `clickhouse-client` to execute this function, we'll always get empty string, because client doesn't use http protocol.
```sql
SELECT getHttpHeader('test')
```
result:
```text
┌─getHttpHeader('test')─┐
│ │
└───────────────────────┘
```
Try to use http request:
```shell
echo "select getHttpHeader('X-Clickhouse-User')" | curl -H 'X-ClickHouse-User: default' -H 'X-ClickHouse-Key: ' 'http://localhost:8123/' -d @-
#result
default
```
## FQDN
Returns the fully qualified domain name of the ClickHouse server.

View File

@ -10,10 +10,10 @@ Attaches a table or a dictionary, for example, when moving a database to another
**Syntax**
``` sql
ATTACH TABLE|DICTIONARY [IF NOT EXISTS] [db.]name [ON CLUSTER cluster] ...
ATTACH TABLE|DICTIONARY|DATABASE [IF NOT EXISTS] [db.]name [ON CLUSTER cluster] ...
```
The query does not create data on the disk, but assumes that data is already in the appropriate places, and just adds information about the table or the dictionary to the server. After executing the `ATTACH` query, the server will know about the existence of the table or the dictionary.
The query does not create data on the disk, but assumes that data is already in the appropriate places, and just adds information about the specified table, dictionary or database to the server. After executing the `ATTACH` query, the server will know about the existence of the table, dictionary or database.
If a table was previously detached ([DETACH](../../sql-reference/statements/detach.md) query), meaning that its structure is known, you can use shorthand without defining the structure.
@ -79,3 +79,13 @@ Attaches a previously detached dictionary.
``` sql
ATTACH DICTIONARY [IF NOT EXISTS] [db.]name [ON CLUSTER cluster]
```
## Attach Existing Database
Attaches a previously detached database.
**Syntax**
``` sql
ATTACH DATABASE [IF NOT EXISTS] name [ENGINE=<database engine>] [ON CLUSTER cluster]
```

View File

@ -136,8 +136,30 @@ Output:
└──────────────┴───────────┴──────────────────────────────────────────┘
```
If the checksums.txt file is missing, it can be restored. It will be recalculated and rewritten during the execution of the CHECK TABLE command for the specific partition, and the status will still be reported as 'success.'"
If the checksums.txt file is missing, it can be restored. It will be recalculated and rewritten during the execution of the CHECK TABLE command for the specific partition, and the status will still be reported as 'is_passed = 1'.
You can check all existing `(Replicated)MergeTree` tables at once by using the `CHECK ALL TABLES` query.
```sql
CHECK ALL TABLES
FORMAT PrettyCompactMonoBlock
SETTINGS check_query_single_value_result = 0
```
```text
┌─database─┬─table────┬─part_path───┬─is_passed─┬─message─┐
│ default │ t2 │ all_1_95_3 │ 1 │ │
│ db1 │ table_01 │ all_39_39_0 │ 1 │ │
│ default │ t1 │ all_39_39_0 │ 1 │ │
│ db1 │ t1 │ all_39_39_0 │ 1 │ │
│ db1 │ table_01 │ all_1_6_1 │ 1 │ │
│ default │ t1 │ all_1_6_1 │ 1 │ │
│ db1 │ t1 │ all_1_6_1 │ 1 │ │
│ db1 │ table_01 │ all_7_38_2 │ 1 │ │
│ db1 │ t1 │ all_7_38_2 │ 1 │ │
│ default │ t1 │ all_7_38_2 │ 1 │ │
└──────────┴──────────┴─────────────┴───────────┴─────────┘
```
## If the Data Is Corrupted

View File

@ -228,7 +228,7 @@ hex(hexed): 5A90B714
Calculated columns (synonym). Column of this type are not stored in the table and it is not possible to INSERT values into them.
When SELECT queries explicitly reference columns of this type, the value is computed at query time from `expr`. By default, `SELECT *` excludes ALIAS columns. This behavior can be disabled with setting `asteriks_include_alias_columns`.
When SELECT queries explicitly reference columns of this type, the value is computed at query time from `expr`. By default, `SELECT *` excludes ALIAS columns. This behavior can be disabled with setting `asterisk_include_alias_columns`.
When using the ALTER query to add new columns, old data for these columns is not written. Instead, when reading old data that does not have values for the new columns, expressions are computed on the fly by default. However, if running the expressions requires different columns that are not indicated in the query, these columns will additionally be read, but only for the blocks of data that need it.

View File

@ -4,60 +4,88 @@ sidebar_position: 36
sidebar_label: DELETE
description: Lightweight deletes simplify the process of deleting data from the database.
keywords: [delete]
title: DELETE Statement
title: The Lightweight DELETE Statement
---
The lightweight `DELETE` statement removes rows from the table `[db.]table` that match the expression `expr`. It is only available for the *MergeTree table engine family.
``` sql
DELETE FROM [db.]table [ON CLUSTER cluster] WHERE expr
DELETE FROM [db.]table [ON CLUSTER cluster] WHERE expr;
```
`DELETE FROM` removes rows from the table `[db.]table` that match the expression `expr`. The deleted rows are marked as deleted immediately and will be automatically filtered out of all subsequent queries. Cleanup of data happens asynchronously in the background. This feature is only available for the MergeTree table engine family.
It is called "lightweight `DELETE`" to contrast it to the [ALTER table DELETE](/en/sql-reference/statements/alter/delete) command, which is a heavyweight process.
For example, the following query deletes all rows from the `hits` table where the `Title` column contains the text `hello`:
## Examples
```sql
-- Deletes all rows from the `hits` table where the `Title` column contains the text `hello`
DELETE FROM hits WHERE Title LIKE '%hello%';
```
Lightweight deletes are asynchronous by default. Set `mutations_sync` equal to 1 to wait for one replica to process the statement, and set `mutations_sync` to 2 to wait for all replicas.
## Lightweight `DELETE` does not delete data from storage immediately
With lightweight `DELETE`, deleted rows are internally marked as deleted immediately and will be automatically filtered out of all subsequent queries. However, cleanup of data happens during the next merge. As a result, it is possible that for an unspecified period, data is not actually deleted from storage and is only marked as deleted.
If you need to guarantee that your data is deleted from storage in a predictable time, consider using the [ALTER table DELETE](/en/sql-reference/statements/alter/delete) command. Note that deleting data using `ALTER table DELETE` may consume significant resources as it recreates all affected parts.
## Deleting large amounts of data
Large deletes can negatively affect ClickHouse performance. If you are attempting to delete all rows from a table, consider using the [`TRUNCATE TABLE`](/en/sql-reference/statements/truncate) command.
If you anticipate frequent deletes, consider using a [custom partitioning key](/en/engines/table-engines/mergetree-family/custom-partitioning-key). You can then use the [`ALTER TABLE...DROP PARTITION`](/en/sql-reference/statements/alter/partition#drop-partitionpart) command to quickly drop all rows associated with that partition.
## Limitations of lightweight `DELETE`
### Lightweight `DELETE`s do not work with projections
Currently, `DELETE` does not work for tables with projections. This is because rows in a projection may be affected by a `DELETE` operation and may require the projection to be rebuilt, negatively affecting `DELETE` performance.
## Performance considerations when using lightweight `DELETE`
**Deleting large volumes of data with the lightweight `DELETE` statement can negatively affect SELECT query performance.**
The following can also negatively impact lightweight `DELETE` performance:
- A heavy `WHERE` condition in a `DELETE` query.
- If the mutations queue is filled with many other mutations, this can possibly lead to performance issues as all mutations on a table are executed sequentially.
- The affected table having a very large number of data parts.
- Having a lot of data in compact parts. In a Compact part, all columns are stored in one file.
## Delete permissions
`DELETE` requires the `ALTER DELETE` privilege. To enable `DELETE` statements on a specific table for a given user, run the following command:
:::note
`DELETE FROM` requires the `ALTER DELETE` privilege:
```sql
grant ALTER DELETE ON db.table to username;
GRANT ALTER DELETE ON db.table to username;
```
:::
## Lightweight Delete Internals
## How lightweight DELETEs work internally in ClickHouse
The idea behind Lightweight Delete is that when a `DELETE FROM table ...` query is executed ClickHouse only saves a mask where each row is marked as either “existing” or as “deleted”. Those “deleted” rows become invisible for subsequent queries, but physically the rows are removed only later by subsequent merges. Writing this mask is usually much more lightweight than what is done by `ALTER table DELETE ...` query.
1. A "mask" is applied to affected rows
### How it is implemented
The mask is implemented as a hidden `_row_exists` system column that stores True for all visible rows and False for deleted ones. This column is only present in a part if some rows in this part were deleted. In other words, the column is not persisted when it has all values equal to True.
When a `DELETE FROM table ...` query is executed, ClickHouse saves a mask where each row is marked as either “existing” or as “deleted”. Those “deleted” rows are omitted for subsequent queries. However, rows are actually only removed later by subsequent merges. Writing this mask is much more lightweight than what is done by an `ALTER table DELETE` query.
## SELECT query
When the column is present `SELECT ... FROM table WHERE condition` query internally is extended by an additional predicate on `_row_exists` and becomes similar to
The mask is implemented as a hidden `_row_exists` system column that stores `True` for all visible rows and `False` for deleted ones. This column is only present in a part if some rows in the part were deleted. This column does not exist when a part has all values equal to `True`.
2. `SELECT` queries are transformed to include the mask
When a masked column is used in a query, the `SELECT ... FROM table WHERE condition` query internally is extended by the predicate on `_row_exists` and is transformed to:
```sql
SELECT ... FROM table PREWHERE _row_exists WHERE condition
SELECT ... FROM table PREWHERE _row_exists WHERE condition
```
At execution time the column `_row_exists` is read to figure out which rows are not visible and if there are many deleted rows it can figure out which granules can be fully skipped when reading the rest of the columns.
At execution time, the column `_row_exists` is read to determine which rows should not be returned. If there are many deleted rows, ClickHouse can determine which granules can be fully skipped when reading the rest of the columns.
## DELETE query
`DELETE FROM table WHERE condition` is translated into `ALTER table UPDATE _row_exists = 0 WHERE condition` mutation. Internally this mutation is executed in 2 steps:
1. `SELECT count() FROM table WHERE condition` for each individual part to figure out if the part is affected.
2. Mutate affected parts, and make hardlinks for unaffected parts. Mutating a part in fact only writes `_row_exists` column and just hardlinks all other columns files in the case of Wide parts. But for Compact parts, all columns are rewritten because they all are stored together in one file.
3. `DELETE` queries are transformed to `ALTER table UPDATE` queries
So if we compare Lightweight Delete to `ALTER DELETE` in the first step they both do the same thing to figure out which parts are affected, but in the second step `ALTER DELETE` does much more work because it reads and rewrites all columns files for the affected parts.
The `DELETE FROM table WHERE condition` is translated into an `ALTER table UPDATE _row_exists = 0 WHERE condition` mutation.
With the described implementation now we can see what can negatively affect 'DELETE FROM' execution time:
- Heavy WHERE condition in DELETE query
- Mutations queue filled with other mutations, because all mutations on a table are executed sequentially
- Table having a very large number of data parts
- Having a lot of data in Compact parts—in a Compact part, all columns are stored in one file.
Internally, this mutation is executed in two steps:
:::note
Currently, Lightweight delete does not work for tables with projection as rows in projection may be affected and require the projection to be rebuilt. Rebuilding projection makes the deletion not lightweight, so this is not supported.
:::
1. A `SELECT count() FROM table WHERE condition` command is executed for each individual part to determine if the part is affected.
2. Based on the commands above, affected parts are then mutated, and hardlinks are created for unaffected parts. In the case of wide parts, the `_row_exists` column for each row is updated and all other columns' files are hardlinked. For compact parts, all columns are re-written because they are all stored together in one file.
From the steps above, we can see that lightweight deletes using the masking technique improves performance over traditional `ALTER table DELETE` commands because `ALTER table DELETE` reads and re-writes all the columns' files for affected parts.
## Related content

View File

@ -5,17 +5,17 @@ sidebar_label: DETACH
title: "DETACH Statement"
---
Makes the server "forget" about the existence of a table, a materialized view, or a dictionary.
Makes the server "forget" about the existence of a table, a materialized view, a dictionary, or a database.
**Syntax**
``` sql
DETACH TABLE|VIEW|DICTIONARY [IF EXISTS] [db.]name [ON CLUSTER cluster] [PERMANENTLY] [SYNC]
DETACH TABLE|VIEW|DICTIONARY|DATABASE [IF EXISTS] [db.]name [ON CLUSTER cluster] [PERMANENTLY] [SYNC]
```
Detaching does not delete the data or metadata of a table, a materialized view or a dictionary. If an entity was not detached `PERMANENTLY`, on the next server launch the server will read the metadata and recall the table/view/dictionary again. If an entity was detached `PERMANENTLY`, there will be no automatic recall.
Detaching does not delete the data or metadata of a table, a materialized view, a dictionary or a database. If an entity was not detached `PERMANENTLY`, on the next server launch the server will read the metadata and recall the table/view/dictionary/database again. If an entity was detached `PERMANENTLY`, there will be no automatic recall.
Whether a table or a dictionary was detached permanently or not, in both cases you can reattach them using the [ATTACH](../../sql-reference/statements/attach.md) query.
Whether a table, a dictionary or a database was detached permanently or not, in both cases you can reattach them using the [ATTACH](../../sql-reference/statements/attach.md) query.
System log tables can be also attached back (e.g. `query_log`, `text_log`, etc). Other system tables can't be reattached. On the next server launch the server will recall those tables again.
`ATTACH MATERIALIZED VIEW` does not work with short syntax (without `SELECT`), but you can attach it using the `ATTACH TABLE` query.

View File

@ -668,6 +668,15 @@ If either `LIKE` or `ILIKE` clause is specified, the query returns a list of sys
Returns a list of merges. All merges are listed in the [system.merges](../../operations/system-tables/merges.md) table.
- `table` -- Table name.
- `database` -- The name of the database the table is in.
- `estimate_complete` -- The estimated time to complete (in seconds).
- `elapsed` -- The time elapsed (in seconds) since the merge started.
- `progress` -- The percentage of completed work (0-100 percent).
- `is_mutation` -- 1 if this process is a part mutation.
- `size_compressed` -- The total size of the compressed data of the merged parts.
- `memory_usage` -- Memory consumption of the merge process.
**Syntax**
@ -686,10 +695,9 @@ SHOW MERGES;
Result:
```text
┌─table──────┬─database─┬─estimate_complete─┬─────elapsed─┬─progress─┬─is_mutation─┬─size─────┬─mem───────┐
│ your_table │ default │ 0.14 │ 0.365592338 │ 0.73 │ 0 │ 5.40 MiB │ 10.25 MiB │
└────────────┴──────────┴───────────────────┴─────────────┴──────────┴─────────────┴────────────┴─────────┘
┌─table──────┬─database─┬─estimate_complete─┬─elapsed─┬─progress─┬─is_mutation─┬─size_compressed─┬─memory_usage─┐
│ your_table │ default │ 0.14 │ 0.36 │ 73.01 │ 0 │ 5.40 MiB │ 10.25 MiB │
└────────────┴──────────┴───────────────────┴─────────┴──────────┴─────────────┴─────────────────┴──────────────┘
```
Query:
@ -701,9 +709,8 @@ SHOW MERGES LIKE 'your_t%' LIMIT 1;
Result:
```text
┌─table──────┬─database─┬─estimate_complete─┬─────elapsed─┬─progress─┬─is_mutation─┬─size─────┬─mem───────┐
│ your_table │ default │ 0.05 │ 1.727629065 │ 0.97 │ 0 │ 5.40 MiB │ 10.25 MiB │
└────────────┴──────────┴───────────────────┴─────────────┴──────────┴─────────────┴────────────┴─────────┘
┌─table──────┬─database─┬─estimate_complete─┬─elapsed─┬─progress─┬─is_mutation─┬─size_compressed─┬─memory_usage─┐
│ your_table │ default │ 0.14 │ 0.36 │ 73.01 │ 0 │ 5.40 MiB │ 10.25 MiB │
└────────────┴──────────┴───────────────────┴─────────┴──────────┴─────────────┴─────────────────┴──────────────┘
```

View File

@ -166,7 +166,7 @@ Aborts ClickHouse process (like `kill -9 {$ pid_clickhouse-server}`)
## Managing Distributed Tables
ClickHouse can manage [distributed](../../engines/table-engines/special/distributed.md) tables. When a user inserts data into these tables, ClickHouse first creates a queue of the data that should be sent to cluster nodes, then asynchronously sends it. You can manage queue processing with the [STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends), [FLUSH DISTRIBUTED](#query_language-system-flush-distributed), and [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends) queries. You can also synchronously insert distributed data with the [insert_distributed_sync](../../operations/settings/settings.md#insert_distributed_sync) setting.
ClickHouse can manage [distributed](../../engines/table-engines/special/distributed.md) tables. When a user inserts data into these tables, ClickHouse first creates a queue of the data that should be sent to cluster nodes, then asynchronously sends it. You can manage queue processing with the [STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends), [FLUSH DISTRIBUTED](#query_language-system-flush-distributed), and [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends) queries. You can also synchronously insert distributed data with the [distributed_foreground_insert](../../operations/settings/settings.md#distributed_foreground_insert) setting.
### STOP DISTRIBUTED SENDS

View File

@ -22,7 +22,7 @@ sidebar_label: Distributed
Смотрите также:
- настройка `insert_distributed_sync`
- настройка `distributed_foreground_insert`
- [MergeTree](../mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes) для примера
Пример:
@ -131,7 +131,7 @@ logs - имя кластера в конфигурационном файле с
- используются запросы, требующие соединение данных (IN, JOIN) по определённому ключу - тогда если данные шардированы по этому ключу, то можно использовать локальные IN, JOIN вместо GLOBAL IN, GLOBAL JOIN, что кардинально более эффективно.
- используется большое количество серверов (сотни и больше) и большое количество маленьких запросов (запросы отдельных клиентов - сайтов, рекламодателей, партнёров) - тогда, для того, чтобы маленькие запросы не затрагивали весь кластер, имеет смысл располагать данные одного клиента на одном шарде, или сделать двухуровневое шардирование: разбить весь кластер на «слои», где слой может состоять из нескольких шардов; данные для одного клиента располагаются на одном слое, но в один слой можно по мере необходимости добавлять шарды, в рамках которых данные распределены произвольным образом; создаются распределённые таблицы на каждый слой и одна общая распределённая таблица для глобальных запросов.
Запись данных осуществляется полностью асинхронно. При вставке в таблицу, блок данных сначала записывается в файловую систему. Затем, в фоновом режиме отправляются на удалённые серверы при первой возможности. Период отправки регулируется настройками [distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms) и [distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms). Движок таблиц `Distributed` отправляет каждый файл со вставленными данными отдельно, но можно включить пакетную отправку данных настройкой [distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts). Эта настройка улучшает производительность кластера за счет более оптимального использования ресурсов сервера-отправителя и сети. Необходимо проверять, что данные отправлены успешно, для этого проверьте список файлов (данных, ожидающих отправки) в каталоге таблицы `/var/lib/clickhouse/data/database/table/`. Количество потоков для выполнения фоновых задач можно задать с помощью настройки [background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size).
Запись данных осуществляется полностью асинхронно. При вставке в таблицу, блок данных сначала записывается в файловую систему. Затем, в фоновом режиме отправляются на удалённые серверы при первой возможности. Период отправки регулируется настройками [distributed_background_insert_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_sleep_time_ms) и [distributed_background_insert_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_max_sleep_time_ms). Движок таблиц `Distributed` отправляет каждый файл со вставленными данными отдельно, но можно включить пакетную отправку данных настройкой [distributed_background_insert_batch](../../../operations/settings/settings.md#distributed_background_insert_batch). Эта настройка улучшает производительность кластера за счет более оптимального использования ресурсов сервера-отправителя и сети. Необходимо проверять, что данные отправлены успешно, для этого проверьте список файлов (данных, ожидающих отправки) в каталоге таблицы `/var/lib/clickhouse/data/database/table/`. Количество потоков для выполнения фоновых задач можно задать с помощью настройки [background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size).
Если после INSERT-а в Distributed таблицу, сервер перестал существовать или был грубо перезапущен (например, в следствие аппаратного сбоя), то записанные данные могут быть потеряны. Если в директории таблицы обнаружен повреждённый кусок данных, то он переносится в поддиректорию broken и больше не используется.

View File

@ -4,17 +4,17 @@ sidebar_position: 12
sidebar_label: Tutorial
---
# ClickHouse Tutorial {#clickhouse-tutorial}
# Руководство {#clickhouse-tutorial}
## What to Expect from This Tutorial? {#what-to-expect-from-this-tutorial}
## Что вы получите, пройдя это руководство? {#what-to-expect-from-this-tutorial}
By going through this tutorial, youll learn how to set up a simple ClickHouse cluster. Itll be small, but fault-tolerant and scalable. Then we will use one of the example datasets to fill it with data and execute some demo queries.
Пройдя это руководство вы научитесь устанавливать простой кластер Clickhouse. Он будет небольшим, но отказоустойчивым и масштабируемым. Далее мы воспользуемся одним из готовых наборов данных для наполнения кластера данными и выполнения над ними нескольких демонстрационных запросов.
## Single Node Setup {#single-node-setup}
## Установка на одном узле {#single-node-setup}
To postpone the complexities of a distributed environment, well start with deploying ClickHouse on a single server or virtual machine. ClickHouse is usually installed from [deb](../getting-started/install.md#install-from-deb-packages) or [rpm](../getting-started/install.md#from-rpm-packages) packages, but there are [alternatives](../getting-started/install.md#from-docker-image) for the operating systems that do no support them.
Чтобы не погружаться сразу в сложности распределённого окружения мы начнём с развёртывания ClickHouse на одном сервере или одной виртуальной машине. ClickHouse обычно устанавливаается из [deb](../getting-started/install.md#install-from-deb-packages)- или [rpm](../getting-started/install.md#from-rpm-packages)-пакетов, но есть и [альтернативы](../getting-started/install.md#from-docker-image) для операционных систем без соответствующих пакетных менеджеров.
For example, you have chosen `deb` packages and executed:
Например, выбираем нужные `deb`-пакеты и выполняем:
``` bash
sudo apt-get install -y apt-transport-https ca-certificates dirmngr
@ -30,49 +30,49 @@ sudo service clickhouse-server start
clickhouse-client # or "clickhouse-client --password" if you've set up a password.
```
What do we have in the packages that got installed:
Что мы получим по результатам установки этих пакетов:
- `clickhouse-client` package contains [clickhouse-client](../interfaces/cli.md) application, interactive ClickHouse console client.
- `clickhouse-common` package contains a ClickHouse executable file.
- `clickhouse-server` package contains configuration files to run ClickHouse as a server.
- с пакетом `clickhouse-client` будет установлена программа [clickhouse-client](../interfaces/cli.md) — интерактивный консольный клиент ClickHouse.
- пакет `clickhouse-common` включает исполняемый файл ClickHouse.
- пакет `clickhouse-server` содержит конфигурационные файлы для запуска ClickHouse в качестве сервера.
Server config files are located in `/etc/clickhouse-server/`. Before going further, please notice the `<path>` element in `config.xml`. Path determines the location for data storage, so it should be located on volume with large disk capacity; the default value is `/var/lib/clickhouse/`. If you want to adjust the configuration, its not handy to directly edit `config.xml` file, considering it might get rewritten on future package updates. The recommended way to override the config elements is to create [files in config.d directory](../operations/configuration-files.md) which serve as “patches” to config.xml.
Файлы конфигурации сервера располагаются в каталоге `/etc/clickhouse-server/`. Прежде чем идти дальше, обратите внимание на элемент `<path>` в файле `config.xml`. Путь, задаваемый этим элементом, определяет местоположение данных, таким образом, он должен быть расположен на томе большой ёмкости; значение по умолчанию — `/var/lib/clickhouse/`. Если вы хотите изменить конфигурацию, то лучше не редактировать вручную файл `config.xml`, поскольку он может быть переписан будущими пакетными обновлениями; рекомендуется создать файлы с необходимыми конфигурационными элементами [в каталоге config.d](../operations/configuration-files.md), которые рассматриваются как “патчи” к config.xml.
As you might have noticed, `clickhouse-server` is not launched automatically after package installation. It wont be automatically restarted after updates, either. The way you start the server depends on your init system, usually, it is:
Вы могли заметить, что `clickhouse-server` не запускается автоматически после установки пакетов. Также сервер не будет автоматически перезапускаться после обновлений. Способ запуска сервера зависит от используемой подсистемы инициализации, обычно это делается так:
``` bash
sudo service clickhouse-server start
```
or
или
``` bash
sudo /etc/init.d/clickhouse-server start
```
The default location for server logs is `/var/log/clickhouse-server/`. The server is ready to handle client connections once it logs the `Ready for connections` message.
Журналы сервера по умолчанию ведутся в `/var/log/clickhouse-server/`. Как только в журнале появится сообщение `Ready for connections` — сервер готов принимать клиентские соединения.
Once the `clickhouse-server` is up and running, we can use `clickhouse-client` to connect to the server and run some test queries like `SELECT "Hello, world!";`.
Теперь, когда `clickhouse-server` запущен, можно подключиться к нему с использованием `clickhouse-client` и выполнить тестовый запрос, например, `SELECT 'Hello, world!';`.
<details markdown="1">
<summary>Quick tips for clickhouse-client</summary>
<summary>Советы по использованию clickhouse-client</summary>
Interactive mode:
Интерактивный режим:
``` bash
clickhouse-client
clickhouse-client --host=... --port=... --user=... --password=...
```
Enable multiline queries:
Включить многострочный режим запросов:
``` bash
clickhouse-client -m
clickhouse-client --multiline
```
Run queries in batch-mode:
Включить пакетный режим запуска запросов:
``` bash
clickhouse-client --query='SELECT 1'
@ -80,7 +80,7 @@ echo 'SELECT 1' | clickhouse-client
clickhouse-client <<< 'SELECT 1'
```
Insert data from a file in specified format:
Вставить данные из файла заданного формата:
``` bash
clickhouse-client --query='INSERT INTO table VALUES' < data.txt
@ -89,39 +89,39 @@ clickhouse-client --query='INSERT INTO table FORMAT TabSeparated' < data.tsv
</details>
## Import Sample Dataset {#import-sample-dataset}
## Загрузка набора данных из примеров {#import-sample-dataset}
Now its time to fill our ClickHouse server with some sample data. In this tutorial, well use some anonymized metric data. There are [multiple ways to import the dataset](../getting-started/example-datasets/metrica.md), and for the sake of the tutorial, well go with the most realistic one.
Настало время загрузить в ClickHouse данные из примеров. В этом руководстве мы используем анонимизированные данные посещений сайтов (веб-метрики). Существует [множество способов импортировать набор данных](../getting-started/example-datasets/metrica.md), но для целей данного руководства мы используем наиболее практичный из них.
### Download and Extract Table Data {#download-and-extract-table-data}
### Загрузка и извлечение табличных данных {#download-and-extract-table-data}
``` bash
curl https://datasets.clickhouse.com/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
curl https://datasets.clickhouse.com/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv
```
The extracted files are about 10GB in size.
Распакованные файлы занимают около 10 ГБ.
### Create Tables {#create-tables}
### Создание таблиц {#create-tables}
As in most databases management systems, ClickHouse logically groups tables into “databases”. Theres a `default` database, but well create a new one named `tutorial`:
ClickHouse, как и большинство СУБД, логически объединяет таблицы в «базы данных». Существует база данных по умолчанию — `default`, но мы созданим новую, дав ей наименование `tutorial`:
``` bash
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS tutorial"
```
Syntax for creating tables is way more complicated compared to databases (see [reference](../sql-reference/statements/create/table.md). In general `CREATE TABLE` statement has to specify three key things:
Ситаксис для создания таблиц более сложен в сравнении с другими СУБД (см. [руководство по SQL](../sql-reference/statements/create/table.md). Оператор `CREATE TABLE` должен указывать на три ключевых момента:
1. Name of table to create.
2. Table schema, i.e. list of columns and their [data types](../sql-reference/data-types/index.md).
3. [Table engine](../engines/table-engines/index.md) and its settings, which determines all the details on how queries to this table will be physically executed.
1. Имя создаваемой таблицы.
2. Схему таблицы, то есть задавать список столбцов и их [типы данных](../sql-reference/data-types/index.md).
3. [Движок таблицы](../engines/table-engines/index.md) и его параметры, которые определяют все детали того, как запросы к данной таблице будут физически исполняться.
There are only two tables to create:
Мы создадим все лишь две таблицы:
- `hits` is a table with each action done by all users on all websites covered by the service.
- `visits` is a table that contains pre-built sessions instead of individual actions.
- таблицу `hits` с действиями, осуществлёнными всеми пользователями на всех сайтах, обслуживаемых сервисом;
- таблицу `visits`, содержащую посещения — преднастроенные сессии вместо каждого действия.
Lets see and execute the real create table queries for these tables:
Выполним операторы `CREATE TABLE` для создания этих таблиц:
``` sql
CREATE TABLE tutorial.hits_v1
@ -462,22 +462,22 @@ ORDER BY (CounterID, StartDate, intHash32(UserID), VisitID)
SAMPLE BY intHash32(UserID)
```
You can execute those queries using the interactive mode of `clickhouse-client` (just launch it in a terminal without specifying a query in advance) or try some [alternative interface](../interfaces/index.md) if you want.
Эти операторы можно выполнить с использованием интерактивного режима в `clickhouse-client` (запустите его из командной строки не указывая заранее запросы) или, при желании, воспользоваться [альтернативным интерфейсом](../interfaces/index.md) .
As we can see, `hits_v1` uses the [basic MergeTree engine](../engines/table-engines/mergetree-family/mergetree.md), while the `visits_v1` uses the [Collapsing](../engines/table-engines/mergetree-family/collapsingmergetree.md) variant.
Как вы можете видеть, `hits_v1` использует [базовый вариант движка MergeTree](../engines/table-engines/mergetree-family/mergetree.md), тогда как `visits_v1` использует вариант [Collapsing](../engines/table-engines/mergetree-family/collapsingmergetree.md).
### Import Data {#import-data}
### Импорт данных {#import-data}
Data import to ClickHouse is done via [INSERT INTO](../sql-reference/statements/insert-into.md) query like in many other SQL databases. However, data is usually provided in one of the [supported serialization formats](../interfaces/formats.md) instead of `VALUES` clause (which is also supported).
Импорт данных в ClickHouse выполняется оператором [INSERT INTO](../sql-reference/statements/insert-into.md) как в большинстве SQL-систем. Однако данные для вставки в таблицы ClickHouse обычно предоставляются в одном из [поддерживаемых форматов](../interfaces/formats.md) вместо их непосредственного указания в предложении `VALUES` (хотя и этот способ поддерживается).
The files we downloaded earlier are in tab-separated format, so heres how to import them via console client:
В нашем случае файлы были загружены ранее в формате со значениями, разделёнными знаком табуляции; импортируем их, указав соответствующие запросы в аргументах командной строки:
``` bash
clickhouse-client --query "INSERT INTO tutorial.hits_v1 FORMAT TSV" --max_insert_block_size=100000 < hits_v1.tsv
clickhouse-client --query "INSERT INTO tutorial.visits_v1 FORMAT TSV" --max_insert_block_size=100000 < visits_v1.tsv
```
ClickHouse has a lot of [settings to tune](../operations/settings/index.md) and one way to specify them in console client is via arguments, as we can see with `--max_insert_block_size`. The easiest way to figure out what settings are available, what do they mean and what the defaults are is to query the `system.settings` table:
ClickHouse оснащён множеством [изменяемых настроек](../operations/settings/index.md) и один из способов их указать — передать при запуске консольного клиенте их в качестве аргументов, как вы видели в этом примере с `--max_insert_block_size`. Простейший способ узнать, какие настройки доступны, что они означают и какие у них значения по умолчанию — запросить содержимое таблицы `system.settings`:
``` sql
SELECT name, value, changed, description
@ -488,23 +488,23 @@ FORMAT TSV
max_insert_block_size 1048576 0 "The maximum block size for insertion, if we control the creation of blocks for insertion."
```
Optionally you can [OPTIMIZE](../sql-reference/statements/optimize.md) the tables after import. Tables that are configured with an engine from MergeTree-family always do merges of data parts in the background to optimize data storage (or at least check if it makes sense). These queries force the table engine to do storage optimization right now instead of some time later:
Можно также применить оператор [OPTIMIZE](../sql-reference/statements/optimize.md) к таблицам после импорта. Для таблиц, созданных с движками семейства MergeTree, слияние частей загруженных данных выполняется в фоновом режиме (по крайней мере проверяется, имеет ли смысл его осуществить); этот оператор принудительно запускает соответствующие процессы слияния вместо того, чтобы эти действия были выполнены в фоне когда-нибудь позже.
``` bash
clickhouse-client --query "OPTIMIZE TABLE tutorial.hits_v1 FINAL"
clickhouse-client --query "OPTIMIZE TABLE tutorial.visits_v1 FINAL"
```
These queries start an I/O and CPU intensive operation, so if the table consistently receives new data, its better to leave it alone and let merges run in the background.
Эти запросы запускают интеснивные по отношению к вводу-выводу и процессорным ресурсам операции, таким образом, если таблица всё ещё получает новые данные, лучше дать возможность слияниям запуститься в фоне.
Now we can check if the table import was successful:
Проверим, успешно ли загрузились данные:
``` bash
clickhouse-client --query "SELECT COUNT(*) FROM tutorial.hits_v1"
clickhouse-client --query "SELECT COUNT(*) FROM tutorial.visits_v1"
```
## Example Queries {#example-queries}
## Примеры запросов {#example-queries}
``` sql
SELECT
@ -526,18 +526,18 @@ FROM tutorial.visits_v1
WHERE (CounterID = 912887) AND (toYYYYMM(StartDate) = 201403)
```
## Cluster Deployment {#cluster-deployment}
## Кластерное развёртывание {#cluster-deployment}
ClickHouse cluster is a homogenous cluster. Steps to set up:
Кластер ClickHouse — гомогенный, то есть все узлы в нём равны, ведущие и ведомые не выделяются. Шаги по установке кластера:
1. Install ClickHouse server on all machines of the cluster
2. Set up cluster configs in configuration files
3. Create local tables on each instance
4. Create a [Distributed table](../engines/table-engines/special/distributed.md)
1. Установить сервер ClickHouse на всех узлах будущего кластера.
2. Прописать кластерные конфигурации в конфигурационных файлах.
3. Создать локальные таблицы на каждом экземпляре.
4. Создать [распределённую таблицу](../engines/table-engines/special/distributed.md).
[Distributed table](../engines/table-engines/special/distributed.md) is actually a kind of “view” to local tables of ClickHouse cluster. SELECT query from a distributed table executes using resources of all clusters shards. You may specify configs for multiple clusters and create multiple distributed tables providing views to different clusters.
[Распределённая таблица](../engines/table-engines/special/distributed.md) — в некотором смысле «представление» над локальными таблицами кластера ClickHouse. Запрос SELECT к распределённой таблице выполняется на всех узлах кластера. Вы можете указать конфигурации для нескольких кластеров и создать множество распределённых таблиц, «смотрящих» на разные кластеры.
Example config for a cluster with three shards, one replica each:
Пример конфигурации кластера с тремя сегментами и одной репликой для каждой:
``` xml
<remote_servers>
@ -564,38 +564,38 @@ Example config for a cluster with three shards, one replica each:
</remote_servers>
```
For further demonstration, lets create a new local table with the same `CREATE TABLE` query that we used for `hits_v1`, but different table name:
Далее создадим новую локальную таблицу с помощью того же запроса `CREATE TABLE`, что использовался для таблицы `hits_v1`, но с другим именем:
``` sql
CREATE TABLE tutorial.hits_local (...) ENGINE = MergeTree() ...
```
Creating a distributed table providing a view into local tables of the cluster:
Создадим распределённую таблицу, обеспечивающую представление над локальными таблицами кластера:
``` sql
CREATE TABLE tutorial.hits_all AS tutorial.hits_local
ENGINE = Distributed(perftest_3shards_1replicas, tutorial, hits_local, rand());
```
A common practice is to create similar Distributed tables on all machines of the cluster. It allows running distributed queries on any machine of the cluster. Also theres an alternative option to create temporary distributed table for a given SELECT query using [remote](../sql-reference/table-functions/remote.md) table function.
Стандартная практика — создание одинаковых распределённых таблиц на всех узлах кластера. Это позволит запускать распределённые запросы с любого узла. Альтернативой может быть создание временной распределённой таблицы для заданного отдельно взятого запроса с использованием табличной функции [remote](../sql-reference/table-functions/remote.md).
Lets run [INSERT SELECT](../sql-reference/statements/insert-into.md) into the Distributed table to spread the table to multiple servers.
Выполним [INSERT SELECT](../sql-reference/statements/insert-into.md) в распределённую таблицу, чтобы распределить данные по нескольким узлам.
``` sql
INSERT INTO tutorial.hits_all SELECT * FROM tutorial.hits_v1;
```
:::danger Notice
This approach is not suitable for the sharding of large tables. Theres a separate tool [clickhouse-copier](../operations/utilities/clickhouse-copier.md) that can re-shard arbitrary large tables.
:::danger Внимание!
Этот подход не годится для сегментирования больших таблиц. Есть инструмент [clickhouse-copier](../operations/utilities/clickhouse-copier.md), специально предназначенный для перераспределения любых больших таблиц.
:::
As you could expect, computationally heavy queries run N times faster if they utilize 3 servers instead of one.
Как и следовало ожидать, вычислительно сложные запросы работают втрое быстрее, если они выполняются на трёх серверах, а не на одном.
In this case, we have used a cluster with 3 shards, and each contains a single replica.
В данном случае мы использовали кластер из трёх сегментов с одной репликой для каждого.
To provide resilience in a production environment, we recommend that each shard should contain 2-3 replicas spread between multiple availability zones or datacenters (or at least racks). Note that ClickHouse supports an unlimited number of replicas.
В продуктивных окружениях для обеспечения надёжности мы рекомендуем чтобы каждый сегмент был защищён 2—3 репликами, разнесёнными на разные зоны отказоустойчивости или разные центры обработки данных (или хотя бы разные стойки). Особо отметим, что ClickHouse поддерживает неограниченное количество реплик.
Example config for a cluster of one shard containing three replicas:
Пример конфигурации кластера с одним сегментом и тремя репликами:
``` xml
<remote_servers>
@ -619,13 +619,13 @@ Example config for a cluster of one shard containing three replicas:
</remote_servers>
```
To enable native replication [ZooKeeper](http://zookeeper.apache.org/) is required. ClickHouse takes care of data consistency on all replicas and runs restore procedure after failure automatically. Its recommended to deploy the ZooKeeper cluster on separate servers (where no other processes including ClickHouse are running).
Для работы встроенной репликации необходимо использовать [ZooKeeper](http://zookeeper.apache.org/). ClickHouse заботится о согласованности данных на всех репликах и автоматически запускает процедуры восстановления в случае сбоев. Рекомендуется развёртывание кластера ZooKeeper на отдельных серверах (на которых не запущено других процессов, в том числе ClickHouse).
:::note Note
ZooKeeper is not a strict requirement: in some simple cases, you can duplicate the data by writing it into all the replicas from your application code. This approach is **not** recommended, in this case, ClickHouse wont be able to guarantee data consistency on all replicas. Thus it becomes the responsibility of your application.
:::note Примечание
Использование ZooKeeper — нестрогая рекомендация: можно продублировать данные, записывая их непосредственно из приложения на несколько реплик. Но этот поход **не рекомедуется** в общем случае, поскольку ClickHouse не сможет гарантировать согласованность данных на всех репликах; обеспечение согласованности станет заботой вашего приложения.
:::
ZooKeeper locations are specified in the configuration file:
Адреса узлов ZooKeeper указываются в файле конфиуграции:
``` xml
<zookeeper>
@ -644,7 +644,7 @@ ZooKeeper locations are specified in the configuration file:
</zookeeper>
```
Also, we need to set macros for identifying each shard and replica which are used on table creation:
Также необходимо в секции macros указать идентификаторы для сегментов и реплик, они нужны будут при создании таблиц:
``` xml
<macros>
@ -653,7 +653,7 @@ Also, we need to set macros for identifying each shard and replica which are use
</macros>
```
If there are no replicas at the moment on replicated table creation, a new first replica is instantiated. If there are already live replicas, the new replica clones data from existing ones. You have an option to create all replicated tables first, and then insert data to it. Another option is to create some replicas and add the others after or during data insertion.
Если в момент создания реплцированной таблицы ни одной реплики ещё нет, то будет создана первая из них. Если уже есть работающие реплики, то в новые реплики данные будут склонированы из существующих. Есть возможность вначале создать реплицируемые таблицы, а затем вставить в них данные. Но можно создать вначале создать только часть реплик и добавить ещё несколько после вставки или в процессе вставки данных.
``` sql
CREATE TABLE tutorial.hits_replica (...)
@ -664,10 +664,10 @@ ENGINE = ReplicatedMergeTree(
...
```
Here we use [ReplicatedMergeTree](../engines/table-engines/mergetree-family/replication.md) table engine. In parameters we specify ZooKeeper path containing shard and replica identifiers.
Здесь мы используем движок [ReplicatedMergeTree](../engines/table-engines/mergetree-family/replication.md). Указываем в параметрах путь в Zookeeper к идентификаторам сегмента и реплики.
``` sql
INSERT INTO tutorial.hits_replica SELECT * FROM tutorial.hits_local;
```
Replication operates in multi-master mode. Data can be loaded into any replica, and the system then syncs it with other instances automatically. Replication is asynchronous so at a given moment, not all replicas may contain recently inserted data. At least one replica should be up to allow data ingestion. Others will sync up data and repair consistency once they will become active again. Note that this approach allows for the low possibility of a loss of recently inserted data.
Репликация работает в режиме мультимастера. Это означает, что данные могут быть загружены на любую из реплик и система автоматически синхронизирует данные между остальными репликами. Репликация асинхронна, то есть в конкретный момент времнени не все реплики могут содержать недавно добавленные данные. Как минимум одна реплика должна быть в строю для приёма данных. Прочие реплики синхронизируются и восстановят согласованное состояния как только снова станут активными. Заметим, что при таком подходе есть вероятность утраты недавно добавленных данных.

View File

@ -108,6 +108,15 @@ sidebar_label: "Ограничения на сложность запроса"
Значение по умолчанию — 0.
## max_bytes_before_external_sort {#settings-max_bytes_before_external_sort}
Включает или отключает выполнение `ORDER BY` во внешней памяти. См. [Детали реализации ORDER BY](../../sql-reference/statements/select/order-by.md#implementation-details)
- Максимальный объем оперативной памяти (в байтах), который может использоваться одной операцией [ORDER BY](../../sql-reference/statements/select/order-by.md). Рекомендуемое значение — половина доступной системной памяти.
- 0 — `ORDER BY` во внешней памяти отключен.
Значение по умолчанию: 0.
## max_rows_to_sort {#max-rows-to-sort}
Максимальное количество строк до сортировки. Позволяет ограничить потребление оперативки при сортировке.

View File

@ -2136,7 +2136,7 @@ SELECT * FROM test_table
- [distributed_replica_error_cap](#settings-distributed_replica_error_cap)
- [distributed_replica_error_half_life](#settings-distributed_replica_error_half_life)
## distributed_directory_monitor_sleep_time_ms {#distributed_directory_monitor_sleep_time_ms}
## distributed_background_insert_sleep_time_ms {#distributed_background_insert_sleep_time_ms}
Основной интервал отправки данных движком таблиц [Distributed](../../engines/table-engines/special/distributed.md). Фактический интервал растёт экспоненциально при возникновении ошибок.
@ -2146,9 +2146,9 @@ SELECT * FROM test_table
Значение по умолчанию: 100 миллисекунд.
## distributed_directory_monitor_max_sleep_time_ms {#distributed_directory_monitor_max_sleep_time_ms}
## distributed_background_insert_max_sleep_time_ms {#distributed_background_insert_max_sleep_time_ms}
Максимальный интервал отправки данных движком таблиц [Distributed](../../engines/table-engines/special/distributed.md). Ограничивает экпоненциальный рост интервала, установленого настройкой [distributed_directory_monitor_sleep_time_ms](#distributed_directory_monitor_sleep_time_ms).
Максимальный интервал отправки данных движком таблиц [Distributed](../../engines/table-engines/special/distributed.md). Ограничивает экпоненциальный рост интервала, установленого настройкой [distributed_background_insert_sleep_time_ms](#distributed_background_insert_sleep_time_ms).
Возможные значения:
@ -2156,7 +2156,7 @@ SELECT * FROM test_table
Значение по умолчанию: 30000 миллисекунд (30 секунд).
## distributed_directory_monitor_batch_inserts {#distributed_directory_monitor_batch_inserts}
## distributed_background_insert_batch {#distributed_background_insert_batch}
Включает/выключает пакетную отправку вставленных данных.
@ -2323,11 +2323,11 @@ SELECT * FROM test_table
Значение по умолчанию: 0.
## insert_distributed_sync {#insert_distributed_sync}
## distributed_foreground_insert {#distributed_foreground_insert}
Включает или отключает режим синхронного добавления данных в распределенные таблицы (таблицы с движком [Distributed](../../engines/table-engines/special/distributed.md#distributed)).
По умолчанию ClickHouse вставляет данные в распределённую таблицу в асинхронном режиме. Если `insert_distributed_sync=1`, то данные вставляются сихронно, а запрос `INSERT` считается выполненным успешно, когда данные записаны на все шарды (по крайней мере на одну реплику для каждого шарда, если `internal_replication = true`).
По умолчанию ClickHouse вставляет данные в распределённую таблицу в асинхронном режиме. Если `distributed_foreground_insert=1`, то данные вставляются сихронно, а запрос `INSERT` считается выполненным успешно, когда данные записаны на все шарды (по крайней мере на одну реплику для каждого шарда, если `internal_replication = true`).
Возможные значения:

View File

@ -111,7 +111,7 @@ $ clickhouse-copier --daemon --config zookeeper.xml --task-path /task/path --bas
<settings>
<connect_timeout>3</connect_timeout>
<!-- Sync insert is set forcibly, leave it here just in case. -->
<insert_distributed_sync>1</insert_distributed_sync>
<distributed_foreground_insert>1</distributed_foreground_insert>
</settings>
<!-- Copying tasks description.

View File

@ -128,7 +128,7 @@ SYSTEM RELOAD CONFIG [ON CLUSTER cluster_name]
## Управление распределёнными таблицами {#query-language-system-distributed}
ClickHouse может оперировать [распределёнными](../../sql-reference/statements/system.md) таблицами. Когда пользователь вставляет данные в эти таблицы, ClickHouse сначала формирует очередь из данных, которые должны быть отправлены на узлы кластера, а затем асинхронно отправляет подготовленные данные. Вы можете управлять очередью с помощью запросов [STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends), [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends) и [FLUSH DISTRIBUTED](#query_language-system-flush-distributed). Также есть возможность синхронно вставлять распределенные данные с помощью настройки [insert_distributed_sync](../../operations/settings/settings.md#insert_distributed_sync).
ClickHouse может оперировать [распределёнными](../../sql-reference/statements/system.md) таблицами. Когда пользователь вставляет данные в эти таблицы, ClickHouse сначала формирует очередь из данных, которые должны быть отправлены на узлы кластера, а затем асинхронно отправляет подготовленные данные. Вы можете управлять очередью с помощью запросов [STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends), [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends) и [FLUSH DISTRIBUTED](#query_language-system-flush-distributed). Также есть возможность синхронно вставлять распределенные данные с помощью настройки [distributed_foreground_insert](../../operations/settings/settings.md#distributed_foreground_insert).
### STOP DISTRIBUTED SENDS {#query_language-system-stop-distributed-sends}

View File

@ -43,7 +43,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] AS [db2.]name2
**详见**
- [insert_distributed_sync](../../../operations/settings/settings.md#insert_distributed_sync) 设置
- [distributed_foreground_insert](../../../operations/settings/settings.md#distributed_foreground_insert) 设置
- [MergeTree](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes) 查看示例
**分布式设置**
@ -58,24 +58,24 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] AS [db2.]name2
- `max_delay_to_insert` - 最大延迟多少秒插入数据到分布式表如果有很多挂起字节异步发送。默认值60。
- `monitor_batch_inserts` - 等同于 [distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts)
- `background_insert_batch` - 等同于 [distributed_background_insert_batch](../../../operations/settings/settings.md#distributed_background_insert_batch)
- `monitor_split_batch_on_failure` - 等同于[distributed_directory_monitor_split_batch_on_failure](../../../operations/settings/settings.md#distributed_directory_monitor_split_batch_on_failure)
- `background_insert_split_batch_on_failure` - 等同于[distributed_background_insert_split_batch_on_failure](../../../operations/settings/settings.md#distributed_background_insert_split_batch_on_failure)
- `monitor_sleep_time_ms` - 等同于 [distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms)
- `background_insert_sleep_time_ms` - 等同于 [distributed_background_insert_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_sleep_time_ms)
- `monitor_max_sleep_time_ms` - 等同于 [distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms)
- `background_insert_max_sleep_time_ms` - 等同于 [distributed_background_insert_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_max_sleep_time_ms)
::note
**稳定性设置** (`fsync_...`):
- 只影响异步插入(例如:`insert_distributed_sync=false`), 当数据首先存储在启动节点磁盘上然后再异步发送到shard。
- 只影响异步插入(例如:`distributed_foreground_insert=false`), 当数据首先存储在启动节点磁盘上然后再异步发送到shard。
— 可能会显著降低`insert`的性能
- 影响将存储在分布式表文件夹中的数据写入 **接受您插入的节点** 。如果你需要保证写入数据到底层的MergeTree表中请参阅 `system.merge_tree_settings` 中的持久性设置(`...fsync...`)
**插入限制设置** (`..._insert`) 请见:
- [insert_distributed_sync](../../../operations/settings/settings.md#insert_distributed_sync) 设置
- [distributed_foreground_insert](../../../operations/settings/settings.md#distributed_foreground_insert) 设置
- [prefer_localhost_replica](../../../operations/settings/settings.md#settings-prefer-localhost-replica) 设置
- `bytes_to_throw_insert``bytes_to_delay_insert` 之前处理,所以你不应该设置它的值小于 `bytes_to_delay_insert`
:::
@ -209,7 +209,7 @@ SELECT 查询会被发送到所有分片,并且无论数据在分片中如何
- 使用需要特定键连接数据( IN 或 JOIN )的查询。如果数据是用该键进行分片,则应使用本地 IN 或 JOIN 而不是 GLOBAL IN 或 GLOBAL JOIN这样效率更高。
- 使用大量服务器(上百或更多),但有大量小查询(个别客户的查询 - 网站,广告商或合作伙伴)。为了使小查询不影响整个集群,让单个客户的数据处于单个分片上是有意义的。或者 你可以配置两级分片:将整个集群划分为«层»,一个层可以包含多个分片。单个客户的数据位于单个层上,根据需要将分片添加到层中,层中的数据随机分布。然后给每层创建分布式表,再创建一个全局的分布式表用于全局的查询。
数据是异步写入的。对于分布式表的 INSERT数据块只写本地文件系统。之后会尽快地在后台发送到远程服务器。发送数据的周期性是由[distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms)和[distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms)设置。分布式引擎会分别发送每个插入数据的文件,但是你可以使用[distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts)设置启用批量发送文件。该设置通过更好地利用本地服务器和网络资源来提高集群性能。你应该检查表目录`/var/lib/clickhouse/data/database/table/`中的文件列表(等待发送的数据)来检查数据是否发送成功。执行后台任务的线程数可以通过[background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size)设置。
数据是异步写入的。对于分布式表的 INSERT数据块只写本地文件系统。之后会尽快地在后台发送到远程服务器。发送数据的周期性是由[distributed_background_insert_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_sleep_time_ms)和[distributed_background_insert_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_max_sleep_time_ms)设置。分布式引擎会分别发送每个插入数据的文件,但是你可以使用[distributed_background_insert_batch](../../../operations/settings/settings.md#distributed_background_insert_batch)设置启用批量发送文件。该设置通过更好地利用本地服务器和网络资源来提高集群性能。你应该检查表目录`/var/lib/clickhouse/data/database/table/`中的文件列表(等待发送的数据)来检查数据是否发送成功。执行后台任务的线程数可以通过[background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size)设置。
如果在 INSERT 到分布式表时服务器节点丢失或重启设备故障则插入的数据可能会丢失。如果在表目录中检测到损坏的数据分片则会将其转移到«broken»子目录并不再使用。

View File

@ -1088,7 +1088,7 @@ ClickHouse生成异常
- [表引擎分布式](../../engines/table-engines/special/distributed.md)
- [distributed_replica_error_half_life](#settings-distributed_replica_error_half_life)
## distributed_directory_monitor_sleep_time_ms {#distributed_directory_monitor_sleep_time_ms}
## distributed_background_insert_sleep_time_ms {#distributed_background_insert_sleep_time_ms}
对于基本间隔 [分布](../../engines/table-engines/special/distributed.md) 表引擎发送数据。 在发生错误时,实际间隔呈指数级增长。
@ -1098,9 +1098,9 @@ ClickHouse生成异常
默认值100毫秒。
## distributed_directory_monitor_max_sleep_time_ms {#distributed_directory_monitor_max_sleep_time_ms}
## distributed_background_insert_max_sleep_time_ms {#distributed_background_insert_max_sleep_time_ms}
的最大间隔 [分布](../../engines/table-engines/special/distributed.md) 表引擎发送数据。 限制在设置的区间的指数增长 [distributed_directory_monitor_sleep_time_ms](#distributed_directory_monitor_sleep_time_ms) 设置。
的最大间隔 [分布](../../engines/table-engines/special/distributed.md) 表引擎发送数据。 限制在设置的区间的指数增长 [distributed_background_insert_sleep_time_ms](#distributed_background_insert_sleep_time_ms) 设置。
可能的值:
@ -1108,7 +1108,7 @@ ClickHouse生成异常
默认值30000毫秒30秒
## distributed_directory_monitor_batch_inserts {#distributed_directory_monitor_batch_inserts}
## distributed_background_insert_batch {#distributed_background_insert_batch}
启用/禁用批量发送插入的数据。

View File

@ -100,7 +100,7 @@ clickhouse-copier --daemon --config zookeeper.xml --task-path /task/path --base-
<settings>
<connect_timeout>3</connect_timeout>
<!-- Sync insert is set forcibly, leave it here just in case. -->
<insert_distributed_sync>1</insert_distributed_sync>
<distributed_foreground_insert>1</distributed_foreground_insert>
</settings>
<!-- Copying tasks description.

View File

@ -93,7 +93,7 @@ SYSTEM RELOAD CONFIG [ON CLUSTER cluster_name]
## Managing Distributed Tables {#query-language-system-distributed}
ClickHouse可以管理 [distribute](../../engines/table-engines/special/distributed.md)表。当用户向这类表插入数据时ClickHouse首先为需要发送到集群节点的数据创建一个队列然后异步的发送它们。你可以维护队列的处理过程通过[STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends), [FLUSH DISTRIBUTED](#query_language-system-flush-distributed), 以及 [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends)。你也可以设置 `insert_distributed_sync`参数来以同步的方式插入分布式数据。
ClickHouse可以管理 [distribute](../../engines/table-engines/special/distributed.md)表。当用户向这类表插入数据时ClickHouse首先为需要发送到集群节点的数据创建一个队列然后异步的发送它们。你可以维护队列的处理过程通过[STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends), [FLUSH DISTRIBUTED](#query_language-system-flush-distributed), 以及 [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends)。你也可以设置 `distributed_foreground_insert`参数来以同步的方式插入分布式数据。
### STOP DISTRIBUTED SENDS {#query_language-system-stop-distributed-sends}

View File

@ -4,6 +4,7 @@
#include <iostream>
#include <fstream>
#include <iomanip>
#include <optional>
#include <random>
#include <string_view>
#include <pcg_random.hpp>
@ -28,10 +29,12 @@
#include <Interpreters/Context.h>
#include <Client/Connection.h>
#include <Common/InterruptListener.h>
#include <Common/Config/configReadClient.h>
#include <Common/Config/ConfigProcessor.h>
#include <Common/Config/getClientConfigPath.h>
#include <Common/TerminalSize.h>
#include <Common/StudentTTest.h>
#include <Common/CurrentMetrics.h>
#include <Common/ErrorCodes.h>
#include <filesystem>
@ -156,7 +159,17 @@ public:
if (home_path_cstr)
home_path = home_path_cstr;
configReadClient(config(), home_path);
std::optional<std::string> config_path;
if (config().has("config-file"))
config_path.emplace(config().getString("config-file"));
else
config_path = getClientConfigPath(home_path);
if (config_path.has_value())
{
ConfigProcessor config_processor(*config_path);
auto loaded_config = config_processor.loadConfig();
config().add(loaded_config.configuration);
}
}
int main(const std::vector<std::string> &) override

View File

@ -25,7 +25,8 @@
#include <Common/Exception.h>
#include <Common/formatReadable.h>
#include <Common/TerminalSize.h>
#include <Common/Config/configReadClient.h>
#include <Common/Config/ConfigProcessor.h>
#include <Common/Config/getClientConfigPath.h>
#include <Core/QueryProcessingStage.h>
#include <Columns/ColumnString.h>
@ -131,69 +132,64 @@ void Client::showWarnings()
}
}
void Client::parseConnectionsCredentials()
void Client::parseConnectionsCredentials(Poco::Util::AbstractConfiguration & config, const std::string & connection_name)
{
/// It is not possible to correctly handle multiple --host --port options.
if (hosts_and_ports.size() >= 2)
return;
std::optional<String> host;
std::optional<String> default_connection_name;
if (hosts_and_ports.empty())
{
if (config().has("host"))
host = config().getString("host");
if (config.has("host"))
default_connection_name = config.getString("host");
}
else
{
host = hosts_and_ports.front().host;
default_connection_name = hosts_and_ports.front().host;
}
String connection;
if (config().has("connection"))
connection = config().getString("connection");
if (!connection_name.empty())
connection = connection_name;
else
connection = host.value_or("localhost");
connection = default_connection_name.value_or("localhost");
Strings keys;
config().keys("connections_credentials", keys);
config.keys("connections_credentials", keys);
bool connection_found = false;
for (const auto & key : keys)
{
const String & prefix = "connections_credentials." + key;
const String & connection_name = config().getString(prefix + ".name", "");
if (connection_name != connection)
const String & name = config.getString(prefix + ".name", "");
if (name != connection)
continue;
connection_found = true;
String connection_hostname;
if (config().has(prefix + ".hostname"))
connection_hostname = config().getString(prefix + ".hostname");
if (config.has(prefix + ".hostname"))
connection_hostname = config.getString(prefix + ".hostname");
else
connection_hostname = connection_name;
connection_hostname = name;
if (hosts_and_ports.empty())
config().setString("host", connection_hostname);
if (config().has(prefix + ".port") && hosts_and_ports.empty())
config().setInt("port", config().getInt(prefix + ".port"));
if (config().has(prefix + ".secure") && !config().has("secure"))
config().setBool("secure", config().getBool(prefix + ".secure"));
if (config().has(prefix + ".user") && !config().has("user"))
config().setString("user", config().getString(prefix + ".user"));
if (config().has(prefix + ".password") && !config().has("password"))
config().setString("password", config().getString(prefix + ".password"));
if (config().has(prefix + ".database") && !config().has("database"))
config().setString("database", config().getString(prefix + ".database"));
if (config().has(prefix + ".history_file") && !config().has("history_file"))
config.setString("host", connection_hostname);
if (config.has(prefix + ".port"))
config.setInt("port", config.getInt(prefix + ".port"));
if (config.has(prefix + ".secure"))
config.setBool("secure", config.getBool(prefix + ".secure"));
if (config.has(prefix + ".user"))
config.setString("user", config.getString(prefix + ".user"));
if (config.has(prefix + ".password"))
config.setString("password", config.getString(prefix + ".password"));
if (config.has(prefix + ".database"))
config.setString("database", config.getString(prefix + ".database"));
if (config.has(prefix + ".history_file"))
{
String history_file = config().getString(prefix + ".history_file");
String history_file = config.getString(prefix + ".history_file");
if (history_file.starts_with("~") && !home_path.empty())
history_file = home_path + "/" + history_file.substr(1);
config().setString("history_file", history_file);
config.setString("history_file", history_file);
}
}
if (config().has("connection") && !connection_found)
if (!connection_name.empty() && !connection_found)
throw Exception(ErrorCodes::NO_ELEMENTS_IN_CONFIG, "No such connection '{}' in connections_credentials", connection);
}
@ -263,7 +259,20 @@ void Client::initialize(Poco::Util::Application & self)
if (home_path_cstr)
home_path = home_path_cstr;
configReadClient(config(), home_path);
std::optional<std::string> config_path;
if (config().has("config-file"))
config_path.emplace(config().getString("config-file"));
else
config_path = getClientConfigPath(home_path);
if (config_path.has_value())
{
ConfigProcessor config_processor(*config_path);
auto loaded_config = config_processor.loadConfig();
parseConnectionsCredentials(*loaded_config.configuration, config().getString("connection", ""));
config().add(loaded_config.configuration);
}
else if (config().has("connection"))
throw Exception(ErrorCodes::BAD_ARGUMENTS, "--connection was specified, but config does not exists");
/** getenv is thread-safe in Linux glibc and in all sane libc implementations.
* But the standard does not guarantee that subsequent calls will not rewrite the value by returned pointer.
@ -286,8 +295,6 @@ void Client::initialize(Poco::Util::Application & self)
if (env_password && !config().has("password"))
config().setString("password", env_password);
parseConnectionsCredentials();
// global_context->setApplicationType(Context::ApplicationType::CLIENT);
global_context->setQueryParameters(query_parameters);
@ -1454,7 +1461,9 @@ int mainEntryClickHouseClient(int argc, char ** argv)
try
{
DB::Client client;
// Initialize command line options
client.init(argc, argv);
/// Initialize config file
return client.run();
}
catch (const DB::Exception & e)

View File

@ -47,7 +47,7 @@ protected:
private:
void printChangedSettings() const;
void showWarnings();
void parseConnectionsCredentials();
void parseConnectionsCredentials(Poco::Util::AbstractConfiguration & config, const std::string & connection_name);
std::vector<String> loadWarningMessages();
};
}

View File

@ -58,7 +58,7 @@ void DB::TaskCluster::reloadSettings(const Poco::Util::AbstractConfiguration & c
/// Override important settings
settings_pull.readonly = 1;
settings_pull.prefer_localhost_replica = false;
settings_push.insert_distributed_sync = true;
settings_push.distributed_foreground_insert = true;
settings_push.prefer_localhost_replica = false;
set_default_value(settings_pull.load_balancing, LoadBalancing::NEAREST_HOSTNAME);
@ -66,7 +66,7 @@ void DB::TaskCluster::reloadSettings(const Poco::Util::AbstractConfiguration & c
set_default_value(settings_pull.max_block_size, 8192UL);
set_default_value(settings_pull.preferred_block_size_bytes, 0);
set_default_value(settings_push.insert_distributed_timeout, 0);
set_default_value(settings_push.distributed_background_insert_timeout, 0);
set_default_value(settings_push.alter_sync, 2);
}

View File

@ -536,6 +536,16 @@ static void sanityChecks(Server & server)
{
}
try
{
const char * filename = "/proc/sys/kernel/task_delayacct";
if (readNumber(filename) == 0)
server.context()->addWarningMessage("Delay accounting is not enabled, OSIOWaitMicroseconds will not be gathered. Check " + String(filename));
}
catch (...) // NOLINT(bugprone-empty-catch)
{
}
std::string dev_id = getBlockDeviceId(data_path);
if (getBlockDeviceType(dev_id) == BlockDeviceType::ROT && getBlockDeviceReadAheadBytes(dev_id) == 0)
server.context()->addWarningMessage("Rotational disk with disabled readahead is in use. Performance can be degraded. Used for data: " + String(data_path));

View File

@ -1472,6 +1472,15 @@
<!-- <disable_internal_dns_cache>1</disable_internal_dns_cache> -->
<!-- You can also configure rocksdb like this: -->
<!-- Full list of options:
- options:
- https://github.com/facebook/rocksdb/blob/4b013dcbed2df84fde3901d7655b9b91c557454d/include/rocksdb/options.h#L1452
- column_family_options:
- https://github.com/facebook/rocksdb/blob/4b013dcbed2df84fde3901d7655b9b91c557454d/include/rocksdb/options.h#L66
- block_based_table_options:
- https://github.com/facebook/rocksdb/blob/4b013dcbed2df84fde3901d7655b9b91c557454d/table/block_based/block_based_table_factory.cc#L228
- https://github.com/facebook/rocksdb/blob/4b013dcbed2df84fde3901d7655b9b91c557454d/include/rocksdb/table.h#L129
-->
<!--
<rocksdb>
<options>
@ -1480,6 +1489,9 @@
<column_family_options>
<num_levels>2</num_levels>
</column_family_options>
<block_based_table_options>
<block_size>1024</block_size>
</block_based_table_options>
<tables>
<table>
<name>TABLE</name>
@ -1489,6 +1501,9 @@
<column_family_options>
<num_levels>2</num_levels>
</column_family_options>
<block_based_table_options>
<block_size>1024</block_size>
</block_based_table_options>
</table>
</tables>
</rocksdb>

View File

@ -1,5 +1,5 @@
#include <AggregateFunctions/AggregateFunctionFactory.h>
#include <AggregateFunctions/AggregateFunctionCombinatorFactory.h>
#include <AggregateFunctions/Combinators/AggregateFunctionCombinatorFactory.h>
#include <DataTypes/DataTypeAggregateFunction.h>
#include <DataTypes/DataTypeNullable.h>

View File

@ -151,10 +151,12 @@ public:
}
else if (BitmapKind::Bitmap == kind)
{
auto size = roaring_bitmap->getSizeInBytes();
std::unique_ptr<RoaringBitmap> bitmap = std::make_unique<RoaringBitmap>(*roaring_bitmap);
bitmap->runOptimize();
auto size = bitmap->getSizeInBytes();
writeVarUInt(size, out);
std::unique_ptr<char[]> buf(new char[size]);
roaring_bitmap->write(buf.get());
bitmap->write(buf.get());
out.write(buf.get(), size);
}
}

View File

@ -95,7 +95,7 @@ public:
size_t size = set.size();
writeVarUInt(size, buf);
for (const auto & elem : set)
writeIntBinary(elem, buf);
writeBinaryLittleEndian(elem.key, buf);
}
void deserialize(AggregateDataPtr __restrict place, ReadBuffer & buf, std::optional<size_t> /* version */, Arena *) const override

View File

@ -2,7 +2,7 @@
#include <unordered_set>
#include <AggregateFunctions/AggregateFunctionNull.h>
#include <AggregateFunctions/Combinators/AggregateFunctionNull.h>
#include <Columns/ColumnsNumber.h>

View File

@ -1,138 +0,0 @@
#include "AggregateFunctionMap.h"
#include "AggregateFunctions/AggregateFunctionCombinatorFactory.h"
#include "Functions/FunctionHelpers.h"
namespace DB
{
namespace ErrorCodes
{
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
}
class AggregateFunctionCombinatorMap final : public IAggregateFunctionCombinator
{
public:
String getName() const override { return "Map"; }
DataTypes transformArguments(const DataTypes & arguments) const override
{
if (arguments.empty())
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Incorrect number of arguments for aggregate function with {} suffix", getName());
const auto * map_type = checkAndGetDataType<DataTypeMap>(arguments[0].get());
if (map_type)
{
if (arguments.size() > 1)
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "{} combinator takes only one map argument", getName());
return DataTypes({map_type->getValueType()});
}
// we need this part just to pass to redirection for mapped arrays
auto check_func = [](DataTypePtr t) { return t->getTypeId() == TypeIndex::Array; };
const auto * tup_type = checkAndGetDataType<DataTypeTuple>(arguments[0].get());
if (tup_type)
{
const auto & types = tup_type->getElements();
bool arrays_match = arguments.size() == 1 && types.size() >= 2 && std::all_of(types.begin(), types.end(), check_func);
if (arrays_match)
{
const auto * val_array_type = assert_cast<const DataTypeArray *>(types[1].get());
return DataTypes({val_array_type->getNestedType()});
}
}
else
{
bool arrays_match = arguments.size() >= 2 && std::all_of(arguments.begin(), arguments.end(), check_func);
if (arrays_match)
{
const auto * val_array_type = assert_cast<const DataTypeArray *>(arguments[1].get());
return DataTypes({val_array_type->getNestedType()});
}
}
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Aggregate function {} requires map as argument", getName());
}
AggregateFunctionPtr transformAggregateFunction(
const AggregateFunctionPtr & nested_function,
const AggregateFunctionProperties &,
const DataTypes & arguments,
const Array & params) const override
{
const auto * map_type = checkAndGetDataType<DataTypeMap>(arguments[0].get());
if (map_type)
{
const auto & key_type = map_type->getKeyType();
switch (key_type->getTypeId())
{
case TypeIndex::Enum8:
case TypeIndex::Int8:
return std::make_shared<AggregateFunctionMap<Int8>>(nested_function, arguments);
case TypeIndex::Enum16:
case TypeIndex::Int16:
return std::make_shared<AggregateFunctionMap<Int16>>(nested_function, arguments);
case TypeIndex::Int32:
return std::make_shared<AggregateFunctionMap<Int32>>(nested_function, arguments);
case TypeIndex::Int64:
return std::make_shared<AggregateFunctionMap<Int64>>(nested_function, arguments);
case TypeIndex::Int128:
return std::make_shared<AggregateFunctionMap<Int128>>(nested_function, arguments);
case TypeIndex::Int256:
return std::make_shared<AggregateFunctionMap<Int256>>(nested_function, arguments);
case TypeIndex::UInt8:
return std::make_shared<AggregateFunctionMap<UInt8>>(nested_function, arguments);
case TypeIndex::Date:
case TypeIndex::UInt16:
return std::make_shared<AggregateFunctionMap<UInt16>>(nested_function, arguments);
case TypeIndex::DateTime:
case TypeIndex::UInt32:
return std::make_shared<AggregateFunctionMap<UInt32>>(nested_function, arguments);
case TypeIndex::UInt64:
return std::make_shared<AggregateFunctionMap<UInt64>>(nested_function, arguments);
case TypeIndex::UInt128:
return std::make_shared<AggregateFunctionMap<UInt128>>(nested_function, arguments);
case TypeIndex::UInt256:
return std::make_shared<AggregateFunctionMap<UInt256>>(nested_function, arguments);
case TypeIndex::UUID:
return std::make_shared<AggregateFunctionMap<UUID>>(nested_function, arguments);
case TypeIndex::IPv4:
return std::make_shared<AggregateFunctionMap<IPv4>>(nested_function, arguments);
case TypeIndex::IPv6:
return std::make_shared<AggregateFunctionMap<IPv6>>(nested_function, arguments);
case TypeIndex::FixedString:
case TypeIndex::String:
return std::make_shared<AggregateFunctionMap<String>>(nested_function, arguments);
default:
throw Exception(
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Map key type {} is not is not supported by combinator {}", key_type->getName(), getName());
}
}
else
{
// in case of tuple of arrays or just arrays (checked in transformArguments), try to redirect to sum/min/max-MappedArrays to implement old behavior
auto nested_func_name = nested_function->getName();
if (nested_func_name == "sum" || nested_func_name == "min" || nested_func_name == "max")
{
AggregateFunctionProperties out_properties;
auto & aggr_func_factory = AggregateFunctionFactory::instance();
return aggr_func_factory.get(nested_func_name + "MappedArrays", arguments, params, out_properties);
}
else
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Aggregation '{}Map' is not implemented for mapped arrays",
nested_func_name);
}
}
};
void registerAggregateFunctionCombinatorMap(AggregateFunctionCombinatorFactory & factory)
{
factory.registerCombinator(std::make_shared<AggregateFunctionCombinatorMap>());
}
}

View File

@ -72,14 +72,14 @@ public:
{
writeBinary(has(), buf);
if (has())
writeBinary(value, buf);
writeBinaryLittleEndian(value, buf);
}
void read(ReadBuffer & buf, const ISerialization & /*serialization*/, Arena *)
{
readBinary(has_value, buf);
if (has())
readBinary(value, buf);
readBinaryLittleEndian(value, buf);
}
@ -1275,13 +1275,13 @@ struct AggregateFunctionAnyHeavyData : Data
void write(WriteBuffer & buf, const ISerialization & serialization) const
{
Data::write(buf, serialization);
writeBinary(counter, buf);
writeBinaryLittleEndian(counter, buf);
}
void read(ReadBuffer & buf, const ISerialization & serialization, Arena * arena)
{
Data::read(buf, serialization, arena);
readBinary(counter, buf);
readBinaryLittleEndian(counter, buf);
}
static const char * name() { return "anyHeavy"; }

View File

@ -19,7 +19,6 @@
#include <Common/assert_cast.h>
#include <AggregateFunctions/IAggregateFunction.h>
#include <AggregateFunctions/AggregateFunctionNull.h>
#include <type_traits>
#include <bitset>

View File

@ -34,7 +34,7 @@ struct TheilsUData : CrossTabData
for (const auto & [key, value] : count_ab)
{
Float64 value_ab = value;
Float64 value_b = count_b.at(key.items[1]);
Float64 value_b = count_b.at(key.items[UInt128::_impl::little(1)]);
dep += (value_ab / count) * log(value_ab / value_b);
}

View File

@ -8,8 +8,6 @@
#include <IO/WriteHelpers.h>
#include <Common/assert_cast.h>
#include <AggregateFunctions/AggregateFunctionNull.h>
namespace DB
{
struct Settings;

View File

@ -1,20 +1,21 @@
include("${ClickHouse_SOURCE_DIR}/cmake/dbms_glob_sources.cmake")
add_headers_and_sources(clickhouse_aggregate_functions .)
add_headers_and_sources(clickhouse_aggregate_functions Combinators)
extract_into_parent_list(clickhouse_aggregate_functions_sources dbms_sources
IAggregateFunction.cpp
AggregateFunctionFactory.cpp
AggregateFunctionCombinatorFactory.cpp
AggregateFunctionState.cpp
Combinators/AggregateFunctionCombinatorFactory.cpp
Combinators/AggregateFunctionState.cpp
AggregateFunctionCount.cpp
parseAggregateFunctionParameters.cpp
)
extract_into_parent_list(clickhouse_aggregate_functions_headers dbms_headers
IAggregateFunction.h
IAggregateFunctionCombinator.h
Combinators/IAggregateFunctionCombinator.h
AggregateFunctionFactory.h
AggregateFunctionCombinatorFactory.h
AggregateFunctionState.h
Combinators/AggregateFunctionCombinatorFactory.h
Combinators/AggregateFunctionState.h
AggregateFunctionCount.cpp
FactoryHelpers.h
parseAggregateFunctionParameters.h

View File

@ -0,0 +1,93 @@
#include "AggregateFunctionArgMinMax.h"
#include "AggregateFunctionCombinatorFactory.h"
#include <AggregateFunctions/AggregateFunctionMinMaxAny.h>
#include <DataTypes/DataTypeDate.h>
#include <DataTypes/DataTypeDateTime.h>
#include <DataTypes/DataTypeString.h>
namespace DB
{
namespace ErrorCodes
{
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
}
namespace
{
template <template <typename> class Data>
class AggregateFunctionCombinatorArgMinMax final : public IAggregateFunctionCombinator
{
public:
String getName() const override { return Data<SingleValueDataGeneric<>>::name(); }
DataTypes transformArguments(const DataTypes & arguments) const override
{
if (arguments.empty())
throw Exception(
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Incorrect number of arguments for aggregate function with {} suffix",
getName());
return DataTypes(arguments.begin(), arguments.end() - 1);
}
AggregateFunctionPtr transformAggregateFunction(
const AggregateFunctionPtr & nested_function,
const AggregateFunctionProperties &,
const DataTypes & arguments,
const Array & params) const override
{
const DataTypePtr & argument_type = arguments.back();
WhichDataType which(argument_type);
#define DISPATCH(TYPE) \
if (which.idx == TypeIndex::TYPE) \
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataFixed<TYPE>>>>(nested_function, arguments, params); /// NOLINT
FOR_NUMERIC_TYPES(DISPATCH)
#undef DISPATCH
if (which.idx == TypeIndex::Date)
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataFixed<DataTypeDate::FieldType>>>>(
nested_function, arguments, params);
if (which.idx == TypeIndex::DateTime)
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataFixed<DataTypeDateTime::FieldType>>>>(
nested_function, arguments, params);
if (which.idx == TypeIndex::DateTime64)
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataFixed<DateTime64>>>>(nested_function, arguments, params);
if (which.idx == TypeIndex::Decimal32)
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataFixed<Decimal32>>>>(nested_function, arguments, params);
if (which.idx == TypeIndex::Decimal64)
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataFixed<Decimal64>>>>(nested_function, arguments, params);
if (which.idx == TypeIndex::Decimal128)
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataFixed<Decimal128>>>>(nested_function, arguments, params);
if (which.idx == TypeIndex::Decimal256)
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataFixed<Decimal256>>>>(nested_function, arguments, params);
if (which.idx == TypeIndex::String)
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataString>>>(nested_function, arguments, params);
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataGeneric<>>>>(nested_function, arguments, params);
}
};
template <typename Data>
struct AggregateFunctionArgMinDataCapitalized : AggregateFunctionMinData<Data>
{
static const char * name() { return "ArgMin"; }
};
template <typename Data>
struct AggregateFunctionArgMaxDataCapitalized : AggregateFunctionMaxData<Data>
{
static const char * name() { return "ArgMax"; }
};
}
void registerAggregateFunctionCombinatorMinMax(AggregateFunctionCombinatorFactory & factory)
{
factory.registerCombinator(std::make_shared<AggregateFunctionCombinatorArgMinMax<AggregateFunctionArgMinDataCapitalized>>());
factory.registerCombinator(std::make_shared<AggregateFunctionCombinatorArgMinMax<AggregateFunctionArgMaxDataCapitalized>>());
}
}

View File

@ -0,0 +1,111 @@
#pragma once
#include <AggregateFunctions/IAggregateFunction.h>
namespace DB
{
template <typename Key>
class AggregateFunctionArgMinMax final : public IAggregateFunctionHelper<AggregateFunctionArgMinMax<Key>>
{
private:
AggregateFunctionPtr nested_function;
SerializationPtr serialization;
size_t key_col;
size_t key_offset;
Key & key(AggregateDataPtr __restrict place) const { return *reinterpret_cast<Key *>(place + key_offset); }
const Key & key(ConstAggregateDataPtr __restrict place) const { return *reinterpret_cast<const Key *>(place + key_offset); }
public:
AggregateFunctionArgMinMax(AggregateFunctionPtr nested_function_, const DataTypes & arguments, const Array & params)
: IAggregateFunctionHelper<AggregateFunctionArgMinMax<Key>>{arguments, params, nested_function_->getResultType()}
, nested_function{nested_function_}
, serialization(arguments.back()->getDefaultSerialization())
, key_col{arguments.size() - 1}
, key_offset{(nested_function->sizeOfData() + alignof(Key) - 1) / alignof(Key) * alignof(Key)}
{
}
String getName() const override { return nested_function->getName() + Key::name(); }
bool isState() const override { return nested_function->isState(); }
bool isVersioned() const override { return nested_function->isVersioned(); }
size_t getVersionFromRevision(size_t revision) const override { return nested_function->getVersionFromRevision(revision); }
size_t getDefaultVersion() const override { return nested_function->getDefaultVersion(); }
bool allocatesMemoryInArena() const override { return nested_function->allocatesMemoryInArena() || Key::allocatesMemoryInArena(); }
bool hasTrivialDestructor() const override { return nested_function->hasTrivialDestructor(); }
size_t sizeOfData() const override { return key_offset + sizeof(Key); }
size_t alignOfData() const override { return nested_function->alignOfData(); }
void create(AggregateDataPtr __restrict place) const override
{
nested_function->create(place);
new (place + key_offset) Key;
}
void destroy(AggregateDataPtr __restrict place) const noexcept override { nested_function->destroy(place); }
void destroyUpToState(AggregateDataPtr __restrict place) const noexcept override { nested_function->destroyUpToState(place); }
void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena * arena) const override
{
if (key(place).changeIfBetter(*columns[key_col], row_num, arena))
{
nested_function->destroy(place);
nested_function->create(place);
nested_function->add(place, columns, row_num, arena);
}
else if (key(place).isEqualTo(*columns[key_col], row_num))
{
nested_function->add(place, columns, row_num, arena);
}
}
void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, Arena * arena) const override
{
if (key(place).changeIfBetter(key(rhs), arena))
{
nested_function->destroy(place);
nested_function->create(place);
nested_function->merge(place, rhs, arena);
}
else if (key(place).isEqualTo(key(rhs)))
{
nested_function->merge(place, rhs, arena);
}
}
void serialize(ConstAggregateDataPtr __restrict place, WriteBuffer & buf, std::optional<size_t> version) const override
{
nested_function->serialize(place, buf, version);
key(place).write(buf, *serialization);
}
void deserialize(AggregateDataPtr __restrict place, ReadBuffer & buf, std::optional<size_t> version, Arena * arena) const override
{
nested_function->deserialize(place, buf, version, arena);
key(place).read(buf, *serialization, arena);
}
void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
{
nested_function->insertResultInto(place, to, arena);
}
void insertMergeResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
{
nested_function->insertMergeResultInto(place, to, arena);
}
AggregateFunctionPtr getNestedFunction() const override { return nested_function; }
};
}

View File

@ -1,7 +1,7 @@
#include <AggregateFunctions/AggregateFunctionArray.h>
#include <AggregateFunctions/AggregateFunctionCombinatorFactory.h>
#include <Common/typeid_cast.h>
#include "AggregateFunctionArray.h"
#include "AggregateFunctionCombinatorFactory.h"
#include <Common/typeid_cast.h>
namespace DB
{

View File

@ -1,6 +1,6 @@
#include <Common/StringUtils/StringUtils.h>
#include <AggregateFunctions/AggregateFunctionCombinatorFactory.h>
#include "AggregateFunctionCombinatorFactory.h"
#include <Common/StringUtils/StringUtils.h>
namespace DB
{

View File

@ -1,7 +1,6 @@
#pragma once
#include <AggregateFunctions/IAggregateFunctionCombinator.h>
#include "IAggregateFunctionCombinator.h"
#include <string>
#include <unordered_map>

View File

@ -1,9 +1,9 @@
#include <AggregateFunctions/AggregateFunctionDistinct.h>
#include <AggregateFunctions/AggregateFunctionCombinatorFactory.h>
#include "AggregateFunctionDistinct.h"
#include "AggregateFunctionCombinatorFactory.h"
#include <AggregateFunctions/Helpers.h>
#include <Common/typeid_cast.h>
namespace DB
{
struct Settings;

View File

@ -1,5 +1,6 @@
#include <AggregateFunctions/AggregateFunctionForEach.h>
#include <AggregateFunctions/AggregateFunctionCombinatorFactory.h>
#include "AggregateFunctionForEach.h"
#include "AggregateFunctionCombinatorFactory.h"
#include <Common/typeid_cast.h>

View File

@ -1,5 +1,5 @@
#include <AggregateFunctions/AggregateFunctionCombinatorFactory.h>
#include <AggregateFunctions/AggregateFunctionIf.h>
#include "AggregateFunctionCombinatorFactory.h"
#include "AggregateFunctionIf.h"
#include "AggregateFunctionNull.h"
namespace DB

View File

@ -1,46 +1,42 @@
#pragma once
#include <unordered_map>
#include <base/sort.h>
#include <AggregateFunctions/AggregateFunctionCombinatorFactory.h>
#include <AggregateFunctions/AggregateFunctionFactory.h>
#include <AggregateFunctions/IAggregateFunction.h>
#include <Columns/ColumnFixedString.h>
#include <Columns/ColumnMap.h>
#include <Columns/ColumnString.h>
#include <Columns/ColumnTuple.h>
#include <Columns/ColumnVector.h>
#include <Core/ColumnWithTypeAndName.h>
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypeMap.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypesNumber.h>
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionHelpers.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h>
#include "DataTypes/Serializations/ISerialization.h"
#include <base/IPv4andIPv6.h>
#include "base/types.h"
#include <Common/formatIPv6.h>
#include <Common/Arena.h>
#include "AggregateFunctions/AggregateFunctionFactory.h"
#include "AggregateFunctionCombinatorFactory.h"
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
}
namespace
{
template <typename KeyType>
struct AggregateFunctionMapCombinatorData
{
using SearchType = KeyType;
std::unordered_map<KeyType, AggregateDataPtr> merged_maps;
static void writeKey(KeyType key, WriteBuffer & buf) { writeBinary(key, buf); }
static void readKey(KeyType & key, ReadBuffer & buf) { readBinary(key, buf); }
static void writeKey(KeyType key, WriteBuffer & buf) { writeBinaryLittleEndian(key, buf); }
static void readKey(KeyType & key, ReadBuffer & buf) { readBinaryLittleEndian(key, buf); }
};
template <>
@ -54,11 +50,7 @@ struct AggregateFunctionMapCombinatorData<String>
size_t operator()(std::string_view str) const { return hash_type{}(str); }
};
#ifdef __cpp_lib_generic_unordered_lookup
using SearchType = std::string_view;
#else
using SearchType = std::string;
#endif
std::unordered_map<String, AggregateDataPtr, StringHash, std::equal_to<>> merged_maps;
static void writeKey(String key, WriteBuffer & buf)
@ -179,11 +171,7 @@ public:
else
key_ref = assert_cast<const ColumnString &>(key_column).getDataAt(offset + i);
#ifdef __cpp_lib_generic_unordered_lookup
key = key_ref.toView();
#else
key = key_ref.toString();
#endif
}
else
{
@ -347,4 +335,132 @@ public:
AggregateFunctionPtr getNestedFunction() const override { return nested_func; }
};
class AggregateFunctionCombinatorMap final : public IAggregateFunctionCombinator
{
public:
String getName() const override { return "Map"; }
DataTypes transformArguments(const DataTypes & arguments) const override
{
if (arguments.empty())
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Incorrect number of arguments for aggregate function with {} suffix", getName());
const auto * map_type = checkAndGetDataType<DataTypeMap>(arguments[0].get());
if (map_type)
{
if (arguments.size() > 1)
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "{} combinator takes only one map argument", getName());
return DataTypes({map_type->getValueType()});
}
// we need this part just to pass to redirection for mapped arrays
auto check_func = [](DataTypePtr t) { return t->getTypeId() == TypeIndex::Array; };
const auto * tup_type = checkAndGetDataType<DataTypeTuple>(arguments[0].get());
if (tup_type)
{
const auto & types = tup_type->getElements();
bool arrays_match = arguments.size() == 1 && types.size() >= 2 && std::all_of(types.begin(), types.end(), check_func);
if (arrays_match)
{
const auto * val_array_type = assert_cast<const DataTypeArray *>(types[1].get());
return DataTypes({val_array_type->getNestedType()});
}
}
else
{
bool arrays_match = arguments.size() >= 2 && std::all_of(arguments.begin(), arguments.end(), check_func);
if (arrays_match)
{
const auto * val_array_type = assert_cast<const DataTypeArray *>(arguments[1].get());
return DataTypes({val_array_type->getNestedType()});
}
}
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Aggregate function {} requires map as argument", getName());
}
AggregateFunctionPtr transformAggregateFunction(
const AggregateFunctionPtr & nested_function,
const AggregateFunctionProperties &,
const DataTypes & arguments,
const Array & params) const override
{
const auto * map_type = checkAndGetDataType<DataTypeMap>(arguments[0].get());
if (map_type)
{
const auto & key_type = map_type->getKeyType();
switch (key_type->getTypeId())
{
case TypeIndex::Enum8:
case TypeIndex::Int8:
return std::make_shared<AggregateFunctionMap<Int8>>(nested_function, arguments);
case TypeIndex::Enum16:
case TypeIndex::Int16:
return std::make_shared<AggregateFunctionMap<Int16>>(nested_function, arguments);
case TypeIndex::Int32:
return std::make_shared<AggregateFunctionMap<Int32>>(nested_function, arguments);
case TypeIndex::Int64:
return std::make_shared<AggregateFunctionMap<Int64>>(nested_function, arguments);
case TypeIndex::Int128:
return std::make_shared<AggregateFunctionMap<Int128>>(nested_function, arguments);
case TypeIndex::Int256:
return std::make_shared<AggregateFunctionMap<Int256>>(nested_function, arguments);
case TypeIndex::UInt8:
return std::make_shared<AggregateFunctionMap<UInt8>>(nested_function, arguments);
case TypeIndex::Date:
case TypeIndex::UInt16:
return std::make_shared<AggregateFunctionMap<UInt16>>(nested_function, arguments);
case TypeIndex::DateTime:
case TypeIndex::UInt32:
return std::make_shared<AggregateFunctionMap<UInt32>>(nested_function, arguments);
case TypeIndex::UInt64:
return std::make_shared<AggregateFunctionMap<UInt64>>(nested_function, arguments);
case TypeIndex::UInt128:
return std::make_shared<AggregateFunctionMap<UInt128>>(nested_function, arguments);
case TypeIndex::UInt256:
return std::make_shared<AggregateFunctionMap<UInt256>>(nested_function, arguments);
case TypeIndex::UUID:
return std::make_shared<AggregateFunctionMap<UUID>>(nested_function, arguments);
case TypeIndex::IPv4:
return std::make_shared<AggregateFunctionMap<IPv4>>(nested_function, arguments);
case TypeIndex::IPv6:
return std::make_shared<AggregateFunctionMap<IPv6>>(nested_function, arguments);
case TypeIndex::FixedString:
case TypeIndex::String:
return std::make_shared<AggregateFunctionMap<String>>(nested_function, arguments);
default:
throw Exception(
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Map key type {} is not is not supported by combinator {}", key_type->getName(), getName());
}
}
else
{
// in case of tuple of arrays or just arrays (checked in transformArguments), try to redirect to sum/min/max-MappedArrays to implement old behavior
auto nested_func_name = nested_function->getName();
if (nested_func_name == "sum" || nested_func_name == "min" || nested_func_name == "max")
{
AggregateFunctionProperties out_properties;
auto & aggr_func_factory = AggregateFunctionFactory::instance();
return aggr_func_factory.get(nested_func_name + "MappedArrays", arguments, params, out_properties);
}
else
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Aggregation '{}Map' is not implemented for mapped arrays",
nested_func_name);
}
}
};
}
void registerAggregateFunctionCombinatorMap(AggregateFunctionCombinatorFactory & factory)
{
factory.registerCombinator(std::make_shared<AggregateFunctionCombinatorMap>());
}
}

View File

@ -1,7 +1,7 @@
#include <AggregateFunctions/AggregateFunctionMerge.h>
#include <AggregateFunctions/AggregateFunctionCombinatorFactory.h>
#include <DataTypes/DataTypeAggregateFunction.h>
#include "AggregateFunctionMerge.h"
#include "AggregateFunctionCombinatorFactory.h"
#include <DataTypes/DataTypeAggregateFunction.h>
namespace DB
{

View File

@ -1,9 +1,12 @@
#include <DataTypes/DataTypeNullable.h>
#include <AggregateFunctions/AggregateFunctionNull.h>
#include "AggregateFunctionNull.h"
#include "AggregateFunctionState.h"
#include "AggregateFunctionSimpleState.h"
#include "AggregateFunctionCombinatorFactory.h"
#include <AggregateFunctions/AggregateFunctionNothing.h>
#include <AggregateFunctions/AggregateFunctionState.h>
#include <AggregateFunctions/AggregateFunctionCombinatorFactory.h>
#include <AggregateFunctions/AggregateFunctionSimpleState.h>
#include <AggregateFunctions/AggregateFunctionCount.h>
#include <DataTypes/DataTypeNullable.h>
namespace DB
{

Some files were not shown because too many files have changed in this diff Show More