Merge branch 'master' into async-loader-integration

This commit is contained in:
serxa 2023-11-07 10:25:38 +00:00
commit 19aa1375fc
647 changed files with 12008 additions and 4309 deletions

View File

@ -1,4 +1,5 @@
### Table of Contents
**[ClickHouse release v23.10, 2023-11-02](#2310)**<br/>
**[ClickHouse release v23.9, 2023-09-28](#239)**<br/>
**[ClickHouse release v23.8 LTS, 2023-08-31](#238)**<br/>
**[ClickHouse release v23.7, 2023-07-27](#237)**<br/>
@ -12,6 +13,184 @@
# 2023 Changelog
### ClickHouse release 23.10, 2023-11-02
#### Backward Incompatible Change
* There is no longer an option to automatically remove broken data parts. This closes [#55174](https://github.com/ClickHouse/ClickHouse/issues/55174). [#55184](https://github.com/ClickHouse/ClickHouse/pull/55184) ([Alexey Milovidov](https://github.com/alexey-milovidov)). [#55557](https://github.com/ClickHouse/ClickHouse/pull/55557) ([Jihyuk Bok](https://github.com/tomahawk28)).
* The obsolete in-memory data parts can no longer be read from the write-ahead log. If you have configured in-memory parts before, they have to be removed before the upgrade. [#55186](https://github.com/ClickHouse/ClickHouse/pull/55186) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove the integration with Meilisearch. Reason: it was compatible only with the old version 0.18. The recent version of Meilisearch changed the protocol and does not work anymore. Note: we would appreciate it if you help to return it back. [#55189](https://github.com/ClickHouse/ClickHouse/pull/55189) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Rename directory monitor concept into background INSERT. All the settings `*directory_monitor*` had been renamed to `distributed_background_insert*`. *Backward compatibility should be preserved* (since old settings had been added as an alias). [#55978](https://github.com/ClickHouse/ClickHouse/pull/55978) ([Azat Khuzhin](https://github.com/azat)).
* Do not interpret the `send_timeout` set on the client side as the `receive_timeout` on the server side and vise-versa. [#56035](https://github.com/ClickHouse/ClickHouse/pull/56035) ([Azat Khuzhin](https://github.com/azat)).
* Comparison of time intervals with different units will throw an exception. This closes [#55942](https://github.com/ClickHouse/ClickHouse/issues/55942). You might have occasionally rely on the previous behavior when the underlying numeric values were compared regardless of the units. [#56090](https://github.com/ClickHouse/ClickHouse/pull/56090) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Rewrited the experimental `S3Queue` table engine completely: changed the way we keep information in zookeeper which allows to make less zookeeper requests, added caching of zookeeper state in cases when we know the state will not change, improved the polling from s3 process to make it less aggressive, changed the way ttl and max set for trached files is maintained, now it is a background process. Added `system.s3queue` and `system.s3queue_log` tables. Closes [#54998](https://github.com/ClickHouse/ClickHouse/issues/54998). [#54422](https://github.com/ClickHouse/ClickHouse/pull/54422) ([Kseniia Sumarokova](https://github.com/kssenii)).
#### New Feature
* Add function `arrayFold(accumulator, x1, ..., xn -> expression, initial, array1, ..., arrayn)` which applies a lambda function to multiple arrays of the same cardinality and collects the result in an accumulator. [#49794](https://github.com/ClickHouse/ClickHouse/pull/49794) ([Lirikl](https://github.com/Lirikl)).
* Support for `Npy` format. `SELECT * FROM file('example_array.npy', Npy)`. [#55982](https://github.com/ClickHouse/ClickHouse/pull/55982) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* If a table has a space-filling curve in its key, e.g., `ORDER BY mortonEncode(x, y)`, the conditions on its arguments, e.g., `x >= 10 AND x <= 20 AND y >= 20 AND y <= 30` can be used for indexing. A setting `analyze_index_with_space_filling_curves` is added to enable or disable this analysis. This closes [#41195](https://github.com/ClickHouse/ClickHouse/issue/41195). Continuation of [#4538](https://github.com/ClickHouse/ClickHouse/pull/4538). Continuation of [#6286](https://github.com/ClickHouse/ClickHouse/pull/6286). Continuation of [#28130](https://github.com/ClickHouse/ClickHouse/pull/28130). Continuation of [#41753](https://github.com/ClickHouse/ClickHouse/pull/#41753). [#55642](https://github.com/ClickHouse/ClickHouse/pull/55642) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* A new setting called `force_optimize_projection_name`, it takes a name of projection as an argument. If it's value set to a non-empty string, ClickHouse checks that this projection is used in the query at least once. Closes [#55331](https://github.com/ClickHouse/ClickHouse/issues/55331). [#56134](https://github.com/ClickHouse/ClickHouse/pull/56134) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Support asynchronous inserts with external data via native protocol. Previously it worked only if data is inlined into query. [#54730](https://github.com/ClickHouse/ClickHouse/pull/54730) ([Anton Popov](https://github.com/CurtizJ)).
* Added aggregation function `lttb` which uses the [Largest-Triangle-Three-Buckets](https://skemman.is/bitstream/1946/15343/3/SS_MSthesis.pdf) algorithm for downsampling data for visualization. [#53145](https://github.com/ClickHouse/ClickHouse/pull/53145) ([Sinan](https://github.com/sinsinan)).
* Query`CHECK TABLE` has better performance and usability (sends progress updates, cancellable). Support checking particular part with `CHECK TABLE ... PART 'part_name'`. [#53404](https://github.com/ClickHouse/ClickHouse/pull/53404) ([vdimir](https://github.com/vdimir)).
* Added function `jsonMergePatch`. When working with JSON data as strings, it provides a way to merge these strings (of JSON objects) together to form a single string containing a single JSON object. [#54364](https://github.com/ClickHouse/ClickHouse/pull/54364) ([Memo](https://github.com/Joeywzr)).
* The second part of Kusto Query Language dialect support. [Phase 1 implementation ](https://github.com/ClickHouse/ClickHouse/pull/37961) has been merged. [#42510](https://github.com/ClickHouse/ClickHouse/pull/42510) ([larryluogit](https://github.com/larryluogit)).
* Added a new SQL function, `arrayRandomSample(arr, k)` which returns a sample of k elements from the input array. Similar functionality could previously be achieved only with less convenient syntax, e.g. "SELECT arrayReduce('groupArraySample(3)', range(10))". [#54391](https://github.com/ClickHouse/ClickHouse/pull/54391) ([itayisraelov](https://github.com/itayisraelov)).
* Introduce `-ArgMin`/`-ArgMax` aggregate combinators which allow to aggregate by min/max values only. One use case can be found in [#54818](https://github.com/ClickHouse/ClickHouse/issues/54818). This PR also reorganize combinators into dedicated folder. [#54947](https://github.com/ClickHouse/ClickHouse/pull/54947) ([Amos Bird](https://github.com/amosbird)).
* Allow to drop cache for Protobuf format with `SYSTEM DROP SCHEMA FORMAT CACHE [FOR Protobuf]`. [#55064](https://github.com/ClickHouse/ClickHouse/pull/55064) ([Aleksandr Musorin](https://github.com/AVMusorin)).
* Add external HTTP Basic authenticator. [#55199](https://github.com/ClickHouse/ClickHouse/pull/55199) ([Aleksei Filatov](https://github.com/aalexfvk)).
* Added function `byteSwap` which reverses the bytes of unsigned integers. This is particularly useful for reversing values of types which are represented as unsigned integers internally such as IPv4. [#55211](https://github.com/ClickHouse/ClickHouse/pull/55211) ([Priyansh Agrawal](https://github.com/Priyansh121096)).
* Added function `formatQuery()` which returns a formatted version (possibly spanning multiple lines) of a SQL query string. Also added function `formatQuerySingleLine()` which does the same but the returned string will not contain linebreaks. [#55239](https://github.com/ClickHouse/ClickHouse/pull/55239) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Added `DWARF` input format that reads debug symbols from an ELF executable/library/object file. [#55450](https://github.com/ClickHouse/ClickHouse/pull/55450) ([Michael Kolupaev](https://github.com/al13n321)).
* Allow to save unparsed records and errors in RabbitMQ, NATS and FileLog engines. Add virtual columns `_error` and `_raw_message`(for NATS and RabbitMQ), `_raw_record` (for FileLog) that are filled when ClickHouse fails to parse new record. The behaviour is controlled under storage settings `nats_handle_error_mode` for NATS, `rabbitmq_handle_error_mode` for RabbitMQ, `handle_error_mode` for FileLog similar to `kafka_handle_error_mode`. If it's set to `default`, en exception will be thrown when ClickHouse fails to parse a record, if it's set to `stream`, erorr and raw record will be saved into virtual columns. Closes [#36035](https://github.com/ClickHouse/ClickHouse/issues/36035). [#55477](https://github.com/ClickHouse/ClickHouse/pull/55477) ([Kruglov Pavel](https://github.com/Avogar)).
* Keeper client improvement: add `get_all_children_number command` that returns number of all children nodes under a specific path. [#55485](https://github.com/ClickHouse/ClickHouse/pull/55485) ([guoxiaolong](https://github.com/guoxiaolongzte)).
* Keeper client improvement: add `get_direct_children_number` command that returns number of direct children nodes under a path. [#55898](https://github.com/ClickHouse/ClickHouse/pull/55898) ([xuzifu666](https://github.com/xuzifu666)).
* Add statement `SHOW SETTING setting_name` which is a simpler version of existing statement `SHOW SETTINGS`. [#55979](https://github.com/ClickHouse/ClickHouse/pull/55979) ([Maksim Kita](https://github.com/kitaisreal)).
* Added fields `substreams` and `filenames` to the `system.parts_columns` table. [#55108](https://github.com/ClickHouse/ClickHouse/pull/55108) ([Anton Popov](https://github.com/CurtizJ)).
* Add support for `SHOW MERGES` query. [#55815](https://github.com/ClickHouse/ClickHouse/pull/55815) ([megao](https://github.com/jetgm)).
* Introduce a setting `create_table_empty_primary_key_by_default` for default `ORDER BY ()`. [#55899](https://github.com/ClickHouse/ClickHouse/pull/55899) ([Srikanth Chekuri](https://github.com/srikanthccv)).
#### Performance Improvement
* Add option `query_plan_preserve_num_streams_after_window_functions` to preserve the number of streams after evaluating window functions to allow parallel stream processing. [#50771](https://github.com/ClickHouse/ClickHouse/pull/50771) ([frinkr](https://github.com/frinkr)).
* Release more streams if data is small. [#53867](https://github.com/ClickHouse/ClickHouse/pull/53867) ([Jiebin Sun](https://github.com/jiebinn)).
* RoaringBitmaps being optimized before serialization. [#55044](https://github.com/ClickHouse/ClickHouse/pull/55044) ([UnamedRus](https://github.com/UnamedRus)).
* Posting lists in inverted indexes are now optimized to use the smallest possible representation for internal bitmaps. Depending on the repetitiveness of the data, this may significantly reduce the space consumption of inverted indexes. [#55069](https://github.com/ClickHouse/ClickHouse/pull/55069) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Fix contention on Context lock, this significantly improves performance for a lot of short-running concurrent queries. [#55121](https://github.com/ClickHouse/ClickHouse/pull/55121) ([Maksim Kita](https://github.com/kitaisreal)).
* Improved the performance of inverted index creation by 30%. This was achieved by replacing `std::unordered_map` with `absl::flat_hash_map`. [#55210](https://github.com/ClickHouse/ClickHouse/pull/55210) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Support ORC filter push down (rowgroup level). [#55330](https://github.com/ClickHouse/ClickHouse/pull/55330) ([李扬](https://github.com/taiyang-li)).
* Improve performance of external aggregation with a lot of temporary files. [#55489](https://github.com/ClickHouse/ClickHouse/pull/55489) ([Maksim Kita](https://github.com/kitaisreal)).
* Set a reasonable size for the marks cache for secondary indices by default to avoid loading the marks over and over again. [#55654](https://github.com/ClickHouse/ClickHouse/pull/55654) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Avoid unnecessary reconstruction of index granules when reading skip indexes. This addresses [#55653](https://github.com/ClickHouse/ClickHouse/issues/55653#issuecomment-1763766009). [#55683](https://github.com/ClickHouse/ClickHouse/pull/55683) ([Amos Bird](https://github.com/amosbird)).
* Cache CAST function in set during execution to improve the performance of function `IN` when set element type doesn't exactly match column type. [#55712](https://github.com/ClickHouse/ClickHouse/pull/55712) ([Duc Canh Le](https://github.com/canhld94)).
* Performance improvement for `ColumnVector::insertMany` and `ColumnVector::insertManyFrom`. [#55714](https://github.com/ClickHouse/ClickHouse/pull/55714) ([frinkr](https://github.com/frinkr)).
* Optimized Map subscript operations by predicting the next row's key position and reduce the comparisons. [#55929](https://github.com/ClickHouse/ClickHouse/pull/55929) ([lgbo](https://github.com/lgbo-ustc)).
* Support struct fields pruning in Parquet (in previous versions it didn't work in some cases). [#56117](https://github.com/ClickHouse/ClickHouse/pull/56117) ([lgbo](https://github.com/lgbo-ustc)).
* Add the ability to tune the number of parallel replicas used in a query execution based on the estimation of rows to read. [#51692](https://github.com/ClickHouse/ClickHouse/pull/51692) ([Raúl Marín](https://github.com/Algunenano)).
* Optimized external aggregation memory consumption in case many temporary files were generated. [#54798](https://github.com/ClickHouse/ClickHouse/pull/54798) ([Nikita Taranov](https://github.com/nickitat)).
* Distributed queries executed in `async_socket_for_remote` mode (default) now respect `max_threads` limit. Previously, some queries could create excessive threads (up to `max_distributed_connections`), causing server performance issues. [#53504](https://github.com/ClickHouse/ClickHouse/pull/53504) ([filimonov](https://github.com/filimonov)).
* Caching skip-able entries while executing DDL from Zookeeper distributed DDL queue. [#54828](https://github.com/ClickHouse/ClickHouse/pull/54828) ([Duc Canh Le](https://github.com/canhld94)).
* Experimental inverted indexes do not store tokens with too many matches (i.e. row ids in the posting list). This saves space and avoids ineffective index lookups when sequential scans would be equally fast or faster. The previous heuristics (`density` parameter passed to the index definition) that controlled when tokens would not be stored was too confusing for users. A much simpler heuristics based on parameter `max_rows_per_postings_list` (default: 64k) is introduced which directly controls the maximum allowed number of row ids in a postings list. [#55616](https://github.com/ClickHouse/ClickHouse/pull/55616) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Improve write performance to `EmbeddedRocksDB` tables. [#55732](https://github.com/ClickHouse/ClickHouse/pull/55732) ([Duc Canh Le](https://github.com/canhld94)).
* Improved overall resilience for ClickHouse in case of many parts within partition (more than 1000). It might reduce the number of `TOO_MANY_PARTS` errors. [#55526](https://github.com/ClickHouse/ClickHouse/pull/55526) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Reduced memory consumption during loading of hierarchical dictionaries. [#55838](https://github.com/ClickHouse/ClickHouse/pull/55838) ([Nikita Taranov](https://github.com/nickitat)).
* All dictionaries support setting `dictionary_use_async_executor`. [#55839](https://github.com/ClickHouse/ClickHouse/pull/55839) ([vdimir](https://github.com/vdimir)).
* Prevent excesive memory usage when deserializing AggregateFunctionTopKGenericData. [#55947](https://github.com/ClickHouse/ClickHouse/pull/55947) ([Raúl Marín](https://github.com/Algunenano)).
* On a Keeper with lots of watches AsyncMetrics threads can consume 100% of CPU for noticable time in `DB::KeeperStorage::getSessionsWithWatchesCount()`. The fix is to avoid traversing heavy `watches` and `list_watches` sets. [#56054](https://github.com/ClickHouse/ClickHouse/pull/56054) ([Alexander Gololobov](https://github.com/davenger)).
* Add setting `optimize_trivial_approximate_count_query` to use `count()` approximation for storage EmbeddedRocksDB. Enable trivial count for StorageJoin. [#55806](https://github.com/ClickHouse/ClickHouse/pull/55806) ([Duc Canh Le](https://github.com/canhld94)).
#### Improvement
* Functions `toDayOfWeek()` (MySQL alias: `DAYOFWEEK()`), `toYearWeek()` (`YEARWEEK()`) and `toWeek()` (`WEEK()`) now supports `String` arguments. This makes its behavior consistent with MySQL's behavior. [#55589](https://github.com/ClickHouse/ClickHouse/pull/55589) ([Robert Schulze](https://github.com/rschu1ze)).
* Introduced setting `date_time_overflow_behavior` with possible values `ignore`, `throw`, `saturate` that controls the overflow behavior when converting from Date, Date32, DateTime64, Integer or Float to Date, Date32, DateTime or DateTime64. [#55696](https://github.com/ClickHouse/ClickHouse/pull/55696) ([Andrey Zvonov](https://github.com/zvonand)).
* Implement query parameters support for `ALTER TABLE ... ACTION PARTITION [ID] {parameter_name:ParameterType}`. Merges [#49516](https://github.com/ClickHouse/ClickHouse/issues/49516). Closes [#49449](https://github.com/ClickHouse/ClickHouse/issues/49449). [#55604](https://github.com/ClickHouse/ClickHouse/pull/55604) ([alesapin](https://github.com/alesapin)).
* Print processor ids in a prettier manner in EXPLAIN. [#48852](https://github.com/ClickHouse/ClickHouse/pull/48852) ([Vlad Seliverstov](https://github.com/behebot)).
* Creating a direct dictionary with a lifetime field will be rejected at create time (as the lifetime does not make sense for direct dictionaries). Fixes: [#27861](https://github.com/ClickHouse/ClickHouse/issues/27861). [#49043](https://github.com/ClickHouse/ClickHouse/pull/49043) ([Rory Crispin](https://github.com/RoryCrispin)).
* Allow parameters in queries with partitions like `ALTER TABLE t DROP PARTITION`. Closes [#49449](https://github.com/ClickHouse/ClickHouse/issues/49449). [#49516](https://github.com/ClickHouse/ClickHouse/pull/49516) ([Nikolay Degterinsky](https://github.com/evillique)).
* Add a new column `xid` for `system.zookeeper_connection`. [#50702](https://github.com/ClickHouse/ClickHouse/pull/50702) ([helifu](https://github.com/helifu)).
* Display the correct server settings in `system.server_settings` after configuration reload. [#53774](https://github.com/ClickHouse/ClickHouse/pull/53774) ([helifu](https://github.com/helifu)).
* Add support for mathematical minus `` character in queries, similar to `-`. [#54100](https://github.com/ClickHouse/ClickHouse/pull/54100) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add replica groups to the experimental `Replicated` database engine. Closes [#53620](https://github.com/ClickHouse/ClickHouse/issues/53620). [#54421](https://github.com/ClickHouse/ClickHouse/pull/54421) ([Nikolay Degterinsky](https://github.com/evillique)).
* It is better to retry retriable s3 errors than totally fail the query. Set bigger value to the s3_retry_attempts by default. [#54770](https://github.com/ClickHouse/ClickHouse/pull/54770) ([Sema Checherinda](https://github.com/CheSema)).
* Add load balancing mode `hostname_levenshtein_distance`. [#54826](https://github.com/ClickHouse/ClickHouse/pull/54826) ([JackyWoo](https://github.com/JackyWoo)).
* Improve hiding secrets in logs. [#55089](https://github.com/ClickHouse/ClickHouse/pull/55089) ([Vitaly Baranov](https://github.com/vitlibar)).
* For now the projection analysis will be performed only on top of query plan. The setting `query_plan_optimize_projection` became obsolete (it was enabled by default long time ago). [#55112](https://github.com/ClickHouse/ClickHouse/pull/55112) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* When function `untuple` is now called on a tuple with named elements and itself has an alias (e.g. `select untuple(tuple(1)::Tuple(element_alias Int)) AS untuple_alias`), then the result column name is now generated from the untuple alias and the tuple element alias (in the example: "untuple_alias.element_alias"). [#55123](https://github.com/ClickHouse/ClickHouse/pull/55123) ([garcher22](https://github.com/garcher22)).
* Added setting `describe_include_virtual_columns`, which allows to include virtual columns of table into result of `DESCRIBE` query. Added setting `describe_compact_output`. If it is set to `true`, `DESCRIBE` query returns only names and types of columns without extra information. [#55129](https://github.com/ClickHouse/ClickHouse/pull/55129) ([Anton Popov](https://github.com/CurtizJ)).
* Sometimes `OPTIMIZE` with `optimize_throw_if_noop=1` may fail with an error `unknown reason` while the real cause of it - different projections in different parts. This behavior is fixed. [#55130](https://github.com/ClickHouse/ClickHouse/pull/55130) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Allow to have several `MaterializedPostgreSQL` tables following the same Postgres table. By default this behaviour is not enabled (for compatibility, because it is a backward-incompatible change), but can be turned on with setting `materialized_postgresql_use_unique_replication_consumer_identifier`. Closes [#54918](https://github.com/ClickHouse/ClickHouse/issues/54918). [#55145](https://github.com/ClickHouse/ClickHouse/pull/55145) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Allow to parse negative `DateTime64` and `DateTime` with fractional part from short strings. [#55146](https://github.com/ClickHouse/ClickHouse/pull/55146) ([Andrey Zvonov](https://github.com/zvonand)).
* To improve compatibility with MySQL, 1. `information_schema.tables` now includes the new field `table_rows`, and 2. `information_schema.columns` now includes the new field `extra`. [#55215](https://github.com/ClickHouse/ClickHouse/pull/55215) ([Robert Schulze](https://github.com/rschu1ze)).
* Clickhouse-client won't show "0 rows in set" if it is zero and if exception was thrown. [#55240](https://github.com/ClickHouse/ClickHouse/pull/55240) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Support rename table without keyword `TABLE` like `RENAME db.t1 to db.t2`. [#55373](https://github.com/ClickHouse/ClickHouse/pull/55373) ([凌涛](https://github.com/lingtaolf)).
* Add `internal_replication` to `system.clusters`. [#55377](https://github.com/ClickHouse/ClickHouse/pull/55377) ([Konstantin Morozov](https://github.com/k-morozov)).
* Select remote proxy resolver based on request protocol, add proxy feature docs and remove `DB::ProxyConfiguration::Protocol::ANY`. [#55430](https://github.com/ClickHouse/ClickHouse/pull/55430) ([Arthur Passos](https://github.com/arthurpassos)).
* Avoid retrying keeper operations on INSERT after table shutdown. [#55519](https://github.com/ClickHouse/ClickHouse/pull/55519) ([Azat Khuzhin](https://github.com/azat)).
* `SHOW COLUMNS` now correctly reports type `FixedString` as `BLOB` if setting `use_mysql_types_in_show_columns` is on. Also added two new settings, `mysql_map_string_to_text_in_show_columns` and `mysql_map_fixed_string_to_text_in_show_columns` to switch the output for types `String` and `FixedString` as `TEXT` or `BLOB`. [#55617](https://github.com/ClickHouse/ClickHouse/pull/55617) ([Serge Klochkov](https://github.com/slvrtrn)).
* During ReplicatedMergeTree tables startup clickhouse server checks set of parts for unexpected parts (exists locally, but not in zookeeper). All unexpected parts move to detached directory and instead of them server tries to restore some ancestor (covered) parts. Now server tries to restore closest ancestors instead of random covered parts. [#55645](https://github.com/ClickHouse/ClickHouse/pull/55645) ([alesapin](https://github.com/alesapin)).
* The advanced dashboard now supports draggable charts on touch devices. This closes [#54206](https://github.com/ClickHouse/ClickHouse/issues/54206). [#55649](https://github.com/ClickHouse/ClickHouse/pull/55649) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Use the default query format if declared when outputting exception with `http_write_exception_in_output_format`. [#55739](https://github.com/ClickHouse/ClickHouse/pull/55739) ([Raúl Marín](https://github.com/Algunenano)).
* Provide a better message for common MATERIALIZED VIEW pitfalls. [#55826](https://github.com/ClickHouse/ClickHouse/pull/55826) ([Raúl Marín](https://github.com/Algunenano)).
* If you dropped the current database, you will still be able to run some queries in `clickhouse-local` and switch to another database. This makes the behavior consistent with `clickhouse-client`. This closes [#55834](https://github.com/ClickHouse/ClickHouse/issues/55834). [#55853](https://github.com/ClickHouse/ClickHouse/pull/55853) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Functions `(add|subtract)(Year|Quarter|Month|Week|Day|Hour|Minute|Second|Millisecond|Microsecond|Nanosecond)` now support string-encoded date arguments, e.g. `SELECT addDays('2023-10-22', 1)`. This increases compatibility with MySQL and is needed by Tableau Online. [#55869](https://github.com/ClickHouse/ClickHouse/pull/55869) ([Robert Schulze](https://github.com/rschu1ze)).
* The setting `apply_deleted_mask` when disabled allows to read rows that where marked as deleted by lightweight DELETE queries. This is useful for debugging. [#55952](https://github.com/ClickHouse/ClickHouse/pull/55952) ([Alexander Gololobov](https://github.com/davenger)).
* Allow skipping `null` values when serailizing Tuple to json objects, which makes it possible to keep compatiability with Spark's `to_json` function, which is also useful for gluten. [#55956](https://github.com/ClickHouse/ClickHouse/pull/55956) ([李扬](https://github.com/taiyang-li)).
* Functions `(add|sub)Date()` now support string-encoded date arguments, e.g. `SELECT addDate('2023-10-22 11:12:13', INTERVAL 5 MINUTE)`. The same support for string-encoded date arguments is added to the plus and minus operators, e.g. `SELECT '2023-10-23' + INTERVAL 1 DAY`. This increases compatibility with MySQL and is needed by Tableau Online. [#55960](https://github.com/ClickHouse/ClickHouse/pull/55960) ([Robert Schulze](https://github.com/rschu1ze)).
* Allow unquoted strings with CR (`\r`) in CSV format. Closes [#39930](https://github.com/ClickHouse/ClickHouse/issues/39930). [#56046](https://github.com/ClickHouse/ClickHouse/pull/56046) ([Kruglov Pavel](https://github.com/Avogar)).
* Allow to run `clickhouse-keeper` using embedded config. [#56086](https://github.com/ClickHouse/ClickHouse/pull/56086) ([Maksim Kita](https://github.com/kitaisreal)).
* Set limit of the maximum configuration value for `queued.min.messages` to avoid problem with start fetching data with Kafka. [#56121](https://github.com/ClickHouse/ClickHouse/pull/56121) ([Stas Morozov](https://github.com/r3b-fish)).
* Fixed a typo in SQL function `minSampleSizeContinous` (renamed `minSampleSizeContinuous`). Old name is preserved for backward compatibility. This closes: [#56139](https://github.com/ClickHouse/ClickHouse/issues/56139). [#56143](https://github.com/ClickHouse/ClickHouse/pull/56143) ([Dorota Szeremeta](https://github.com/orotaday)).
* Print path for broken parts on disk before shutting down the server. Before this change if a part is corrupted on disk and server cannot start, it was almost impossible to understand which part is broken. This is fixed. [#56181](https://github.com/ClickHouse/ClickHouse/pull/56181) ([Duc Canh Le](https://github.com/canhld94)).
#### Build/Testing/Packaging Improvement
* If the database in Docker is already initialized, it doesn't need to be initialized again upon subsequent launches. This can potentially fix the issue of infinite container restarts when the database fails to load within 1000 attempts (relevant for very large databases and multi-node setups). [#50724](https://github.com/ClickHouse/ClickHouse/pull/50724) ([Alexander Nikolaev](https://github.com/AlexNik)).
* Resource with source code including submodules is built in Darwin special build task. It may be used to build ClickHouse without checking out the submodules. [#51435](https://github.com/ClickHouse/ClickHouse/pull/51435) ([Ilya Yatsishin](https://github.com/qoega)).
* An error was occuring when building ClickHouse with the AVX series of instructions enabled globally (which isn't recommended). The reason is that snappy does not enable `SNAPPY_HAVE_X86_CRC32`. [#55049](https://github.com/ClickHouse/ClickHouse/pull/55049) ([monchickey](https://github.com/monchickey)).
* Solve issue with launching standalone `clickhouse-keeper` from `clickhouse-server` package. [#55226](https://github.com/ClickHouse/ClickHouse/pull/55226) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* In the tests, RabbitMQ version is updated to 3.12.6. Improved logs collection for RabbitMQ tests. [#55424](https://github.com/ClickHouse/ClickHouse/pull/55424) ([Ilya Yatsishin](https://github.com/qoega)).
* Modified the error message difference between openssl and boringssl to fix the functional test. [#55975](https://github.com/ClickHouse/ClickHouse/pull/55975) ([MeenaRenganathan22](https://github.com/MeenaRenganathan22)).
* Use upstream repo for apache datasketches. [#55787](https://github.com/ClickHouse/ClickHouse/pull/55787) ([Nikita Taranov](https://github.com/nickitat)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Skip hardlinking inverted index files in mutation [#47663](https://github.com/ClickHouse/ClickHouse/pull/47663) ([cangyin](https://github.com/cangyin)).
* Fixed bug of `match` function (regex) with pattern containing alternation produces incorrect key condition. Closes #53222. [#54696](https://github.com/ClickHouse/ClickHouse/pull/54696) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix 'Cannot find column' in read-in-order optimization with ARRAY JOIN [#51746](https://github.com/ClickHouse/ClickHouse/pull/51746) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Support missed experimental `Object(Nullable(json))` subcolumns in query. [#54052](https://github.com/ClickHouse/ClickHouse/pull/54052) ([zps](https://github.com/VanDarkholme7)).
* Re-add fix for `accurateCastOrNull()` [#54629](https://github.com/ClickHouse/ClickHouse/pull/54629) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Fix detecting `DEFAULT` for columns of a Distributed table created without AS [#55060](https://github.com/ClickHouse/ClickHouse/pull/55060) ([Vitaly Baranov](https://github.com/vitlibar)).
* Proper cleanup in case of exception in ctor of ShellCommandSource [#55103](https://github.com/ClickHouse/ClickHouse/pull/55103) ([Alexander Gololobov](https://github.com/davenger)).
* Fix deadlock in LDAP assigned role update [#55119](https://github.com/ClickHouse/ClickHouse/pull/55119) ([Julian Maicher](https://github.com/jmaicher)).
* Suppress error statistics update for internal exceptions [#55128](https://github.com/ClickHouse/ClickHouse/pull/55128) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix deadlock in backups [#55132](https://github.com/ClickHouse/ClickHouse/pull/55132) ([alesapin](https://github.com/alesapin)).
* Fix storage Iceberg files retrieval [#55144](https://github.com/ClickHouse/ClickHouse/pull/55144) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix partition pruning of extra columns in set. [#55172](https://github.com/ClickHouse/ClickHouse/pull/55172) ([Amos Bird](https://github.com/amosbird)).
* Fix recalculation of skip indexes in ALTER UPDATE queries when table has adaptive granularity [#55202](https://github.com/ClickHouse/ClickHouse/pull/55202) ([Duc Canh Le](https://github.com/canhld94)).
* Fix for background download in fs cache [#55252](https://github.com/ClickHouse/ClickHouse/pull/55252) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Avoid possible memory leaks in compressors in case of missing buffer finalization [#55262](https://github.com/ClickHouse/ClickHouse/pull/55262) ([Azat Khuzhin](https://github.com/azat)).
* Fix functions execution over sparse columns [#55275](https://github.com/ClickHouse/ClickHouse/pull/55275) ([Azat Khuzhin](https://github.com/azat)).
* Fix incorrect merging of Nested for SELECT FINAL FROM SummingMergeTree [#55276](https://github.com/ClickHouse/ClickHouse/pull/55276) ([Azat Khuzhin](https://github.com/azat)).
* Fix bug with inability to drop detached partition in replicated merge tree on top of S3 without zero copy [#55309](https://github.com/ClickHouse/ClickHouse/pull/55309) ([alesapin](https://github.com/alesapin)).
* Fix a crash in MergeSortingPartialResultTransform (due to zero chunks after `remerge`) [#55335](https://github.com/ClickHouse/ClickHouse/pull/55335) ([Azat Khuzhin](https://github.com/azat)).
* Fix data-race in CreatingSetsTransform (on errors) due to throwing shared exception [#55338](https://github.com/ClickHouse/ClickHouse/pull/55338) ([Azat Khuzhin](https://github.com/azat)).
* Fix trash optimization (up to a certain extent) [#55353](https://github.com/ClickHouse/ClickHouse/pull/55353) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix leak in StorageHDFS [#55370](https://github.com/ClickHouse/ClickHouse/pull/55370) ([Azat Khuzhin](https://github.com/azat)).
* Fix parsing of arrays in cast operator [#55417](https://github.com/ClickHouse/ClickHouse/pull/55417) ([Anton Popov](https://github.com/CurtizJ)).
* Fix filtering by virtual columns with OR filter in query [#55418](https://github.com/ClickHouse/ClickHouse/pull/55418) ([Azat Khuzhin](https://github.com/azat)).
* Fix MongoDB connection issues [#55419](https://github.com/ClickHouse/ClickHouse/pull/55419) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix MySQL interface boolean representation [#55427](https://github.com/ClickHouse/ClickHouse/pull/55427) ([Serge Klochkov](https://github.com/slvrtrn)).
* Fix MySQL text protocol DateTime formatting and LowCardinality(Nullable(T)) types reporting [#55479](https://github.com/ClickHouse/ClickHouse/pull/55479) ([Serge Klochkov](https://github.com/slvrtrn)).
* Make `use_mysql_types_in_show_columns` affect only `SHOW COLUMNS` [#55481](https://github.com/ClickHouse/ClickHouse/pull/55481) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix stack symbolizer parsing `DW_FORM_ref_addr` incorrectly and sometimes crashing [#55483](https://github.com/ClickHouse/ClickHouse/pull/55483) ([Michael Kolupaev](https://github.com/al13n321)).
* Destroy fiber in case of exception in cancelBefore in AsyncTaskExecutor [#55516](https://github.com/ClickHouse/ClickHouse/pull/55516) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix Query Parameters not working with custom HTTP handlers [#55521](https://github.com/ClickHouse/ClickHouse/pull/55521) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Fix checking of non handled data for Values format [#55527](https://github.com/ClickHouse/ClickHouse/pull/55527) ([Azat Khuzhin](https://github.com/azat)).
* Fix 'Invalid cursor state' in odbc interacting with MS SQL Server [#55558](https://github.com/ClickHouse/ClickHouse/pull/55558) ([vdimir](https://github.com/vdimir)).
* Fix max execution time and 'break' overflow mode [#55577](https://github.com/ClickHouse/ClickHouse/pull/55577) ([Alexander Gololobov](https://github.com/davenger)).
* Fix crash in QueryNormalizer with cyclic aliases [#55602](https://github.com/ClickHouse/ClickHouse/pull/55602) ([vdimir](https://github.com/vdimir)).
* Disable wrong optimization and add a test [#55609](https://github.com/ClickHouse/ClickHouse/pull/55609) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Merging [#52352](https://github.com/ClickHouse/ClickHouse/issues/52352) [#55621](https://github.com/ClickHouse/ClickHouse/pull/55621) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add a test to avoid incorrect decimal sorting [#55662](https://github.com/ClickHouse/ClickHouse/pull/55662) ([Amos Bird](https://github.com/amosbird)).
* Fix progress bar for s3 and azure Cluster functions with url without globs [#55666](https://github.com/ClickHouse/ClickHouse/pull/55666) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix filtering by virtual columns with OR filter in query (resubmit) [#55678](https://github.com/ClickHouse/ClickHouse/pull/55678) ([Azat Khuzhin](https://github.com/azat)).
* Fixes and improvements for Iceberg storage [#55695](https://github.com/ClickHouse/ClickHouse/pull/55695) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix data race in CreatingSetsTransform (v2) [#55786](https://github.com/ClickHouse/ClickHouse/pull/55786) ([Azat Khuzhin](https://github.com/azat)).
* Throw exception when parsing illegal string as float if precise_float_parsing is true [#55861](https://github.com/ClickHouse/ClickHouse/pull/55861) ([李扬](https://github.com/taiyang-li)).
* Disable predicate pushdown if the CTE contains stateful functions [#55871](https://github.com/ClickHouse/ClickHouse/pull/55871) ([Raúl Marín](https://github.com/Algunenano)).
* Fix normalize ASTSelectWithUnionQuery, as it was stripping `FORMAT` from the query [#55887](https://github.com/ClickHouse/ClickHouse/pull/55887) ([flynn](https://github.com/ucasfl)).
* Try to fix possible segfault in Native ORC input format [#55891](https://github.com/ClickHouse/ClickHouse/pull/55891) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix window functions in case of sparse columns. [#55895](https://github.com/ClickHouse/ClickHouse/pull/55895) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* fix: StorageNull supports subcolumns [#55912](https://github.com/ClickHouse/ClickHouse/pull/55912) ([FFish](https://github.com/wxybear)).
* Do not write retriable errors for Replicated mutate/merge into error log [#55944](https://github.com/ClickHouse/ClickHouse/pull/55944) ([Azat Khuzhin](https://github.com/azat)).
* Fix `SHOW DATABASES LIMIT <N>` [#55962](https://github.com/ClickHouse/ClickHouse/pull/55962) ([Raúl Marín](https://github.com/Algunenano)).
* Fix autogenerated Protobuf schema with fields with underscore [#55974](https://github.com/ClickHouse/ClickHouse/pull/55974) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix dateTime64ToSnowflake64() with non-default scale [#55983](https://github.com/ClickHouse/ClickHouse/pull/55983) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix output/input of Arrow dictionary column [#55989](https://github.com/ClickHouse/ClickHouse/pull/55989) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix fetching schema from schema registry in AvroConfluent [#55991](https://github.com/ClickHouse/ClickHouse/pull/55991) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix 'Block structure mismatch' on concurrent ALTER and INSERTs in Buffer table [#55995](https://github.com/ClickHouse/ClickHouse/pull/55995) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix incorrect free space accounting for least_used JBOD policy [#56030](https://github.com/ClickHouse/ClickHouse/pull/56030) ([Azat Khuzhin](https://github.com/azat)).
* Fix missing scalar issue when evaluating subqueries inside table functions [#56057](https://github.com/ClickHouse/ClickHouse/pull/56057) ([Amos Bird](https://github.com/amosbird)).
* Fix wrong query result when http_write_exception_in_output_format=1 [#56135](https://github.com/ClickHouse/ClickHouse/pull/56135) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix schema cache for fallback JSON->JSONEachRow with changed settings [#56172](https://github.com/ClickHouse/ClickHouse/pull/56172) ([Kruglov Pavel](https://github.com/Avogar)).
* Add error handler to odbc-bridge [#56185](https://github.com/ClickHouse/ClickHouse/pull/56185) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
### ClickHouse release 23.9, 2023-09-28
#### Backward Incompatible Change

View File

@ -321,9 +321,9 @@ set (DEBUG_INFO_FLAGS "-g -gdwarf-4")
option(DISABLE_OMIT_FRAME_POINTER "Disable omit frame pointer compiler optimization" OFF)
if (DISABLE_OMIT_FRAME_POINTER)
set (CMAKE_CXX_FLAGS_ADD "${CMAKE_CXX_FLAGS_ADD} -fno-omit-frame-pointer")
set (CMAKE_C_FLAGS_ADD "${CMAKE_C_FLAGS_ADD} -fno-omit-frame-pointer")
set (CMAKE_ASM_FLAGS_ADD "${CMAKE_ASM_FLAGS_ADD} -fno-omit-frame-pointer")
set (CMAKE_CXX_FLAGS_ADD "${CMAKE_CXX_FLAGS_ADD} -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer")
set (CMAKE_C_FLAGS_ADD "${CMAKE_C_FLAGS_ADD} -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer")
set (CMAKE_ASM_FLAGS_ADD "${CMAKE_ASM_FLAGS_ADD} -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer")
endif()
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${COMPILER_FLAGS} ${CMAKE_CXX_FLAGS_ADD}")

View File

@ -22,8 +22,6 @@ curl https://clickhouse.com/ | sh
## Upcoming Events
* [**v23.9 Community Call**]([https://clickhouse.com/company/events/v23-8-community-release-call](https://clickhouse.com/company/events/v23-9-community-release-call)?utm_source=github&utm_medium=social&utm_campaign=release-webinar-2023-08) - Sep 28 - 23.9 is rapidly approaching. Original creator, co-founder, and CTO of ClickHouse Alexey Milovidov will walk us through the highlights of the release.
* [**ClickHouse Meetup in Amsterdam**](https://www.meetup.com/clickhouse-netherlands-user-group/events/296334590/) - Oct 31
* [**ClickHouse Meetup in Beijing**](https://www.meetup.com/clickhouse-beijing-user-group/events/296334856/) - Nov 4
* [**ClickHouse Meetup in San Francisco**](https://www.meetup.com/clickhouse-silicon-valley-meetup-group/events/296334923/) - Nov 8
* [**ClickHouse Meetup in Singapore**](https://www.meetup.com/clickhouse-singapore-meetup-group/events/296334976/) - Nov 15
@ -45,4 +43,4 @@ We are a globally diverse and distributed team, united behind a common goal of c
Check out our **current openings** here: https://clickhouse.com/company/careers
Cant find what you are looking for, but want to let us know you are interested in joining ClickHouse? Email careers@clickhouse.com!
Can't find what you are looking for, but want to let us know you are interested in joining ClickHouse? Email careers@clickhouse.com!

View File

@ -13,9 +13,10 @@ The following versions of ClickHouse server are currently being supported with s
| Version | Supported |
|:-|:-|
| 23.10 | ✔️ |
| 23.9 | ✔️ |
| 23.8 | ✔️ |
| 23.7 | ✔️ |
| 23.7 | |
| 23.6 | ❌ |
| 23.5 | ❌ |
| 23.4 | ❌ |

View File

@ -45,6 +45,10 @@ else ()
target_compile_definitions(common PUBLIC WITH_COVERAGE=0)
endif ()
if (TARGET ch_contrib::crc32_s390x)
target_link_libraries(common PUBLIC ch_contrib::crc32_s390x)
endif()
target_include_directories(common PUBLIC .. "${CMAKE_CURRENT_BINARY_DIR}/..")
target_link_libraries (common

View File

@ -35,6 +35,10 @@
#pragma clang diagnostic ignored "-Wreserved-identifier"
#endif
#if defined(__s390x__)
#include <base/crc32c_s390x.h>
#define CRC_INT s390x_crc32c
#endif
/**
* The std::string_view-like container to avoid creating strings to find substrings in the hash table.
@ -264,8 +268,8 @@ inline size_t hashLessThan8(const char * data, size_t size)
if (size >= 4)
{
UInt64 a = unalignedLoad<uint32_t>(data);
return hashLen16(size + (a << 3), unalignedLoad<uint32_t>(data + size - 4));
UInt64 a = unalignedLoadLittleEndian<uint32_t>(data);
return hashLen16(size + (a << 3), unalignedLoadLittleEndian<uint32_t>(data + size - 4));
}
if (size > 0)
@ -285,8 +289,8 @@ inline size_t hashLessThan16(const char * data, size_t size)
{
if (size > 8)
{
UInt64 a = unalignedLoad<UInt64>(data);
UInt64 b = unalignedLoad<UInt64>(data + size - 8);
UInt64 a = unalignedLoadLittleEndian<UInt64>(data);
UInt64 b = unalignedLoadLittleEndian<UInt64>(data + size - 8);
return hashLen16(a, rotateByAtLeast1(b + size, static_cast<UInt8>(size))) ^ b;
}
@ -315,13 +319,13 @@ struct CRC32Hash
do
{
UInt64 word = unalignedLoad<UInt64>(pos);
UInt64 word = unalignedLoadLittleEndian<UInt64>(pos);
res = static_cast<unsigned>(CRC_INT(res, word));
pos += 8;
} while (pos + 8 < end);
UInt64 word = unalignedLoad<UInt64>(end - 8); /// I'm not sure if this is normal.
UInt64 word = unalignedLoadLittleEndian<UInt64>(end - 8); /// I'm not sure if this is normal.
res = static_cast<unsigned>(CRC_INT(res, word));
return res;

26
base/base/crc32c_s390x.h Normal file
View File

@ -0,0 +1,26 @@
#pragma once
#include <crc32-s390x.h>
inline uint32_t s390x_crc32c_u8(uint32_t crc, uint8_t v)
{
return crc32c_le_vx(crc, reinterpret_cast<unsigned char *>(&v), sizeof(v));
}
inline uint32_t s390x_crc32c_u16(uint32_t crc, uint16_t v)
{
v = __builtin_bswap16(v);
return crc32c_le_vx(crc, reinterpret_cast<unsigned char *>(&v), sizeof(v));
}
inline uint32_t s390x_crc32c_u32(uint32_t crc, uint32_t v)
{
v = __builtin_bswap32(v);
return crc32c_le_vx(crc, reinterpret_cast<unsigned char *>(&v), sizeof(v));
}
inline uint64_t s390x_crc32c(uint64_t crc, uint64_t v)
{
v = __builtin_bswap64(v);
return crc32c_le_vx(static_cast<uint32_t>(crc), reinterpret_cast<unsigned char *>(&v), sizeof(uint64_t));
}

View File

@ -68,7 +68,7 @@ namespace Net
struct ProxyConfig
/// HTTP proxy server configuration.
{
ProxyConfig() : port(HTTP_PORT), protocol("http"), tunnel(true) { }
ProxyConfig() : port(HTTP_PORT), protocol("http"), tunnel(true), originalRequestProtocol("http") { }
std::string host;
/// Proxy server host name or IP address.
@ -87,6 +87,9 @@ namespace Net
/// A regular expression defining hosts for which the proxy should be bypassed,
/// e.g. "localhost|127\.0\.0\.1|192\.168\.0\.\d+". Can also be an empty
/// string to disable proxy bypassing.
std::string originalRequestProtocol;
/// Original request protocol (http or https).
/// Required in the case of: HTTPS request over HTTP proxy with tunneling (CONNECT) off.
};
HTTPClientSession();

View File

@ -418,7 +418,7 @@ void HTTPClientSession::reconnect()
std::string HTTPClientSession::proxyRequestPrefix() const
{
std::string result("http://");
std::string result(_proxyConfig.originalRequestProtocol + "://");
result.append(_host);
/// Do not append default by default, since this may break some servers.
/// One example of such server is GCS (Google Cloud Storage).

View File

@ -2,11 +2,11 @@
# NOTE: has nothing common with DBMS_TCP_PROTOCOL_VERSION,
# only DBMS_TCP_PROTOCOL_VERSION should be incremented on protocol changes.
SET(VERSION_REVISION 54479)
SET(VERSION_REVISION 54480)
SET(VERSION_MAJOR 23)
SET(VERSION_MINOR 10)
SET(VERSION_MINOR 11)
SET(VERSION_PATCH 1)
SET(VERSION_GITHASH 8f9a227de1f530cdbda52c145d41a6b0f1d29961)
SET(VERSION_DESCRIBE v23.10.1.1-testing)
SET(VERSION_STRING 23.10.1.1)
SET(VERSION_GITHASH 13adae0e42fd48de600486fc5d4b64d39f80c43e)
SET(VERSION_DESCRIBE v23.11.1.1-testing)
SET(VERSION_STRING 23.11.1.1)
# end of autochange

View File

@ -3,4 +3,4 @@ It allows to integrate JEMalloc into CMake project.
- Remove JEMALLOC_HAVE_ATTR_FORMAT_GNU_PRINTF because it's non standard.
- Added JEMALLOC_CONFIG_MALLOC_CONF substitution
- Add musl support (USE_MUSL)
- Also note, that darwin build requires JEMALLOC_PREFIX, while others don not
- Also note, that darwin build requires JEMALLOC_PREFIX, while others do not

2
contrib/libhdfs3 vendored

@ -1 +1 @@
Subproject commit 377220ef351ae24994a5fcd2b5fa3930d00c4db0
Subproject commit bdcb91354b1c05b21e73043a112a6f1e3b013497

View File

@ -1,4 +1,4 @@
if(NOT OS_FREEBSD AND NOT APPLE AND NOT ARCH_PPC64LE AND NOT ARCH_S390X)
if(NOT OS_FREEBSD AND NOT APPLE AND NOT ARCH_PPC64LE)
option(ENABLE_HDFS "Enable HDFS" ${ENABLE_LIBRARIES})
elseif(ENABLE_HDFS)
message (${RECONFIGURE_MESSAGE_LEVEL} "Cannot use HDFS3 with current configuration")

View File

@ -1,3 +1,10 @@
option (ENABLE_SSH "Enable support for SSH keys and protocol" ON)
if (NOT ENABLE_SSH)
message(STATUS "Not using SSH")
return()
endif()
set(LIB_SOURCE_DIR "${ClickHouse_SOURCE_DIR}/contrib/libssh")
set(LIB_BINARY_DIR "${ClickHouse_BINARY_DIR}/contrib/libssh")
# Specify search path for CMake modules to be loaded by include()

2
contrib/orc vendored

@ -1 +1 @@
Subproject commit f31c271110a2f0dac908a152f11708193ae209ee
Subproject commit e24f2c2a3ca0769c96704ab20ad6f512a83ea2ad

View File

@ -34,7 +34,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="23.9.2.56"
ARG VERSION="23.10.1.1976"
ARG PACKAGES="clickhouse-keeper"
# user/group precreated explicitly with fixed uid/gid on purpose.

View File

@ -32,7 +32,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="23.9.2.56"
ARG VERSION="23.10.1.1976"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
# user/group precreated explicitly with fixed uid/gid on purpose.

View File

@ -30,7 +30,7 @@ RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb ${REPO_CHANNEL} main"
ARG VERSION="23.9.2.56"
ARG VERSION="23.10.1.1976"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
# set non-empty deb_location_url url to create a docker image

View File

@ -19,7 +19,7 @@ For more information and documentation see https://clickhouse.com/.
### Compatibility
- The amd64 image requires support for [SSE3 instructions](https://en.wikipedia.org/wiki/SSE3). Virtually all x86 CPUs after 2005 support SSE3.
- The arm64 image requires support for the [ARMv8.2-A architecture](https://en.wikipedia.org/wiki/AArch64#ARMv8.2-A). Most ARM CPUs after 2017 support ARMv8.2-A. A notable exception is Raspberry Pi 4 from 2019 whose CPU only supports ARMv8.0-A.
- The arm64 image requires support for the [ARMv8.2-A architecture](https://en.wikipedia.org/wiki/AArch64#ARMv8.2-A) and additionally the Load-Acquire RCpc register. The register is optional in version ARMv8.2-A and mandatory in [ARMv8.3-A](https://en.wikipedia.org/wiki/AArch64#ARMv8.3-A). Supported in Graviton >=2, Azure and GCP instances. Examples for unsupported devices are Raspberry Pi 4 (ARMv8.0-A) and Jetson AGX Xavier/Orin (ARMv8.2-A).
## How to use this image

View File

@ -72,6 +72,13 @@ function configure()
sudo chown clickhouse /etc/clickhouse-server/config.d/keeper_port.xml
sudo chgrp clickhouse /etc/clickhouse-server/config.d/keeper_port.xml
#Randomize merge tree setting allow_experimental_block_number_column
value=$(($RANDOM % 2))
sudo cat /etc/clickhouse-server/config.d/merge_tree_settings.xml \
| sed "s|<allow_experimental_block_number_column>[01]</allow_experimental_block_number_column>|<allow_experimental_block_number_column>$value</allow_experimental_block_number_column>|" \
> /etc/clickhouse-server/config.d/merge_tree_settings.xml.tmp
sudo mv /etc/clickhouse-server/config.d/merge_tree_settings.xml.tmp /etc/clickhouse-server/config.d/merge_tree_settings.xml
# for clickhouse-server (via service)
echo "ASAN_OPTIONS='malloc_context_size=10 verbosity=1 allocator_release_to_os_interval_ms=10000'" >> /etc/environment
# for clickhouse-client
@ -177,6 +184,9 @@ function stop()
echo "thread apply all backtrace (on stop)" >> /test_output/gdb.log
timeout 30m gdb -batch -ex 'thread apply all backtrace' -p "$pid" | ts '%Y-%m-%d %H:%M:%S' >> /test_output/gdb.log
clickhouse stop --force
else
echo -e "Warning: server did not stop yet$OK" >> /test_output/test_results.tsv
clickhouse stop --force
fi
}

View File

@ -2,5 +2,4 @@
set -x
service zookeeper start && sleep 7 && /usr/share/zookeeper/bin/zkCli.sh -server localhost:2181 -create create /clickhouse_test '';
timeout 40m gdb -q -ex 'set print inferior-events off' -ex 'set confirm off' -ex 'set print thread-events off' -ex run -ex bt -ex quit --args ./unit_tests_dbms --gtest_output='json:test_output/test_result.json' | tee test_output/test_result.txt

View File

@ -189,6 +189,7 @@ rg -Fav -e "Code: 236. DB::Exception: Cancelled merging parts" \
-e "ZooKeeperClient" \
-e "KEEPER_EXCEPTION" \
-e "DirectoryMonitor" \
-e "DistributedInsertQueue" \
-e "TABLE_IS_READ_ONLY" \
-e "Code: 1000, e.code() = 111, Connection refused" \
-e "UNFINISHED" \

View File

@ -75,7 +75,7 @@ sidebar_label: 2022
* Fix usage of nested columns with non-array columns with the same prefix [2] [#28762](https://github.com/ClickHouse/ClickHouse/pull/28762) ([Anton Popov](https://github.com/CurtizJ)).
* Lower compiled_expression_cache_size to 128MB [#28816](https://github.com/ClickHouse/ClickHouse/pull/28816) ([Maksim Kita](https://github.com/kitaisreal)).
* Column default dictGet identifier fix [#28863](https://github.com/ClickHouse/ClickHouse/pull/28863) ([Maksim Kita](https://github.com/kitaisreal)).
* Don not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Do not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Merging [#27963](https://github.com/ClickHouse/ClickHouse/issues/27963) [#29063](https://github.com/ClickHouse/ClickHouse/pull/29063) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix terminate on uncaught exception [#29216](https://github.com/ClickHouse/ClickHouse/pull/29216) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix arcadia (pre)stable 21.10 build [#29250](https://github.com/ClickHouse/ClickHouse/pull/29250) ([DimasKovas](https://github.com/DimasKovas)).

View File

@ -29,7 +29,7 @@ sidebar_label: 2022
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Don not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Do not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Merging [#27963](https://github.com/ClickHouse/ClickHouse/issues/27963) [#29063](https://github.com/ClickHouse/ClickHouse/pull/29063) ([Maksim Kita](https://github.com/kitaisreal)).
* May be fix s3 tests [#29762](https://github.com/ClickHouse/ClickHouse/pull/29762) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix ca-bundle.crt in kerberized_hadoop/Dockerfile [#30358](https://github.com/ClickHouse/ClickHouse/pull/30358) ([Vladimir C](https://github.com/vdimir)).

View File

@ -14,6 +14,6 @@ sidebar_label: 2022
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Don not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Do not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Merging [#27963](https://github.com/ClickHouse/ClickHouse/issues/27963) [#29063](https://github.com/ClickHouse/ClickHouse/pull/29063) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix terminate on uncaught exception [#29216](https://github.com/ClickHouse/ClickHouse/pull/29216) ([Alexander Tokmakov](https://github.com/tavplubix)).

View File

@ -15,6 +15,6 @@ sidebar_label: 2022
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Don not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Do not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Merging [#27963](https://github.com/ClickHouse/ClickHouse/issues/27963) [#29063](https://github.com/ClickHouse/ClickHouse/pull/29063) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix terminate on uncaught exception [#29216](https://github.com/ClickHouse/ClickHouse/pull/29216) ([Alexander Tokmakov](https://github.com/tavplubix)).

View File

@ -13,6 +13,6 @@ sidebar_label: 2022
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Don not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Do not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Merging [#27963](https://github.com/ClickHouse/ClickHouse/issues/27963) [#29063](https://github.com/ClickHouse/ClickHouse/pull/29063) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix terminate on uncaught exception [#29216](https://github.com/ClickHouse/ClickHouse/pull/29216) ([Alexander Tokmakov](https://github.com/tavplubix)).

View File

@ -0,0 +1,406 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.10.1.1976-stable (13adae0e42f) FIXME as compared to v23.9.1.1854-stable (8f9a227de1f)
#### Backward Incompatible Change
* Rewrited storage S3Queue completely: changed the way we keep information in zookeeper which allows to make less zookeeper requests, added caching of zookeeper state in cases when we know the state will not change, improved the polling from s3 process to make it less aggressive, changed the way ttl and max set for trached files is maintained, now it is a background process. Added `system.s3queue` and `system.s3queue_log` tables. Closes [#54998](https://github.com/ClickHouse/ClickHouse/issues/54998). [#54422](https://github.com/ClickHouse/ClickHouse/pull/54422) ([Kseniia Sumarokova](https://github.com/kssenii)).
* There is no longer an option to automatically remove broken data parts. This closes [#55174](https://github.com/ClickHouse/ClickHouse/issues/55174). [#55184](https://github.com/ClickHouse/ClickHouse/pull/55184) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* The obsolete in-memory data parts can no longer be read from the write-ahead log. If you have configured in-memory parts before, they have to be removed before the upgrade. [#55186](https://github.com/ClickHouse/ClickHouse/pull/55186) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove the integration with Meilisearch. Reason: it was compatible only with the old version 0.18. The recent version of Meilisearch changed the protocol and does not work anymore. Note: we would appreciate it if you help to return it back. [#55189](https://github.com/ClickHouse/ClickHouse/pull/55189) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Rename directory monitor concept into background INSERT. All settings `*directory_monitor*` had been renamed to `distributed_background_insert*`. **Backward compatibility should be preserved** (since old settings had been added as an alias). [#55978](https://github.com/ClickHouse/ClickHouse/pull/55978) ([Azat Khuzhin](https://github.com/azat)).
* Do not mix-up send_timeout and receive_timeout. [#56035](https://github.com/ClickHouse/ClickHouse/pull/56035) ([Azat Khuzhin](https://github.com/azat)).
* Comparison of time intervals with different units will throw an exception. This closes [#55942](https://github.com/ClickHouse/ClickHouse/issues/55942). You might have occasionally rely on the previous behavior when the underlying numeric values were compared regardless of the units. [#56090](https://github.com/ClickHouse/ClickHouse/pull/56090) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### New Feature
* Add function "arrayFold(x1, ..., xn, accum -> expression, array1, ..., arrayn, init_accum)" which applies a lambda function to multiple arrays of the same cardinality and collects the result in an accumulator. [#49794](https://github.com/ClickHouse/ClickHouse/pull/49794) ([Lirikl](https://github.com/Lirikl)).
* Added aggregation function lttb which uses the [Largest-Triangle-Three-Buckets](https://skemman.is/bitstream/1946/15343/3/SS_MSthesis.pdf) algorithm for downsampling data for visualization. [#53145](https://github.com/ClickHouse/ClickHouse/pull/53145) ([Sinan](https://github.com/sinsinan)).
* Query`CHECK TABLE` has better performance and usability (sends progress updates, cancellable). Support checking particular part with `CHECK TABLE ... PART 'part_name'`. [#53404](https://github.com/ClickHouse/ClickHouse/pull/53404) ([vdimir](https://github.com/vdimir)).
* Added function `jsonMergePatch`. When working with JSON data as strings, it provides a way to merge these strings (of JSON objects) together to form a single string containing a single JSON object. [#54364](https://github.com/ClickHouse/ClickHouse/pull/54364) ([Memo](https://github.com/Joeywzr)).
* Added a new SQL function, "arrayRandomSample(arr, k)" which returns a sample of k elements from the input array. Similar functionality could previously be achieved only with less convenient syntax, e.g. "SELECT arrayReduce('groupArraySample(3)', range(10))". [#54391](https://github.com/ClickHouse/ClickHouse/pull/54391) ([itayisraelov](https://github.com/itayisraelov)).
* Added new function `getHttpHeader` to get HTTP request header value used for a request to ClickHouse server. Return empty string if the request is not done over HTTP protocol or there is no such header. [#54813](https://github.com/ClickHouse/ClickHouse/pull/54813) ([凌涛](https://github.com/lingtaolf)).
* Introduce -ArgMin/-ArgMax aggregate combinators which allow to aggregate by min/max values only. One use case can be found in [#54818](https://github.com/ClickHouse/ClickHouse/issues/54818). This PR also reorganize combinators into dedicated folder. [#54947](https://github.com/ClickHouse/ClickHouse/pull/54947) ([Amos Bird](https://github.com/amosbird)).
* Allow to drop cache for Protobuf format with `SYSTEM DROP SCHEMA FORMAT CACHE [FOR Protobuf]`. [#55064](https://github.com/ClickHouse/ClickHouse/pull/55064) ([Aleksandr Musorin](https://github.com/AVMusorin)).
* Add external HTTP Basic authenticator. [#55199](https://github.com/ClickHouse/ClickHouse/pull/55199) ([Aleksei Filatov](https://github.com/aalexfvk)).
* Added function `byteSwap` which reverses the bytes of unsigned integers. This is particularly useful for reversing values of types which are represented as unsigned integers internally such as IPv4. [#55211](https://github.com/ClickHouse/ClickHouse/pull/55211) ([Priyansh Agrawal](https://github.com/Priyansh121096)).
* Added function `formatQuery()` which returns a formatted version (possibly spanning multiple lines) of a SQL query string. Also added function `formatQuerySingleLine()` which does the same but the returned string will not contain linebreaks. [#55239](https://github.com/ClickHouse/ClickHouse/pull/55239) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Added DWARF input format that reads debug symbols from an ELF executable/library/object file. [#55450](https://github.com/ClickHouse/ClickHouse/pull/55450) ([Michael Kolupaev](https://github.com/al13n321)).
* Allow to save unparsed records and errors in RabbitMQ, NATS and FileLog engines. Add virtual columns `_error` and `_raw_message`(for NATS and RabbitMQ), `_raw_record` (for FileLog) that are filled when ClickHouse fails to parse new record. The behaviour is controlled under storage settings `nats_handle_error_mode` for NATS, `rabbitmq_handle_error_mode` for RabbitMQ, `handle_error_mode` for FileLog similar to `kafka_handle_error_mode`. If it's set to `default`, en exception will be thrown when ClickHouse fails to parse a record, if it's set to `stream`, erorr and raw record will be saved into virtual columns. Closes [#36035](https://github.com/ClickHouse/ClickHouse/issues/36035). [#55477](https://github.com/ClickHouse/ClickHouse/pull/55477) ([Kruglov Pavel](https://github.com/Avogar)).
* Keeper client improvement: add get_all_children_number command that returns number of all children nodes under a specific path. [#55485](https://github.com/ClickHouse/ClickHouse/pull/55485) ([guoxiaolong](https://github.com/guoxiaolongzte)).
* If a table has a space-filling curve in its key, e.g., `ORDER BY mortonEncode(x, y)`, the conditions on its arguments, e.g., `x >= 10 AND x <= 20 AND y >= 20 AND y <= 30` can be used for indexing. A setting `analyze_index_with_space_filling_curves` is added to enable or disable this analysis. This closes [#41195](https://github.com/ClickHouse/ClickHouse/issues/41195). Continuation of [#4538](https://github.com/ClickHouse/ClickHouse/issues/4538). Continuation of [#6286](https://github.com/ClickHouse/ClickHouse/issues/6286). Continuation of [#28130](https://github.com/ClickHouse/ClickHouse/issues/28130). Continuation of [#41753](https://github.com/ClickHouse/ClickHouse/issues/41753). [#55642](https://github.com/ClickHouse/ClickHouse/pull/55642) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add setting `optimize_trivial_approximate_count_query` to use `count()` approximation for storage EmbeddedRocksDB. Enable trivial count for StorageJoin. [#55806](https://github.com/ClickHouse/ClickHouse/pull/55806) ([Duc Canh Le](https://github.com/canhld94)).
* Keeper client improvement: add get_direct_children_number command that returns number of direct children nodes under a path. [#55898](https://github.com/ClickHouse/ClickHouse/pull/55898) ([xuzifu666](https://github.com/xuzifu666)).
* Add statement `SHOW SETTING setting_name` which is a simpler version of existing statement `SHOW SETTINGS`. [#55979](https://github.com/ClickHouse/ClickHouse/pull/55979) ([Maksim Kita](https://github.com/kitaisreal)).
* This pr gives possibility to pass data in Npy format to Clickhouse. ``` SELECT * FROM file('example_array.npy', Npy). [#55982](https://github.com/ClickHouse/ClickHouse/pull/55982) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* This PR impleents a new setting called `force_optimize_projection_name`, it takes a name of projection as an argument. If it's value set to a non-empty string, ClickHouse checks that this projection is used in the query at least once. Closes [#55331](https://github.com/ClickHouse/ClickHouse/issues/55331). [#56134](https://github.com/ClickHouse/ClickHouse/pull/56134) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
#### Performance Improvement
* Add option `query_plan_preserve_num_streams_after_window_functions` to preserve the number of streams after evaluating window functions to allow parallel stream processing. [#50771](https://github.com/ClickHouse/ClickHouse/pull/50771) ([frinkr](https://github.com/frinkr)).
* Release more num_streams if data is small. [#53867](https://github.com/ClickHouse/ClickHouse/pull/53867) ([Jiebin Sun](https://github.com/jiebinn)).
* RoaringBitmaps being optimized before serialization. [#55044](https://github.com/ClickHouse/ClickHouse/pull/55044) ([UnamedRus](https://github.com/UnamedRus)).
* Posting lists in inverted indexes are now optimized to use the smallest possible representation for internal bitmaps. Depending on the repetitiveness of the data, this may significantly reduce the space consumption of inverted indexes. [#55069](https://github.com/ClickHouse/ClickHouse/pull/55069) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Fix contention on Context lock, this significantly improves performance for a lot of short-running concurrent queries. [#55121](https://github.com/ClickHouse/ClickHouse/pull/55121) ([Maksim Kita](https://github.com/kitaisreal)).
* Improved the performance of inverted index creation by 30%. This was achieved by replacing `std::unordered_map` with `absl::flat_hash_map`. [#55210](https://github.com/ClickHouse/ClickHouse/pull/55210) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Support orc filter push down (rowgroup level). [#55330](https://github.com/ClickHouse/ClickHouse/pull/55330) ([李扬](https://github.com/taiyang-li)).
* Improve performance of external aggregation with a lot of temporary files. [#55489](https://github.com/ClickHouse/ClickHouse/pull/55489) ([Maksim Kita](https://github.com/kitaisreal)).
* Set a reasonable size for the marks cache for secondary indices by default to avoid loading the marks over and over again. [#55654](https://github.com/ClickHouse/ClickHouse/pull/55654) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Avoid unnecessary reconstruction of index granules when reading skip indexes. This addresses [#55653](https://github.com/ClickHouse/ClickHouse/issues/55653)#issuecomment-1763766009 . [#55683](https://github.com/ClickHouse/ClickHouse/pull/55683) ([Amos Bird](https://github.com/amosbird)).
* Cache cast function in set during execution to improve the performance of function `IN` when set element type doesn't exactly match column type. [#55712](https://github.com/ClickHouse/ClickHouse/pull/55712) ([Duc Canh Le](https://github.com/canhld94)).
* Performance improvement for `ColumnVector::insertMany` and `ColumnVector::insertManyFrom`. [#55714](https://github.com/ClickHouse/ClickHouse/pull/55714) ([frinkr](https://github.com/frinkr)).
* ... Getting values from a map is widely used. In practice, the key structrues are usally the same in the same map column, we could try to predict the next row's key position and reduce the comparisons. [#55929](https://github.com/ClickHouse/ClickHouse/pull/55929) ([lgbo](https://github.com/lgbo-ustc)).
* ... Fix an issue that struct field prune doesn't work in some cases. For example ```sql INSERT INTO FUNCTION file('test_parquet_struct', Parquet, 'x Tuple(a UInt32, b UInt32, c String)') SELECT tuple(number, rand(), concat('testxxxxxxx' toString(number))) FROM numbers(10);. [#56117](https://github.com/ClickHouse/ClickHouse/pull/56117) ([lgbo](https://github.com/lgbo-ustc)).
#### Improvement
* This is the second part of Kusto Query Language dialect support. [Phase 1 implementation ](https://github.com/ClickHouse/ClickHouse/pull/37961) has been merged. [#42510](https://github.com/ClickHouse/ClickHouse/pull/42510) ([larryluogit](https://github.com/larryluogit)).
* Op processors IDs are raw ptrs casted to UInt64. Print it in a prettier manner:. [#48852](https://github.com/ClickHouse/ClickHouse/pull/48852) ([Vlad Seliverstov](https://github.com/behebot)).
* Creating a direct dictionary with a lifetime field set will be rejected at create time. Fixes: [#27861](https://github.com/ClickHouse/ClickHouse/issues/27861). [#49043](https://github.com/ClickHouse/ClickHouse/pull/49043) ([Rory Crispin](https://github.com/RoryCrispin)).
* Allow parameters in queries with partitions like `ALTER TABLE t DROP PARTITION`. Closes [#49449](https://github.com/ClickHouse/ClickHouse/issues/49449). [#49516](https://github.com/ClickHouse/ClickHouse/pull/49516) ([Nikolay Degterinsky](https://github.com/evillique)).
* 1.Refactor the code about zookeeper_connection 2.Add a new column xid for zookeeper_connection. [#50702](https://github.com/ClickHouse/ClickHouse/pull/50702) ([helifu](https://github.com/helifu)).
* Add the ability to tune the number of parallel replicas used in a query execution based on the estimation of rows to read. [#51692](https://github.com/ClickHouse/ClickHouse/pull/51692) ([Raúl Marín](https://github.com/Algunenano)).
* Distributed queries executed in `async_socket_for_remote` mode (default) now respect `max_threads` limit. Previously, some queries could create excessive threads (up to `max_distributed_connections`), causing server performance issues. [#53504](https://github.com/ClickHouse/ClickHouse/pull/53504) ([filimonov](https://github.com/filimonov)).
* Display the correct server settings after reload. [#53774](https://github.com/ClickHouse/ClickHouse/pull/53774) ([helifu](https://github.com/helifu)).
* Add support for mathematical minus `` character in queries, similar to `-`. [#54100](https://github.com/ClickHouse/ClickHouse/pull/54100) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add replica groups to the Replicated database engine. Closes [#53620](https://github.com/ClickHouse/ClickHouse/issues/53620). [#54421](https://github.com/ClickHouse/ClickHouse/pull/54421) ([Nikolay Degterinsky](https://github.com/evillique)).
* This PR will fix UBsan test error [here](https://github.com/ClickHouse/ClickHouse/pull/54566). [#54568](https://github.com/ClickHouse/ClickHouse/pull/54568) ([JackyWoo](https://github.com/JackyWoo)).
* Support asynchronous inserts with external data via native protocol. Previously it worked only if data is inlined into query. [#54730](https://github.com/ClickHouse/ClickHouse/pull/54730) ([Anton Popov](https://github.com/CurtizJ)).
* It is better to retry retriable s3 errors than totally fail the query. Set bigger value to the s3_retry_attempts by default. [#54770](https://github.com/ClickHouse/ClickHouse/pull/54770) ([Sema Checherinda](https://github.com/CheSema)).
* Optimised external aggregation memory consumption in case many temporary files were generated. [#54798](https://github.com/ClickHouse/ClickHouse/pull/54798) ([Nikita Taranov](https://github.com/nickitat)).
* Add load balancing test_hostname_levenshtein_distance. [#54826](https://github.com/ClickHouse/ClickHouse/pull/54826) ([JackyWoo](https://github.com/JackyWoo)).
* Caching skip-able entries while executing DDL from Zookeeper distributed DDL queue. [#54828](https://github.com/ClickHouse/ClickHouse/pull/54828) ([Duc Canh Le](https://github.com/canhld94)).
* Improve hiding secrets in logs. [#55089](https://github.com/ClickHouse/ClickHouse/pull/55089) ([Vitaly Baranov](https://github.com/vitlibar)).
* Added fields `substreams` and `filenames` to the `system.parts_columns` table. [#55108](https://github.com/ClickHouse/ClickHouse/pull/55108) ([Anton Popov](https://github.com/CurtizJ)).
* For now the projection analysis will be performed only on top of query plan. The setting `query_plan_optimize_projection` became obsolete (it was enabled by default long time ago). [#55112](https://github.com/ClickHouse/ClickHouse/pull/55112) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* When function "untuple()" is now called on a tuple with named elements and itself has an alias (e.g. "select untuple(tuple(1)::Tuple(element_alias Int)) AS untuple_alias"), then the result column name is now generated from the untuple alias and the tuple element alias (in the example: "untuple_alias.element_alias"). [#55123](https://github.com/ClickHouse/ClickHouse/pull/55123) ([garcher22](https://github.com/garcher22)).
* Added setting `describe_include_virtual_columns`, which allows to include virtual columns of table into result of `DESCRIBE` query. Added setting `describe_compact_output`. If it is set to `true`, `DESCRIBE` query returns only names and types of columns without extra information. [#55129](https://github.com/ClickHouse/ClickHouse/pull/55129) ([Anton Popov](https://github.com/CurtizJ)).
* Sometimes `OPTIMIZE` with `optimize_throw_if_noop=1` may fail with an error `unknown reason` while the real cause of it - different projections in different parts. This behavior is fixed. [#55130](https://github.com/ClickHouse/ClickHouse/pull/55130) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Allow to have several MaterializedPostgreSQL tables following the same Postgres table. By default this behaviour is not enabled (for compatibility, because it is backward-incompatible change), but can be turned on with setting `materialized_postgresql_use_unique_replication_consumer_identifier`. Closes [#54918](https://github.com/ClickHouse/ClickHouse/issues/54918). [#55145](https://github.com/ClickHouse/ClickHouse/pull/55145) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Allow to parse negative DateTime64 and DateTime with fractional part from short strings. [#55146](https://github.com/ClickHouse/ClickHouse/pull/55146) ([Andrey Zvonov](https://github.com/zvonand)).
* To improve compatibility with MySQL, 1. "information_schema.tables" now includes the new field "table_rows", and 2. "information_schema.columns" now includes the new field "extra". [#55215](https://github.com/ClickHouse/ClickHouse/pull/55215) ([Robert Schulze](https://github.com/rschu1ze)).
* Clickhouse-client won't show "0 rows in set" if it is zero and if exception was thrown. [#55240](https://github.com/ClickHouse/ClickHouse/pull/55240) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Support rename table without keyword `TABLE` like `RENAME db.t1 to db.t2`. [#55373](https://github.com/ClickHouse/ClickHouse/pull/55373) ([凌涛](https://github.com/lingtaolf)).
* Add internal_replication to system.clusters. [#55377](https://github.com/ClickHouse/ClickHouse/pull/55377) ([Konstantin Morozov](https://github.com/k-morozov)).
* Select remote proxy resolver based on request protocol, add proxy feature docs and remove `DB::ProxyConfiguration::Protocol::ANY`. [#55430](https://github.com/ClickHouse/ClickHouse/pull/55430) ([Arthur Passos](https://github.com/arthurpassos)).
* Avoid retrying keeper operations on INSERT after table shutdown. [#55519](https://github.com/ClickHouse/ClickHouse/pull/55519) ([Azat Khuzhin](https://github.com/azat)).
* Improved overall resilience for ClickHouse in case of many parts within partition (more than 1000). It might reduce the number of `TOO_MANY_PARTS` errors. [#55526](https://github.com/ClickHouse/ClickHouse/pull/55526) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Follow up https://github.com/ClickHouse/ClickHouse/pull/55184 to not to fall into `UNKNOWN_SETTING` error when the user uses the deleted MergeTree settings. [#55557](https://github.com/ClickHouse/ClickHouse/pull/55557) ([Jihyuk Bok](https://github.com/tomahawk28)).
* Updated `dashboard.html()` added a `if` to check if there is one chart, then "maximize" and "drag" buttons are not shown. [#55581](https://github.com/ClickHouse/ClickHouse/pull/55581) ([bhavuk2002](https://github.com/bhavuk2002)).
* Functions `toDayOfWeek()` (MySQL alias: `DAYOFWEEK()`), `toYearWeek()` (`YEARWEEK()`) and `toWeek()` (`WEEK()`) now supports `String` arguments. This makes its behavior consistent with MySQL's behavior. [#55589](https://github.com/ClickHouse/ClickHouse/pull/55589) ([Robert Schulze](https://github.com/rschu1ze)).
* Implement query parameters support for `ALTER TABLE ... ACTION PARTITION [ID] {parameter_name:ParameterType}`. Merges [#49516](https://github.com/ClickHouse/ClickHouse/issues/49516). Closes [#49449](https://github.com/ClickHouse/ClickHouse/issues/49449). [#55604](https://github.com/ClickHouse/ClickHouse/pull/55604) ([alesapin](https://github.com/alesapin)).
* Inverted indexes do not store tokens with too many matches (i.e. row ids in the posting list). This saves space and avoids ineffective index lookups when sequential scans would be equally fast or faster. The previous heuristics (`density` parameter passed to the index definition) that controlled when tokens would not be stored was too confusing for users. A much simpler heuristics based on parameter `max_rows_per_postings_list` (default: 64k) is introduced which directly controls the maximum allowed number of row ids in a postings list. [#55616](https://github.com/ClickHouse/ClickHouse/pull/55616) ([Harry Lee](https://github.com/HarryLeeIBM)).
* `SHOW COLUMNS` now correctly reports type `FixedString` as `BLOB` if setting `use_mysql_types_in_show_columns` is on. Also added two new settings, `mysql_map_string_to_text_in_show_columns` and `mysql_map_fixed_string_to_text_in_show_columns` to switch the output for types `String` and `FixedString` as `TEXT` or `BLOB`. [#55617](https://github.com/ClickHouse/ClickHouse/pull/55617) ([Serge Klochkov](https://github.com/slvrtrn)).
* During ReplicatedMergeTree tables startup clickhouse server checks set of parts for unexpected parts (exists locally, but not in zookeeper). All unexpected parts move to detached directory and instead of them server tries to restore some ancestor (covered) parts. Now server tries to restore closest ancestors instead of random covered parts. [#55645](https://github.com/ClickHouse/ClickHouse/pull/55645) ([alesapin](https://github.com/alesapin)).
* The advanced dashboard now supports draggable charts on touch devices. This closes [#54206](https://github.com/ClickHouse/ClickHouse/issues/54206). [#55649](https://github.com/ClickHouse/ClickHouse/pull/55649) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Introduced setting `date_time_overflow_behavior` with possible values `ignore`, `throw`, `saturate` that controls the overflow behavior when converting from Date, Date32, DateTime64, Integer or Float to Date, Date32, DateTime or DateTime64. [#55696](https://github.com/ClickHouse/ClickHouse/pull/55696) ([Andrey Zvonov](https://github.com/zvonand)).
* Improve write performance to rocksdb. [#55732](https://github.com/ClickHouse/ClickHouse/pull/55732) ([Duc Canh Le](https://github.com/canhld94)).
* Use the default query format if declared when outputting exception with http_write_exception_in_output_format. [#55739](https://github.com/ClickHouse/ClickHouse/pull/55739) ([Raúl Marín](https://github.com/Algunenano)).
* Use upstream repo for apache datasketches. [#55787](https://github.com/ClickHouse/ClickHouse/pull/55787) ([Nikita Taranov](https://github.com/nickitat)).
* Add support for SHOW MERGES query. [#55815](https://github.com/ClickHouse/ClickHouse/pull/55815) ([megao](https://github.com/jetgm)).
* Provide a better message for common MV pitfalls. [#55826](https://github.com/ClickHouse/ClickHouse/pull/55826) ([Raúl Marín](https://github.com/Algunenano)).
* Reduced memory consumption during loading of hierarchical dictionaries. [#55838](https://github.com/ClickHouse/ClickHouse/pull/55838) ([Nikita Taranov](https://github.com/nickitat)).
* All dictionaries support setting `dictionary_use_async_executor`. [#55839](https://github.com/ClickHouse/ClickHouse/pull/55839) ([vdimir](https://github.com/vdimir)).
* If you dropped the current database, you will still be able to run some queries in `clickhouse-local` and switch to another database. This makes the behavior consistent with `clickhouse-client`. This closes [#55834](https://github.com/ClickHouse/ClickHouse/issues/55834). [#55853](https://github.com/ClickHouse/ClickHouse/pull/55853) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Functions `(add|subtract)(Year|Quarter|Month|Week|Day|Hour|Minute|Second|Millisecond|Microsecond|Nanosecond)` now support string-encoded date arguments, e.g. `SELECT addDays('2023-10-22', 1)`. This increases compatibility with MySQL and is needed by Tableau Online. [#55869](https://github.com/ClickHouse/ClickHouse/pull/55869) ([Robert Schulze](https://github.com/rschu1ze)).
* Introduce setting `create_table_empty_primary_key_by_default` for default `ORDER BY ()`. [#55899](https://github.com/ClickHouse/ClickHouse/pull/55899) ([Srikanth Chekuri](https://github.com/srikanthccv)).
* Prevent excesive memory usage when deserializing AggregateFunctionTopKGenericData. [#55947](https://github.com/ClickHouse/ClickHouse/pull/55947) ([Raúl Marín](https://github.com/Algunenano)).
* The setting `apply_deleted_mask` when disabled allows to read rows that where marked as deleted by lightweight DELETE queries. This is useful for debugging. [#55952](https://github.com/ClickHouse/ClickHouse/pull/55952) ([Alexander Gololobov](https://github.com/davenger)).
* Allow skip null values when serailize tuple to json objects, which makes it possible to keep compatiability with spark to_json function, which is also useful for gluten. [#55956](https://github.com/ClickHouse/ClickHouse/pull/55956) ([李扬](https://github.com/taiyang-li)).
* Functions `(add|sub)Date()` now support string-encoded date arguments, e.g. `SELECT addDate('2023-10-22 11:12:13', INTERVAL 5 MINUTE)`. The same support for string-encoded date arguments is added to the plus and minus operators, e.g. `SELECT '2023-10-23' + INTERVAL 1 DAY`. This increases compatibility with MySQL and is needed by Tableau Online. [#55960](https://github.com/ClickHouse/ClickHouse/pull/55960) ([Robert Schulze](https://github.com/rschu1ze)).
* Allow unquoted strings with CR in CSV format. Closes [#39930](https://github.com/ClickHouse/ClickHouse/issues/39930). [#56046](https://github.com/ClickHouse/ClickHouse/pull/56046) ([Kruglov Pavel](https://github.com/Avogar)).
* On a Keeper with lots of watches AsyncMetrics threads can consume 100% of CPU for noticable time in `DB::KeeperStorage::getSessionsWithWatchesCount()`. The fix is to avoid traversing heavy `watches` and `list_watches` sets. [#56054](https://github.com/ClickHouse/ClickHouse/pull/56054) ([Alexander Gololobov](https://github.com/davenger)).
* Allow to run `clickhouse-keeper` using embedded config. [#56086](https://github.com/ClickHouse/ClickHouse/pull/56086) ([Maksim Kita](https://github.com/kitaisreal)).
* Set limit of the maximum configuration value for queued.min.messages to avoid problem with start fetching data with Kafka. [#56121](https://github.com/ClickHouse/ClickHouse/pull/56121) ([Stas Morozov](https://github.com/r3b-fish)).
* Fixed typo in SQL function `minSampleSizeContinous` (renamed `minSampleSizeContinuous`). Old name is preserved for backward compatibility. This closes: [#56139](https://github.com/ClickHouse/ClickHouse/issues/56139). [#56143](https://github.com/ClickHouse/ClickHouse/pull/56143) ([Dorota Szeremeta](https://github.com/orotaday)).
* Print corrupted part path on disk before shutdown server. Before this change if a part is corrupted on disk and server cannot start, it was almost impossible to understand which part is broken. This is fixed. [#56181](https://github.com/ClickHouse/ClickHouse/pull/56181) ([Duc Canh Le](https://github.com/canhld94)).
#### Build/Testing/Packaging Improvement
* If the database is already initialized, it doesn't need to be initialized again upon subsequent launches. This can potentially fix the issue of infinite container restarts when the database fails to load within 1000 attempts (relevant for very large databases and multi-node setups). [#50724](https://github.com/ClickHouse/ClickHouse/pull/50724) ([Alexander Nikolaev](https://github.com/AlexNik)).
* Resource with source code including submodules is built in Darwin special build task. It may be used to build ClickHouse without checkouting submodules. [#51435](https://github.com/ClickHouse/ClickHouse/pull/51435) ([Ilya Yatsishin](https://github.com/qoega)).
* An error will occur when building ClickHouse with the avx series of instructions enabled. CMake command ```shell cmake .. -DCMAKE_BUILD_TYPE=Release -DENABLE_AVX=ON -DENABLE_AVX2=ON -DENABLE_AVX2_FOR_SPEC_OP=ON ``` Failed message ``` [1558/11630] Building CXX object contrib/snappy-cmake/CMakeFiles/_snappy.dir/__/snappy/snappy.cc.o FAILED: contrib/snappy-cmake/CMakeFiles/_snappy.dir/__/snappy/snappy.cc.o /usr/bin/ccache /usr/bin/clang++-17 --target=x86_64-linux-gnu --sysroot=/opt/ClickHouse/cmake/linux/../../contrib/sysroot/linux-x86_64/x86_64-linux-gnu/libc -DHAVE_CONFIG_H -DSTD_EXCEPTION_HAS_STACK_TRACE=1 -D_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS -I/opt/ClickHouse/base/glibc-compatibility/memcpy -isystem /opt/ClickHouse/contrib/snappy -isystem /opt/ClickHouse/build/contrib/snappy-cmake -isystem /opt/ClickHouse/contrib/llvm-project/libcxx/include -isystem /opt/ClickHouse/contrib/llvm-project/libcxxabi/include -isystem /opt/ClickHouse/contrib/libunwind/include --gcc-toolchain=/opt/ClickHouse/cmake/linux/../../contrib/sysroot/linux-x86_64 -fdiagnostics-color=always -Wno-enum-constexpr-conversion -fsized-deallocation -pipe -mssse3 -msse4.1 -msse4.2 -mpclmul -mpopcnt -mavx -mavx2 -fasynchronous-unwind-tables -ffile-prefix-map=/opt/ClickHouse=. -falign-functions=32 -mbranches-within-32B-boundaries -fdiagnostics-absolute-paths -fstrict-vtable-pointers -w -O3 -DNDEBUG -D OS_LINUX -Werror -nostdinc++ -std=c++2b -MD -MT contrib/snappy-cmake/CMakeFiles/_snappy.dir/__/snappy/snappy.cc.o -MF contrib/snappy-cmake/CMakeFiles/_snappy.dir/__/snappy/snappy.cc.o.d -o contrib/snappy-cmake/CMakeFiles/_snappy.dir/__/snappy/snappy.cc.o -c /opt/ClickHouse/contrib/snappy/snappy.cc /opt/ClickHouse/contrib/snappy/snappy.cc:1061:3: error: unknown type name '__m256i' 1061 | __m256i data = _mm256_lddqu_si256(static_cast<const __m256i *>(src)); | ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1061:55: error: unknown type name '__m256i' 1061 | __m256i data = _mm256_lddqu_si256(static_cast<const __m256i *>(src)); | ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1061:18: error: use of undeclared identifier '_mm256_lddqu_si256'; did you mean '_mm_lddqu_si128'? 1061 | __m256i data = _mm256_lddqu_si256(static_cast<const __m256i *>(src)); | ^~~~~~~~~~~~~~~~~~ | _mm_lddqu_si128 /usr/lib/llvm-17/lib/clang/17/include/pmmintrin.h:38:1: note: '_mm_lddqu_si128' declared here 38 | _mm_lddqu_si128(__m128i_u const *__p) | ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1061:66: error: cannot initialize a parameter of type 'const __m128i_u *' with an lvalue of type 'const void *' 1061 | __m256i data = _mm256_lddqu_si256(static_cast<const __m256i *>(src)); | ^~~ /usr/lib/llvm-17/lib/clang/17/include/pmmintrin.h:38:34: note: passing argument to parameter '__p' here 38 | _mm_lddqu_si128(__m128i_u const *__p) | ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1062:40: error: unknown type name '__m256i' 1062 | _mm256_storeu_si256(reinterpret_cast<__m256i *>(dst), data); | ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1065:49: error: unknown type name '__m256i' 1065 | data = _mm256_lddqu_si256(static_cast<const __m256i *>(src) + 1); | ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1065:65: error: arithmetic on a pointer to void 1065 | data = _mm256_lddqu_si256(static_cast<const __m256i *>(src) + 1); | ~~~ ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1066:42: error: unknown type name '__m256i' 1066 | _mm256_storeu_si256(reinterpret_cast<__m256i *>(dst) + 1, data); | ^ 8 errors generated. [1567/11630] Building CXX object contrib/rocksdb-cma...rocksdb.dir/__/rocksdb/db/arena_wrapped_db_iter.cc.o ninja: build stopped: subcommand failed. ``` The reason is that snappy does not enable SNAPPY_HAVE_X86_CRC32. [#55049](https://github.com/ClickHouse/ClickHouse/pull/55049) ([monchickey](https://github.com/monchickey)).
* Add `instance_env_variables` option to integration tests. [#55208](https://github.com/ClickHouse/ClickHouse/pull/55208) ([Arthur Passos](https://github.com/arthurpassos)).
* Solve issue with launching standalone clickhouse-keeper from clickhouse-server package. [#55226](https://github.com/ClickHouse/ClickHouse/pull/55226) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* In tests RabbitMQ version is updated to 3.12.6. Improved logs collection for RabbitMQ tests. [#55424](https://github.com/ClickHouse/ClickHouse/pull/55424) ([Ilya Yatsishin](https://github.com/qoega)).
* Fix integration check python script to use gh api url - Add Readme for CI tests. [#55476](https://github.com/ClickHouse/ClickHouse/pull/55476) ([Max K.](https://github.com/mkaynov)).
* Fix integration check python script to use gh api url - Add Readme for CI tests. [#55716](https://github.com/ClickHouse/ClickHouse/pull/55716) ([Max K.](https://github.com/mkaynov)).
* Check sha512 for tgz; use a proper repository for keeper; write only filenames to TGZ.sha512 files for tarball packages. Prerequisite for [#31473](https://github.com/ClickHouse/ClickHouse/issues/31473). [#55717](https://github.com/ClickHouse/ClickHouse/pull/55717) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Updated to get free port for azurite. [#55796](https://github.com/ClickHouse/ClickHouse/pull/55796) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Reduce the ouput info. [#55938](https://github.com/ClickHouse/ClickHouse/pull/55938) ([helifu](https://github.com/helifu)).
* Modified the error message difference between openssl and boringssl to fix the functional test. [#55975](https://github.com/ClickHouse/ClickHouse/pull/55975) ([MeenaRenganathan22](https://github.com/MeenaRenganathan22)).
* Changes to support the HDFS for s390x. [#56128](https://github.com/ClickHouse/ClickHouse/pull/56128) ([MeenaRenganathan22](https://github.com/MeenaRenganathan22)).
* Fix flaky test of jbod balancer by relaxing the Gini coefficient and introducing more determinism in insertions. [#56175](https://github.com/ClickHouse/ClickHouse/pull/56175) ([Amos Bird](https://github.com/amosbird)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* skip hardlinking inverted index files in mutation [#47663](https://github.com/ClickHouse/ClickHouse/pull/47663) ([cangyin](https://github.com/cangyin)).
* Fix 'Cannot find column' in read-in-order optimization with ARRAY JOIN [#51746](https://github.com/ClickHouse/ClickHouse/pull/51746) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Support missed Object(Nullable(json)) subcolumns in query. [#54052](https://github.com/ClickHouse/ClickHouse/pull/54052) ([zps](https://github.com/VanDarkholme7)).
* Re-add fix for `accurateCastOrNull()` [#54629](https://github.com/ClickHouse/ClickHouse/pull/54629) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Fix detecting DEFAULT for columns of a Distributed table created without AS [#55060](https://github.com/ClickHouse/ClickHouse/pull/55060) ([Vitaly Baranov](https://github.com/vitlibar)).
* Proper cleanup in case of exception in ctor of ShellCommandSource [#55103](https://github.com/ClickHouse/ClickHouse/pull/55103) ([Alexander Gololobov](https://github.com/davenger)).
* Fix deadlock in LDAP assigned role update [#55119](https://github.com/ClickHouse/ClickHouse/pull/55119) ([Julian Maicher](https://github.com/jmaicher)).
* Suppress error statistics update for internal exceptions [#55128](https://github.com/ClickHouse/ClickHouse/pull/55128) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix deadlock in backups [#55132](https://github.com/ClickHouse/ClickHouse/pull/55132) ([alesapin](https://github.com/alesapin)).
* Fix storage Iceberg files retrieval [#55144](https://github.com/ClickHouse/ClickHouse/pull/55144) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix partition pruning of extra columns in set. [#55172](https://github.com/ClickHouse/ClickHouse/pull/55172) ([Amos Bird](https://github.com/amosbird)).
* Fix recalculation of skip indexes in ALTER UPDATE queries when table has adaptive granularity [#55202](https://github.com/ClickHouse/ClickHouse/pull/55202) ([Duc Canh Le](https://github.com/canhld94)).
* Fix for background download in fs cache [#55252](https://github.com/ClickHouse/ClickHouse/pull/55252) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Avoid possible memory leaks in compressors in case of missing buffer finalization [#55262](https://github.com/ClickHouse/ClickHouse/pull/55262) ([Azat Khuzhin](https://github.com/azat)).
* Fix functions execution over sparse columns [#55275](https://github.com/ClickHouse/ClickHouse/pull/55275) ([Azat Khuzhin](https://github.com/azat)).
* Fix incorrect merging of Nested for SELECT FINAL FROM SummingMergeTree [#55276](https://github.com/ClickHouse/ClickHouse/pull/55276) ([Azat Khuzhin](https://github.com/azat)).
* Fix bug with inability to drop detached partition in replicated merge tree on top of S3 without zero copy [#55309](https://github.com/ClickHouse/ClickHouse/pull/55309) ([alesapin](https://github.com/alesapin)).
* Fix SIGSEGV in MergeSortingPartialResultTransform (due to zero chunks after remerge()) [#55335](https://github.com/ClickHouse/ClickHouse/pull/55335) ([Azat Khuzhin](https://github.com/azat)).
* Fix data-race in CreatingSetsTransform (on errors) due to throwing shared exception [#55338](https://github.com/ClickHouse/ClickHouse/pull/55338) ([Azat Khuzhin](https://github.com/azat)).
* Fix trash optimization (up to a certain extent) [#55353](https://github.com/ClickHouse/ClickHouse/pull/55353) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix leak in StorageHDFS [#55370](https://github.com/ClickHouse/ClickHouse/pull/55370) ([Azat Khuzhin](https://github.com/azat)).
* Fix parsing of arrays in cast operator [#55417](https://github.com/ClickHouse/ClickHouse/pull/55417) ([Anton Popov](https://github.com/CurtizJ)).
* Fix filtering by virtual columns with OR filter in query [#55418](https://github.com/ClickHouse/ClickHouse/pull/55418) ([Azat Khuzhin](https://github.com/azat)).
* Fix MongoDB connection issues [#55419](https://github.com/ClickHouse/ClickHouse/pull/55419) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix MySQL interface boolean representation [#55427](https://github.com/ClickHouse/ClickHouse/pull/55427) ([Serge Klochkov](https://github.com/slvrtrn)).
* Fix MySQL text protocol DateTime formatting and LowCardinality(Nullable(T)) types reporting [#55479](https://github.com/ClickHouse/ClickHouse/pull/55479) ([Serge Klochkov](https://github.com/slvrtrn)).
* Make `use_mysql_types_in_show_columns` affect only `SHOW COLUMNS` [#55481](https://github.com/ClickHouse/ClickHouse/pull/55481) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix stack symbolizer parsing DW_FORM_ref_addr incorrectly and sometimes crashing [#55483](https://github.com/ClickHouse/ClickHouse/pull/55483) ([Michael Kolupaev](https://github.com/al13n321)).
* Destroy fiber in case of exception in cancelBefore in AsyncTaskExecutor [#55516](https://github.com/ClickHouse/ClickHouse/pull/55516) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix Query Parameters not working with custom HTTP handlers [#55521](https://github.com/ClickHouse/ClickHouse/pull/55521) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Fix checking of non handled data for Values format [#55527](https://github.com/ClickHouse/ClickHouse/pull/55527) ([Azat Khuzhin](https://github.com/azat)).
* Fix 'Invalid cursor state' in odbc interacting with MS SQL Server [#55558](https://github.com/ClickHouse/ClickHouse/pull/55558) ([vdimir](https://github.com/vdimir)).
* Fix max execution time and 'break' overflow mode [#55577](https://github.com/ClickHouse/ClickHouse/pull/55577) ([Alexander Gololobov](https://github.com/davenger)).
* Fix crash in QueryNormalizer with cyclic aliases [#55602](https://github.com/ClickHouse/ClickHouse/pull/55602) ([vdimir](https://github.com/vdimir)).
* Disable wrong optimization and add a test [#55609](https://github.com/ClickHouse/ClickHouse/pull/55609) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Merging [#52352](https://github.com/ClickHouse/ClickHouse/issues/52352) [#55621](https://github.com/ClickHouse/ClickHouse/pull/55621) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add a test to avoid incorrect decimal sorting [#55662](https://github.com/ClickHouse/ClickHouse/pull/55662) ([Amos Bird](https://github.com/amosbird)).
* Fix progress bar for s3 and azure Cluster functions with url without globs [#55666](https://github.com/ClickHouse/ClickHouse/pull/55666) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix filtering by virtual columns with OR filter in query (resubmit) [#55678](https://github.com/ClickHouse/ClickHouse/pull/55678) ([Azat Khuzhin](https://github.com/azat)).
* Fixes and improvements for Iceberg storage [#55695](https://github.com/ClickHouse/ClickHouse/pull/55695) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix data race in CreatingSetsTransform (v2) [#55786](https://github.com/ClickHouse/ClickHouse/pull/55786) ([Azat Khuzhin](https://github.com/azat)).
* Throw exception when parsing illegal string as float if precise_float_parsing is true [#55861](https://github.com/ClickHouse/ClickHouse/pull/55861) ([李扬](https://github.com/taiyang-li)).
* Disable predicate pushdown if the CTE contains stateful functions [#55871](https://github.com/ClickHouse/ClickHouse/pull/55871) ([Raúl Marín](https://github.com/Algunenano)).
* Fix normalize ASTSelectWithUnionQuery strip FORMAT of the query [#55887](https://github.com/ClickHouse/ClickHouse/pull/55887) ([flynn](https://github.com/ucasfl)).
* Try to fix possible segfault in Native ORC input format [#55891](https://github.com/ClickHouse/ClickHouse/pull/55891) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix window functions in case of sparse columns. [#55895](https://github.com/ClickHouse/ClickHouse/pull/55895) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* fix: StorageNull supports subcolumns [#55912](https://github.com/ClickHouse/ClickHouse/pull/55912) ([FFish](https://github.com/wxybear)).
* Do not write retriable errors for Replicated mutate/merge into error log [#55944](https://github.com/ClickHouse/ClickHouse/pull/55944) ([Azat Khuzhin](https://github.com/azat)).
* Fix `SHOW DATABASES LIMIT <N>` [#55962](https://github.com/ClickHouse/ClickHouse/pull/55962) ([Raúl Marín](https://github.com/Algunenano)).
* Fix autogenerated Protobuf schema with fields with underscore [#55974](https://github.com/ClickHouse/ClickHouse/pull/55974) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix dateTime64ToSnowflake64() with non-default scale [#55983](https://github.com/ClickHouse/ClickHouse/pull/55983) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix output/input of Arrow dictionary column [#55989](https://github.com/ClickHouse/ClickHouse/pull/55989) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix fetching schema from schema registry in AvroConfluent [#55991](https://github.com/ClickHouse/ClickHouse/pull/55991) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix 'Block structure mismatch' on concurrent ALTER and INSERTs in Buffer table [#55995](https://github.com/ClickHouse/ClickHouse/pull/55995) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix incorrect free space accounting for least_used JBOD policy [#56030](https://github.com/ClickHouse/ClickHouse/pull/56030) ([Azat Khuzhin](https://github.com/azat)).
* Fix missing scalar issue when evaluating subqueries inside table functions [#56057](https://github.com/ClickHouse/ClickHouse/pull/56057) ([Amos Bird](https://github.com/amosbird)).
* Fix wrong query result when http_write_exception_in_output_format=1 [#56135](https://github.com/ClickHouse/ClickHouse/pull/56135) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix schema cache for fallback JSON->JSONEachRow with changed settings [#56172](https://github.com/ClickHouse/ClickHouse/pull/56172) ([Kruglov Pavel](https://github.com/Avogar)).
* Add error handler to odbc-bridge [#56185](https://github.com/ClickHouse/ClickHouse/pull/56185) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
#### NO CL ENTRY
* NO CL ENTRY: 'Revert "Fix libssh+openssl3 & s390x (part 2)"'. [#55188](https://github.com/ClickHouse/ClickHouse/pull/55188) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Support SAMPLE BY for VIEW"'. [#55357](https://github.com/ClickHouse/ClickHouse/pull/55357) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Revert "refine error code of duplicated index in create query""'. [#55467](https://github.com/ClickHouse/ClickHouse/pull/55467) ([Han Fei](https://github.com/hanfei1991)).
* NO CL ENTRY: 'Update mysql.md - Remove the Private Preview Note'. [#55486](https://github.com/ClickHouse/ClickHouse/pull/55486) ([Ryadh DAHIMENE](https://github.com/Ryado)).
* NO CL ENTRY: 'Revert "Removed "maximize" and "drag" buttons from `dashboard` in case of single chart"'. [#55623](https://github.com/ClickHouse/ClickHouse/pull/55623) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Fix filtering by virtual columns with OR filter in query"'. [#55657](https://github.com/ClickHouse/ClickHouse/pull/55657) ([Antonio Andelic](https://github.com/antonio2368)).
* NO CL ENTRY: 'Revert "Improve ColumnDecimal, ColumnVector getPermutation performance using pdqsort with RadixSort"'. [#55682](https://github.com/ClickHouse/ClickHouse/pull/55682) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Integration check script fix ups"'. [#55694](https://github.com/ClickHouse/ClickHouse/pull/55694) ([alesapin](https://github.com/alesapin)).
* NO CL ENTRY: 'Revert "Fix 'Block structure mismatch' on concurrent ALTER and INSERTs in Buffer table"'. [#56103](https://github.com/ClickHouse/ClickHouse/pull/56103) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Add function getHttpHeader"'. [#56109](https://github.com/ClickHouse/ClickHouse/pull/56109) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Fix output/input of Arrow dictionary column"'. [#56150](https://github.com/ClickHouse/ClickHouse/pull/56150) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Set defaults_for_omitted_fields to true for hive text format [#49486](https://github.com/ClickHouse/ClickHouse/pull/49486) ([李扬](https://github.com/taiyang-li)).
* Fixing join tests with analyzer [#49555](https://github.com/ClickHouse/ClickHouse/pull/49555) ([vdimir](https://github.com/vdimir)).
* Make exception about `ALTER TABLE ... DROP COLUMN|INDEX|PROJECTION` more clear [#50181](https://github.com/ClickHouse/ClickHouse/pull/50181) ([Alexander Gololobov](https://github.com/davenger)).
* ANTI JOIN: Invalid number of rows in Chunk [#50944](https://github.com/ClickHouse/ClickHouse/pull/50944) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Analyzer: fix row policy [#53170](https://github.com/ClickHouse/ClickHouse/pull/53170) ([Dmitry Novik](https://github.com/novikd)).
* Add a test with Block structure mismatch in grace hash join. [#53278](https://github.com/ClickHouse/ClickHouse/pull/53278) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Support skip_unused_shards in Analyzer [#53282](https://github.com/ClickHouse/ClickHouse/pull/53282) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Revert "Revert "Planner prepare filters for analysis"" [#53792](https://github.com/ClickHouse/ClickHouse/pull/53792) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Add setting allow_experimental_partial_result [#54514](https://github.com/ClickHouse/ClickHouse/pull/54514) ([vdimir](https://github.com/vdimir)).
* Fix CI skip build and skip tests checks [#54532](https://github.com/ClickHouse/ClickHouse/pull/54532) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* `MergeTreeData::forcefullyMovePartToDetachedAndRemoveFromMemory` does not respect mutations [#54653](https://github.com/ClickHouse/ClickHouse/pull/54653) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix azure test by using unique names [#54738](https://github.com/ClickHouse/ClickHouse/pull/54738) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Use `--filter` to reduce checkout time [#54857](https://github.com/ClickHouse/ClickHouse/pull/54857) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Remove tests [#54873](https://github.com/ClickHouse/ClickHouse/pull/54873) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Remove useless path from lockSharedData in StorageReplicatedMergeTree [#54989](https://github.com/ClickHouse/ClickHouse/pull/54989) ([Mike Kot](https://github.com/myrrc)).
* Fix broken test [#55002](https://github.com/ClickHouse/ClickHouse/pull/55002) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Update version_date.tsv and changelogs after v23.8.3.48-lts [#55063](https://github.com/ClickHouse/ClickHouse/pull/55063) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Use `source` instead of `bash` for pre-build script [#55071](https://github.com/ClickHouse/ClickHouse/pull/55071) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Update s3queue.md to add experimental flag [#55093](https://github.com/ClickHouse/ClickHouse/pull/55093) ([Peignon Melvyn](https://github.com/melvynator)).
* Clean data dir and always start an old server version in aggregate functions compatibility test. [#55105](https://github.com/ClickHouse/ClickHouse/pull/55105) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Update version_date.tsv and changelogs after v23.9.1.1854-stable [#55118](https://github.com/ClickHouse/ClickHouse/pull/55118) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Add links to check reports in status comment [#55122](https://github.com/ClickHouse/ClickHouse/pull/55122) ([vdimir](https://github.com/vdimir)).
* Bump croaring to 2.0.2 [#55127](https://github.com/ClickHouse/ClickHouse/pull/55127) ([Robert Schulze](https://github.com/rschu1ze)).
* check if block is empty after async insert retries [#55143](https://github.com/ClickHouse/ClickHouse/pull/55143) ([Han Fei](https://github.com/hanfei1991)).
* Improve linker detection on macOS [#55147](https://github.com/ClickHouse/ClickHouse/pull/55147) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix libssh+openssl3 & s390x [#55154](https://github.com/ClickHouse/ClickHouse/pull/55154) ([Boris Kuschel](https://github.com/bkuschel)).
* Fix file cache temporary file segment range in FileSegment::reserve [#55164](https://github.com/ClickHouse/ClickHouse/pull/55164) ([vdimir](https://github.com/vdimir)).
* Fix libssh+openssl3 & s390x (part 2) [#55187](https://github.com/ClickHouse/ClickHouse/pull/55187) ([Boris Kuschel](https://github.com/bkuschel)).
* Fix wrong test name [#55190](https://github.com/ClickHouse/ClickHouse/pull/55190) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix libssh+openssl3 & s390x (Part 2.1) [#55192](https://github.com/ClickHouse/ClickHouse/pull/55192) ([Boris Kuschel](https://github.com/bkuschel)).
* Reset signals caught by clickhouse-client if a pager is in use [#55193](https://github.com/ClickHouse/ClickHouse/pull/55193) ([Azat Khuzhin](https://github.com/azat)).
* Update README.md [#55209](https://github.com/ClickHouse/ClickHouse/pull/55209) ([Tyler Hannan](https://github.com/tylerhannan)).
* remove the blocker to grow the metadata file version [#55218](https://github.com/ClickHouse/ClickHouse/pull/55218) ([Sema Checherinda](https://github.com/CheSema)).
* Fix syntax highlight in client for spaceship operator [#55224](https://github.com/ClickHouse/ClickHouse/pull/55224) ([Azat Khuzhin](https://github.com/azat)).
* Fix mypy errors [#55228](https://github.com/ClickHouse/ClickHouse/pull/55228) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Upgrade MinIO to support accepting non signed requests [#55245](https://github.com/ClickHouse/ClickHouse/pull/55245) ([Azat Khuzhin](https://github.com/azat)).
* tests: switch test_throttling to S3 over https to make it more production like [#55247](https://github.com/ClickHouse/ClickHouse/pull/55247) ([Azat Khuzhin](https://github.com/azat)).
* Evaluate defaults during async insert safer [#55253](https://github.com/ClickHouse/ClickHouse/pull/55253) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix data race in context [#55260](https://github.com/ClickHouse/ClickHouse/pull/55260) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Do not allow tests with state ERROR be overwritten by PASSED [#55261](https://github.com/ClickHouse/ClickHouse/pull/55261) ([Azat Khuzhin](https://github.com/azat)).
* Fix query formatting for SYSTEM queries [#55277](https://github.com/ClickHouse/ClickHouse/pull/55277) ([Azat Khuzhin](https://github.com/azat)).
* Context added TSA [#55278](https://github.com/ClickHouse/ClickHouse/pull/55278) ([Maksim Kita](https://github.com/kitaisreal)).
* Improve logging in query cache [#55296](https://github.com/ClickHouse/ClickHouse/pull/55296) ([Robert Schulze](https://github.com/rschu1ze)).
* MaterializedPostgreSQL: remove back check [#55297](https://github.com/ClickHouse/ClickHouse/pull/55297) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Make `use_mysql_types_in_show_columns` independent from the connection [#55298](https://github.com/ClickHouse/ClickHouse/pull/55298) ([Robert Schulze](https://github.com/rschu1ze)).
* Make `HandlingRuleHTTPHandlerFactory` more stupid, but less error prone [#55307](https://github.com/ClickHouse/ClickHouse/pull/55307) ([alesapin](https://github.com/alesapin)).
* Fix tsan issue in croaring [#55311](https://github.com/ClickHouse/ClickHouse/pull/55311) ([Robert Schulze](https://github.com/rschu1ze)).
* Refactorings and better documentation for `toStartOfInterval()` [#55327](https://github.com/ClickHouse/ClickHouse/pull/55327) ([Robert Schulze](https://github.com/rschu1ze)).
* Review [#51946](https://github.com/ClickHouse/ClickHouse/issues/51946) and partially revert it [#55336](https://github.com/ClickHouse/ClickHouse/pull/55336) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update README.md [#55339](https://github.com/ClickHouse/ClickHouse/pull/55339) ([Tyler Hannan](https://github.com/tylerhannan)).
* Context locks small fixes [#55352](https://github.com/ClickHouse/ClickHouse/pull/55352) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix bad test `01605_dictinct_two_level` [#55354](https://github.com/ClickHouse/ClickHouse/pull/55354) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix test_max_rows_to_read_leaf_with_view flakiness (due to prefer_localhost_replica) [#55355](https://github.com/ClickHouse/ClickHouse/pull/55355) ([Azat Khuzhin](https://github.com/azat)).
* Better exception messages [#55356](https://github.com/ClickHouse/ClickHouse/pull/55356) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update CHANGELOG.md [#55359](https://github.com/ClickHouse/ClickHouse/pull/55359) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Better recursion depth check [#55361](https://github.com/ClickHouse/ClickHouse/pull/55361) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Merging [#43085](https://github.com/ClickHouse/ClickHouse/issues/43085) [#55362](https://github.com/ClickHouse/ClickHouse/pull/55362) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix data-race in web disk [#55372](https://github.com/ClickHouse/ClickHouse/pull/55372) ([Azat Khuzhin](https://github.com/azat)).
* Disable skim under TSan (Rust does not supports ThreadSanitizer) [#55378](https://github.com/ClickHouse/ClickHouse/pull/55378) ([Azat Khuzhin](https://github.com/azat)).
* Fix missing thread accounting for insert_distributed_sync=1 [#55392](https://github.com/ClickHouse/ClickHouse/pull/55392) ([Azat Khuzhin](https://github.com/azat)).
* Improve tests for untuple() [#55425](https://github.com/ClickHouse/ClickHouse/pull/55425) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix test that never worked test_rabbitmq_random_detach [#55453](https://github.com/ClickHouse/ClickHouse/pull/55453) ([Ilya Yatsishin](https://github.com/qoega)).
* Fix out of bound error in system.remote_data_paths + disk web [#55468](https://github.com/ClickHouse/ClickHouse/pull/55468) ([alesapin](https://github.com/alesapin)).
* Remove existing moving/ dir if allow_remove_stale_moving_parts is off [#55480](https://github.com/ClickHouse/ClickHouse/pull/55480) ([Mike Kot](https://github.com/myrrc)).
* Bump curl to 8.4 [#55492](https://github.com/ClickHouse/ClickHouse/pull/55492) ([Robert Schulze](https://github.com/rschu1ze)).
* Minor fixes for 02882_replicated_fetch_checksums_doesnt_match.sql [#55493](https://github.com/ClickHouse/ClickHouse/pull/55493) ([vdimir](https://github.com/vdimir)).
* AggregatingTransform initGenerate race condition fix [#55495](https://github.com/ClickHouse/ClickHouse/pull/55495) ([Maksim Kita](https://github.com/kitaisreal)).
* HashTable resize exception handle fix [#55497](https://github.com/ClickHouse/ClickHouse/pull/55497) ([Maksim Kita](https://github.com/kitaisreal)).
* fix lots of 'Structure does not match' warnings in ci [#55503](https://github.com/ClickHouse/ClickHouse/pull/55503) ([Han Fei](https://github.com/hanfei1991)).
* Cleanup: parallel replica coordinator usage [#55515](https://github.com/ClickHouse/ClickHouse/pull/55515) ([Igor Nikonov](https://github.com/devcrafter)).
* add k-morozov to trusted contributors [#55523](https://github.com/ClickHouse/ClickHouse/pull/55523) ([Mike Kot](https://github.com/myrrc)).
* Forbid create inverted index if setting not enabled [#55529](https://github.com/ClickHouse/ClickHouse/pull/55529) ([flynn](https://github.com/ucasfl)).
* Better exception messages but without SEGFAULT [#55541](https://github.com/ClickHouse/ClickHouse/pull/55541) ([Antonio Andelic](https://github.com/antonio2368)).
* Avoid setting same promise twice [#55553](https://github.com/ClickHouse/ClickHouse/pull/55553) ([Antonio Andelic](https://github.com/antonio2368)).
* Better error message in case when merge selecting task failed. [#55554](https://github.com/ClickHouse/ClickHouse/pull/55554) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Add a test [#55564](https://github.com/ClickHouse/ClickHouse/pull/55564) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Added healthcheck for LDAP [#55571](https://github.com/ClickHouse/ClickHouse/pull/55571) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fix global context for tests with --gtest_filter [#55583](https://github.com/ClickHouse/ClickHouse/pull/55583) ([Azat Khuzhin](https://github.com/azat)).
* Fix replica groups for Replicated database engine [#55587](https://github.com/ClickHouse/ClickHouse/pull/55587) ([Azat Khuzhin](https://github.com/azat)).
* Remove unused protobuf includes [#55590](https://github.com/ClickHouse/ClickHouse/pull/55590) ([Raúl Marín](https://github.com/Algunenano)).
* Apply Context changes to standalone Keeper [#55591](https://github.com/ClickHouse/ClickHouse/pull/55591) ([Antonio Andelic](https://github.com/antonio2368)).
* Do not fail if label-to-remove does not exists in PR [#55592](https://github.com/ClickHouse/ClickHouse/pull/55592) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* CI: cast extra column expression `pull_request_number` to Int32 [#55599](https://github.com/ClickHouse/ClickHouse/pull/55599) ([Han Fei](https://github.com/hanfei1991)).
* Add back a test that was removed by mistake [#55605](https://github.com/ClickHouse/ClickHouse/pull/55605) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Bump croaring to v2.0.4 [#55606](https://github.com/ClickHouse/ClickHouse/pull/55606) ([Robert Schulze](https://github.com/rschu1ze)).
* byteswap: Add 16/32-byte integer support [#55607](https://github.com/ClickHouse/ClickHouse/pull/55607) ([Robert Schulze](https://github.com/rschu1ze)).
* Revert [#54421](https://github.com/ClickHouse/ClickHouse/issues/54421) [#55613](https://github.com/ClickHouse/ClickHouse/pull/55613) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix: race condition in kusto implementation [#55615](https://github.com/ClickHouse/ClickHouse/pull/55615) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Remove passed tests from `analyzer_tech_debt.txt` [#55618](https://github.com/ClickHouse/ClickHouse/pull/55618) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Enable 02161_addressToLineWithInlines [#55622](https://github.com/ClickHouse/ClickHouse/pull/55622) ([Michael Kolupaev](https://github.com/al13n321)).
* KeyCondition: preparation [#55625](https://github.com/ClickHouse/ClickHouse/pull/55625) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix flakiness of test_system_merges (by increasing sleep interval properly) [#55627](https://github.com/ClickHouse/ClickHouse/pull/55627) ([Azat Khuzhin](https://github.com/azat)).
* fix `structure does not match` logs again [#55628](https://github.com/ClickHouse/ClickHouse/pull/55628) ([Han Fei](https://github.com/hanfei1991)).
* KeyCondition: small changes [#55640](https://github.com/ClickHouse/ClickHouse/pull/55640) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Resubmit [#54421](https://github.com/ClickHouse/ClickHouse/issues/54421) [#55641](https://github.com/ClickHouse/ClickHouse/pull/55641) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix some typos [#55646](https://github.com/ClickHouse/ClickHouse/pull/55646) ([alesapin](https://github.com/alesapin)).
* Show move/maximize only if there is more than a single chart [#55648](https://github.com/ClickHouse/ClickHouse/pull/55648) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Enable test_query_is_lock_free[detach table] for the analyzer [#55668](https://github.com/ClickHouse/ClickHouse/pull/55668) ([Raúl Marín](https://github.com/Algunenano)).
* Allow FINAL with parallel replicas with custom key [#55679](https://github.com/ClickHouse/ClickHouse/pull/55679) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix StorageMaterializedView::isRemote [#55681](https://github.com/ClickHouse/ClickHouse/pull/55681) ([vdimir](https://github.com/vdimir)).
* Bump gRPC to 1.34.1 [#55693](https://github.com/ClickHouse/ClickHouse/pull/55693) ([Robert Schulze](https://github.com/rschu1ze)).
* Randomize block_number column setting in ci [#55713](https://github.com/ClickHouse/ClickHouse/pull/55713) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Use pool for proxied S3 disk http sessions [#55718](https://github.com/ClickHouse/ClickHouse/pull/55718) ([Aleksei Filatov](https://github.com/aalexfvk)).
* Analyzer: fix block stucture mismatch in matview with engine distributed [#55741](https://github.com/ClickHouse/ClickHouse/pull/55741) ([vdimir](https://github.com/vdimir)).
* Use diff object again, since JSON API limits the files [#55750](https://github.com/ClickHouse/ClickHouse/pull/55750) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Big endian platform max intersection fix [#55756](https://github.com/ClickHouse/ClickHouse/pull/55756) ([Suzy Wang](https://github.com/SuzyWangIBMer)).
* Remove temporary debug logging in MultiplexedConnections [#55764](https://github.com/ClickHouse/ClickHouse/pull/55764) ([Michael Kolupaev](https://github.com/al13n321)).
* Check if partition ID is `nullptr` [#55765](https://github.com/ClickHouse/ClickHouse/pull/55765) ([Antonio Andelic](https://github.com/antonio2368)).
* Control Keeper feature flag randomization with env [#55766](https://github.com/ClickHouse/ClickHouse/pull/55766) ([Antonio Andelic](https://github.com/antonio2368)).
* Added test to check CapnProto cache [#55769](https://github.com/ClickHouse/ClickHouse/pull/55769) ([Aleksandr Musorin](https://github.com/AVMusorin)).
* Query Cache: Only cache initial query [#55771](https://github.com/ClickHouse/ClickHouse/pull/55771) ([zhongyuankai](https://github.com/zhongyuankai)).
* Temporarily disable flaky test [#55772](https://github.com/ClickHouse/ClickHouse/pull/55772) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix test test_postgresql_replica_database_engine_2/test.py::test_replica_consumer [#55774](https://github.com/ClickHouse/ClickHouse/pull/55774) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Improve ColumnDecimal, ColumnVector getPermutation performance using pdqsort with RadixSort fix [#55775](https://github.com/ClickHouse/ClickHouse/pull/55775) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix black check [#55779](https://github.com/ClickHouse/ClickHouse/pull/55779) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Correctly grep fuzzer.log [#55780](https://github.com/ClickHouse/ClickHouse/pull/55780) ([Antonio Andelic](https://github.com/antonio2368)).
* Parallel replicas: cleanup, less copying during announcement [#55781](https://github.com/ClickHouse/ClickHouse/pull/55781) ([Igor Nikonov](https://github.com/devcrafter)).
* Enable test_mutation_simple with the analyzer [#55791](https://github.com/ClickHouse/ClickHouse/pull/55791) ([Raúl Marín](https://github.com/Algunenano)).
* Improve enrich image [#55793](https://github.com/ClickHouse/ClickHouse/pull/55793) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Bump gRPC to v1.36.4 [#55811](https://github.com/ClickHouse/ClickHouse/pull/55811) ([Robert Schulze](https://github.com/rschu1ze)).
* Attemp to fix test_dictionaries_redis flakiness [#55813](https://github.com/ClickHouse/ClickHouse/pull/55813) ([Raúl Marín](https://github.com/Algunenano)).
* Add diagnostic checks for issue [#55041](https://github.com/ClickHouse/ClickHouse/issues/55041) [#55835](https://github.com/ClickHouse/ClickHouse/pull/55835) ([Robert Schulze](https://github.com/rschu1ze)).
* Correct aggregate functions ser/deserialization to be endianness-independent. [#55837](https://github.com/ClickHouse/ClickHouse/pull/55837) ([Austin Kothig](https://github.com/kothiga)).
* Bump gRPC to v1.37.1 [#55840](https://github.com/ClickHouse/ClickHouse/pull/55840) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix 00002_log_and_exception_messages_formatting [#55844](https://github.com/ClickHouse/ClickHouse/pull/55844) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix caching objects in pygithub, and changelogs [#55845](https://github.com/ClickHouse/ClickHouse/pull/55845) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update version_date.tsv and changelogs after v23.3.14.78-lts [#55847](https://github.com/ClickHouse/ClickHouse/pull/55847) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v23.9.2.56-stable [#55848](https://github.com/ClickHouse/ClickHouse/pull/55848) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v23.8.4.69-lts [#55849](https://github.com/ClickHouse/ClickHouse/pull/55849) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* fix node setting in the test [#55850](https://github.com/ClickHouse/ClickHouse/pull/55850) ([Sema Checherinda](https://github.com/CheSema)).
* Add load_metadata_threads to describe filesystem cache [#55863](https://github.com/ClickHouse/ClickHouse/pull/55863) ([Jordi Villar](https://github.com/jrdi)).
* One final leftover in diff_urls of PRInfo [#55874](https://github.com/ClickHouse/ClickHouse/pull/55874) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix digest check in replicated ddl worker [#55877](https://github.com/ClickHouse/ClickHouse/pull/55877) ([Sergei Trifonov](https://github.com/serxa)).
* Test parallel replicas with rollup [#55886](https://github.com/ClickHouse/ClickHouse/pull/55886) ([Raúl Marín](https://github.com/Algunenano)).
* Fix some tests with Replicated database [#55889](https://github.com/ClickHouse/ClickHouse/pull/55889) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Update stress.py [#55890](https://github.com/ClickHouse/ClickHouse/pull/55890) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Revert "Revert "Revert "Add settings for real-time updates during query execution""" [#55893](https://github.com/ClickHouse/ClickHouse/pull/55893) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Make test `system_zookeeper_connection` better [#55900](https://github.com/ClickHouse/ClickHouse/pull/55900) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* A test `01019_alter_materialized_view_consistent` is unstable with Analyzer [#55901](https://github.com/ClickHouse/ClickHouse/pull/55901) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove C++ templates, because they are stupid [#55910](https://github.com/ClickHouse/ClickHouse/pull/55910) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Bump gRPC to v1.39.1 [#55914](https://github.com/ClickHouse/ClickHouse/pull/55914) ([Robert Schulze](https://github.com/rschu1ze)).
* Add sanity check to RPNBuilderFunctionTreeNode [#55915](https://github.com/ClickHouse/ClickHouse/pull/55915) ([Robert Schulze](https://github.com/rschu1ze)).
* Bump gRPC to v1.42.0 [#55916](https://github.com/ClickHouse/ClickHouse/pull/55916) ([Robert Schulze](https://github.com/rschu1ze)).
* Set storage.has_lightweight_delete_parts flag when a part has been loaded [#55935](https://github.com/ClickHouse/ClickHouse/pull/55935) ([Alexander Gololobov](https://github.com/davenger)).
* Include information about supported versions in bug report issue template [#55937](https://github.com/ClickHouse/ClickHouse/pull/55937) ([Nikita Taranov](https://github.com/nickitat)).
* arrayFold: Switch accumulator and array arguments [#55948](https://github.com/ClickHouse/ClickHouse/pull/55948) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix overrides via connections_credentials in case of root directives exists [#55949](https://github.com/ClickHouse/ClickHouse/pull/55949) ([Azat Khuzhin](https://github.com/azat)).
* Test for Bug 43644 [#55955](https://github.com/ClickHouse/ClickHouse/pull/55955) ([Robert Schulze](https://github.com/rschu1ze)).
* Bump protobuf to v3.19.6 [#55963](https://github.com/ClickHouse/ClickHouse/pull/55963) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix possible performance test error [#55964](https://github.com/ClickHouse/ClickHouse/pull/55964) ([Azat Khuzhin](https://github.com/azat)).
* Avoid counting lost parts twice [#55987](https://github.com/ClickHouse/ClickHouse/pull/55987) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Convert unnecessary std::scoped_lock usage to std::lock_guard [#56006](https://github.com/ClickHouse/ClickHouse/pull/56006) ([Robert Schulze](https://github.com/rschu1ze)).
* Stress tests: Try to wait until server is responsive after gdb detach [#56009](https://github.com/ClickHouse/ClickHouse/pull/56009) ([Raúl Marín](https://github.com/Algunenano)).
* test_storage_s3_queue - add debug info [#56011](https://github.com/ClickHouse/ClickHouse/pull/56011) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Bump protobuf to v21.9 [#56014](https://github.com/ClickHouse/ClickHouse/pull/56014) ([Robert Schulze](https://github.com/rschu1ze)).
* Correct the implementation of function `jsonMergePatch` [#56020](https://github.com/ClickHouse/ClickHouse/pull/56020) ([Anton Popov](https://github.com/CurtizJ)).
* Fix 02438_sync_replica_lightweight [#56023](https://github.com/ClickHouse/ClickHouse/pull/56023) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix some bad code by making it worse [#56026](https://github.com/ClickHouse/ClickHouse/pull/56026) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix bash completion for mawk (and update format list and add one more delimiter) [#56050](https://github.com/ClickHouse/ClickHouse/pull/56050) ([Azat Khuzhin](https://github.com/azat)).
* Analyzer: Fix crash on window resolve [#56055](https://github.com/ClickHouse/ClickHouse/pull/56055) ([Dmitry Novik](https://github.com/novikd)).
* Fix function_json_value_return_type_allow_nullable setting name in doc [#56056](https://github.com/ClickHouse/ClickHouse/pull/56056) ([vdimir](https://github.com/vdimir)).
* Force shutdown in upgrade test [#56074](https://github.com/ClickHouse/ClickHouse/pull/56074) ([Raúl Marín](https://github.com/Algunenano)).
* Try enable `01154_move_partition_long` with s3 [#56080](https://github.com/ClickHouse/ClickHouse/pull/56080) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix race condition between DROP_RANGE and committing existing block [#56083](https://github.com/ClickHouse/ClickHouse/pull/56083) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix flakiness of 02263_lazy_mark_load [#56087](https://github.com/ClickHouse/ClickHouse/pull/56087) ([Michael Kolupaev](https://github.com/al13n321)).
* Make the code less bloated [#56091](https://github.com/ClickHouse/ClickHouse/pull/56091) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Maybe smaller binary [#56112](https://github.com/ClickHouse/ClickHouse/pull/56112) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove old trash from unit tests [#56113](https://github.com/ClickHouse/ClickHouse/pull/56113) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove some bloat [#56114](https://github.com/ClickHouse/ClickHouse/pull/56114) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix test_format_schema_on_server flakiness [#56116](https://github.com/ClickHouse/ClickHouse/pull/56116) ([Azat Khuzhin](https://github.com/azat)).
* Fix: schedule delayed part checks correctly [#56123](https://github.com/ClickHouse/ClickHouse/pull/56123) ([Igor Nikonov](https://github.com/devcrafter)).
* Beautify `show merges` [#56124](https://github.com/ClickHouse/ClickHouse/pull/56124) ([Denny Crane](https://github.com/den-crane)).
* Better options for disabling frame pointer omitting [#56130](https://github.com/ClickHouse/ClickHouse/pull/56130) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fix: incorrect brace style in clang-format [#56133](https://github.com/ClickHouse/ClickHouse/pull/56133) ([Igor Nikonov](https://github.com/devcrafter)).
* Do not try to activate covered parts when handilng unexpected parts [#56137](https://github.com/ClickHouse/ClickHouse/pull/56137) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Re-fix 'Block structure mismatch' on concurrent ALTER and INSERTs in Buffer table [#56140](https://github.com/ClickHouse/ClickHouse/pull/56140) ([Michael Kolupaev](https://github.com/al13n321)).
* Update version_date.tsv and changelogs after v23.3.15.29-lts [#56145](https://github.com/ClickHouse/ClickHouse/pull/56145) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v23.8.5.16-lts [#56146](https://github.com/ClickHouse/ClickHouse/pull/56146) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v23.9.3.12-stable [#56147](https://github.com/ClickHouse/ClickHouse/pull/56147) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v23.7.6.111-stable [#56148](https://github.com/ClickHouse/ClickHouse/pull/56148) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Fasttest timeout setting [#56160](https://github.com/ClickHouse/ClickHouse/pull/56160) ([Max K.](https://github.com/mkaynov)).
* Use monotonic clock for part check scheduling [#56162](https://github.com/ClickHouse/ClickHouse/pull/56162) ([Igor Nikonov](https://github.com/devcrafter)).
* More metrics for fs cache [#56165](https://github.com/ClickHouse/ClickHouse/pull/56165) ([Kseniia Sumarokova](https://github.com/kssenii)).
* FileCache minor changes [#56168](https://github.com/ClickHouse/ClickHouse/pull/56168) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Update 01414_mutations_and_errors_zookeeper.sh [#56176](https://github.com/ClickHouse/ClickHouse/pull/56176) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Shard fs cache keys [#56194](https://github.com/ClickHouse/ClickHouse/pull/56194) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Do less work when there lots of read requests and watches for same paths [#56197](https://github.com/ClickHouse/ClickHouse/pull/56197) ([Alexander Gololobov](https://github.com/davenger)).
* Remove skip_unused_shards tests from analyzer skiplist [#56200](https://github.com/ClickHouse/ClickHouse/pull/56200) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Easy tests fix for analyzer [#56211](https://github.com/ClickHouse/ClickHouse/pull/56211) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Add a warning if delayed accounting is not enabled (breaks OSIOWaitMicroseconds) [#56227](https://github.com/ClickHouse/ClickHouse/pull/56227) ([Azat Khuzhin](https://github.com/azat)).
* Do not remove part if `Too many open files` is thrown [#56238](https://github.com/ClickHouse/ClickHouse/pull/56238) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix ORC commit [#56261](https://github.com/ClickHouse/ClickHouse/pull/56261) ([Raúl Marín](https://github.com/Algunenano)).
* Fix typo in largestTriangleThreeBuckets.md [#56263](https://github.com/ClickHouse/ClickHouse/pull/56263) ([Nikita Taranov](https://github.com/nickitat)).

View File

@ -0,0 +1,31 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.3.15.29-lts (218336662e4) FIXME as compared to v23.3.14.78-lts (c8f4ba52c65)
#### Build/Testing/Packaging Improvement
* Backported in [#55671](https://github.com/ClickHouse/ClickHouse/issues/55671): If the database is already initialized, it doesn't need to be initialized again upon subsequent launches. This can potentially fix the issue of infinite container restarts when the database fails to load within 1000 attempts (relevant for very large databases and multi-node setups). [#50724](https://github.com/ClickHouse/ClickHouse/pull/50724) ([Alexander Nikolaev](https://github.com/AlexNik)).
* Backported in [#55734](https://github.com/ClickHouse/ClickHouse/issues/55734): Fix integration check python script to use gh api url - Add Readme for CI tests. [#55716](https://github.com/ClickHouse/ClickHouse/pull/55716) ([Max K.](https://github.com/mkaynov)).
* Backported in [#55829](https://github.com/ClickHouse/ClickHouse/issues/55829): Check sha512 for tgz; use a proper repository for keeper; write only filenames to TGZ.sha512 files for tarball packages. Prerequisite for [#31473](https://github.com/ClickHouse/ClickHouse/issues/31473). [#55717](https://github.com/ClickHouse/ClickHouse/pull/55717) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix bug with inability to drop detached partition in replicated merge tree on top of S3 without zero copy [#55309](https://github.com/ClickHouse/ClickHouse/pull/55309) ([alesapin](https://github.com/alesapin)).
* Fix crash in QueryNormalizer with cyclic aliases [#55602](https://github.com/ClickHouse/ClickHouse/pull/55602) ([vdimir](https://github.com/vdimir)).
* Fix window functions in case of sparse columns. [#55895](https://github.com/ClickHouse/ClickHouse/pull/55895) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
#### NO CL ENTRY
* NO CL ENTRY: 'Pin rust version to fix GLIBC compatibility check'. [#55788](https://github.com/ClickHouse/ClickHouse/pull/55788) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Fix incorrect createColumn call on join clause [#48998](https://github.com/ClickHouse/ClickHouse/pull/48998) ([Yi Sheng](https://github.com/ongkong)).
* Use `--filter` to reduce checkout time [#54857](https://github.com/ClickHouse/ClickHouse/pull/54857) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Improve enrich image [#55793](https://github.com/ClickHouse/ClickHouse/pull/55793) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* One final leftover in diff_urls of PRInfo [#55874](https://github.com/ClickHouse/ClickHouse/pull/55874) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).

View File

@ -0,0 +1,78 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.7.6.111-stable (6b047a47504) FIXME as compared to v23.7.5.30-stable (e86c21fb922)
#### Improvement
* Backported in [#54285](https://github.com/ClickHouse/ClickHouse/issues/54285): Enable allow_remove_stale_moving_parts by default. [#54260](https://github.com/ClickHouse/ClickHouse/pull/54260) ([vdimir](https://github.com/vdimir)).
#### Build/Testing/Packaging Improvement
* Backported in [#55291](https://github.com/ClickHouse/ClickHouse/issues/55291): Resource with source code including submodules is built in Darwin special build task. It may be used to build ClickHouse without checkouting submodules. [#51435](https://github.com/ClickHouse/ClickHouse/pull/51435) ([Ilya Yatsishin](https://github.com/qoega)).
* Backported in [#54705](https://github.com/ClickHouse/ClickHouse/issues/54705): Enrich `changed_images.json` with the latest tag from master for images that are not changed in the pull request. [#54369](https://github.com/ClickHouse/ClickHouse/pull/54369) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Backported in [#54683](https://github.com/ClickHouse/ClickHouse/issues/54683): We build and upload them for every push, which isn't worth it. [#54675](https://github.com/ClickHouse/ClickHouse/pull/54675) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#55407](https://github.com/ClickHouse/ClickHouse/issues/55407): Solve issue with launching standalone clickhouse-keeper from clickhouse-server package. [#55226](https://github.com/ClickHouse/ClickHouse/pull/55226) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Backported in [#55723](https://github.com/ClickHouse/ClickHouse/issues/55723): Fix integration check python script to use gh api url - Add Readme for CI tests. [#55716](https://github.com/ClickHouse/ClickHouse/pull/55716) ([Max K.](https://github.com/mkaynov)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix recalculation of skip indexes and projections in `ALTER DELETE` queries [#52530](https://github.com/ClickHouse/ClickHouse/pull/52530) ([Anton Popov](https://github.com/CurtizJ)).
* RFC: Fix filtering by virtual columns with OR expression [#52653](https://github.com/ClickHouse/ClickHouse/pull/52653) ([Azat Khuzhin](https://github.com/azat)).
* Fix reading of unnecessary column in case of multistage `PREWHERE` [#52689](https://github.com/ClickHouse/ClickHouse/pull/52689) ([Anton Popov](https://github.com/CurtizJ)).
* Fix sorting of sparse columns with large limit [#52827](https://github.com/ClickHouse/ClickHouse/pull/52827) ([Anton Popov](https://github.com/CurtizJ)).
* Fix reading of empty `Nested(Array(LowCardinality(...)))` [#52949](https://github.com/ClickHouse/ClickHouse/pull/52949) ([Anton Popov](https://github.com/CurtizJ)).
* Fix adding sub-second intervals to DateTime [#53309](https://github.com/ClickHouse/ClickHouse/pull/53309) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix: moved to prewhere condition actions can lose column [#53492](https://github.com/ClickHouse/ClickHouse/pull/53492) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix crash in join on sparse column [#53548](https://github.com/ClickHouse/ClickHouse/pull/53548) ([vdimir](https://github.com/vdimir)).
* Fix named_collection_admin alias [#54066](https://github.com/ClickHouse/ClickHouse/pull/54066) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix rows_before_limit_at_least for DelayedSource. [#54122](https://github.com/ClickHouse/ClickHouse/pull/54122) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix: allow IPv6 for bloom filter [#54200](https://github.com/ClickHouse/ClickHouse/pull/54200) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Check for overflow before addition in `analysisOfVariance` function [#54385](https://github.com/ClickHouse/ClickHouse/pull/54385) ([Antonio Andelic](https://github.com/antonio2368)).
* reproduce and fix the bug in removeSharedRecursive [#54430](https://github.com/ClickHouse/ClickHouse/pull/54430) ([Sema Checherinda](https://github.com/CheSema)).
* Fix aggregate projections with normalized states [#54480](https://github.com/ClickHouse/ClickHouse/pull/54480) ([Amos Bird](https://github.com/amosbird)).
* Fix possible parsing error in WithNames formats with disabled input_format_with_names_use_header [#54513](https://github.com/ClickHouse/ClickHouse/pull/54513) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix rare case of CHECKSUM_DOESNT_MATCH error [#54549](https://github.com/ClickHouse/ClickHouse/pull/54549) ([alesapin](https://github.com/alesapin)).
* Fix zero copy garbage [#54550](https://github.com/ClickHouse/ClickHouse/pull/54550) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix "Invalid number of rows in Chunk" in MaterializedPostgreSQL [#54844](https://github.com/ClickHouse/ClickHouse/pull/54844) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Prevent attaching parts from tables with different projections or indices [#55062](https://github.com/ClickHouse/ClickHouse/pull/55062) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Fix deadlock in LDAP assigned role update [#55119](https://github.com/ClickHouse/ClickHouse/pull/55119) ([Julian Maicher](https://github.com/jmaicher)).
* Fix storage Iceberg files retrieval [#55144](https://github.com/ClickHouse/ClickHouse/pull/55144) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix functions execution over sparse columns [#55275](https://github.com/ClickHouse/ClickHouse/pull/55275) ([Azat Khuzhin](https://github.com/azat)).
* Fix bug with inability to drop detached partition in replicated merge tree on top of S3 without zero copy [#55309](https://github.com/ClickHouse/ClickHouse/pull/55309) ([alesapin](https://github.com/alesapin)).
* Fix trash optimization (up to a certain extent) [#55353](https://github.com/ClickHouse/ClickHouse/pull/55353) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix parsing of arrays in cast operator [#55417](https://github.com/ClickHouse/ClickHouse/pull/55417) ([Anton Popov](https://github.com/CurtizJ)).
* Fix filtering by virtual columns with OR filter in query [#55418](https://github.com/ClickHouse/ClickHouse/pull/55418) ([Azat Khuzhin](https://github.com/azat)).
* Fix MongoDB connection issues [#55419](https://github.com/ClickHouse/ClickHouse/pull/55419) ([Nikolay Degterinsky](https://github.com/evillique)).
* Destroy fiber in case of exception in cancelBefore in AsyncTaskExecutor [#55516](https://github.com/ClickHouse/ClickHouse/pull/55516) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix crash in QueryNormalizer with cyclic aliases [#55602](https://github.com/ClickHouse/ClickHouse/pull/55602) ([vdimir](https://github.com/vdimir)).
* Fix filtering by virtual columns with OR filter in query (resubmit) [#55678](https://github.com/ClickHouse/ClickHouse/pull/55678) ([Azat Khuzhin](https://github.com/azat)).
* Fix window functions in case of sparse columns. [#55895](https://github.com/ClickHouse/ClickHouse/pull/55895) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
#### NO CL CATEGORY
* Backported in [#55704](https://github.com/ClickHouse/ClickHouse/issues/55704):. [#55657](https://github.com/ClickHouse/ClickHouse/pull/55657) ([Antonio Andelic](https://github.com/antonio2368)).
#### NO CL ENTRY
* NO CL ENTRY: 'Revert "Merge pull request [#52395](https://github.com/ClickHouse/ClickHouse/issues/52395) from azat/rust/reproducible-builds"'. [#55517](https://github.com/ClickHouse/ClickHouse/pull/55517) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Test libunwind changes. [#51436](https://github.com/ClickHouse/ClickHouse/pull/51436) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Retry blob listing in test_alter_moving_garbage [#52193](https://github.com/ClickHouse/ClickHouse/pull/52193) ([vdimir](https://github.com/vdimir)).
* add tests with connection reset by peer error, and retry it inside client [#52441](https://github.com/ClickHouse/ClickHouse/pull/52441) ([Sema Checherinda](https://github.com/CheSema)).
* Small fix for HTTPHeaderFilter [#53146](https://github.com/ClickHouse/ClickHouse/pull/53146) ([San](https://github.com/santrancisco)).
* fix Logical Error in AsynchronousBoundedReadBuffer [#53651](https://github.com/ClickHouse/ClickHouse/pull/53651) ([Sema Checherinda](https://github.com/CheSema)).
* Replace dlcdn.apache.org by archive domain [#54081](https://github.com/ClickHouse/ClickHouse/pull/54081) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix segfault in system.zookeeper [#54326](https://github.com/ClickHouse/ClickHouse/pull/54326) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix CI skip build and skip tests checks [#54532](https://github.com/ClickHouse/ClickHouse/pull/54532) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Update WebObjectStorage.cpp [#54695](https://github.com/ClickHouse/ClickHouse/pull/54695) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Use `--filter` to reduce checkout time [#54857](https://github.com/ClickHouse/ClickHouse/pull/54857) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* check if block is empty after async insert retries [#55143](https://github.com/ClickHouse/ClickHouse/pull/55143) ([Han Fei](https://github.com/hanfei1991)).
* MaterializedPostgreSQL: remove back check [#55297](https://github.com/ClickHouse/ClickHouse/pull/55297) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Remove existing moving/ dir if allow_remove_stale_moving_parts is off [#55480](https://github.com/ClickHouse/ClickHouse/pull/55480) ([Mike Kot](https://github.com/myrrc)).
* One final leftover in diff_urls of PRInfo [#55874](https://github.com/ClickHouse/ClickHouse/pull/55874) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).

View File

@ -0,0 +1,24 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.8.5.16-lts (e8a1af5fe2f) FIXME as compared to v23.8.4.69-lts (d4d1e7b9ded)
#### Build/Testing/Packaging Improvement
* Backported in [#55830](https://github.com/ClickHouse/ClickHouse/issues/55830): Check sha512 for tgz; use a proper repository for keeper; write only filenames to TGZ.sha512 files for tarball packages. Prerequisite for [#31473](https://github.com/ClickHouse/ClickHouse/issues/31473). [#55717](https://github.com/ClickHouse/ClickHouse/pull/55717) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix storage Iceberg files retrieval [#55144](https://github.com/ClickHouse/ClickHouse/pull/55144) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Try to fix possible segfault in Native ORC input format [#55891](https://github.com/ClickHouse/ClickHouse/pull/55891) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix window functions in case of sparse columns. [#55895](https://github.com/ClickHouse/ClickHouse/pull/55895) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Use `--filter` to reduce checkout time [#54857](https://github.com/ClickHouse/ClickHouse/pull/54857) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* One final leftover in diff_urls of PRInfo [#55874](https://github.com/ClickHouse/ClickHouse/pull/55874) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Bring relevant commits from backport/23.8/55336 to 23.8 [#56029](https://github.com/ClickHouse/ClickHouse/pull/56029) ([Austin Kothig](https://github.com/kothiga)).

View File

@ -0,0 +1,20 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.9.3.12-stable (b7230b06563) FIXME as compared to v23.9.2.56-stable (a1bf3f1de55)
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix storage Iceberg files retrieval [#55144](https://github.com/ClickHouse/ClickHouse/pull/55144) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Try to fix possible segfault in Native ORC input format [#55891](https://github.com/ClickHouse/ClickHouse/pull/55891) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix window functions in case of sparse columns. [#55895](https://github.com/ClickHouse/ClickHouse/pull/55895) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Use `--filter` to reduce checkout time [#54857](https://github.com/ClickHouse/ClickHouse/pull/54857) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* One final leftover in diff_urls of PRInfo [#55874](https://github.com/ClickHouse/ClickHouse/pull/55874) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).

View File

@ -67,22 +67,30 @@ Implementations of `ReadBuffer`/`WriteBuffer` are used for working with files an
Read/WriteBuffers only deal with bytes. There are functions from `ReadHelpers` and `WriteHelpers` header files to help with formatting input/output. For example, there are helpers to write a number in decimal format.
Lets look at what happens when you want to write a result set in `JSON` format to stdout. You have a result set ready to be fetched from `IBlockInputStream`. You create `WriteBufferFromFileDescriptor(STDOUT_FILENO)` to write bytes to stdout. You create `JSONRowOutputStream`, initialized with that `WriteBuffer`, to write rows in `JSON` to stdout. You create `BlockOutputStreamFromRowOutputStream` on top of it, to represent it as `IBlockOutputStream`. Then you call `copyData` to transfer data from `IBlockInputStream` to `IBlockOutputStream`, and everything works. Internally, `JSONRowOutputStream` will write various JSON delimiters and call the `IDataType::serializeTextJSON` method with a reference to `IColumn` and the row number as arguments. Consequently, `IDataType::serializeTextJSON` will call a method from `WriteHelpers.h`: for example, `writeText` for numeric types and `writeJSONString` for `DataTypeString`.
Let's examine what happens when you want to write a result set in `JSON` format to stdout.
You have a result set ready to be fetched from a pulling `QueryPipeline`.
First, you create a `WriteBufferFromFileDescriptor(STDOUT_FILENO)` to write bytes to stdout.
Next, you connect the result from the query pipeline to `JSONRowOutputFormat`, which is initialized with that `WriteBuffer`, to write rows in `JSON` format to stdout.
This can be done via the `complete` method, which turns a pulling `QueryPipeline` into a completed `QueryPipeline`.
Internally, `JSONRowOutputFormat` will write various JSON delimiters and call the `IDataType::serializeTextJSON` method with a reference to `IColumn` and the row number as arguments. Consequently, `IDataType::serializeTextJSON` will call a method from `WriteHelpers.h`: for example, `writeText` for numeric types and `writeJSONString` for `DataTypeString`.
## Tables {#tables}
The `IStorage` interface represents tables. Different implementations of that interface are different table engines. Examples are `StorageMergeTree`, `StorageMemory`, and so on. Instances of these classes are just tables.
The key `IStorage` methods are `read` and `write`. There are also `alter`, `rename`, `drop`, and so on. The `read` method accepts the following arguments: the set of columns to read from a table, the `AST` query to consider, and the desired number of streams to return. It returns one or multiple `IBlockInputStream` objects and information about the stage of data processing that was completed inside a table engine during query execution.
The key methods in `IStorage` are `read` and `write`, along with others such as `alter`, `rename`, and `drop`. The `read` method accepts the following arguments: a set of columns to read from a table, the `AST` query to consider, and the desired number of streams. It returns a `Pipe`.
In most cases, the read method is only responsible for reading the specified columns from a table, not for any further data processing. All further data processing is done by the query interpreter and is outside the responsibility of `IStorage`.
In most cases, the read method is responsible only for reading the specified columns from a table, not for any further data processing.
All subsequent data processing is handled by another part of the pipeline, which falls outside the responsibility of `IStorage`.
But there are notable exceptions:
- The AST query is passed to the `read` method, and the table engine can use it to derive index usage and to read fewer data from a table.
- Sometimes the table engine can process data itself to a specific stage. For example, `StorageDistributed` can send a query to remote servers, ask them to process data to a stage where data from different remote servers can be merged, and return that preprocessed data. The query interpreter then finishes processing the data.
The tables `read` method can return multiple `IBlockInputStream` objects to allow parallel data processing. These multiple block input streams can read from a table in parallel. Then you can wrap these streams with various transformations (such as expression evaluation or filtering) that can be calculated independently and create a `UnionBlockInputStream` on top of them, to read from multiple streams in parallel.
The tables `read` method can return a `Pipe` consisting of multiple `Processors`. These `Processors` can read from a table in parallel.
Then, you can connect these processors with various other transformations (such as expression evaluation or filtering), which can be calculated independently.
And then, create a `QueryPipeline` on top of them, and execute it via `PipelineExecutor`.
There are also `TableFunction`s. These are functions that return a temporary `IStorage` object to use in the `FROM` clause of a query.
@ -98,9 +106,19 @@ A hand-written recursive descent parser parses a query. For example, `ParserSele
## Interpreters {#interpreters}
Interpreters are responsible for creating the query execution pipeline from an `AST`. There are simple interpreters, such as `InterpreterExistsQuery` and `InterpreterDropQuery`, or the more sophisticated `InterpreterSelectQuery`. The query execution pipeline is a combination of block input or output streams. For example, the result of interpreting the `SELECT` query is the `IBlockInputStream` to read the result set from; the result of the `INSERT` query is the `IBlockOutputStream` to write data for insertion to, and the result of interpreting the `INSERT SELECT` query is the `IBlockInputStream` that returns an empty result set on the first read, but that copies data from `SELECT` to `INSERT` at the same time.
Interpreters are responsible for creating the query execution pipeline from an AST. There are simple interpreters, such as `InterpreterExistsQuery` and `InterpreterDropQuery`, as well as the more sophisticated `InterpreterSelectQuery`.
`InterpreterSelectQuery` uses `ExpressionAnalyzer` and `ExpressionActions` machinery for query analysis and transformations. This is where most rule-based query optimizations are done. `ExpressionAnalyzer` is quite messy and should be rewritten: various query transformations and optimizations should be extracted to separate classes to allow modular transformations of query.
The query execution pipeline is a combination of processors that can consume and produce chunks (sets of columns with specific types).
A processor communicates via ports and can have multiple input ports and multiple output ports.
A more detailed description can be found in [src/Processors/IProcessor.h](https://github.com/ClickHouse/ClickHouse/blob/master/src/Processors/IProcessor.h).
For example, the result of interpreting the `SELECT` query is a "pulling" `QueryPipeline` which has a special output port to read the result set from.
The result of the `INSERT` query is a "pushing" `QueryPipeline` with an input port to write data for insertion.
And the result of interpreting the `INSERT SELECT` query is a "completed" `QueryPipeline` that has no inputs or outputs but copies data from `SELECT` to `INSERT` simultaneously.
`InterpreterSelectQuery` uses `ExpressionAnalyzer` and `ExpressionActions` machinery for query analysis and transformations. This is where most rule-based query optimizations are performed. `ExpressionAnalyzer` is quite messy and should be rewritten: various query transformations and optimizations should be extracted into separate classes to allow for modular transformations of the query.
To address current problems that exist in interpreters, a new `InterpreterSelectQueryAnalyzer` is being developed. It is a new version of `InterpreterSelectQuery` that does not use `ExpressionAnalyzer` and introduces an additional abstraction level between `AST` and `QueryPipeline` called `QueryTree`. It is not production-ready yet, but it can be tested with the `allow_experimental_analyzer` flag.
## Functions {#functions}

View File

@ -11,6 +11,8 @@ This is intended for continuous integration checks that run on Linux servers. If
The cross-build for macOS is based on the [Build instructions](../development/build.md), follow them first.
The following sections provide a walk-through for building ClickHouse for `x86_64` macOS. If youre targeting ARM architecture, simply substitute all occurrences of `x86_64` with `aarch64`. For example, replace `x86_64-apple-darwin` with `aarch64-apple-darwin` throughout the steps.
## Install Clang-17
Follow the instructions from https://apt.llvm.org/ for your Ubuntu or Debian setup.
@ -30,13 +32,13 @@ export CCTOOLS=$(cd ~/cctools && pwd)
mkdir ${CCTOOLS}
cd ${CCTOOLS}
git clone https://github.com/tpoechtrager/apple-libtapi.git
git clone --depth=1 https://github.com/tpoechtrager/apple-libtapi.git
cd apple-libtapi
INSTALLPREFIX=${CCTOOLS} ./build.sh
./install.sh
cd ..
git clone https://github.com/tpoechtrager/cctools-port.git
git clone --depth=1 https://github.com/tpoechtrager/cctools-port.git
cd cctools-port/cctools
./configure --prefix=$(readlink -f ${CCTOOLS}) --with-libtapi=$(readlink -f ${CCTOOLS}) --target=x86_64-apple-darwin
make install
@ -46,7 +48,7 @@ Also, we need to download macOS X SDK into the working tree.
``` bash
cd ClickHouse/cmake/toolchain/darwin-x86_64
curl -L 'https://github.com/phracker/MacOSX-SDKs/releases/download/10.15/MacOSX10.15.sdk.tar.xz' | tar xJ --strip-components=1
curl -L 'https://github.com/phracker/MacOSX-SDKs/releases/download/11.3/MacOSX11.0.sdk.tar.xz' | tar xJ --strip-components=1
```
## Build ClickHouse {#build-clickhouse}

View File

@ -23,7 +23,7 @@ sudo bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"
``` bash
cd ClickHouse
mkdir build-riscv64
CC=clang-16 CXX=clang++-16 cmake . -Bbuild-riscv64 -G Ninja -DCMAKE_TOOLCHAIN_FILE=cmake/linux/toolchain-riscv64.cmake -DGLIBC_COMPATIBILITY=OFF -DENABLE_LDAP=OFF -DOPENSSL_NO_ASM=ON -DENABLE_JEMALLOC=ON -DENABLE_PARQUET=OFF -DENABLE_GRPC=OFF -DENABLE_HDFS=OFF -DENABLE_MYSQL=OFF
CC=clang-17 CXX=clang++-17 cmake . -Bbuild-riscv64 -G Ninja -DCMAKE_TOOLCHAIN_FILE=cmake/linux/toolchain-riscv64.cmake -DGLIBC_COMPATIBILITY=OFF -DENABLE_LDAP=OFF -DOPENSSL_NO_ASM=ON -DENABLE_JEMALLOC=ON -DENABLE_PARQUET=OFF -DENABLE_GRPC=OFF -DENABLE_HDFS=OFF -DENABLE_MYSQL=OFF
ninja -C build-riscv64
```

View File

@ -219,13 +219,21 @@ You can also run your custom-built ClickHouse binary with the config file from t
## IDE (Integrated Development Environment) {#ide-integrated-development-environment}
If you do not know which IDE to use, we recommend that you use CLion. CLion is commercial software, but it offers 30 days free trial period. It is also free of charge for students. CLion can be used both on Linux and on macOS.
**CLion (recommended)**
KDevelop and QTCreator are other great alternatives of an IDE for developing ClickHouse. KDevelop comes in as a very handy IDE although unstable. If KDevelop crashes after a while upon opening project, you should click “Stop All” button as soon as it has opened the list of projects files. After doing so KDevelop should be fine to work with.
If you do not know which IDE to use, we recommend that you use [CLion](https://www.jetbrains.com/clion/). CLion is commercial software but it offers a 30 day free trial. It is also free of charge for students. CLion can be used on both Linux and macOS.
As simple code editors, you can use Sublime Text or Visual Studio Code, or Kate (all of which are available on Linux).
A few things to know when using CLion to develop ClickHouse:
Just in case, it is worth mentioning that CLion creates `build` path on its own, it also on its own selects `debug` for build type, for configuration it uses a version of CMake that is defined in CLion and not the one installed by you, and finally, CLion will use `make` to run build tasks instead of `ninja`. This is normal behaviour, just keep that in mind to avoid confusion.
- CLion creates a `build` path on its own and automatically selects `debug` for the build type
- It uses a version of CMake that is defined in CLion and not the one installed by you
- CLion will use `make` to run build tasks instead of `ninja` (this is normal behavior)
**Other alternatives**
[KDevelop](https://kdevelop.org/) and [QTCreator](https://www.qt.io/product/development-tools) are other great alternative IDEs for developing ClickHouse. While KDevelop is a great IDE, it is sometimes unstable. If KDevelop crashes when opening a project, you should click the “Stop All” button as soon as it has opened the list of projects files. After doing so, KDevelop should be fine to work with.
Other IDEs you can use are [Sublime Text](https://www.sublimetext.com/), [Visual Studio Code](https://code.visualstudio.com/), or [Kate](https://kate-editor.org/) (all of which are available on Linux). If you are using VS Code, we recommend using the [clangd extension](https://marketplace.visualstudio.com/items?itemName=llvm-vs-code-extensions.vscode-clangd) to replace IntelliSense as it is much more performant.
## Writing Code {#writing-code}

View File

@ -345,7 +345,7 @@ struct ExtractDomain
**7.** For abstract classes (interfaces) you can add the `I` prefix.
``` cpp
class IBlockInputStream
class IProcessor
```
**8.** If you use a variable locally, you can use the short name.

View File

@ -28,7 +28,6 @@ SETTINGS
kafka_topic_list = 'topic1,topic2,...',
kafka_group_name = 'group_name',
kafka_format = 'data_format'[,]
[kafka_row_delimiter = 'delimiter_symbol',]
[kafka_schema = '',]
[kafka_num_consumers = N,]
[kafka_max_block_size = 0,]
@ -53,7 +52,6 @@ Required parameters:
Optional parameters:
- `kafka_row_delimiter` — Delimiter character, which ends the message. **This setting is deprecated and is no longer used, not left for compatibility reasons.**
- `kafka_schema` — Parameter that must be used if the format requires a schema definition. For example, [Capn Proto](https://capnproto.org/) requires the path to the schema file and the name of the root `schema.capnp:Message` object.
- `kafka_num_consumers` — The number of consumers per table. Specify more consumers if the throughput of one consumer is insufficient. The total number of consumers should not exceed the number of partitions in the topic, since only one consumer can be assigned per partition, and must not be greater than the number of physical cores on the server where ClickHouse is deployed. Default: `1`.
- `kafka_max_block_size` — The maximum batch size (in messages) for poll. Default: [max_insert_block_size](../../../operations/settings/settings.md#setting-max_insert_block_size).
@ -64,7 +62,7 @@ Optional parameters:
- `kafka_poll_max_batch_size` — Maximum amount of messages to be polled in a single Kafka poll. Default: [max_block_size](../../../operations/settings/settings.md#setting-max_block_size).
- `kafka_flush_interval_ms` — Timeout for flushing data from Kafka. Default: [stream_flush_interval_ms](../../../operations/settings/settings.md#stream-flush-interval-ms).
- `kafka_thread_per_consumer` — Provide independent thread for each consumer. When enabled, every consumer flush the data independently, in parallel (otherwise — rows from several consumers squashed to form one block). Default: `0`.
- `kafka_handle_error_mode` — How to handle errors for Kafka engine. Possible values: default, stream.
- `kafka_handle_error_mode` — How to handle errors for Kafka engine. Possible values: default (the exception will be thrown if we fail to parse a message), stream (the exception message and raw message will be saved in virtual columns `_error` and `_raw_message`).
- `kafka_commit_on_select` — Commit messages when select query is made. Default: `false`.
- `kafka_max_rows_per_message` — The maximum number of rows written in one kafka message for row-based formats. Default : `1`.
@ -249,6 +247,13 @@ Example:
- `_headers.name` — Array of message's headers keys.
- `_headers.value` — Array of message's headers values.
Additional virtual columns when `kafka_handle_error_mode='stream'`:
- `_raw_message` - Raw message that couldn't be parsed successfully.
- `_error` - Exception message happened during failed parsing.
Note: `_raw_message` and `_error` virtual columns are filled only in case of exception during parsing, they are always empty when message was parsed successfully.
## Data formats support {#data-formats-support}
Kafka engine supports all [formats](../../../interfaces/formats.md) supported in ClickHouse.

View File

@ -25,7 +25,6 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
nats_url = 'host:port',
nats_subjects = 'subject1,subject2,...',
nats_format = 'data_format'[,]
[nats_row_delimiter = 'delimiter_symbol',]
[nats_schema = '',]
[nats_num_consumers = N,]
[nats_queue_group = 'group_name',]
@ -40,7 +39,8 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
[nats_password = 'password',]
[nats_token = 'clickhouse',]
[nats_startup_connect_tries = '5']
[nats_max_rows_per_message = 1]
[nats_max_rows_per_message = 1,]
[nats_handle_error_mode = 'default']
```
Required parameters:
@ -51,7 +51,6 @@ Required parameters:
Optional parameters:
- `nats_row_delimiter` Delimiter character, which ends the message. **This setting is deprecated and is no longer used, not left for compatibility reasons.**
- `nats_schema` Parameter that must be used if the format requires a schema definition. For example, [Capn Proto](https://capnproto.org/) requires the path to the schema file and the name of the root `schema.capnp:Message` object.
- `nats_num_consumers` The number of consumers per table. Default: `1`. Specify more consumers if the throughput of one consumer is insufficient.
- `nats_queue_group` Name for queue group of NATS subscribers. Default is the table name.
@ -66,6 +65,7 @@ Optional parameters:
- `nats_token` - NATS auth token.
- `nats_startup_connect_tries` - Number of connect tries at startup. Default: `5`.
- `nats_max_rows_per_message` — The maximum number of rows written in one NATS message for row-based formats. (default : `1`).
- `nats_handle_error_mode` — How to handle errors for RabbitMQ engine. Possible values: default (the exception will be thrown if we fail to parse a message), stream (the exception message and raw message will be saved in virtual columns `_error` and `_raw_message`).
SSL connection:
@ -165,6 +165,14 @@ If you want to change the target table by using `ALTER`, we recommend disabling
- `_subject` - NATS message subject.
Additional virtual columns when `kafka_handle_error_mode='stream'`:
- `_raw_message` - Raw message that couldn't be parsed successfully.
- `_error` - Exception message happened during failed parsing.
Note: `_raw_message` and `_error` virtual columns are filled only in case of exception during parsing, they are always empty when message was parsed successfully.
## Data formats support {#data-formats-support}
NATS engine supports all [formats](../../../interfaces/formats.md) supported in ClickHouse.

View File

@ -28,7 +28,6 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
[rabbitmq_exchange_type = 'exchange_type',]
[rabbitmq_routing_key_list = 'key1,key2,...',]
[rabbitmq_secure = 0,]
[rabbitmq_row_delimiter = 'delimiter_symbol',]
[rabbitmq_schema = '',]
[rabbitmq_num_consumers = N,]
[rabbitmq_num_queues = N,]
@ -45,7 +44,8 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
[rabbitmq_username = '',]
[rabbitmq_password = '',]
[rabbitmq_commit_on_select = false,]
[rabbitmq_max_rows_per_message = 1]
[rabbitmq_max_rows_per_message = 1,]
[rabbitmq_handle_error_mode = 'default']
```
Required parameters:
@ -58,7 +58,6 @@ Optional parameters:
- `rabbitmq_exchange_type` The type of RabbitMQ exchange: `direct`, `fanout`, `topic`, `headers`, `consistent_hash`. Default: `fanout`.
- `rabbitmq_routing_key_list` A comma-separated list of routing keys.
- `rabbitmq_row_delimiter` Delimiter character, which ends the message. **This setting is deprecated and is no longer used, not left for compatibility reasons.**
- `rabbitmq_schema` Parameter that must be used if the format requires a schema definition. For example, [Capn Proto](https://capnproto.org/) requires the path to the schema file and the name of the root `schema.capnp:Message` object.
- `rabbitmq_num_consumers` The number of consumers per table. Specify more consumers if the throughput of one consumer is insufficient. Default: `1`
- `rabbitmq_num_queues` Total number of queues. Increasing this number can significantly improve performance. Default: `1`.
@ -78,6 +77,7 @@ Optional parameters:
- `rabbitmq_max_rows_per_message` — The maximum number of rows written in one RabbitMQ message for row-based formats. Default : `1`.
- `rabbitmq_empty_queue_backoff_start` — A start backoff point to reschedule read if the rabbitmq queue is empty.
- `rabbitmq_empty_queue_backoff_end` — An end backoff point to reschedule read if the rabbitmq queue is empty.
- `rabbitmq_handle_error_mode` — How to handle errors for RabbitMQ engine. Possible values: default (the exception will be thrown if we fail to parse a message), stream (the exception message and raw message will be saved in virtual columns `_error` and `_raw_message`).
@ -191,6 +191,13 @@ Example:
- `_message_id` - messageID of the received message; non-empty if was set, when message was published.
- `_timestamp` - timestamp of the received message; non-empty if was set, when message was published.
Additional virtual columns when `kafka_handle_error_mode='stream'`:
- `_raw_message` - Raw message that couldn't be parsed successfully.
- `_error` - Exception message happened during failed parsing.
Note: `_raw_message` and `_error` virtual columns are filled only in case of exception during parsing, they are always empty when message was parsed successfully.
## Data formats support {#data-formats-support}
RabbitMQ engine supports all [formats](../../../interfaces/formats.md) supported in ClickHouse.

View File

@ -866,6 +866,7 @@ Tags:
- `prefer_not_to_merge` — Disables merging of data parts on this volume. When this setting is enabled, merging data on this volume is not allowed. This allows controlling how ClickHouse works with slow disks.
- `perform_ttl_move_on_insert` — Disables TTL move on data part INSERT. By default (if enabled) if we insert a data part that already expired by the TTL move rule it immediately goes to a volume/disk declared in move rule. This can significantly slowdown insert in case if destination volume/disk is slow (e.g. S3). If disabled then already expired data part is written into a default volume and then right after moved to TTL volume.
- `load_balancing` - Policy for disk balancing, `round_robin` or `least_used`.
- `least_used_ttl_ms` - Configure timeout (in milliseconds) for the updating available space on all disks (`0` - update always, `-1` - never update, default is `60000`). Note, if the disk can be used by ClickHouse only and is not subject to a online filesystem resize/shrink you can use `-1`, in all other cases it is not recommended, since eventually it will lead to incorrect space distribution.
Configuration examples:

View File

@ -48,61 +48,61 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] AS [db2.]name2
#### policy_name
`policy_name` - (optionally) policy name, it will be used to store temporary files for async send
`policy_name` - (optionally) policy name, it will be used to store temporary files for background send
**See Also**
- [insert_distributed_sync](../../../operations/settings/settings.md#insert_distributed_sync) setting
- [distributed_foreground_insert](../../../operations/settings/settings.md#distributed_foreground_insert) setting
- [MergeTree](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes) for the examples
### Distributed Settings
#### fsync_after_insert
`fsync_after_insert` - do the `fsync` for the file data after asynchronous insert to Distributed. Guarantees that the OS flushed the whole inserted data to a file **on the initiator node** disk.
`fsync_after_insert` - do the `fsync` for the file data after background insert to Distributed. Guarantees that the OS flushed the whole inserted data to a file **on the initiator node** disk.
#### fsync_directories
`fsync_directories` - do the `fsync` for directories. Guarantees that the OS refreshed directory metadata after operations related to asynchronous inserts on Distributed table (after insert, after sending the data to shard, etc).
`fsync_directories` - do the `fsync` for directories. Guarantees that the OS refreshed directory metadata after operations related to background inserts on Distributed table (after insert, after sending the data to shard, etc).
#### bytes_to_throw_insert
`bytes_to_throw_insert` - if more than this number of compressed bytes will be pending for async INSERT, an exception will be thrown. 0 - do not throw. Default 0.
`bytes_to_throw_insert` - if more than this number of compressed bytes will be pending for background INSERT, an exception will be thrown. 0 - do not throw. Default 0.
#### bytes_to_delay_insert
`bytes_to_delay_insert` - if more than this number of compressed bytes will be pending for async INSERT, the query will be delayed. 0 - do not delay. Default 0.
`bytes_to_delay_insert` - if more than this number of compressed bytes will be pending for background INSERT, the query will be delayed. 0 - do not delay. Default 0.
#### max_delay_to_insert
`max_delay_to_insert` - max delay of inserting data into Distributed table in seconds, if there are a lot of pending bytes for async send. Default 60.
`max_delay_to_insert` - max delay of inserting data into Distributed table in seconds, if there are a lot of pending bytes for background send. Default 60.
#### monitor_batch_inserts
#### background_insert_batch
`monitor_batch_inserts` - same as [distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts)
`background_insert_batch` - same as [distributed_background_insert_batch](../../../operations/settings/settings.md#distributed_background_insert_batch)
#### monitor_split_batch_on_failure
#### background_insert_split_batch_on_failure
`monitor_split_batch_on_failure` - same as [distributed_directory_monitor_split_batch_on_failure](../../../operations/settings/settings.md#distributed_directory_monitor_split_batch_on_failure)
`background_insert_split_batch_on_failure` - same as [distributed_background_insert_split_batch_on_failure](../../../operations/settings/settings.md#distributed_background_insert_split_batch_on_failure)
#### monitor_sleep_time_ms
#### background_insert_sleep_time_ms
`monitor_sleep_time_ms` - same as [distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms)
`background_insert_sleep_time_ms` - same as [distributed_background_insert_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_sleep_time_ms)
#### monitor_max_sleep_time_ms
#### background_insert_max_sleep_time_ms
`monitor_max_sleep_time_ms` - same as [distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms)
`background_insert_max_sleep_time_ms` - same as [distributed_background_insert_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_max_sleep_time_ms)
:::note
**Durability settings** (`fsync_...`):
- Affect only asynchronous INSERTs (i.e. `insert_distributed_sync=false`) when data first stored on the initiator node disk and later asynchronously send to shards.
- Affect only background INSERTs (i.e. `distributed_foreground_insert=false`) when data first stored on the initiator node disk and later, in background, send to shards.
- May significantly decrease the inserts' performance
- Affect writing the data stored inside Distributed table folder into the **node which accepted your insert**. If you need to have guarantees of writing data to underlying MergeTree tables - see durability settings (`...fsync...`) in `system.merge_tree_settings`
For **Insert limit settings** (`..._insert`) see also:
- [insert_distributed_sync](../../../operations/settings/settings.md#insert_distributed_sync) setting
- [distributed_foreground_insert](../../../operations/settings/settings.md#distributed_foreground_insert) setting
- [prefer_localhost_replica](../../../operations/settings/settings.md#settings-prefer-localhost-replica) setting
- `bytes_to_throw_insert` handled before `bytes_to_delay_insert`, so you should not set it to the value less then `bytes_to_delay_insert`
:::
@ -232,7 +232,7 @@ You should be concerned about the sharding scheme in the following cases:
- Queries are used that require joining data (`IN` or `JOIN`) by a specific key. If data is sharded by this key, you can use local `IN` or `JOIN` instead of `GLOBAL IN` or `GLOBAL JOIN`, which is much more efficient.
- A large number of servers is used (hundreds or more) with a large number of small queries, for example, queries for data of individual clients (e.g. websites, advertisers, or partners). In order for the small queries to not affect the entire cluster, it makes sense to locate data for a single client on a single shard. Alternatively, you can set up bi-level sharding: divide the entire cluster into “layers”, where a layer may consist of multiple shards. Data for a single client is located on a single layer, but shards can be added to a layer as necessary, and data is randomly distributed within them. `Distributed` tables are created for each layer, and a single shared distributed table is created for global queries.
Data is written asynchronously. When inserted in the table, the data block is just written to the local file system. The data is sent to the remote servers in the background as soon as possible. The periodicity for sending data is managed by the [distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms) and [distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms) settings. The `Distributed` engine sends each file with inserted data separately, but you can enable batch sending of files with the [distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts) setting. This setting improves cluster performance by better utilizing local server and network resources. You should check whether data is sent successfully by checking the list of files (data waiting to be sent) in the table directory: `/var/lib/clickhouse/data/database/table/`. The number of threads performing background tasks can be set by [background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size) setting.
Data is written in background. When inserted in the table, the data block is just written to the local file system. The data is sent to the remote servers in the background as soon as possible. The periodicity for sending data is managed by the [distributed_background_insert_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_sleep_time_ms) and [distributed_background_insert_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_max_sleep_time_ms) settings. The `Distributed` engine sends each file with inserted data separately, but you can enable batch sending of files with the [distributed_background_insert_batch](../../../operations/settings/settings.md#distributed_background_insert_batch) setting. This setting improves cluster performance by better utilizing local server and network resources. You should check whether data is sent successfully by checking the list of files (data waiting to be sent) in the table directory: `/var/lib/clickhouse/data/database/table/`. The number of threads performing background tasks can be set by [background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size) setting.
If the server ceased to exist or had a rough restart (for example, due to a hardware failure) after an `INSERT` to a `Distributed` table, the inserted data might be lost. If a damaged data part is detected in the table directory, it is transferred to the `broken` subdirectory and no longer used.

View File

@ -0,0 +1,105 @@
---
slug: /en/engines/table-engines/special/filelog
sidebar_position: 160
sidebar_label: FileLog
---
# FileLog Engine {#filelog-engine}
This engine allows to process application log files as a stream of records.
`FileLog` lets you:
- Subscribe to log files.
- Process new records as they are appended to subscribed log files.
## Creating a Table {#creating-a-table}
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE = FileLog('path_to_logs', 'format_name') SETTINGS
[poll_timeout_ms = 0,]
[poll_max_batch_size = 0,]
[max_block_size = 0,]
[max_threads = 0,]
[poll_directory_watch_events_backoff_init = 500,]
[poll_directory_watch_events_backoff_max = 32000,]
[poll_directory_watch_events_backoff_factor = 2,]
[handle_error_mode = 'default']
```
Engine arguments:
- `path_to_logs` Path to log files to subscribe. It can be path to a directory with log files or to a single log file. Note that ClickHouse allows only paths inside `user_files` directory.
- `format_name` - Record format. Note that FileLog process each line in a file as a separate record and not all data formats are suitable for it.
Optional parameters:
- `poll_timeout_ms` - Timeout for single poll from log file. Default: [stream_poll_timeout_ms](../../../operations/settings/settings.md#stream_poll_timeout_ms).
- `poll_max_batch_size` — Maximum amount of records to be polled in a single poll. Default: [max_block_size](../../../operations/settings/settings.md#setting-max_block_size).
- `max_block_size` — The maximum batch size (in records) for poll. Default: [max_insert_block_size](../../../operations/settings/settings.md#setting-max_insert_block_size).
- `max_threads` - Number of max threads to parse files, default is 0, which means the number will be max(1, physical_cpu_cores / 4).
- `poll_directory_watch_events_backoff_init` - The initial sleep value for watch directory thread. Default: `500`.
- `poll_directory_watch_events_backoff_max` - The max sleep value for watch directory thread. Default: `32000`.
- `poll_directory_watch_events_backoff_factor` - The speed of backoff, exponential by default. Default: `2`.
- `handle_error_mode` — How to handle errors for FileLog engine. Possible values: default (the exception will be thrown if we fail to parse a message), stream (the exception message and raw message will be saved in virtual columns `_error` and `_raw_message`).
## Description {#description}
The delivered records are tracked automatically, so each record in a log file is only counted once.
`SELECT` is not particularly useful for reading records (except for debugging), because each record can be read only once. It is more practical to create real-time threads using [materialized views](../../../sql-reference/statements/create/view.md). To do this:
1. Use the engine to create a FileLog table and consider it a data stream.
2. Create a table with the desired structure.
3. Create a materialized view that converts data from the engine and puts it into a previously created table.
When the `MATERIALIZED VIEW` joins the engine, it starts collecting data in the background. This allows you to continually receive records from log files and convert them to the required format using `SELECT`.
One FileLog table can have as many materialized views as you like, they do not read data from the table directly, but receive new records (in blocks), this way you can write to several tables with different detail level (with grouping - aggregation and without).
Example:
``` sql
CREATE TABLE logs (
timestamp UInt64,
level String,
message String
) ENGINE = FileLog('user_files/my_app/app.log', 'JSONEachRow');
CREATE TABLE daily (
day Date,
level String,
total UInt64
) ENGINE = SummingMergeTree(day, (day, level), 8192);
CREATE MATERIALIZED VIEW consumer TO daily
AS SELECT toDate(toDateTime(timestamp)) AS day, level, count() as total
FROM queue GROUP BY day, level;
SELECT level, sum(total) FROM daily GROUP BY level;
```
To stop receiving streams data or to change the conversion logic, detach the materialized view:
``` sql
DETACH TABLE consumer;
ATTACH TABLE consumer;
```
If you want to change the target table by using `ALTER`, we recommend disabling the material view to avoid discrepancies between the target table and the data from the view.
## Virtual Columns {#virtual-columns}
- `_filename` - Name of the log file.
- `_offset` - Offset in the log file.
Additional virtual columns when `kafka_handle_error_mode='stream'`:
- `_raw_record` - Raw record that couldn't be parsed successfully.
- `_error` - Exception message happened during failed parsing.
Note: `_raw_record` and `_error` virtual columns are filled only in case of exception during parsing, they are always empty when message was parsed successfully.

View File

@ -5,9 +5,7 @@ sidebar_label: Amazon customer reviews
# Amazon customer reviews dataset
[**Amazon Customer Reviews**](https://s3.amazonaws.com/amazon-reviews-pds/readme.html) (a.k.a. Product Reviews) is one of Amazons iconic products. In a period of over two decades since the first review in 1995, millions of Amazon customers have contributed over a hundred million reviews to express opinions and describe their experiences regarding products on the Amazon.com website. This makes Amazon Customer Reviews a rich source of information for academic researchers in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning (ML), amongst others. By accessing the dataset, you agree to the [license terms](https://s3.amazonaws.com/amazon-reviews-pds/license.txt).
The data is in a tab-separated format in gzipped files are up in AWS S3. Let's walk through the steps to insert it into ClickHouse.
This dataset contains over 150M customer reviews of Amazon products. The data is in snappy-compressed Parquet files in AWS S3 that total 49GB in size (compressed). Let's walk through the steps to insert it into ClickHouse.
:::note
The queries below were executed on a **Production** instance of [ClickHouse Cloud](https://clickhouse.cloud).
@ -18,49 +16,28 @@ The queries below were executed on a **Production** instance of [ClickHouse Clou
```sql
SELECT *
FROM s3('https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Wireless_v1_00.tsv.gz',
'TabSeparatedWithNames',
'marketplace String,
customer_id Int64,
review_id String,
product_id String,
product_parent Int64,
product_title String,
product_category String,
star_rating Int64,
helpful_votes Int64,
total_votes Int64,
vine Bool,
verified_purchase Bool,
review_headline String,
review_body String,
review_date Date'
)
LIMIT 10;
FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/amazon_reviews/amazon_reviews_2015.snappy.parquet')
LIMIT 10
```
The rows look like:
```response
┌─marketplace─┬─customer_id─┬─review_id──────┬─product_id─┬─product_parent─┬─product_title──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─product_category─┬─star_rating─┬─helpful_votes─┬─total_votes─┬─vine──┬─verified_purchase─┬─review_headline───────────┬─review_body────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─review_date─┐
US │ 16414143 │ R3W4P9UBGNGH1U │ B00YL0EKWE │ 852431543 │ LG G4 Case Hard Transparent Slim Clear Cover for LG G4 │ Wireless │ 2 │ 1 │ 3 │ false │ true │ Looks good, functions meh │ 2 issues - Once I turned on the circle apps and installed this case, my battery drained twice as fast as usual. I ended up turning off the circle apps, which kind of makes the case just a case... with a hole in it. Second, the wireless charging doesn't work. I have a Motorola 360 watch and a Qi charging pad. The watch charges fine but this case doesn't. But hey, it looks nice. │ 2015-08-31
US │ 50800750 │ R15V54KBMTQWAY │ B00XK95RPQ │ 516894650 │ Selfie Stick Fiblastiq&trade; Extendable Wireless Bluetooth Selfie Stick with built-in Bluetooth Adjustable Phone Holder │ Wireless │ 4 │ 0 │ 0 │ false │ false │ A fun little gadget │ Im embarrassed to admit that until recently, I have had a very negative opinion about “selfie sticks” aka “monopods” aka “narcissticks.” But having reviewed a number of them recently, theyre growing on me. This one is pretty nice and simple to set up and with easy instructions illustrated on the back of the box (not sure why some reviewers have stated that there are no instructions when they are clearly printed on the box unless they received different packaging than I did). Once assembled, the pairing via bluetooth and use of the stick are easy and intuitive. Nothing to it.<br /><br />The stick comes with a USB charging cable but arrived with a charge so you can use it immediately, though its probably a good idea to charge it right away so that you have no interruption of use out of the box. Make sure the stick is switched to on (it will light up) and extend your stick to the length you desire up to about a yards length and snap away.<br /><br />The phone clamp held the phone sturdily so I wasnt worried about it slipping out. But the longer you extend the stick, the harder it is to maneuver. But that will happen with any stick and is not specific to this one in particular.<br /><br />Two things that could improve this: 1) add the option to clamp this in portrait orientation instead of having to try and hold the stick at the portrait angle, which makes it feel unstable; 2) add the opening for a tripod so that this can be used to sit upright on a table for skyping and facetime eliminating the need to hold the phone up with your hand, causing fatigue.<br /><br />But other than that, this is a nice quality monopod for a variety of picture taking opportunities.<br /><br />I received a sample in exchange for my honest opinion. │ 2015-08-31
US │ 15184378 │ RY8I449HNXSVF │ B00SXRXUKO │ 984297154 │ Tribe AB40 Water Resistant Sports Armband with Key Holder for 4.7-Inch iPhone 6S/6/5/5S/5C, Galaxy S4 + Screen Protector - Dark Pink │ Wireless │ 5 │ 0 │ 0 │ false │ true │ Five Stars │ Fits iPhone 6 well │ 2015-08-31
US │ 10203548 │ R18TLJYCKJFLSR │ B009V5X1CE │ 279912704 │ RAVPower® Element 10400mAh External Battery USB Portable Charger (Dual USB Outputs, Ultra Compact Design), Travel Charger for iPhone 6,iPhone 6 plus,iPhone 5, 5S, 5C, 4S, 4, iPad Air, 4, 3, 2, Mini 2 (Apple adapters not included); Samsung Galaxy S5, S4, S3, S2, Note 3, Note 2; HTC One, EVO, Thunderbolt, Incredible, Droid DNA, Motorola ATRIX, Droid, Moto X, Google Glass, Nexus 4, Nexus 5, Nexus 7, │ Wireless │ 5 │ 0 │ 0 │ false │ true │ Great charger │ Great charger. I easily get 3+ charges on a Samsung Galaxy 3. Works perfectly for camping trips or long days on the boat. │ 2015-08-31
US │ 488280 │ R1NK26SWS53B8Q │ B00D93OVF0 │ 662791300 │ Fosmon Micro USB Value Pack Bundle for Samsung Galaxy Exhilarate - Includes Home / Travel Charger, Car / Vehicle Charger and USB Cable │ Wireless │ 5 │ 0 │ 0 │ false │ true │ Five Stars │ Great for the price :-) │ 2015-08-31
US │ 13334021 │ R11LOHEDYJALTN │ B00XVGJMDQ │ 421688488 │ iPhone 6 Case, Vofolen Impact Resistant Protective Shell iPhone 6S Wallet Cover Shockproof Rubber Bumper Case Anti-scratches Hard Cover Skin Card Slot Holder for iPhone 6 6S │ Wireless │ 5 │ 0 │ 0 │ false │ true │ Five Stars │ Great Case, better customer service! │ 2015-08-31
US │ 27520697 │ R3ALQVQB2P9LA7 │ B00KQW1X1C │ 554285554 │ Nokia Lumia 630 RM-978 White Factory Unlocked - International Version No Warranty │ Wireless │ 4 │ 0 │ 0 │ false │ true │ Four Stars │ Easy to set up and use. Great functions for the price │ 2015-08-31
US │ 48086021 │ R3MWLXLNO21PDQ │ B00IP1MQNK │ 488006702 │ Lumsing 10400mah external battery │ Wireless │ 5 │ 0 │ 0 │ false │ true │ Five Stars │ Works great │ 2015-08-31
US │ 12738196 │ R2L15IS24CX0LI │ B00HVORET8 │ 389677711 │ iPhone 5S Battery Case - iPhone 5 Battery Case , Maxboost Atomic S [MFI Certified] External Protective Battery Charging Case Power Bank Charger All Versions of Apple iPhone 5/5S [Juice Battery Pack] │ Wireless │ 5 │ 0 │ 0 │ false │ true │ So far so good │ So far so good. It is essentially identical to the one it replaced from another company. That one stopped working after 7 months so I am a bit apprehensive about this one. │ 2015-08-31
US │ 15867807 │ R1DJ8976WPWVZU │ B00HX3G6J6 │ 299654876 │ HTC One M8 Screen Protector, Skinomi TechSkin Full Coverage Screen Protector for HTC One M8 Clear HD Anti-Bubble Film │ Wireless │ 3 │ 0 │ 0 │ false │ true │ seems durable but these are always harder to get on ... │ seems durable but these are always harder to get on right than people make them out to be. also send to curl up at the edges after a while. with today's smartphones, you hardly need screen protectors anyway. │ 2015-08-31
└─────────────┴─────────────┴────────────────┴────────────┴────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────────┴─────────────┴───────────────┴─────────────┴───────┴───────────────────┴─────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌─review_date─┬─marketplace─┬─customer_id─┬─review_id──────┬─product_id─┬─product_parent─┬─product_title────────────────────────────────────────────────┬─product_category───────┬─star_rating─┬─helpful_votes─┬─total_votes─┬─vine──┬─verified_purchase─┬─review_headline─────────────────────────────────────────────────────────────┬─review_body────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 16452 │ US │ 21080196 │ R17NMVYCQXEEFW │ B00RSI5DJA │ 904397429 │ Pilot │ Digital_Video_Download │ 5 │ 0 │ 0 │ false │ false │ yes indeed │ OMG- i totally see myself get hook on that show if it happen- love it │
│ 16452 │ US │ 44158214 │ R3MAPJVO9D0ERG │ B00RSI61PU │ 475013967 │ Salem Rogers: Model of the Year 1998 │ Digital_Video_Download │ 5 │ 0 │ 0 │ false │ false │ Halarious show!! │ Loved this pilot episode!! Please pick this up Amazon!!! │
│ 16452 │ US │ 1944630 │ R1Q5YPRE84OVB6 │ B009IU6BIS │ 101502671 │ National Lampoon's Christmas Vacation │ Digital_Video_Download │ 5 │ 0 │ 0 │ false │ false │ Classic! │ This is a holiday classic. How can you not love it! │
│ 16452 │ US │ 49029010 │ RGDK35TBJJ2ZI │ B00RSI68V2 │ 639602030 │ Table 58 │ Digital_Video_Download │ 5 │ 2 │ 3 │ false │ false │ Fun for the whole family!! │ This show is fun! Our family really enjoyed watching the show. I can see this being one of the shows that we watch on Friday nights with our pizza and ice cream. I hope to see more of the show and the great cast of characters. │
│ 16452 │ US │ 52257958 │ R1R2SEOJT8M14Y │ B00RSGIMUE │ 196368495 │ Niko and the Sword of Light │ Digital_Video_Download │ 5 │ 1 │ 2 │ false │ false │ it's a new kind of avatar. great show. make more. │ My 7 year old son and my husband both loved it! It's like avatar the last air bender but with different magical powers. The characters are adorably well developed. The story is interesting. We hope amazon makes the whole season. We can't wait to see more! │
│ 16452 │ US │ 26927978 │ RN0JCPQ6Z4FUB │ B009ITL7UG │ 497741324 │ Lord of the Rings: The Return of the King (Extended Edition) │ Digital_Video_Download │ 5 │ 0 │ 0 │ false │ true │ Definite must-own for any Tolkien buff who has not yet upgraded to Blu-Ray! │ If you liked the theatrical release and are a fan of Middle-Earth then you should get this. │
│ 16452 │ US │ 19626887 │ R15LDVOU1S1DFB │ B00RSGHGB0 │ 576999592 │ Just Add Magic - Season 1 │ Digital_Video_Download │ 5 │ 1 │ 1 │ false │ false │ Great story! So good even my teenage boys said ... │ Great story! So good even my teenage boys said this is actually pretty good!!! Can't wait for the next episode. │
│ 16452 │ US │ 1439383 │ R2DJVLZM1MVFQH │ B002WEQJ3E │ 733651019 │ Six: The Mark Unleashed │ Digital_Video_Download │ 1 │ 0 │ 4 │ false │ false │ I am now less intelligent for having watched an entire 10 minutes of it │ I am now less intelligent for having watched an entire 10 minutes of it. God save my sole as I now must kick out the chair from which I am standing on so that the noose may do its job. Watch the movie at your own risk. The screen will suck your brain cells out of your body. │
│ 16452 │ US │ 46233181 │ R33W2NB9MCRUFV │ B00RSGFYQE │ 464741068 │ Point of Honor │ Digital_Video_Download │ 4 │ 0 │ 0 │ false │ false │ Give it a chance. │ Pilots are just what they are...pilots. A chance to see what works and what doesn't and a chance to smooth out the wrinkles. Point of Honor at least stands a fair chance. │
│ 16452 │ US │ 19537300 │ R2WGJYESHID0ZF │ B00RSGHQJM │ 374287214 │ Down Dog │ Digital_Video_Download │ 5 │ 1 │ 1 │ false │ false │ Five Stars │ great fun │
└─────────────┴─────────────┴─────────────┴────────────────┴────────────┴────────────────┴──────────────────────────────────────────────────────────────┴────────────────────────┴─────────────┴───────────────┴─────────────┴───────┴───────────────────┴─────────────────────────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```
:::note
Normally you would not need to pass in the schema into the `s3` table function - ClickHouse can infer the names and data types of the columns. However, this particular dataset uses a non-standard tab-separated format, but the `s3` function seems to work fine with this non-standard format if you include the schema.
:::
2. Let's define a new table named `amazon_reviews`. We'll optimize some of the column data types - and choose a primary key (the `ORDER BY` clause):
2. Let's define a new `MergeTree` table named `amazon_reviews` to store this data in ClickHouse:
```sql
CREATE TABLE amazon_reviews
@ -82,58 +59,38 @@ CREATE TABLE amazon_reviews
review_body String
)
ENGINE = MergeTree
ORDER BY (marketplace, review_date, product_category);
ORDER BY (review_date, product_category);
```
3. We are now ready to insert the data into ClickHouse. Before we do, check out the [list of files in the dataset](https://s3.amazonaws.com/amazon-reviews-pds/tsv/index.txt) and decide which ones you want to include.
4. We will insert all of the US reviews - which is about 151M rows. The following `INSERT` command uses the `s3Cluster` table function, which allows the processing of multiple S3 files in parallel using all the nodes of your cluster. We also use a wildcard to insert any file that starts with the name `https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_`:
3. The following `INSERT` command uses the `s3Cluster` table function, which allows the processing of multiple S3 files in parallel using all the nodes of your cluster. We also use a wildcard to insert any file that starts with the name `https://datasets-documentation.s3.eu-west-3.amazonaws.com/amazon_reviews/amazon_reviews_*.snappy.parquet`:
```sql
INSERT INTO amazon_reviews
WITH
transform(vine, ['Y','N'],[true, false]) AS vine,
transform(verified_purchase, ['Y','N'],[true, false]) AS verified_purchase
SELECT
*
FROM s3Cluster(
'default',
'https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_*.tsv.gz',
'TSVWithNames',
'review_date Date,
marketplace LowCardinality(String),
customer_id UInt64,
review_id String,
product_id String,
product_parent UInt64,
product_title String,
product_category LowCardinality(String),
star_rating UInt8,
helpful_votes UInt32,
total_votes UInt32,
vine FixedString(1),
verified_purchase FixedString(1),
review_headline String,
review_body String'
)
SETTINGS input_format_allow_errors_num = 1000000;
'https://datasets-documentation.s3.eu-west-3.amazonaws.com/amazon_reviews/amazon_reviews_*.snappy.parquet'
);
```
:::tip
In ClickHouse Cloud, there is a cluster named `default`. Change `default` to the name of your cluster...or use the `s3` table function (instead of `s3Cluster`) if you do not have a cluster.
In ClickHouse Cloud, the name of the cluster is `default`. Change `default` to the name of your cluster...or use the `s3` table function (instead of `s3Cluster`) if you do not have a cluster.
:::
5. That query doesn't take long - within 5 minutes or so you should see all the rows inserted:
5. That query doesn't take long - averaging about 300,000 rows per second. within 5 minutes or so you should see all the rows inserted:
```sql
SELECT formatReadableQuantity(count())
FROM amazon_reviews
FROM amazon_reviews;
```
```response
┌─formatReadableQuantity(count())─┐
│ 150.96 million │
└─────────────────────────────────┘
1 row in set. Elapsed: 0.005 sec.
```
6. Let's see how much space our data is using:
@ -155,11 +112,11 @@ The original data was about 70G, but compressed in ClickHouse it takes up about
```response
┌─disk_name─┬─compressed─┬─uncompressed─┬─compr_rate─┬──────rows─┬─part_count─┐
│ s3disk │ 30.00 GiB │ 70.61 GiB │ 2.35 │ 150957260 │ 9
│ s3disk │ 30.05 GiB │ 70.47 GiB │ 2.35 │ 150957260 │ 14
└───────────┴────────────┴──────────────┴────────────┴───────────┴────────────┘
```
7. Let's run some queries...here are the top 10 most-helpful reviews on Amazon:
7. Let's run some queries...here are the top 10 most-helpful reviews in the dataset:
```sql
SELECT
@ -170,7 +127,7 @@ ORDER BY helpful_votes DESC
LIMIT 10;
```
Notice the query has to process all 151M rows, and it takes about 17 seconds:
Notice the query has to process all 151M rows, but takes less than one second!
```response
┌─product_title────────────────────────────────────────────────────────────────────────────┬─review_headline───────────────────────────────────────────────────────┐
@ -186,7 +143,7 @@ Notice the query has to process all 151M rows, and it takes about 17 seconds:
│ Tuscan Dairy Whole Vitamin D Milk, Gallon, 128 oz │ Make this your only stock and store │
└──────────────────────────────────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────┘
10 rows in set. Elapsed: 17.595 sec. Processed 150.96 million rows, 15.36 GB (8.58 million rows/s., 872.89 MB/s.)
10 rows in set. Elapsed: 0.897 sec. Processed 150.96 million rows, 15.36 GB (168.36 million rows/s., 17.13 GB/s.)
```
8. Here are the top 10 products in Amazon with the most reviews:
@ -215,7 +172,7 @@ LIMIT 10;
│ Crossy Road │ 28111 │
└───────────────────────────────────────────────┴─────────┘
10 rows in set. Elapsed: 16.684 sec. Processed 195.05 million rows, 20.86 GB (11.69 million rows/s., 1.25 GB/s.)
10 rows in set. Elapsed: 20.059 sec. Processed 150.96 million rows, 12.78 GB (7.53 million rows/s., 637.25 MB/s.)
```
9. Here are the average review ratings per month for each product (an actual [Amazon job interview question](https://datalemur.com/questions/sql-avg-review-ratings)!):
@ -261,7 +218,8 @@ It calculates all the monthly averages for each product, but we only returned 20
│ 2015-08-01 │ The Birds of West Africa (Collins Field Guides) │ 4 │
└────────────┴─────────────────────────────────────────────────────────────────────────────────────────┴───────────┘
20 rows in set. Elapsed: 52.827 sec. Processed 251.46 million rows, 35.26 GB (4.76 million rows/s., 667.55 MB/s.)
20 rows in set. Elapsed: 43.055 sec. Processed 150.96 million rows, 13.24 GB (3.51 million rows/s., 307.41 MB/s.)
Peak memory usage: 41.73 GiB.
```
10. Here are the total number of votes per product category. This query is fast because `product_category` is in the primary key:
@ -272,7 +230,8 @@ SELECT
product_category
FROM amazon_reviews
GROUP BY product_category
ORDER BY 1 DESC;
ORDER BY 1 DESC
FORMAT PrettyCompactMonoBlock;
```
```response
@ -322,7 +281,7 @@ ORDER BY 1 DESC;
│ 72970 │ Gift Card │
└──────────────────┴──────────────────────────┘
43 rows in set. Elapsed: 0.423 sec. Processed 150.96 million rows, 756.20 MB (356.70 million rows/s., 1.79 GB/s.)
43 rows in set. Elapsed: 0.201 sec. Processed 150.96 million rows, 754.79 MB (750.85 million rows/s., 3.75 GB/s.)
```
11. Let's find the products with the word **"awful"** occurring most frequently in the review. This is a big task - over 151M strings have to be parsed looking for a single word:
@ -340,7 +299,7 @@ ORDER BY count DESC
LIMIT 50;
```
The query takes a couple of minutes, but the results are a fun read:
The query only takes 4 seconds - which is impressive - and the results are a fun read:
```response
@ -363,41 +322,42 @@ The query takes a couple of minutes, but the results are a fun read:
│ B00N28818A │ Amazon Prime Video │ 1.4305555555555556 │ 72 │
│ B007FTE2VW │ SimCity - Limited Edition │ 1.2794117647058822 │ 68 │
│ 0439023513 │ Mockingjay (The Hunger Games) │ 2.6417910447761193 │ 67 │
│ B00178630A │ Diablo III - PC/Mac │ 1.671875 │ 64 │
│ B000OCEWGW │ Liquid Ass │ 4.8125 │ 64 │
│ B00178630A │ Diablo III - PC/Mac │ 1.671875 │ 64 │
│ B005ZOBNOI │ The Fault in Our Stars │ 4.316666666666666 │ 60 │
│ B00L9B7IKE │ The Girl on the Train: A Novel │ 2.0677966101694913 │ 59 │
│ B007S6Y6VS │ Garden of Life Raw Organic Meal │ 2.8793103448275863 │ 58 │
│ B0064X7B4A │ Words With Friends │ 2.2413793103448274 │ 58 │
│ B003WUYPPG │ Unbroken: A World War II Story of Survival, Resilience, and Redemption │ 4.620689655172414 │ 58 │
│ B00006HBUJ │ Star Wars: Episode II - Attack of the Clones (Widescreen Edition) │ 2.2982456140350878 │ 57 │
│ B0064X7B4A │ Words With Friends │ 2.2413793103448274 │ 58 │
│ B007S6Y6VS │ Garden of Life Raw Organic Meal │ 2.8793103448275863 │ 58 │
│ B000XUBFE2 │ The Book Thief │ 4.526315789473684 │ 57 │
│ B00006HBUJ │ Star Wars: Episode II - Attack of the Clones (Widescreen Edition) │ 2.2982456140350878 │ 57 │
│ B0006399FS │ How to Dismantle an Atomic Bomb │ 1.9821428571428572 │ 56 │
│ B003ZSJ212 │ Star Wars: The Complete Saga (Episodes I-VI) (Packaging May Vary) [Blu-ray] │ 2.309090909090909 │ 55 │
│ 193700788X │ Dead Ever After (Sookie Stackhouse/True Blood) │ 1.5185185185185186 │ 54 │
│ B004FYEZMQ │ Mass Effect 3 │ 2.056603773584906 │ 53 │
│ B000CFYAMC │ The Room │ 3.9615384615384617 │ 52 │
│ B0031JK95S │ Garden of Life Raw Organic Meal │ 3.3137254901960786 │ 51 │
│ B0012JY4G4 │ Color Oops Hair Color Remover Extra Strength 1 Each │ 3.9019607843137254 │ 51 │
│ B007VTVRFA │ SimCity - Limited Edition │ 1.2040816326530612 │ 49
│ B0031JK95S │ Garden of Life Raw Organic Meal │ 3.3137254901960786 │ 51
│ B00CE18P0K │ Pilot │ 1.7142857142857142 │ 49 │
│ B007VTVRFA │ SimCity - Limited Edition │ 1.2040816326530612 │ 49 │
│ 0316015849 │ Twilight (The Twilight Saga, Book 1) │ 1.8979591836734695 │ 49 │
│ B00DR0PDNE │ Google Chromecast HDMI Streaming Media Player │ 2.5416666666666665 │ 48 │
│ B000056OWC │ The First Years: 4-Stage Bath System │ 1.2127659574468086 │ 47 │
│ B007IXWKUK │ Fifty Shades Darker (Fifty Shades, Book 2) │ 1.6304347826086956 │ 46 │
│ 1892112000 │ To Train Up a Child │ 1.4130434782608696 │ 46 │
│ 043935806X │ Harry Potter and the Order of the Phoenix (Book 5) │ 3.977272727272727 │ 44 │
│ B00BGO0Q9O │ Fitbit Flex Wireless Wristband with Sleep Function, Black │ 1.9318181818181819 │ 44 │
│ B003XF1XOQ │ Mockingjay (Hunger Games Trilogy, Book 3) │ 2.772727272727273 │ 44 │
│ B00DD2B52Y │ Spring Breakers │ 1.2093023255813953 │ 43
│ B00BGO0Q9O │ Fitbit Flex Wireless Wristband with Sleep Function, Black │ 1.9318181818181819 │ 44
│ B0064X7FVE │ The Weather Channel: Forecast, Radar & Alerts │ 1.5116279069767442 │ 43 │
│ B0083PWAPW │ Kindle Fire HD 7", Dolby Audio, Dual-Band Wi-Fi │ 2.627906976744186 │ 43 │
│ B00DD2B52Y │ Spring Breakers │ 1.2093023255813953 │ 43 │
│ B00192KCQ0 │ Death Magnetic │ 3.5714285714285716 │ 42 │
│ B007S6Y74O │ Garden of Life Raw Organic Meal │ 3.292682926829268 │ 41 │
│ B004CFA9RS │ Divergent (Divergent Trilogy, Book 1) │ 3.1219512195121952 │ 41 │
│ B0052QYLUM │ Infant Optics DXR-5 Portable Video Baby Monitor │ 2.1463414634146343 │ 41 │
└────────────┴──────────────────────────────────────────────────────────────────────────────────────────┴────────────────────┴───────┘
50 rows in set. Elapsed: 60.052 sec. Processed 150.96 million rows, 68.93 GB (2.51 million rows/s., 1.15 GB/s.)
50 rows in set. Elapsed: 4.072 sec. Processed 150.96 million rows, 68.93 GB (37.07 million rows/s., 16.93 GB/s.)
Peak memory usage: 1.82 GiB.
```
12. We can run the same query again, except this time we search for **awesome** in the reviews:
@ -415,8 +375,6 @@ ORDER BY count DESC
LIMIT 50;
```
It runs quite a bit faster - which means the cache is helping us out here:
```response
┌─product_id─┬─any(product_title)────────────────────────────────────────────────────┬───avg(star_rating)─┬─count─┐
@ -449,8 +407,8 @@ It runs quite a bit faster - which means the cache is helping us out here:
│ B008JK6W5K │ Logo Quiz │ 4.782106782106782 │ 693 │
│ B00EDTSKLU │ Geometry Dash │ 4.942028985507246 │ 690 │
│ B00CSR2J9I │ Hill Climb Racing │ 4.880059970014993 │ 667 │
│ B005ZXWMUS │ Netflix │ 4.722306525037936 │ 659 │
│ B00CRFAAYC │ Fab Tattoo Artist FREE │ 4.907435508345979 │ 659 │
│ B005ZXWMUS │ Netflix │ 4.722306525037936 │ 659 │
│ B00DHQHQCE │ Battle Beach │ 4.863287250384024 │ 651 │
│ B00BGA9WK2 │ PlayStation 4 500GB Console [Old Model] │ 4.688751926040061 │ 649 │
│ B008Y7SMQU │ Logo Quiz - Fun Plus Free │ 4.7888 │ 625 │
@ -472,5 +430,6 @@ It runs quite a bit faster - which means the cache is helping us out here:
│ B00G6ZTM3Y │ Terraria │ 4.728421052631579 │ 475 │
└────────────┴───────────────────────────────────────────────────────────────────────┴────────────────────┴───────┘
50 rows in set. Elapsed: 33.954 sec. Processed 150.96 million rows, 68.95 GB (4.45 million rows/s., 2.03 GB/s.)
50 rows in set. Elapsed: 4.079 sec. Processed 150.96 million rows, 68.95 GB (37.01 million rows/s., 16.90 GB/s.)
Peak memory usage: 2.18 GiB.
```

View File

@ -74,6 +74,7 @@ The supported formats are:
| [ArrowStream](#data-format-arrow-stream) | ✔ | ✔ |
| [ORC](#data-format-orc) | ✔ | ✔ |
| [One](#data-format-one) | ✔ | ✗ |
| [Npy](#data-format-npy) | ✔ | ✗ |
| [RowBinary](#rowbinary) | ✔ | ✔ |
| [RowBinaryWithNames](#rowbinarywithnamesandtypes) | ✔ | ✔ |
| [RowBinaryWithNamesAndTypes](#rowbinarywithnamesandtypes) | ✔ | ✔ |
@ -2454,6 +2455,50 @@ Result:
└──────────────┘
```
## Npy {#data-format-npy}
This function is designed to load a NumPy array from a .npy file into ClickHouse. The NumPy file format is a binary format used for efficiently storing arrays of numerical data. During import, ClickHouse treats top level dimension as an array of rows with single column. Supported Npy data types and their corresponding type in ClickHouse:
| Npy type | ClickHouse type |
|:--------:|:---------------:|
| b1 | UInt8 |
| i1 | Int8 |
| i2 | Int16 |
| i4 | Int32 |
| i8 | Int64 |
| u1 | UInt8 |
| u2 | UInt16 |
| u4 | UInt32 |
| u8 | UInt64 |
| f4 | Float32 |
| f8 | Float64 |
| S | String |
| U | String |
**Example of saving an array in .npy format using Python**
```Python
import numpy as np
arr = np.array([[[1],[2],[3]],[[4],[5],[6]]])
np.save('example_array.npy', arr)
```
**Example of reading a NumPy file in ClickHouse**
Query:
```sql
SELECT *
FROM file('example_array.npy', Npy)
```
Result:
```
┌─array─────────┐
│ [[1],[2],[3]] │
│ [[4],[5],[6]] │
└───────────────┘
```
## LineAsString {#lineasstring}
In this format, every line of input data is interpreted as a single string value. This format can only be parsed for table with a single field of type [String](/docs/en/sql-reference/data-types/string.md). The remaining columns must be set to [DEFAULT](/docs/en/sql-reference/statements/create/table.md/#default) or [MATERIALIZED](/docs/en/sql-reference/statements/create/table.md/#materialized), or omitted.

View File

@ -1862,6 +1862,10 @@ The trailing slash is mandatory.
## Prometheus {#prometheus}
:::note
ClickHouse Cloud does not currently support connecting to Prometheus. To be notified when this feature is supported, please contact support@clickhouse.com.
:::
Exposing metrics data for scraping from [Prometheus](https://prometheus.io).
Settings:
@ -2797,3 +2801,7 @@ Proxy settings are determined in the following order:
ClickHouse will check the highest priority resolver type for the request protocol. If it is not defined,
it will check the next highest priority resolver type, until it reaches the environment resolver.
This also allows a mix of resolver types can be used.
### disable_tunneling_for_https_requests_over_http_proxy {#disable_tunneling_for_https_requests_over_http_proxy}
By default, tunneling (i.e, `HTTP CONNECT`) is used to make `HTTPS` requests over `HTTP` proxy. This setting can be used to disable it.

View File

@ -106,6 +106,15 @@ Possible values:
Default value: 0.
## max_bytes_before_external_sort {#settings-max_bytes_before_external_sort}
Enables or disables execution of `ORDER BY` clauses in external memory. See [ORDER BY Implementation Details](../../sql-reference/statements/select/order-by.md#implementation-details)
- Maximum volume of RAM (in bytes) that can be used by the single [ORDER BY](../../sql-reference/statements/select/order-by.md) operation. Recommended value is half of available system memory
- 0 — `ORDER BY` in external memory disabled.
Default value: 0.
## max_rows_to_sort {#max-rows-to-sort}
A maximum number of rows before sorting. This allows you to limit memory consumption when sorting.
@ -163,7 +172,27 @@ If you set `timeout_before_checking_execution_speed `to 0, ClickHouse will use c
## timeout_overflow_mode {#timeout-overflow-mode}
What to do if the query is run longer than max_execution_time: throw or break. By default, throw.
What to do if the query is run longer than `max_execution_time`: `throw` or `break`. By default, `throw`.
# max_execution_time_leaf
Similar semantic to `max_execution_time` but only apply on leaf node for distributed or remote queries.
For example, if we want to limit execution time on leaf node to `10s` but no limit on the initial node, instead of having `max_execution_time` in the nested subquery settings:
``` sql
SELECT count() FROM cluster(cluster, view(SELECT * FROM t SETTINGS max_execution_time = 10));
```
We can use `max_execution_time_leaf` as the query settings:
``` sql
SELECT count() FROM cluster(cluster, view(SELECT * FROM t)) SETTINGS max_execution_time_leaf = 10;
```
# timeout_overflow_mode_leaf
What to do when the query in leaf node run longer than `max_execution_time_leaf`: `throw` or `break`. By default, `throw`.
## min_execution_speed {#min-execution-speed}

View File

@ -2473,7 +2473,7 @@ See also:
- [distributed_replica_error_cap](#distributed_replica_error_cap)
- [distributed_replica_error_half_life](#settings-distributed_replica_error_half_life)
## distributed_directory_monitor_sleep_time_ms {#distributed_directory_monitor_sleep_time_ms}
## distributed_background_insert_sleep_time_ms {#distributed_background_insert_sleep_time_ms}
Base interval for the [Distributed](../../engines/table-engines/special/distributed.md) table engine to send data. The actual interval grows exponentially in the event of errors.
@ -2483,9 +2483,9 @@ Possible values:
Default value: 100 milliseconds.
## distributed_directory_monitor_max_sleep_time_ms {#distributed_directory_monitor_max_sleep_time_ms}
## distributed_background_insert_max_sleep_time_ms {#distributed_background_insert_max_sleep_time_ms}
Maximum interval for the [Distributed](../../engines/table-engines/special/distributed.md) table engine to send data. Limits exponential growth of the interval set in the [distributed_directory_monitor_sleep_time_ms](#distributed_directory_monitor_sleep_time_ms) setting.
Maximum interval for the [Distributed](../../engines/table-engines/special/distributed.md) table engine to send data. Limits exponential growth of the interval set in the [distributed_background_insert_sleep_time_ms](#distributed_background_insert_sleep_time_ms) setting.
Possible values:
@ -2493,7 +2493,7 @@ Possible values:
Default value: 30000 milliseconds (30 seconds).
## distributed_directory_monitor_batch_inserts {#distributed_directory_monitor_batch_inserts}
## distributed_background_insert_batch {#distributed_background_insert_batch}
Enables/disables inserted data sending in batches.
@ -2506,13 +2506,13 @@ Possible values:
Default value: 0.
## distributed_directory_monitor_split_batch_on_failure {#distributed_directory_monitor_split_batch_on_failure}
## distributed_background_insert_split_batch_on_failure {#distributed_background_insert_split_batch_on_failure}
Enables/disables splitting batches on failures.
Sometimes sending particular batch to the remote shard may fail, because of some complex pipeline after (i.e. `MATERIALIZED VIEW` with `GROUP BY`) due to `Memory limit exceeded` or similar errors. In this case, retrying will not help (and this will stuck distributed sends for the table) but sending files from that batch one by one may succeed INSERT.
So installing this setting to `1` will disable batching for such batches (i.e. temporary disables `distributed_directory_monitor_batch_inserts` for failed batches).
So installing this setting to `1` will disable batching for such batches (i.e. temporary disables `distributed_background_insert_batch` for failed batches).
Possible values:
@ -2695,15 +2695,15 @@ Possible values:
Default value: 0.
## insert_distributed_sync {#insert_distributed_sync}
## distributed_foreground_insert {#distributed_foreground_insert}
Enables or disables synchronous data insertion into a [Distributed](../../engines/table-engines/special/distributed.md/#distributed) table.
By default, when inserting data into a `Distributed` table, the ClickHouse server sends data to cluster nodes in asynchronous mode. When `insert_distributed_sync=1`, the data is processed synchronously, and the `INSERT` operation succeeds only after all the data is saved on all shards (at least one replica for each shard if `internal_replication` is true).
By default, when inserting data into a `Distributed` table, the ClickHouse server sends data to cluster nodes in background mode. When `distributed_foreground_insert=1`, the data is processed synchronously, and the `INSERT` operation succeeds only after all the data is saved on all shards (at least one replica for each shard if `internal_replication` is true).
Possible values:
- 0 — Data is inserted in asynchronous mode.
- 0 — Data is inserted in background mode.
- 1 — Data is inserted in synchronous mode.
Default value: `0`.
@ -2762,7 +2762,7 @@ Result:
## use_compact_format_in_distributed_parts_names {#use_compact_format_in_distributed_parts_names}
Uses compact format for storing blocks for async (`insert_distributed_sync`) INSERT into tables with `Distributed` engine.
Uses compact format for storing blocks for background (`distributed_foreground_insert`) INSERT into tables with `Distributed` engine.
Possible values:
@ -2772,7 +2772,7 @@ Possible values:
Default value: `1`.
:::note
- with `use_compact_format_in_distributed_parts_names=0` changes from cluster definition will not be applied for async INSERT.
- with `use_compact_format_in_distributed_parts_names=0` changes from cluster definition will not be applied for background INSERT.
- with `use_compact_format_in_distributed_parts_names=1` changing the order of the nodes in the cluster definition, will change the `shard_index`/`replica_index` so be aware.
:::
@ -3310,22 +3310,11 @@ Possible values:
Default value: `0`.
## use_mysql_types_in_show_columns {#use_mysql_types_in_show_columns}
Show the names of MySQL data types corresponding to ClickHouse data types in [SHOW COLUMNS](../../sql-reference/statements/show.md#show_columns).
Possible values:
- 0 - Show names of native ClickHouse data types.
- 1 - Show names of MySQL data types corresponding to ClickHouse data types.
Default value: `0`.
## mysql_map_string_to_text_in_show_columns {#mysql_map_string_to_text_in_show_columns}
When enabled, [String](../../sql-reference/data-types/string.md) ClickHouse data type will be displayed as `TEXT` in [SHOW COLUMNS](../../sql-reference/statements/show.md#show_columns).
Has effect only when [use_mysql_types_in_show_columns](#use_mysql_types_in_show_columns) is enabled.
Has an effect only when the connection is made through the MySQL wire protocol.
- 0 - Use `BLOB`.
- 1 - Use `TEXT`.
@ -3336,7 +3325,7 @@ Default value: `0`.
When enabled, [FixedString](../../sql-reference/data-types/fixedstring.md) ClickHouse data type will be displayed as `TEXT` in [SHOW COLUMNS](../../sql-reference/statements/show.md#show_columns).
Has effect only when [use_mysql_types_in_show_columns](#use_mysql_types_in_show_columns) is enabled.
Has an effect only when the connection is made through the MySQL wire protocol.
- 0 - Use `BLOB`.
- 1 - Use `TEXT`.
@ -3944,6 +3933,16 @@ Possible values:
Default value: `0`.
## force_optimize_projection_name {#force-optimize-projection_name}
If it is set to a non-empty string, check that this projection is used in the query at least once.
Possible values:
- string: name of projection that used in a query
Default value: `''`.
## alter_sync {#alter-sync}
Allows to set up waiting for actions to be executed on replicas by [ALTER](../../sql-reference/statements/alter/index.md), [OPTIMIZE](../../sql-reference/statements/optimize.md) or [TRUNCATE](../../sql-reference/statements/truncate.md) queries.
@ -3956,6 +3955,10 @@ Possible values:
Default value: `1`.
:::note
`alter_sync` is applicable to `Replicated` tables only, it does nothing to alters of not `Replicated` tables.
:::
## replication_wait_for_inactive_replica_timeout {#replication-wait-for-inactive-replica-timeout}
Specifies how long (in seconds) to wait for inactive replicas to execute [ALTER](../../sql-reference/statements/alter/index.md), [OPTIMIZE](../../sql-reference/statements/optimize.md) or [TRUNCATE](../../sql-reference/statements/truncate.md) queries.
@ -4780,6 +4783,10 @@ a Tuple(
)
```
## analyze_index_with_space_filling_curves
If a table has a space-filling curve in its index, e.g. `ORDER BY mortonEncode(x, y)`, and the query has conditions on its arguments, e.g. `x >= 10 AND x <= 20 AND y >= 20 AND y <= 30`, use the space-filling curve for index analysis.
## dictionary_use_async_executor {#dictionary_use_async_executor}
Execute a pipeline for reading dictionary source in several threads. It's supported only by dictionaries with local CLICKHOUSE source.
@ -4794,3 +4801,10 @@ LIFETIME(MIN 0 MAX 3600)
LAYOUT(COMPLEX_KEY_HASHED_ARRAY())
SETTINGS(dictionary_use_async_executor=1, max_threads=8);
```
## storage_metadata_write_full_object_key {#storage_metadata_write_full_object_key}
When set to `true` the metadata files are written with `VERSION_FULL_OBJECT_KEY` format version. With that format full object storage key names are written to the metadata files.
When set to `false` the metadata files are written with the previous format version, `VERSION_INLINE_DATA`. With that format only suffixes of object storage key names are are written to the metadata files. The prefix for all of object storage key names is set in configurations files at `storage_configuration.disks` section.
Default value: `false`.

View File

@ -78,6 +78,11 @@ If procfs is supported and enabled on the system, ClickHouse server collects the
- `OSReadBytes`
- `OSWriteBytes`
:::note
`OSIOWaitMicroseconds` is disabled by default in Linux kernels starting from 5.14.x.
You can enable it using `sudo sysctl kernel.task_delayacct=1` or by creating a `.conf` file in `/etc/sysctl.d/` with `kernel.task_delayacct = 1`
:::
## Related content
- Blog: [System Tables and a window into the internals of ClickHouse](https://clickhouse.com/blog/clickhouse-debugging-issues-with-system-tables)

View File

@ -35,27 +35,25 @@ WITH arrayMap(x -> demangle(addressToSymbol(x)), trace) AS all SELECT thread_nam
``` text
Row 1:
──────
thread_name: clickhouse-serv
thread_id: 686
query_id: 1a11f70b-626d-47c1-b948-f9c7b206395d
res: sigqueue
DB::StorageSystemStackTrace::fillData(std::__1::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::__1::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, DB::Context const&, DB::SelectQueryInfo const&) const
DB::IStorageSystemOneBlock<DB::StorageSystemStackTrace>::read(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, DB::SelectQueryInfo const&, DB::Context const&, DB::QueryProcessingStage::Enum, unsigned long, unsigned int)
DB::InterpreterSelectQuery::executeFetchColumns(DB::QueryProcessingStage::Enum, DB::QueryPipeline&, std::__1::shared_ptr<DB::PrewhereInfo> const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&)
DB::InterpreterSelectQuery::executeImpl(DB::QueryPipeline&, std::__1::shared_ptr<DB::IBlockInputStream> const&, std::__1::optional<DB::Pipe>)
DB::InterpreterSelectQuery::execute()
DB::InterpreterSelectWithUnionQuery::execute()
DB::executeQueryImpl(char const*, char const*, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool, DB::ReadBuffer*)
DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool)
DB::TCPHandler::runImpl()
DB::TCPHandler::run()
Poco::Net::TCPServerConnection::start()
Poco::Net::TCPServerDispatcher::run()
Poco::PooledThread::run()
Poco::ThreadImpl::runnableEntry(void*)
start_thread
__clone
thread_name: QueryPipelineEx
thread_id: 743490
query_id: dc55a564-febb-4e37-95bb-090ef182c6f1
res: memcpy
large_ralloc
arena_ralloc
do_rallocx
Allocator<true, true>::realloc(void*, unsigned long, unsigned long, unsigned long)
HashTable<unsigned long, HashMapCell<unsigned long, char*, HashCRC32<unsigned long>, HashTableNoState, PairNoInit<unsigned long, char*>>, HashCRC32<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>>::resize(unsigned long, unsigned long)
void DB::Aggregator::executeImplBatch<false, false, true, DB::AggregationMethodOneNumber<unsigned long, HashMapTable<unsigned long, HashMapCell<unsigned long, char*, HashCRC32<unsigned long>, HashTableNoState, PairNoInit<unsigned long, char*>>, HashCRC32<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>>, true, false>>(DB::AggregationMethodOneNumber<unsigned long, HashMapTable<unsigned long, HashMapCell<unsigned long, char*, HashCRC32<unsigned long>, HashTableNoState, PairNoInit<unsigned long, char*>>, HashCRC32<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>>, true, false>&, DB::AggregationMethodOneNumber<unsigned long, HashMapTable<unsigned long, HashMapCell<unsigned long, char*, HashCRC32<unsigned long>, HashTableNoState, PairNoInit<unsigned long, char*>>, HashCRC32<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>>, true, false>::State&, DB::Arena*, unsigned long, unsigned long, DB::Aggregator::AggregateFunctionInstruction*, bool, char*) const
DB::Aggregator::executeImpl(DB::AggregatedDataVariants&, unsigned long, unsigned long, std::__1::vector<DB::IColumn const*, std::__1::allocator<DB::IColumn const*>>&, DB::Aggregator::AggregateFunctionInstruction*, bool, bool, char*) const
DB::Aggregator::executeOnBlock(std::__1::vector<COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::__1::allocator<COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>, unsigned long, unsigned long, DB::AggregatedDataVariants&, std::__1::vector<DB::IColumn const*, std::__1::allocator<DB::IColumn const*>>&, std::__1::vector<std::__1::vector<DB::IColumn const*, std::__1::allocator<DB::IColumn const*>>, std::__1::allocator<std::__1::vector<DB::IColumn const*, std::__1::allocator<DB::IColumn const*>>>>&, bool&) const
DB::AggregatingTransform::work()
DB::ExecutionThreadContext::executeTask()
DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*)
void std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreads()::$_0, void ()>>(std::__1::__function::__policy_storage const*)
ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::worker(std::__1::__list_iterator<ThreadFromGlobalPoolImpl<false>, void*>)
void std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<void ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::scheduleImpl<void>(std::__1::function<void ()>, Priority, std::__1::optional<unsigned long>, bool)::'lambda0'()>(void&&)::'lambda'(), void ()>>(std::__1::__function::__policy_storage const*)
void* std::__1::__thread_proxy[abi:v15000]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, Priority, std::__1::optional<unsigned long>, bool)::'lambda0'()>>(void*)
```
Getting filenames and line numbers in ClickHouse source code:

View File

@ -115,7 +115,7 @@ Parameters:
<settings>
<connect_timeout>3</connect_timeout>
<!-- Sync insert is set forcibly, leave it here just in case. -->
<insert_distributed_sync>1</insert_distributed_sync>
<distributed_foreground_insert>1</distributed_foreground_insert>
</settings>
<!-- Copying tasks description.

View File

@ -12,6 +12,7 @@ A client application to interact with clickhouse-keeper by its native protocol.
- `-q QUERY`, `--query=QUERY` — Query to execute. If this parameter is not passed, `clickhouse-keeper-client` will start in interactive mode.
- `-h HOST`, `--host=HOST` — Server host. Default value: `localhost`.
- `-p N`, `--port=N` — Server port. Default value: 9181
- `-c FILE_PATH`, `--config-file=FILE_PATH` — Set path of config file to get the connection string. Default value: `config.xml`.
- `--connection-timeout=TIMEOUT` — Set connection timeout in seconds. Default value: 10s.
- `--session-timeout=TIMEOUT` — Set session timeout in seconds. Default value: 10s.
- `--operation-timeout=TIMEOUT` — Set operation timeout in seconds. Default value: 10s.

View File

@ -44,7 +44,7 @@ INSERT INTO map_map VALUES
('2000-01-01', '2000-01-01 00:00:00', (['c', 'd', 'e'], [10, 10, 10])),
('2000-01-01', '2000-01-01 00:01:00', (['d', 'e', 'f'], [10, 10, 10])),
('2000-01-01', '2000-01-01 00:01:00', (['f', 'g', 'g'], [10, 10, 10]));
SELECT
timeslot,
sumMap(status),
@ -317,6 +317,15 @@ FROM people
└────────┴───────────────────────────┘
```
## -ArgMin
The suffix -ArgMin can be appended to the name of any aggregate function. In this case, the aggregate function accepts an additional argument, which should be any comparable expression. The aggregate function processes only the rows that have the minimum value for the specified extra expression.
Examples: `sumArgMin(column, expr)`, `countArgMin(expr)`, `avgArgMin(x, expr)` and so on.
## -ArgMax
Similar to suffix -ArgMin but processes only the rows that have the maximum value for the specified extra expression.
## Related Content

View File

@ -60,7 +60,7 @@ SELECT largestTriangleThreeBuckets(4)(x, y) FROM largestTriangleThreeBuckets_tes
Result:
``` text
┌────────largestTriangleThreeBuckets(3)(x, y)───────────┐
┌────────largestTriangleThreeBuckets(4)(x, y)───────────┐
│ [(1,10),(3,15),(5,40),(10,70)] │
└───────────────────────────────────────────────────────┘
```

View File

@ -899,11 +899,11 @@ Other types are not supported yet. The function returns the attribute for the pr
Data must completely fit into RAM.
## Dictionary Updates {#dictionary-updates}
## Refreshing dictionary data using LIFETIME {#lifetime}
ClickHouse periodically updates the dictionaries. The update interval for fully downloaded dictionaries and the invalidation interval for cached dictionaries are defined in the `lifetime` tag in seconds.
ClickHouse periodically updates dictionaries based on the `LIFETIME` tag (defined in seconds). `LIFETIME` is the update interval for fully downloaded dictionaries and the invalidation interval for cached dictionaries.
Dictionary updates (other than loading for first use) do not block queries. During updates, the old version of a dictionary is used. If an error occurs during an update, the error is written to the server log, and queries continue using the old version of dictionaries.
During updates, the old version of a dictionary can still be queried. Dictionary updates (other than when loading the dictionary for first use) do not block queries. If an error occurs during an update, the error is written to the server log and queries can continue using the old version of the dictionary. If a dictionary update is successful, the old version of the dictionary is replaced atomically.
Example of settings:

View File

@ -2172,80 +2172,6 @@ Result:
└─────────────────────┘
```
## arrayRandomSample
Function `arrayRandomSample` returns a subset with `samples`-many random elements of an input array. If `samples` exceeds the size of the input array, the sample size is limited to the size of the array. In this case, all elements of the input array are returned, but the order is not guaranteed. The function can handle both flat arrays and nested arrays.
**Syntax**
```sql
arrayRandomSample(arr, samples)
```
**Arguments**
- `arr` — The input array from which to sample elements. This may be flat or nested arrays.
- `samples` — An unsigned integer specifying the number of elements to include in the random sample.
**Returned Value**
- An array containing a random sample of elements from the input array.
**Examples**
Query:
```sql
SELECT arrayRandomSample(['apple', 'banana', 'cherry', 'date'], 2) as res;
```
Result:
```
┌─res────────────────┐
│ ['banana','apple'] │
└────────────────────┘
```
Query:
```sql
SELECT arrayRandomSample([[1, 2], [3, 4], [5, 6]], 2) as res;
```
Result:
```
┌─res───────────┐
│ [[3,4],[5,6]] │
└───────────────┘
```
Query:
```sql
SELECT arrayRandomSample([1, 2, 3, 4, 5], 0) as res;
```
Result:
```
┌─res─┐
│ [] │
└─────┘
```
Query:
```sql
SELECT arrayRandomSample([1, 2, 3], 5) as res;
```
Result:
```
┌─res─────┐
│ [3,1,2] │
└─────────┘
```
## Distance functions
All supported functions are described in [distance functions documentation](../../sql-reference/functions/distance-functions.md).

View File

@ -1557,10 +1557,10 @@ Returns for a given date, the number of days passed since [1 January 0000](https
toDaysSinceYearZero(date[, time_zone])
```
Aliases: `TO_DAYS`
Alias: `TO_DAYS`
**Arguments**
- `date` — The date to calculate the number of days passed since year zero from. [Date](../../sql-reference/data-types/date.md), [Date32](../../sql-reference/data-types/date32.md), [DateTime](../../sql-reference/data-types/datetime.md) or [DateTime64](../../sql-reference/data-types/datetime64.md).
- `time_zone` — A String type const value or a expression represent the time zone. [String types](../../sql-reference/data-types/string.md)
@ -1584,6 +1584,56 @@ Result:
└────────────────────────────────────────────┘
```
**See Also**
- [fromDaysSinceYearZero](#fromDaysSinceYearZero)
## fromDaysSinceYearZero
Returns for a given number of days passed since [1 January 0000](https://en.wikipedia.org/wiki/Year_zero) the corresponding date in the [proleptic Gregorian calendar defined by ISO 8601](https://en.wikipedia.org/wiki/Gregorian_calendar#Proleptic_Gregorian_calendar). The calculation is the same as in MySQL's [`FROM_DAYS()`](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_from-days) function.
The result is undefined if it cannot be represented within the bounds of the [Date](../../sql-reference/data-types/date.md) type.
**Syntax**
``` sql
fromDaysSinceYearZero(days)
```
Alias: `FROM_DAYS`
**Arguments**
- `days` — The number of days passed since year zero.
**Returned value**
The date corresponding to the number of days passed since year zero.
Type: [Date](../../sql-reference/data-types/date.md).
**Example**
``` sql
SELECT fromDaysSinceYearZero(739136), fromDaysSinceYearZero(toDaysSinceYearZero(toDate('2023-09-08')));
```
Result:
``` text
┌─fromDaysSinceYearZero(739136)─┬─fromDaysSinceYearZero(toDaysSinceYearZero(toDate('2023-09-08')))─┐
│ 2023-09-08 │ 2023-09-08 │
└───────────────────────────────┴──────────────────────────────────────────────────────────────────┘
```
**See Also**
- [toDaysSinceYearZero](#toDaysSinceYearZero)
## fromDaysSinceYearZero32
Like [fromDaysSinceYearZero](#fromDaysSinceYearZero) but returns a [Date32](../../sql-reference/data-types/date32.md).
## age
Returns the `unit` component of the difference between `startdate` and `enddate`. The difference is calculated using a precision of 1 microsecond.
@ -2016,7 +2066,7 @@ Result:
## addDate
Adds the time interval or date interval to the provided date or date with time.
Adds the time interval to the provided date, date with time or String-encoded date / date with time.
If the addition results in a value outside the bounds of the data type, the result is undefined.
@ -2028,7 +2078,7 @@ addDate(date, interval)
**Arguments**
- `date` — The date or date with time to which `interval` is added. [Date](../../sql-reference/data-types/date.md), [Date32](../../sql-reference/data-types/date32.md), [DateTime](../../sql-reference/data-types/datetime.md) or [DateTime64](../../sql-reference/data-types/datetime64.md).
- `date` — The date or date with time to which `interval` is added. [Date](../../sql-reference/data-types/date.md), [Date32](../../sql-reference/data-types/date32.md), [DateTime](../../sql-reference/data-types/datetime.md), [DateTime64](../../sql-reference/data-types/datetime64.md), or [String](../../sql-reference/data-types/string.md)
- `interval` — Interval to add. [Interval](../../sql-reference/data-types/special-data-types/interval.md).
**Returned value**
@ -2059,7 +2109,7 @@ Alias: `ADDDATE`
## subDate
Subtracts the time interval or date interval from the provided date or date with time.
Subtracts the time interval from the provided date, date with time or String-encoded date / date with time.
If the subtraction results in a value outside the bounds of the data type, the result is undefined.
@ -2071,7 +2121,7 @@ subDate(date, interval)
**Arguments**
- `date` — The date or date with time from which `interval` is subtracted. [Date](../../sql-reference/data-types/date.md), [Date32](../../sql-reference/data-types/date32.md), [DateTime](../../sql-reference/data-types/datetime.md) or [DateTime64](../../sql-reference/data-types/datetime64.md).
- `date` — The date or date with time from which `interval` is subtracted. [Date](../../sql-reference/data-types/date.md), [Date32](../../sql-reference/data-types/date32.md), [DateTime](../../sql-reference/data-types/datetime.md), [DateTime64](../../sql-reference/data-types/datetime64.md), or [String](../../sql-reference/data-types/string.md)
- `interval` — Interval to subtract. [Interval](../../sql-reference/data-types/special-data-types/interval.md).
**Returned value**

View File

@ -68,45 +68,6 @@ WHERE macro = 'test';
└───────┴──────────────┘
```
## getHttpHeader
Returns the value of specified http header.If there is no such header or the request method is not http, it will return empty string.
**Syntax**
```sql
getHttpHeader(name);
```
**Arguments**
- `name` — Http header name .[String](../../sql-reference/data-types/string.md#string)
**Returned value**
Value of the specified header.
Type:[String](../../sql-reference/data-types/string.md#string).
When we use `clickhouse-client` to execute this function, we'll always get empty string, because client doesn't use http protocol.
```sql
SELECT getHttpHeader('test')
```
result:
```text
┌─getHttpHeader('test')─┐
│ │
└───────────────────────┘
```
Try to use http request:
```shell
echo "select getHttpHeader('X-Clickhouse-User')" | curl -H 'X-ClickHouse-User: default' -H 'X-ClickHouse-Key: ' 'http://localhost:8123/' -d @-
#result
default
```
## FQDN
Returns the fully qualified domain name of the ClickHouse server.
@ -2799,10 +2760,13 @@ message Root
Returns a formatted, possibly multi-line, version of the given SQL query.
Throws an exception if the query is not well-formed. To return `NULL` instead, function `formatQueryOrNull()` may be used.
**Syntax**
```sql
formatQuery(query)
formatQueryOrNull(query)
```
**Arguments**
@ -2835,10 +2799,13 @@ WHERE (a > 3) AND (b < 3) │
Like formatQuery() but the returned formatted string contains no line breaks.
Throws an exception if the query is not well-formed. To return `NULL` instead, function `formatQuerySingleLineOrNull()` may be used.
**Syntax**
```sql
formatQuerySingleLine(query)
formatQuerySingleLineOrNull(query)
```
**Arguments**

View File

@ -689,47 +689,47 @@ Calculates the [hamming distance](https://en.wikipedia.org/wiki/Hamming_distance
**Syntax**
```sql
byteHammingDistance(string2, string2)
byteHammingDistance(string1, string2)
```
**Examples**
``` sql
SELECT byteHammingDistance('abc', 'ab') ;
SELECT byteHammingDistance('karolin', 'kathrin');
```
Result:
``` text
┌─byteHammingDistance('abc', 'ab')─┐
1
└──────────────────────────────────┘
┌─byteHammingDistance('karolin', 'kathrin')─┐
3
└───────────────────────────────────────────
```
- Alias: mismatches
Alias: mismatches
## jaccardIndex
## stringJaccardIndex
Calculates the [Jaccard similarity index](https://en.wikipedia.org/wiki/Jaccard_index) between two byte strings.
**Syntax**
```sql
byteJaccardIndex(string1, string2)
stringJaccardIndex(string1, string2)
```
**Examples**
``` sql
SELECT jaccardIndex('clickhouse', 'mouse');
SELECT stringJaccardIndex('clickhouse', 'mouse');
```
Result:
``` text
┌─jaccardIndex('clickhouse', 'mouse')─┐
│ 0.4 │
└─────────────────────────────────────────┘
┌─stringJaccardIndex('clickhouse', 'mouse')─┐
0.4 │
└───────────────────────────────────────────
```
## editDistance
@ -752,8 +752,8 @@ Result:
``` text
┌─editDistance('clickhouse', 'mouse')─┐
6 │
└─────────────────────────────────────────
│ 6 │
└─────────────────────────────────────┘
```
- Alias: levenshteinDistance
Alias: levenshteinDistance

View File

@ -10,10 +10,10 @@ Attaches a table or a dictionary, for example, when moving a database to another
**Syntax**
``` sql
ATTACH TABLE|DICTIONARY [IF NOT EXISTS] [db.]name [ON CLUSTER cluster] ...
ATTACH TABLE|DICTIONARY|DATABASE [IF NOT EXISTS] [db.]name [ON CLUSTER cluster] ...
```
The query does not create data on the disk, but assumes that data is already in the appropriate places, and just adds information about the table or the dictionary to the server. After executing the `ATTACH` query, the server will know about the existence of the table or the dictionary.
The query does not create data on the disk, but assumes that data is already in the appropriate places, and just adds information about the specified table, dictionary or database to the server. After executing the `ATTACH` query, the server will know about the existence of the table, dictionary or database.
If a table was previously detached ([DETACH](../../sql-reference/statements/detach.md) query), meaning that its structure is known, you can use shorthand without defining the structure.
@ -79,3 +79,13 @@ Attaches a previously detached dictionary.
``` sql
ATTACH DICTIONARY [IF NOT EXISTS] [db.]name [ON CLUSTER cluster]
```
## Attach Existing Database
Attaches a previously detached database.
**Syntax**
``` sql
ATTACH DATABASE [IF NOT EXISTS] name [ENGINE=<database engine>] [ON CLUSTER cluster]
```

View File

@ -136,8 +136,30 @@ Output:
└──────────────┴───────────┴──────────────────────────────────────────┘
```
If the checksums.txt file is missing, it can be restored. It will be recalculated and rewritten during the execution of the CHECK TABLE command for the specific partition, and the status will still be reported as 'success.'"
If the checksums.txt file is missing, it can be restored. It will be recalculated and rewritten during the execution of the CHECK TABLE command for the specific partition, and the status will still be reported as 'is_passed = 1'.
You can check all existing `(Replicated)MergeTree` tables at once by using the `CHECK ALL TABLES` query.
```sql
CHECK ALL TABLES
FORMAT PrettyCompactMonoBlock
SETTINGS check_query_single_value_result = 0
```
```text
┌─database─┬─table────┬─part_path───┬─is_passed─┬─message─┐
│ default │ t2 │ all_1_95_3 │ 1 │ │
│ db1 │ table_01 │ all_39_39_0 │ 1 │ │
│ default │ t1 │ all_39_39_0 │ 1 │ │
│ db1 │ t1 │ all_39_39_0 │ 1 │ │
│ db1 │ table_01 │ all_1_6_1 │ 1 │ │
│ default │ t1 │ all_1_6_1 │ 1 │ │
│ db1 │ t1 │ all_1_6_1 │ 1 │ │
│ db1 │ table_01 │ all_7_38_2 │ 1 │ │
│ db1 │ t1 │ all_7_38_2 │ 1 │ │
│ default │ t1 │ all_7_38_2 │ 1 │ │
└──────────┴──────────┴─────────────┴───────────┴─────────┘
```
## If the Data Is Corrupted

View File

@ -228,7 +228,7 @@ hex(hexed): 5A90B714
Calculated columns (synonym). Column of this type are not stored in the table and it is not possible to INSERT values into them.
When SELECT queries explicitly reference columns of this type, the value is computed at query time from `expr`. By default, `SELECT *` excludes ALIAS columns. This behavior can be disabled with setting `asteriks_include_alias_columns`.
When SELECT queries explicitly reference columns of this type, the value is computed at query time from `expr`. By default, `SELECT *` excludes ALIAS columns. This behavior can be disabled with setting `asterisk_include_alias_columns`.
When using the ALTER query to add new columns, old data for these columns is not written. Instead, when reading old data that does not have values for the new columns, expressions are computed on the fly by default. However, if running the expressions requires different columns that are not indicated in the query, these columns will additionally be read, but only for the blocks of data that need it.

View File

@ -4,60 +4,88 @@ sidebar_position: 36
sidebar_label: DELETE
description: Lightweight deletes simplify the process of deleting data from the database.
keywords: [delete]
title: DELETE Statement
title: The Lightweight DELETE Statement
---
The lightweight `DELETE` statement removes rows from the table `[db.]table` that match the expression `expr`. It is only available for the *MergeTree table engine family.
``` sql
DELETE FROM [db.]table [ON CLUSTER cluster] WHERE expr
DELETE FROM [db.]table [ON CLUSTER cluster] WHERE expr;
```
`DELETE FROM` removes rows from the table `[db.]table` that match the expression `expr`. The deleted rows are marked as deleted immediately and will be automatically filtered out of all subsequent queries. Cleanup of data happens asynchronously in the background. This feature is only available for the MergeTree table engine family.
It is called "lightweight `DELETE`" to contrast it to the [ALTER table DELETE](/en/sql-reference/statements/alter/delete) command, which is a heavyweight process.
For example, the following query deletes all rows from the `hits` table where the `Title` column contains the text `hello`:
## Examples
```sql
-- Deletes all rows from the `hits` table where the `Title` column contains the text `hello`
DELETE FROM hits WHERE Title LIKE '%hello%';
```
Lightweight deletes are asynchronous by default. Set `mutations_sync` equal to 1 to wait for one replica to process the statement, and set `mutations_sync` to 2 to wait for all replicas.
## Lightweight `DELETE` does not delete data from storage immediately
With lightweight `DELETE`, deleted rows are internally marked as deleted immediately and will be automatically filtered out of all subsequent queries. However, cleanup of data happens during the next merge. As a result, it is possible that for an unspecified period, data is not actually deleted from storage and is only marked as deleted.
If you need to guarantee that your data is deleted from storage in a predictable time, consider using the [ALTER table DELETE](/en/sql-reference/statements/alter/delete) command. Note that deleting data using `ALTER table DELETE` may consume significant resources as it recreates all affected parts.
## Deleting large amounts of data
Large deletes can negatively affect ClickHouse performance. If you are attempting to delete all rows from a table, consider using the [`TRUNCATE TABLE`](/en/sql-reference/statements/truncate) command.
If you anticipate frequent deletes, consider using a [custom partitioning key](/en/engines/table-engines/mergetree-family/custom-partitioning-key). You can then use the [`ALTER TABLE...DROP PARTITION`](/en/sql-reference/statements/alter/partition#drop-partitionpart) command to quickly drop all rows associated with that partition.
## Limitations of lightweight `DELETE`
### Lightweight `DELETE`s do not work with projections
Currently, `DELETE` does not work for tables with projections. This is because rows in a projection may be affected by a `DELETE` operation and may require the projection to be rebuilt, negatively affecting `DELETE` performance.
## Performance considerations when using lightweight `DELETE`
**Deleting large volumes of data with the lightweight `DELETE` statement can negatively affect SELECT query performance.**
The following can also negatively impact lightweight `DELETE` performance:
- A heavy `WHERE` condition in a `DELETE` query.
- If the mutations queue is filled with many other mutations, this can possibly lead to performance issues as all mutations on a table are executed sequentially.
- The affected table having a very large number of data parts.
- Having a lot of data in compact parts. In a Compact part, all columns are stored in one file.
## Delete permissions
`DELETE` requires the `ALTER DELETE` privilege. To enable `DELETE` statements on a specific table for a given user, run the following command:
:::note
`DELETE FROM` requires the `ALTER DELETE` privilege:
```sql
grant ALTER DELETE ON db.table to username;
GRANT ALTER DELETE ON db.table to username;
```
:::
## Lightweight Delete Internals
## How lightweight DELETEs work internally in ClickHouse
The idea behind Lightweight Delete is that when a `DELETE FROM table ...` query is executed ClickHouse only saves a mask where each row is marked as either “existing” or as “deleted”. Those “deleted” rows become invisible for subsequent queries, but physically the rows are removed only later by subsequent merges. Writing this mask is usually much more lightweight than what is done by `ALTER table DELETE ...` query.
1. A "mask" is applied to affected rows
### How it is implemented
The mask is implemented as a hidden `_row_exists` system column that stores True for all visible rows and False for deleted ones. This column is only present in a part if some rows in this part were deleted. In other words, the column is not persisted when it has all values equal to True.
When a `DELETE FROM table ...` query is executed, ClickHouse saves a mask where each row is marked as either “existing” or as “deleted”. Those “deleted” rows are omitted for subsequent queries. However, rows are actually only removed later by subsequent merges. Writing this mask is much more lightweight than what is done by an `ALTER table DELETE` query.
## SELECT query
When the column is present `SELECT ... FROM table WHERE condition` query internally is extended by an additional predicate on `_row_exists` and becomes similar to
The mask is implemented as a hidden `_row_exists` system column that stores `True` for all visible rows and `False` for deleted ones. This column is only present in a part if some rows in the part were deleted. This column does not exist when a part has all values equal to `True`.
2. `SELECT` queries are transformed to include the mask
When a masked column is used in a query, the `SELECT ... FROM table WHERE condition` query internally is extended by the predicate on `_row_exists` and is transformed to:
```sql
SELECT ... FROM table PREWHERE _row_exists WHERE condition
SELECT ... FROM table PREWHERE _row_exists WHERE condition
```
At execution time the column `_row_exists` is read to figure out which rows are not visible and if there are many deleted rows it can figure out which granules can be fully skipped when reading the rest of the columns.
At execution time, the column `_row_exists` is read to determine which rows should not be returned. If there are many deleted rows, ClickHouse can determine which granules can be fully skipped when reading the rest of the columns.
## DELETE query
`DELETE FROM table WHERE condition` is translated into `ALTER table UPDATE _row_exists = 0 WHERE condition` mutation. Internally this mutation is executed in 2 steps:
1. `SELECT count() FROM table WHERE condition` for each individual part to figure out if the part is affected.
2. Mutate affected parts, and make hardlinks for unaffected parts. Mutating a part in fact only writes `_row_exists` column and just hardlinks all other columns files in the case of Wide parts. But for Compact parts, all columns are rewritten because they all are stored together in one file.
3. `DELETE` queries are transformed to `ALTER table UPDATE` queries
So if we compare Lightweight Delete to `ALTER DELETE` in the first step they both do the same thing to figure out which parts are affected, but in the second step `ALTER DELETE` does much more work because it reads and rewrites all columns files for the affected parts.
The `DELETE FROM table WHERE condition` is translated into an `ALTER table UPDATE _row_exists = 0 WHERE condition` mutation.
With the described implementation now we can see what can negatively affect 'DELETE FROM' execution time:
- Heavy WHERE condition in DELETE query
- Mutations queue filled with other mutations, because all mutations on a table are executed sequentially
- Table having a very large number of data parts
- Having a lot of data in Compact parts—in a Compact part, all columns are stored in one file.
Internally, this mutation is executed in two steps:
:::note
Currently, Lightweight delete does not work for tables with projection as rows in projection may be affected and require the projection to be rebuilt. Rebuilding projection makes the deletion not lightweight, so this is not supported.
:::
1. A `SELECT count() FROM table WHERE condition` command is executed for each individual part to determine if the part is affected.
2. Based on the commands above, affected parts are then mutated, and hardlinks are created for unaffected parts. In the case of wide parts, the `_row_exists` column for each row is updated and all other columns' files are hardlinked. For compact parts, all columns are re-written because they are all stored together in one file.
From the steps above, we can see that lightweight deletes using the masking technique improves performance over traditional `ALTER table DELETE` commands because `ALTER table DELETE` reads and re-writes all the columns' files for affected parts.
## Related content

View File

@ -5,17 +5,17 @@ sidebar_label: DETACH
title: "DETACH Statement"
---
Makes the server "forget" about the existence of a table, a materialized view, or a dictionary.
Makes the server "forget" about the existence of a table, a materialized view, a dictionary, or a database.
**Syntax**
``` sql
DETACH TABLE|VIEW|DICTIONARY [IF EXISTS] [db.]name [ON CLUSTER cluster] [PERMANENTLY] [SYNC]
DETACH TABLE|VIEW|DICTIONARY|DATABASE [IF EXISTS] [db.]name [ON CLUSTER cluster] [PERMANENTLY] [SYNC]
```
Detaching does not delete the data or metadata of a table, a materialized view or a dictionary. If an entity was not detached `PERMANENTLY`, on the next server launch the server will read the metadata and recall the table/view/dictionary again. If an entity was detached `PERMANENTLY`, there will be no automatic recall.
Detaching does not delete the data or metadata of a table, a materialized view, a dictionary or a database. If an entity was not detached `PERMANENTLY`, on the next server launch the server will read the metadata and recall the table/view/dictionary/database again. If an entity was detached `PERMANENTLY`, there will be no automatic recall.
Whether a table or a dictionary was detached permanently or not, in both cases you can reattach them using the [ATTACH](../../sql-reference/statements/attach.md) query.
Whether a table, a dictionary or a database was detached permanently or not, in both cases you can reattach them using the [ATTACH](../../sql-reference/statements/attach.md) query.
System log tables can be also attached back (e.g. `query_log`, `text_log`, etc). Other system tables can't be reattached. On the next server launch the server will recall those tables again.
`ATTACH MATERIALIZED VIEW` does not work with short syntax (without `SELECT`), but you can attach it using the `ATTACH TABLE` query.

View File

@ -207,7 +207,7 @@ The optional keyword `FULL` causes the output to include the collation, comment
The statement produces a result table with the following structure:
- `field` - The name of the column (String)
- `type` - The column data type. If setting `[use_mysql_types_in_show_columns](../../operations/settings/settings.md#use_mysql_types_in_show_columns) = 1` (default: 0), then the equivalent type name in MySQL is shown. (String)
- `type` - The column data type. If the query was made through the MySQL wire protocol, then the equivalent type name in MySQL is shown. (String)
- `null` - `YES` if the column data type is Nullable, `NO` otherwise (String)
- `key` - `PRI` if the column is part of the primary key, `SOR` if the column is part of the sorting key, empty otherwise (String)
- `default` - Default expression of the column if it is of type `ALIAS`, `DEFAULT`, or `MATERIALIZED`, otherwise `NULL`. (Nullable(String))
@ -668,6 +668,15 @@ If either `LIKE` or `ILIKE` clause is specified, the query returns a list of sys
Returns a list of merges. All merges are listed in the [system.merges](../../operations/system-tables/merges.md) table.
- `table` -- Table name.
- `database` -- The name of the database the table is in.
- `estimate_complete` -- The estimated time to complete (in seconds).
- `elapsed` -- The time elapsed (in seconds) since the merge started.
- `progress` -- The percentage of completed work (0-100 percent).
- `is_mutation` -- 1 if this process is a part mutation.
- `size_compressed` -- The total size of the compressed data of the merged parts.
- `memory_usage` -- Memory consumption of the merge process.
**Syntax**
@ -686,10 +695,9 @@ SHOW MERGES;
Result:
```text
┌─table──────┬─database─┬─estimate_complete─┬─────elapsed─┬─progress─┬─is_mutation─┬─size─────┬─mem───────┐
│ your_table │ default │ 0.14 │ 0.365592338 │ 0.73 │ 0 │ 5.40 MiB │ 10.25 MiB │
└────────────┴──────────┴───────────────────┴─────────────┴──────────┴─────────────┴────────────┴─────────┘
┌─table──────┬─database─┬─estimate_complete─┬─elapsed─┬─progress─┬─is_mutation─┬─size_compressed─┬─memory_usage─┐
│ your_table │ default │ 0.14 │ 0.36 │ 73.01 │ 0 │ 5.40 MiB │ 10.25 MiB │
└────────────┴──────────┴───────────────────┴─────────┴──────────┴─────────────┴─────────────────┴──────────────┘
```
Query:
@ -701,9 +709,8 @@ SHOW MERGES LIKE 'your_t%' LIMIT 1;
Result:
```text
┌─table──────┬─database─┬─estimate_complete─┬─────elapsed─┬─progress─┬─is_mutation─┬─size─────┬─mem───────┐
│ your_table │ default │ 0.05 │ 1.727629065 │ 0.97 │ 0 │ 5.40 MiB │ 10.25 MiB │
└────────────┴──────────┴───────────────────┴─────────────┴──────────┴─────────────┴────────────┴─────────┘
┌─table──────┬─database─┬─estimate_complete─┬─elapsed─┬─progress─┬─is_mutation─┬─size_compressed─┬─memory_usage─┐
│ your_table │ default │ 0.14 │ 0.36 │ 73.01 │ 0 │ 5.40 MiB │ 10.25 MiB │
└────────────┴──────────┴───────────────────┴─────────┴──────────┴─────────────┴─────────────────┴──────────────┘
```

View File

@ -166,7 +166,7 @@ Aborts ClickHouse process (like `kill -9 {$ pid_clickhouse-server}`)
## Managing Distributed Tables
ClickHouse can manage [distributed](../../engines/table-engines/special/distributed.md) tables. When a user inserts data into these tables, ClickHouse first creates a queue of the data that should be sent to cluster nodes, then asynchronously sends it. You can manage queue processing with the [STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends), [FLUSH DISTRIBUTED](#query_language-system-flush-distributed), and [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends) queries. You can also synchronously insert distributed data with the [insert_distributed_sync](../../operations/settings/settings.md#insert_distributed_sync) setting.
ClickHouse can manage [distributed](../../engines/table-engines/special/distributed.md) tables. When a user inserts data into these tables, ClickHouse first creates a queue of the data that should be sent to cluster nodes, then asynchronously sends it. You can manage queue processing with the [STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends), [FLUSH DISTRIBUTED](#query_language-system-flush-distributed), and [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends) queries. You can also synchronously insert distributed data with the [distributed_foreground_insert](../../operations/settings/settings.md#distributed_foreground_insert) setting.
### STOP DISTRIBUTED SENDS

View File

@ -7,7 +7,7 @@ keywords: [gcs, bucket]
# gcs Table Function
Provides a table-like interface to select/insert files in [Google Cloud Storage](https://cloud.google.com/storage/).
Provides a table-like interface to `SELECT` and `INSERT` data from [Google Cloud Storage](https://cloud.google.com/storage/). Requires the [`Storage Object User` IAM role](https://cloud.google.com/storage/docs/access-control/iam-roles).
**Syntax**

View File

@ -49,21 +49,9 @@ ClickHouse — полноценная столбцовая СУБД. Данны
Блоки создаются для всех обработанных фрагментов данных. Напоминаем, что одни и те же типы вычислений, имена столбцов и типы переиспользуются в разных блоках и только данные колонок изменяются. Лучше разделить данные и заголовок блока потому, что в блоках маленького размера мы имеем большой оверхэд по временным строкам при копировании умных указателей (`shared_ptrs`) и имен столбцов.
## Потоки блоков (Block Streams) {#block-streams}
## Процессоры
Потоки блоков обрабатывают данные. Мы используем потоки блоков для чтения данных, трансформации или записи данных куда-либо. `IBlockInputStream` предоставляет метод `read` для получения следующего блока, пока это возможно, и метод `write`, чтобы продвигать (push) блок куда-либо.
Потоки отвечают за:
1. Чтение и запись в таблицу. Таблица лишь возвращает поток для чтения или записи блоков.
2. Реализацию форматов данных. Например, при выводе данных в терминал в формате `Pretty`, вы создаете выходной поток блоков, который форматирует поступающие в него блоки.
3. Трансформацию данных. Допустим, у вас есть `IBlockInputStream` и вы хотите создать отфильтрованный поток. Вы создаете `FilterBlockInputStream` и инициализируете его вашим потоком. Затем вы тянете (pull) блоки из `FilterBlockInputStream`, а он тянет блоки исходного потока, фильтрует их и возвращает отфильтрованные блоки вам. Таким образом построены конвейеры выполнения запросов.
Имеются и более сложные трансформации. Например, когда вы тянете блоки из `AggregatingBlockInputStream`, он считывает все данные из своего источника, агрегирует их, и возвращает поток агрегированных данных вам. Другой пример: конструктор `UnionBlockInputStream` принимает множество источников входных данных и число потоков. Такой `Stream` работает в несколько потоков и читает данные источников параллельно.
> Потоки блоков используют «втягивающий» (pull) подход к управлению потоком выполнения: когда вы вытягиваете блок из первого потока, он, следовательно, вытягивает необходимые блоки из вложенных потоков, так и работает весь конвейер выполнения. Ни «pull» ни «push» не имеют явного преимущества, потому что поток управления неявный, и это ограничивает в реализации различных функций, таких как одновременное выполнение нескольких запросов (слияние нескольких конвейеров вместе). Это ограничение можно преодолеть с помощью сопрограмм (coroutines) или просто запуском дополнительных потоков, которые ждут друг друга. У нас может быть больше возможностей, если мы сделаем поток управления явным: если мы локализуем логику для передачи данных из одной расчетной единицы в другую вне этих расчетных единиц. Читайте эту [статью](http://journal.stuffwithstuff.com/2013/01/13/iteration-inside-and-out/) для углубленного изучения.
Следует отметить, что конвейер выполнения запроса создает временные данные на каждом шаге. Мы стараемся сохранить размер блока достаточно маленьким, чтобы временные данные помещались в кэш процессора. При таком допущении запись и чтение временных данных практически бесплатны по сравнению с другими расчетами. Мы могли бы рассмотреть альтернативу, которая заключается в том, чтобы объединить многие операции в конвейере вместе. Это может сделать конвейер как можно короче и удалить большую часть временных данных, что может быть преимуществом, но у такого подхода также есть недостатки. Например, разделенный конвейер позволяет легко реализовать кэширование промежуточных данных, использование промежуточных данных из аналогичных запросов, выполняемых одновременно, и объединение конвейеров для аналогичных запросов.
Смотрите описание в файле [src/Processors/IProcessor.h](https://github.com/ClickHouse/ClickHouse/blob/master/src/Processors/IProcessor.h) исходного кода.
## Форматы {#formats}
@ -81,13 +69,16 @@ ClickHouse — полноценная столбцовая СУБД. Данны
Буферы чтения-записи имеют дело только с байтами. В заголовочных файлах `ReadHelpers` и `WriteHelpers` объявлены некоторые функции, чтобы помочь с форматированием ввода-вывода. Например, есть помощники для записи числа в десятичном формате.
Давайте посмотрим, что происходит, когда вы хотите вывести результат в `JSON` формате в стандартный вывод (stdout). У вас есть результирующий набор данных, готовый к извлечению из `IBlockInputStream`. Вы создаете `WriteBufferFromFileDescriptor(STDOUT_FILENO)` чтобы записать байты в stdout. Вы создаете `JSONRowOutputStream`, инициализируете с этим `WriteBuffer`'ом, чтобы записать строки `JSON` в stdout. Кроме того вы создаете `BlockOutputStreamFromRowOutputStream`, реализуя `IBlockOutputStream`. Затем вызывается `copyData` для передачи данных из `IBlockInputStream` в `IBlockOutputStream` и все работает. Внутренний `JSONRowOutputStream` будет писать в формате `JSON` различные разделители и вызвать `IDataType::serializeTextJSON` метод со ссылкой на `IColumn` и номер строки в качестве аргументов. Следовательно, `IDataType::serializeTextJSON` вызовет метод из `WriteHelpers.h`: например, `writeText` для числовых типов и `writeJSONString` для `DataTypeString`.
Давайте посмотрим, что происходит, когда вы хотите вывести результат в `JSON` формате в стандартный вывод (stdout). У вас есть результирующий набор данных, готовый к извлечению из `QueryPipeline`. Вы создаете `WriteBufferFromFileDescriptor(STDOUT_FILENO)` чтобы записать байты в stdout. Вы создаете `JSONRowOutputFormat`, инициализируете с этим `WriteBuffer`'ом, чтобы записать строки `JSON` в stdout.
Чтобы соеденить выход `QueryPipeline` с форматом, можно использовать метод `complete`, который превращает `QueryPipeline` в завершенный `QueryPipeline`.
Внутренний `JSONRowOutputStream` будет писать в формате `JSON` различные разделители и вызвать `IDataType::serializeTextJSON` метод со ссылкой на `IColumn` и номер строки в качестве аргументов. Следовательно, `IDataType::serializeTextJSON` вызовет метод из `WriteHelpers.h`: например, `writeText` для числовых типов и `writeJSONString` для `DataTypeString`.
## Таблицы {#tables}
Интерфейс `IStorage` служит для отображения таблицы. Различные движки таблиц являются реализациями этого интерфейса. Примеры `StorageMergeTree`, `StorageMemory` и так далее. Экземпляры этих классов являются просто таблицами.
Ключевые методы `IStorage` это `read` и `write`. Есть и другие варианты — `alter`, `rename`, `drop` и так далее. Метод `read` принимает следующие аргументы: набор столбцов для чтения из таблицы, `AST` запрос и желаемое количество потоков для вывода. Он возвращает один или несколько объектов `IBlockInputStream` и информацию о стадии обработки данных, которая была завершена внутри табличного движка во время выполнения запроса.
Ключевые методы `IStorage` это `read` и `write`. Есть и другие варианты — `alter`, `rename`, `drop` и так далее.
Метод `read` принимает следующие аргументы: набор столбцов для чтения из таблицы, `AST` запрос и желаемое количество потоков для вывода и возвращает `Pipe`.
В большинстве случаев метод read отвечает только за чтение указанных столбцов из таблицы, а не за дальнейшую обработку данных. Вся дальнейшая обработка данных осуществляется интерпретатором запросов и не входит в сферу ответственности `IStorage`.
@ -96,7 +87,9 @@ ClickHouse — полноценная столбцовая СУБД. Данны
- AST-запрос, передающийся в метод `read`, может использоваться движком таблицы для получения информации о возможности использования индекса и считывания меньшего количества данных из таблицы.
- Иногда движок таблиц может сам обрабатывать данные до определенного этапа. Например, `StorageDistributed` можно отправить запрос на удаленные серверы, попросить их обработать данные до этапа, когда данные с разных удаленных серверов могут быть объединены, и вернуть эти предварительно обработанные данные. Затем интерпретатор запросов завершает обработку данных.
Метод `read` может возвращать несколько объектов `IBlockInputStream`, позволяя осуществлять параллельную обработку данных. Эти несколько блочных входных потоков могут считываться из таблицы параллельно. Затем вы можете обернуть эти потоки различными преобразованиями (такими как вычисление выражений или фильтрация), которые могут быть вычислены независимо, и создать `UnionBlockInputStream` поверх них, чтобы читать из нескольких потоков параллельно.
Метод `read` может возвращать `Pipe`, состоящий из нескольких процессоров. Каждый их этих процессоров может читать данные параллельно.
Затем, вы можете соеденить эти просессоры с другими преобразованиями (такими как вычисление выражений или фильтрация), которые могут быть вычислены независимо.
Далее, создан `QueryPipeline` поверх них, можно выполнить пайплайн с помощью `PipelineExecutor`.
Есть и другие варианты. Например, `TableFunction` возвращает временный объект `IStorage`, который можно подставить во `FROM`.
@ -112,10 +105,18 @@ ClickHouse — полноценная столбцовая СУБД. Данны
## Интерпретаторы {#interpreters}
Интерпретаторы отвечают за создание конвейера выполнения запроса из `AST`. Есть простые интерпретаторы, такие как `InterpreterExistsQuery` и `InterpreterDropQuery` или более сложный `InterpreterSelectQuery`. Конвейер выполнения запроса представляет собой комбинацию входных и выходных потоков блоков. Например, результатом интерпретации `SELECT` запроса является `IBlockInputStream` для чтения результирующего набора данных; результат интерпретации `INSERT` запроса — это `IBlockOutputStream`, для записи данных, предназначенных для вставки; результат интерпретации `INSERT SELECT` запроса — это `IBlockInputStream`, который возвращает пустой результирующий набор при первом чтении, но копирует данные из `SELECT` к `INSERT`.
Интерпретаторы отвечают за создание конвейера выполнения запроса из `AST`. Есть простые интерпретаторы, такие как `InterpreterExistsQuery` и `InterpreterDropQuery` или более сложный `InterpreterSelectQuery`.
Конвейер выполнения запроса представляет собой комбинацию процессоров, которые могут принимать на вход и также возвращать чанки (набор колонок с их типами)
Процессоры обмениваются данными через порты и могут иметь несколько входных и выходных портов.
Более подробное описание можно найти в файле [src/Processors/IProcessor.h](https://github.com/ClickHouse/ClickHouse/blob/master/src/Processors/IProcessor.h).
Например, результатом интерпретации `SELECT` запроса является `QueryPipeline`, который имеет специальный выходной порт для чтения результирующего набора данных. Результатом интерпретации `INSERT` запроса является `QueryPipeline` с входным портом для записи данных для вставки. Результатом интерпретации `INSERT SELECT` запроса является завершенный `QueryPipeline`, который не имеет входов или выходов, но копирует данные из `SELECT` в `INSERT` одновременно.
`InterpreterSelectQuery` использует `ExpressionAnalyzer` и `ExpressionActions` механизмы для анализа запросов и преобразований. Именно здесь выполняется большинство оптимизаций запросов на основе правил. `ExpressionAnalyzer` написан довольно грязно и должен быть переписан: различные преобразования запросов и оптимизации должны быть извлечены в отдельные классы, чтобы позволить модульные преобразования или запросы.
Для решения текущих проблем, существующих в интерпретаторах, разрабатывается новый `InterpreterSelectQueryAnalyzer`. Это новая версия `InterpreterSelectQuery`, которая не использует `ExpressionAnalyzer` и вводит дополнительный уровень абстракции между `AST` и `QueryPipeline`, называемый `QueryTree`. Он еще не готов к использованию в продакшене, но его можно протестировать с помощью флага `allow_experimental_analyzer`.
## Функции {#functions}
Существуют обычные функции и агрегатные функции. Агрегатные функции смотрите в следующем разделе.

View File

@ -345,7 +345,7 @@ struct ExtractDomain
**7.** Для абстрактных классов (интерфейсов) можно добавить в начало имени букву `I`.
``` cpp
class IBlockInputStream
class IProcessor
```
**8.** Если переменная используется достаточно локально, то можно использовать короткое имя.

View File

@ -22,7 +22,7 @@ sidebar_label: Distributed
Смотрите также:
- настройка `insert_distributed_sync`
- настройка `distributed_foreground_insert`
- [MergeTree](../mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes) для примера
Пример:
@ -131,7 +131,7 @@ logs - имя кластера в конфигурационном файле с
- используются запросы, требующие соединение данных (IN, JOIN) по определённому ключу - тогда если данные шардированы по этому ключу, то можно использовать локальные IN, JOIN вместо GLOBAL IN, GLOBAL JOIN, что кардинально более эффективно.
- используется большое количество серверов (сотни и больше) и большое количество маленьких запросов (запросы отдельных клиентов - сайтов, рекламодателей, партнёров) - тогда, для того, чтобы маленькие запросы не затрагивали весь кластер, имеет смысл располагать данные одного клиента на одном шарде, или сделать двухуровневое шардирование: разбить весь кластер на «слои», где слой может состоять из нескольких шардов; данные для одного клиента располагаются на одном слое, но в один слой можно по мере необходимости добавлять шарды, в рамках которых данные распределены произвольным образом; создаются распределённые таблицы на каждый слой и одна общая распределённая таблица для глобальных запросов.
Запись данных осуществляется полностью асинхронно. При вставке в таблицу, блок данных сначала записывается в файловую систему. Затем, в фоновом режиме отправляются на удалённые серверы при первой возможности. Период отправки регулируется настройками [distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms) и [distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms). Движок таблиц `Distributed` отправляет каждый файл со вставленными данными отдельно, но можно включить пакетную отправку данных настройкой [distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts). Эта настройка улучшает производительность кластера за счет более оптимального использования ресурсов сервера-отправителя и сети. Необходимо проверять, что данные отправлены успешно, для этого проверьте список файлов (данных, ожидающих отправки) в каталоге таблицы `/var/lib/clickhouse/data/database/table/`. Количество потоков для выполнения фоновых задач можно задать с помощью настройки [background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size).
Запись данных осуществляется полностью асинхронно. При вставке в таблицу, блок данных сначала записывается в файловую систему. Затем, в фоновом режиме отправляются на удалённые серверы при первой возможности. Период отправки регулируется настройками [distributed_background_insert_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_sleep_time_ms) и [distributed_background_insert_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_max_sleep_time_ms). Движок таблиц `Distributed` отправляет каждый файл со вставленными данными отдельно, но можно включить пакетную отправку данных настройкой [distributed_background_insert_batch](../../../operations/settings/settings.md#distributed_background_insert_batch). Эта настройка улучшает производительность кластера за счет более оптимального использования ресурсов сервера-отправителя и сети. Необходимо проверять, что данные отправлены успешно, для этого проверьте список файлов (данных, ожидающих отправки) в каталоге таблицы `/var/lib/clickhouse/data/database/table/`. Количество потоков для выполнения фоновых задач можно задать с помощью настройки [background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size).
Если после INSERT-а в Distributed таблицу, сервер перестал существовать или был грубо перезапущен (например, в следствие аппаратного сбоя), то записанные данные могут быть потеряны. Если в директории таблицы обнаружен повреждённый кусок данных, то он переносится в поддиректорию broken и больше не используется.

View File

@ -8,9 +8,9 @@ sidebar_label: "Установка"
## Системные требования {#sistemnye-trebovaniia}
ClickHouse может работать на любой операционной системе Linux, FreeBSD или Mac OS X с архитектурой процессора x86_64, AArch64 или PowerPC64LE.
ClickHouse может работать на любой операционной системе Linux, FreeBSD или Mac OS X с архитектурой процессора x86-64, AArch64 или PowerPC64LE.
Предварительно собранные пакеты компилируются для x86_64 и используют набор инструкций SSE 4.2, поэтому, если не указано иное, его поддержка в используемом процессоре, становится дополнительным требованием к системе. Вот команда, чтобы проверить, поддерживает ли текущий процессор SSE 4.2:
Предварительно собранные пакеты компилируются для x86-64 и используют набор инструкций SSE 4.2, поэтому, если не указано иное, его поддержка в используемом процессоре, становится дополнительным требованием к системе. Вот команда, чтобы проверить, поддерживает ли текущий процессор SSE 4.2:
``` bash
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
@ -20,9 +20,9 @@ grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not su
## Доступные варианты установки {#dostupnye-varianty-ustanovki}
### Из DEB пакетов {#install-from-deb-packages}
### Из deb-пакетов {#install-from-deb-packages}
Яндекс рекомендует использовать официальные скомпилированные `deb` пакеты для Debian или Ubuntu. Для установки пакетов выполните:
Яндекс рекомендует использовать официальные скомпилированные `deb`-пакеты для Debian или Ubuntu. Для установки пакетов выполните:
``` bash
sudo apt-get install -y apt-transport-https ca-certificates dirmngr
@ -64,18 +64,18 @@ clickhouse-client # or "clickhouse-client --password" if you set up a password.
#### Пакеты {#packages}
- `clickhouse-common-static`Устанавливает исполняемые файлы ClickHouse.
- `clickhouse-server`Создает символические ссылки для `clickhouse-server` и устанавливает конфигурационные файлы.
- `clickhouse-client`Создает символические ссылки для `clickhouse-client` и других клиентских инструментов и устанавливает конфигурационные файлы `clickhouse-client`.
- `clickhouse-common-static-dbg`Устанавливает исполняемые файлы ClickHouse собранные с отладочной информацией.
- `clickhouse-common-static`устанавливает исполняемые файлы ClickHouse.
- `clickhouse-server`создаёт символические ссылки для `clickhouse-server` и устанавливает конфигурационные файлы.
- `clickhouse-client`создаёт символические ссылки для `clickhouse-client` и других клиентских инструментов и устанавливает конфигурационные файлы `clickhouse-client`.
- `clickhouse-common-static-dbg`устанавливает исполняемые файлы ClickHouse собранные с отладочной информацией.
:::note Внимание
Если вам нужно установить ClickHouse определенной версии, вы должны установить все пакеты одной версии:
Если вам нужно установить ClickHouse определённой версии, вы должны установить все пакеты одной версии:
`sudo apt-get install clickhouse-server=21.8.5.7 clickhouse-client=21.8.5.7 clickhouse-common-static=21.8.5.7`
:::
### Из RPM пакетов {#from-rpm-packages}
### Из rpm-пакетов {#from-rpm-packages}
Команда ClickHouse в Яндексе рекомендует использовать официальные предкомпилированные `rpm` пакеты для CentOS, RedHat и всех остальных дистрибутивов Linux, основанных на rpm.
Команда ClickHouse в Яндексе рекомендует использовать официальные предкомпилированные `rpm`-пакеты для CentOS, RedHat и всех остальных дистрибутивов Linux, основанных на rpm.
#### Установка официального репозитория
@ -128,7 +128,7 @@ clickhouse-client # or "clickhouse-client --password" if you set up a password.
Для использования наиболее свежих версий нужно заменить `stable` на `testing` (рекомендуется для тестовых окружений). Также иногда доступен `prestable`.
Для, собственно, установки пакетов необходимо выполнить следующие команды:
Для непосредственной установки пакетов необходимо выполнить следующие команды:
``` bash
sudo yum install clickhouse-server clickhouse-client
@ -136,9 +136,9 @@ sudo yum install clickhouse-server clickhouse-client
Также есть возможность установить пакеты вручную, скачав отсюда: https://packages.clickhouse.com/rpm/stable.
### Из Tgz архивов {#from-tgz-archives}
### Из tgz-архивов {#from-tgz-archives}
Команда ClickHouse в Яндексе рекомендует использовать предкомпилированные бинарники из `tgz` архивов для всех дистрибутивов, где невозможна установка `deb` и `rpm` пакетов.
Команда ClickHouse в Яндексе рекомендует использовать предкомпилированные бинарники из `tgz`-архивов для всех дистрибутивов, где невозможна установка `deb`- и `rpm`- пакетов.
Интересующую версию архивов можно скачать вручную с помощью `curl` или `wget` из репозитория https://packages.clickhouse.com/tgz/.
После этого архивы нужно распаковать и воспользоваться скриптами установки. Пример установки самой свежей версии:
@ -205,11 +205,11 @@ sudo clickhouse-client-$LATEST_VERSION/install/doinst.sh
```
</details>
Для production окружений рекомендуется использовать последнюю `stable`-версию. Её номер также можно найти на github с на вкладке https://github.com/ClickHouse/ClickHouse/tags c постфиксом `-stable`.
Для продуктивных окружений рекомендуется использовать последнюю `stable`-версию. Её номер также можно найти на github с на вкладке https://github.com/ClickHouse/ClickHouse/tags c постфиксом `-stable`.
### Из Docker образа {#from-docker-image}
Для запуска ClickHouse в Docker нужно следовать инструкции на [Docker Hub](https://hub.docker.com/r/clickhouse/clickhouse-server/). Внутри образов используются официальные `deb` пакеты.
Для запуска ClickHouse в Docker нужно следовать инструкции на [Docker Hub](https://hub.docker.com/r/clickhouse/clickhouse-server/). Внутри образов используются официальные `deb`-пакеты.
### Из единого бинарного файла {#from-single-binary}
@ -230,20 +230,20 @@ sudo ./clickhouse install
После скачивания можно воспользоваться `clickhouse client` для подключения к серверу или `clickhouse local` для обработки локальных данных.
Чтобы установить ClickHouse в рамках всей системы (с необходимыми конфигурационными файлами, настройками пользователей и т.д.), выполните `sudo ./clickhouse install`. Затем выполните команды `clickhouse start` (чтобы запустить сервер) и `clickhouse-client` (чтобы подключиться к нему).
Чтобы установить ClickHouse в рамках всей системы (с необходимыми конфигурационными файлами, настройками пользователей и т. д.), выполните `sudo ./clickhouse install`. Затем выполните команды `clickhouse start` (чтобы запустить сервер) и `clickhouse-client` (чтобы подключиться к нему).
Данные сборки не рекомендуются для использования в рабочей среде, так как они недостаточно тщательно протестированы. Также в них присутствуют не все возможности ClickHouse.
### Из исходного кода {#from-sources}
Для компиляции ClickHouse вручную, используйте инструкцию для [Linux](../development/build.mdx) или [Mac OS X](../development/build-osx.md).
Для компиляции ClickHouse вручную используйте инструкцию для [Linux](../development/build.mdx) или [Mac OS X](../development/build-osx.md).
Можно скомпилировать пакеты и установить их, либо использовать программы без установки пакетов. Также при ручой сборке можно отключить необходимость поддержки набора инструкций SSE 4.2 или собрать под процессоры архитектуры AArch64.
Client: programs/clickhouse-client
Server: programs/clickhouse-server
Для работы собранного вручную сервера необходимо создать директории для данных и метаданных, а также сделать их `chown` для желаемого пользователя. Пути к этим директориям могут быть изменены в конфигурационном файле сервера (src/programs/server/config.xml), по умолчанию используются следующие:
Для работы собранного вручную сервера необходимо создать каталоги для данных и метаданных, а также сделать сменить их владельца на желаемого пользователя (`chown`). Пути к этим каталогам могут быть изменены в конфигурационном файле сервера (src/programs/server/config.xml), по умолчанию используются следующие:
/opt/clickhouse/data/default/
/opt/clickhouse/metadata/default/
@ -252,13 +252,13 @@ sudo ./clickhouse install
## Запуск {#zapusk}
Для запуска сервера в качестве демона, выполните:
Для запуска сервера в режиме демона, выполните:
``` bash
sudo service clickhouse-server start
```
Смотрите логи в директории `/var/log/clickhouse-server/`.
Смотрите журналы в каталоге `/var/log/clickhouse-server/`.
Если сервер не стартует, проверьте корректность конфигурации в файле `/etc/clickhouse-server/config.xml`
@ -268,16 +268,16 @@ sudo service clickhouse-server start
clickhouse-server --config-file=/etc/clickhouse-server/config.xml
```
При этом, лог будет выводиться в консоль, что удобно для разработки.
Если конфигурационный файл лежит в текущей директории, то указывать параметр `--config-file` не требуется, по умолчанию будет использован файл `./config.xml`.
При этом журнал будет выводиться в консоль, что удобно для разработки.
Если конфигурационный файл лежит в текущем каталоге, то указывать параметр `--config-file` не требуется, по умолчанию будет использован файл `./config.xml`.
После запуска сервера, соединиться с ним можно с помощью клиента командной строки:
После запуска сервера соединиться с ним можно с помощью клиента командной строки:
``` bash
clickhouse-client
```
По умолчанию он соединяется с localhost:9000, от имени пользователя `default` без пароля. Также клиент может быть использован для соединения с удалённым сервером с помощью аргумента `--host`.
По умолчанию он соединяется с localhost:9000 от имени пользователя `default` без пароля. Также клиент может быть использован для соединения с удалённым сервером с помощью аргумента `--host`.
Терминал должен использовать кодировку UTF-8.

View File

@ -4,17 +4,17 @@ sidebar_position: 12
sidebar_label: Tutorial
---
# ClickHouse Tutorial {#clickhouse-tutorial}
# Руководство {#clickhouse-tutorial}
## What to Expect from This Tutorial? {#what-to-expect-from-this-tutorial}
## Что вы получите, пройдя это руководство? {#what-to-expect-from-this-tutorial}
By going through this tutorial, youll learn how to set up a simple ClickHouse cluster. Itll be small, but fault-tolerant and scalable. Then we will use one of the example datasets to fill it with data and execute some demo queries.
Пройдя это руководство вы научитесь устанавливать простой кластер Clickhouse. Он будет небольшим, но отказоустойчивым и масштабируемым. Далее мы воспользуемся одним из готовых наборов данных для наполнения кластера данными и выполнения над ними нескольких демонстрационных запросов.
## Single Node Setup {#single-node-setup}
## Установка на одном узле {#single-node-setup}
To postpone the complexities of a distributed environment, well start with deploying ClickHouse on a single server or virtual machine. ClickHouse is usually installed from [deb](../getting-started/install.md#install-from-deb-packages) or [rpm](../getting-started/install.md#from-rpm-packages) packages, but there are [alternatives](../getting-started/install.md#from-docker-image) for the operating systems that do no support them.
Чтобы не погружаться сразу в сложности распределённого окружения мы начнём с развёртывания ClickHouse на одном сервере или одной виртуальной машине. ClickHouse обычно устанавливаается из [deb](../getting-started/install.md#install-from-deb-packages)- или [rpm](../getting-started/install.md#from-rpm-packages)-пакетов, но есть и [альтернативы](../getting-started/install.md#from-docker-image) для операционных систем без соответствующих пакетных менеджеров.
For example, you have chosen `deb` packages and executed:
Например, выбираем нужные `deb`-пакеты и выполняем:
``` bash
sudo apt-get install -y apt-transport-https ca-certificates dirmngr
@ -30,49 +30,49 @@ sudo service clickhouse-server start
clickhouse-client # or "clickhouse-client --password" if you've set up a password.
```
What do we have in the packages that got installed:
Что мы получим по результатам установки этих пакетов:
- `clickhouse-client` package contains [clickhouse-client](../interfaces/cli.md) application, interactive ClickHouse console client.
- `clickhouse-common` package contains a ClickHouse executable file.
- `clickhouse-server` package contains configuration files to run ClickHouse as a server.
- с пакетом `clickhouse-client` будет установлена программа [clickhouse-client](../interfaces/cli.md) — интерактивный консольный клиент ClickHouse.
- пакет `clickhouse-common` включает исполняемый файл ClickHouse.
- пакет `clickhouse-server` содержит конфигурационные файлы для запуска ClickHouse в качестве сервера.
Server config files are located in `/etc/clickhouse-server/`. Before going further, please notice the `<path>` element in `config.xml`. Path determines the location for data storage, so it should be located on volume with large disk capacity; the default value is `/var/lib/clickhouse/`. If you want to adjust the configuration, its not handy to directly edit `config.xml` file, considering it might get rewritten on future package updates. The recommended way to override the config elements is to create [files in config.d directory](../operations/configuration-files.md) which serve as “patches” to config.xml.
Файлы конфигурации сервера располагаются в каталоге `/etc/clickhouse-server/`. Прежде чем идти дальше, обратите внимание на элемент `<path>` в файле `config.xml`. Путь, задаваемый этим элементом, определяет местоположение данных, таким образом, он должен быть расположен на томе большой ёмкости; значение по умолчанию — `/var/lib/clickhouse/`. Если вы хотите изменить конфигурацию, то лучше не редактировать вручную файл `config.xml`, поскольку он может быть переписан будущими пакетными обновлениями; рекомендуется создать файлы с необходимыми конфигурационными элементами [в каталоге config.d](../operations/configuration-files.md), которые рассматриваются как “патчи” к config.xml.
As you might have noticed, `clickhouse-server` is not launched automatically after package installation. It wont be automatically restarted after updates, either. The way you start the server depends on your init system, usually, it is:
Вы могли заметить, что `clickhouse-server` не запускается автоматически после установки пакетов. Также сервер не будет автоматически перезапускаться после обновлений. Способ запуска сервера зависит от используемой подсистемы инициализации, обычно это делается так:
``` bash
sudo service clickhouse-server start
```
or
или
``` bash
sudo /etc/init.d/clickhouse-server start
```
The default location for server logs is `/var/log/clickhouse-server/`. The server is ready to handle client connections once it logs the `Ready for connections` message.
Журналы сервера по умолчанию ведутся в `/var/log/clickhouse-server/`. Как только в журнале появится сообщение `Ready for connections` — сервер готов принимать клиентские соединения.
Once the `clickhouse-server` is up and running, we can use `clickhouse-client` to connect to the server and run some test queries like `SELECT "Hello, world!";`.
Теперь, когда `clickhouse-server` запущен, можно подключиться к нему с использованием `clickhouse-client` и выполнить тестовый запрос, например, `SELECT 'Hello, world!';`.
<details markdown="1">
<summary>Quick tips for clickhouse-client</summary>
<summary>Советы по использованию clickhouse-client</summary>
Interactive mode:
Интерактивный режим:
``` bash
clickhouse-client
clickhouse-client --host=... --port=... --user=... --password=...
```
Enable multiline queries:
Включить многострочный режим запросов:
``` bash
clickhouse-client -m
clickhouse-client --multiline
```
Run queries in batch-mode:
Включить пакетный режим запуска запросов:
``` bash
clickhouse-client --query='SELECT 1'
@ -80,7 +80,7 @@ echo 'SELECT 1' | clickhouse-client
clickhouse-client <<< 'SELECT 1'
```
Insert data from a file in specified format:
Вставить данные из файла заданного формата:
``` bash
clickhouse-client --query='INSERT INTO table VALUES' < data.txt
@ -89,39 +89,39 @@ clickhouse-client --query='INSERT INTO table FORMAT TabSeparated' < data.tsv
</details>
## Import Sample Dataset {#import-sample-dataset}
## Загрузка набора данных из примеров {#import-sample-dataset}
Now its time to fill our ClickHouse server with some sample data. In this tutorial, well use some anonymized metric data. There are [multiple ways to import the dataset](../getting-started/example-datasets/metrica.md), and for the sake of the tutorial, well go with the most realistic one.
Настало время загрузить в ClickHouse данные из примеров. В этом руководстве мы используем анонимизированные данные посещений сайтов (веб-метрики). Существует [множество способов импортировать набор данных](../getting-started/example-datasets/metrica.md), но для целей данного руководства мы используем наиболее практичный из них.
### Download and Extract Table Data {#download-and-extract-table-data}
### Загрузка и извлечение табличных данных {#download-and-extract-table-data}
``` bash
curl https://datasets.clickhouse.com/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
curl https://datasets.clickhouse.com/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv
```
The extracted files are about 10GB in size.
Распакованные файлы занимают около 10 ГБ.
### Create Tables {#create-tables}
### Создание таблиц {#create-tables}
As in most databases management systems, ClickHouse logically groups tables into “databases”. Theres a `default` database, but well create a new one named `tutorial`:
ClickHouse, как и большинство СУБД, логически объединяет таблицы в «базы данных». Существует база данных по умолчанию — `default`, но мы созданим новую, дав ей наименование `tutorial`:
``` bash
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS tutorial"
```
Syntax for creating tables is way more complicated compared to databases (see [reference](../sql-reference/statements/create/table.md). In general `CREATE TABLE` statement has to specify three key things:
Ситаксис для создания таблиц более сложен в сравнении с другими СУБД (см. [руководство по SQL](../sql-reference/statements/create/table.md). Оператор `CREATE TABLE` должен указывать на три ключевых момента:
1. Name of table to create.
2. Table schema, i.e. list of columns and their [data types](../sql-reference/data-types/index.md).
3. [Table engine](../engines/table-engines/index.md) and its settings, which determines all the details on how queries to this table will be physically executed.
1. Имя создаваемой таблицы.
2. Схему таблицы, то есть задавать список столбцов и их [типы данных](../sql-reference/data-types/index.md).
3. [Движок таблицы](../engines/table-engines/index.md) и его параметры, которые определяют все детали того, как запросы к данной таблице будут физически исполняться.
There are only two tables to create:
Мы создадим все лишь две таблицы:
- `hits` is a table with each action done by all users on all websites covered by the service.
- `visits` is a table that contains pre-built sessions instead of individual actions.
- таблицу `hits` с действиями, осуществлёнными всеми пользователями на всех сайтах, обслуживаемых сервисом;
- таблицу `visits`, содержащую посещения — преднастроенные сессии вместо каждого действия.
Lets see and execute the real create table queries for these tables:
Выполним операторы `CREATE TABLE` для создания этих таблиц:
``` sql
CREATE TABLE tutorial.hits_v1
@ -462,22 +462,22 @@ ORDER BY (CounterID, StartDate, intHash32(UserID), VisitID)
SAMPLE BY intHash32(UserID)
```
You can execute those queries using the interactive mode of `clickhouse-client` (just launch it in a terminal without specifying a query in advance) or try some [alternative interface](../interfaces/index.md) if you want.
Эти операторы можно выполнить с использованием интерактивного режима в `clickhouse-client` (запустите его из командной строки не указывая заранее запросы) или, при желании, воспользоваться [альтернативным интерфейсом](../interfaces/index.md) .
As we can see, `hits_v1` uses the [basic MergeTree engine](../engines/table-engines/mergetree-family/mergetree.md), while the `visits_v1` uses the [Collapsing](../engines/table-engines/mergetree-family/collapsingmergetree.md) variant.
Как вы можете видеть, `hits_v1` использует [базовый вариант движка MergeTree](../engines/table-engines/mergetree-family/mergetree.md), тогда как `visits_v1` использует вариант [Collapsing](../engines/table-engines/mergetree-family/collapsingmergetree.md).
### Import Data {#import-data}
### Импорт данных {#import-data}
Data import to ClickHouse is done via [INSERT INTO](../sql-reference/statements/insert-into.md) query like in many other SQL databases. However, data is usually provided in one of the [supported serialization formats](../interfaces/formats.md) instead of `VALUES` clause (which is also supported).
Импорт данных в ClickHouse выполняется оператором [INSERT INTO](../sql-reference/statements/insert-into.md) как в большинстве SQL-систем. Однако данные для вставки в таблицы ClickHouse обычно предоставляются в одном из [поддерживаемых форматов](../interfaces/formats.md) вместо их непосредственного указания в предложении `VALUES` (хотя и этот способ поддерживается).
The files we downloaded earlier are in tab-separated format, so heres how to import them via console client:
В нашем случае файлы были загружены ранее в формате со значениями, разделёнными знаком табуляции; импортируем их, указав соответствующие запросы в аргументах командной строки:
``` bash
clickhouse-client --query "INSERT INTO tutorial.hits_v1 FORMAT TSV" --max_insert_block_size=100000 < hits_v1.tsv
clickhouse-client --query "INSERT INTO tutorial.visits_v1 FORMAT TSV" --max_insert_block_size=100000 < visits_v1.tsv
```
ClickHouse has a lot of [settings to tune](../operations/settings/index.md) and one way to specify them in console client is via arguments, as we can see with `--max_insert_block_size`. The easiest way to figure out what settings are available, what do they mean and what the defaults are is to query the `system.settings` table:
ClickHouse оснащён множеством [изменяемых настроек](../operations/settings/index.md) и один из способов их указать — передать при запуске консольного клиенте их в качестве аргументов, как вы видели в этом примере с `--max_insert_block_size`. Простейший способ узнать, какие настройки доступны, что они означают и какие у них значения по умолчанию — запросить содержимое таблицы `system.settings`:
``` sql
SELECT name, value, changed, description
@ -488,23 +488,23 @@ FORMAT TSV
max_insert_block_size 1048576 0 "The maximum block size for insertion, if we control the creation of blocks for insertion."
```
Optionally you can [OPTIMIZE](../sql-reference/statements/optimize.md) the tables after import. Tables that are configured with an engine from MergeTree-family always do merges of data parts in the background to optimize data storage (or at least check if it makes sense). These queries force the table engine to do storage optimization right now instead of some time later:
Можно также применить оператор [OPTIMIZE](../sql-reference/statements/optimize.md) к таблицам после импорта. Для таблиц, созданных с движками семейства MergeTree, слияние частей загруженных данных выполняется в фоновом режиме (по крайней мере проверяется, имеет ли смысл его осуществить); этот оператор принудительно запускает соответствующие процессы слияния вместо того, чтобы эти действия были выполнены в фоне когда-нибудь позже.
``` bash
clickhouse-client --query "OPTIMIZE TABLE tutorial.hits_v1 FINAL"
clickhouse-client --query "OPTIMIZE TABLE tutorial.visits_v1 FINAL"
```
These queries start an I/O and CPU intensive operation, so if the table consistently receives new data, its better to leave it alone and let merges run in the background.
Эти запросы запускают интеснивные по отношению к вводу-выводу и процессорным ресурсам операции, таким образом, если таблица всё ещё получает новые данные, лучше дать возможность слияниям запуститься в фоне.
Now we can check if the table import was successful:
Проверим, успешно ли загрузились данные:
``` bash
clickhouse-client --query "SELECT COUNT(*) FROM tutorial.hits_v1"
clickhouse-client --query "SELECT COUNT(*) FROM tutorial.visits_v1"
```
## Example Queries {#example-queries}
## Примеры запросов {#example-queries}
``` sql
SELECT
@ -526,18 +526,18 @@ FROM tutorial.visits_v1
WHERE (CounterID = 912887) AND (toYYYYMM(StartDate) = 201403)
```
## Cluster Deployment {#cluster-deployment}
## Кластерное развёртывание {#cluster-deployment}
ClickHouse cluster is a homogenous cluster. Steps to set up:
Кластер ClickHouse — гомогенный, то есть все узлы в нём равны, ведущие и ведомые не выделяются. Шаги по установке кластера:
1. Install ClickHouse server on all machines of the cluster
2. Set up cluster configs in configuration files
3. Create local tables on each instance
4. Create a [Distributed table](../engines/table-engines/special/distributed.md)
1. Установить сервер ClickHouse на всех узлах будущего кластера.
2. Прописать кластерные конфигурации в конфигурационных файлах.
3. Создать локальные таблицы на каждом экземпляре.
4. Создать [распределённую таблицу](../engines/table-engines/special/distributed.md).
[Distributed table](../engines/table-engines/special/distributed.md) is actually a kind of “view” to local tables of ClickHouse cluster. SELECT query from a distributed table executes using resources of all clusters shards. You may specify configs for multiple clusters and create multiple distributed tables providing views to different clusters.
[Распределённая таблица](../engines/table-engines/special/distributed.md) — в некотором смысле «представление» над локальными таблицами кластера ClickHouse. Запрос SELECT к распределённой таблице выполняется на всех узлах кластера. Вы можете указать конфигурации для нескольких кластеров и создать множество распределённых таблиц, «смотрящих» на разные кластеры.
Example config for a cluster with three shards, one replica each:
Пример конфигурации кластера с тремя сегментами и одной репликой для каждой:
``` xml
<remote_servers>
@ -564,38 +564,38 @@ Example config for a cluster with three shards, one replica each:
</remote_servers>
```
For further demonstration, lets create a new local table with the same `CREATE TABLE` query that we used for `hits_v1`, but different table name:
Далее создадим новую локальную таблицу с помощью того же запроса `CREATE TABLE`, что использовался для таблицы `hits_v1`, но с другим именем:
``` sql
CREATE TABLE tutorial.hits_local (...) ENGINE = MergeTree() ...
```
Creating a distributed table providing a view into local tables of the cluster:
Создадим распределённую таблицу, обеспечивающую представление над локальными таблицами кластера:
``` sql
CREATE TABLE tutorial.hits_all AS tutorial.hits_local
ENGINE = Distributed(perftest_3shards_1replicas, tutorial, hits_local, rand());
```
A common practice is to create similar Distributed tables on all machines of the cluster. It allows running distributed queries on any machine of the cluster. Also theres an alternative option to create temporary distributed table for a given SELECT query using [remote](../sql-reference/table-functions/remote.md) table function.
Стандартная практика — создание одинаковых распределённых таблиц на всех узлах кластера. Это позволит запускать распределённые запросы с любого узла. Альтернативой может быть создание временной распределённой таблицы для заданного отдельно взятого запроса с использованием табличной функции [remote](../sql-reference/table-functions/remote.md).
Lets run [INSERT SELECT](../sql-reference/statements/insert-into.md) into the Distributed table to spread the table to multiple servers.
Выполним [INSERT SELECT](../sql-reference/statements/insert-into.md) в распределённую таблицу, чтобы распределить данные по нескольким узлам.
``` sql
INSERT INTO tutorial.hits_all SELECT * FROM tutorial.hits_v1;
```
:::danger Notice
This approach is not suitable for the sharding of large tables. Theres a separate tool [clickhouse-copier](../operations/utilities/clickhouse-copier.md) that can re-shard arbitrary large tables.
:::danger Внимание!
Этот подход не годится для сегментирования больших таблиц. Есть инструмент [clickhouse-copier](../operations/utilities/clickhouse-copier.md), специально предназначенный для перераспределения любых больших таблиц.
:::
As you could expect, computationally heavy queries run N times faster if they utilize 3 servers instead of one.
Как и следовало ожидать, вычислительно сложные запросы работают втрое быстрее, если они выполняются на трёх серверах, а не на одном.
In this case, we have used a cluster with 3 shards, and each contains a single replica.
В данном случае мы использовали кластер из трёх сегментов с одной репликой для каждого.
To provide resilience in a production environment, we recommend that each shard should contain 2-3 replicas spread between multiple availability zones or datacenters (or at least racks). Note that ClickHouse supports an unlimited number of replicas.
В продуктивных окружениях для обеспечения надёжности мы рекомендуем чтобы каждый сегмент был защищён 2—3 репликами, разнесёнными на разные зоны отказоустойчивости или разные центры обработки данных (или хотя бы разные стойки). Особо отметим, что ClickHouse поддерживает неограниченное количество реплик.
Example config for a cluster of one shard containing three replicas:
Пример конфигурации кластера с одним сегментом и тремя репликами:
``` xml
<remote_servers>
@ -619,13 +619,13 @@ Example config for a cluster of one shard containing three replicas:
</remote_servers>
```
To enable native replication [ZooKeeper](http://zookeeper.apache.org/) is required. ClickHouse takes care of data consistency on all replicas and runs restore procedure after failure automatically. Its recommended to deploy the ZooKeeper cluster on separate servers (where no other processes including ClickHouse are running).
Для работы встроенной репликации необходимо использовать [ZooKeeper](http://zookeeper.apache.org/). ClickHouse заботится о согласованности данных на всех репликах и автоматически запускает процедуры восстановления в случае сбоев. Рекомендуется развёртывание кластера ZooKeeper на отдельных серверах (на которых не запущено других процессов, в том числе ClickHouse).
:::note Note
ZooKeeper is not a strict requirement: in some simple cases, you can duplicate the data by writing it into all the replicas from your application code. This approach is **not** recommended, in this case, ClickHouse wont be able to guarantee data consistency on all replicas. Thus it becomes the responsibility of your application.
:::note Примечание
Использование ZooKeeper — нестрогая рекомендация: можно продублировать данные, записывая их непосредственно из приложения на несколько реплик. Но этот поход **не рекомедуется** в общем случае, поскольку ClickHouse не сможет гарантировать согласованность данных на всех репликах; обеспечение согласованности станет заботой вашего приложения.
:::
ZooKeeper locations are specified in the configuration file:
Адреса узлов ZooKeeper указываются в файле конфиуграции:
``` xml
<zookeeper>
@ -644,7 +644,7 @@ ZooKeeper locations are specified in the configuration file:
</zookeeper>
```
Also, we need to set macros for identifying each shard and replica which are used on table creation:
Также необходимо в секции macros указать идентификаторы для сегментов и реплик, они нужны будут при создании таблиц:
``` xml
<macros>
@ -653,7 +653,7 @@ Also, we need to set macros for identifying each shard and replica which are use
</macros>
```
If there are no replicas at the moment on replicated table creation, a new first replica is instantiated. If there are already live replicas, the new replica clones data from existing ones. You have an option to create all replicated tables first, and then insert data to it. Another option is to create some replicas and add the others after or during data insertion.
Если в момент создания реплцированной таблицы ни одной реплики ещё нет, то будет создана первая из них. Если уже есть работающие реплики, то в новые реплики данные будут склонированы из существующих. Есть возможность вначале создать реплицируемые таблицы, а затем вставить в них данные. Но можно создать вначале создать только часть реплик и добавить ещё несколько после вставки или в процессе вставки данных.
``` sql
CREATE TABLE tutorial.hits_replica (...)
@ -664,10 +664,10 @@ ENGINE = ReplicatedMergeTree(
...
```
Here we use [ReplicatedMergeTree](../engines/table-engines/mergetree-family/replication.md) table engine. In parameters we specify ZooKeeper path containing shard and replica identifiers.
Здесь мы используем движок [ReplicatedMergeTree](../engines/table-engines/mergetree-family/replication.md). Указываем в параметрах путь в Zookeeper к идентификаторам сегмента и реплики.
``` sql
INSERT INTO tutorial.hits_replica SELECT * FROM tutorial.hits_local;
```
Replication operates in multi-master mode. Data can be loaded into any replica, and the system then syncs it with other instances automatically. Replication is asynchronous so at a given moment, not all replicas may contain recently inserted data. At least one replica should be up to allow data ingestion. Others will sync up data and repair consistency once they will become active again. Note that this approach allows for the low possibility of a loss of recently inserted data.
Репликация работает в режиме мультимастера. Это означает, что данные могут быть загружены на любую из реплик и система автоматически синхронизирует данные между остальными репликами. Репликация асинхронна, то есть в конкретный момент времнени не все реплики могут содержать недавно добавленные данные. Как минимум одна реплика должна быть в строю для приёма данных. Прочие реплики синхронизируются и восстановят согласованное состояния как только снова станут активными. Заметим, что при таком подходе есть вероятность утраты недавно добавленных данных.

View File

@ -6,24 +6,24 @@ sidebar_label: "Производительность"
# Производительность {#proizvoditelnost}
По результатам внутреннего тестирования в Яндексе, ClickHouse обладает наиболее высокой производительностью (как наиболее высокой пропускной способностью на длинных запросах, так и наиболее низкой задержкой на коротких запросах), при соответствующем сценарии работы, среди доступных для тестирования систем подобного класса. Результаты тестирования можно посмотреть на [отдельной странице](https://clickhouse.com/benchmark/dbms/).
По результатам внутреннего тестирования в Яндексе ClickHouse обладает наиболее высокой производительностью (как наиболее высокой пропускной способностью на длинных запросах, так и наиболее низкой задержкой на коротких запросах) при соответствующем сценарии работы среди доступных для тестирования систем подобного класса. Результаты тестирования можно посмотреть на [отдельной странице](https://clickhouse.com/benchmark/dbms/).
Также это подтверждают многочисленные независимые бенчмарки. Их не сложно найти в Интернете самостоятельно, либо можно воспользоваться [небольшой коллекцией ссылок по теме](https://clickhouse.com/#independent-benchmarks).
Также это подтверждают многочисленные независимые бенчмарки. Их несложно найти в Интернете самостоятельно, либо можно воспользоваться [небольшой коллекцией ссылок по теме](https://clickhouse.com/#independent-benchmarks).
## Пропускная способность при обработке одного большого запроса {#propusknaia-sposobnost-pri-obrabotke-odnogo-bolshogo-zaprosa}
Пропускную способность можно измерять в строчках в секунду и в мегабайтах в секунду. При условии, что данные помещаются в page cache, не слишком сложный запрос обрабатывается на современном железе со скоростью около 2-10 GB/sec. несжатых данных на одном сервере (в простейшем случае скорость может достигать 30 GB/sec). Если данные не помещаются в page cache, то скорость работы зависит от скорости дисковой подсистемы и коэффициента сжатия данных. Например, если дисковая подсистема позволяет читать данные со скоростью 400 MB/sec., а коэффициент сжатия данных составляет 3, то скорость будет около 1.2GB/sec. Для получения скорости в строчках в секунду, следует поделить скорость в байтах в секунду на суммарный размер используемых в запросе столбцов. Например, если вынимаются столбцы на 10 байт, то скорость будет в районе 100-200 млн. строк в секунду.
Пропускную способность можно измерять в строчках в секунду и в мегабайтах в секунду. При условии, что данные помещаются в страничный кэш (page cache), не слишком сложный запрос обрабатывается на современном железе со скоростью около 2—10 ГБ/с несжатых данных на одном сервере (в простейшем случае скорость может достигать 30 ГБ/с). Если данные не помещаются в страничный кэш, то скорость работы зависит от скорости подсистемы ввода-вывода и коэффициента сжатия данных. Например, если подсистема ввода-вывода позволяет читать данные со скоростью 400 МБ/с, а коэффициент сжатия данных составляет 3, то скорость будет около 1,2 ГБ/с. Для получения скорости в строках в секунду следует поделить скорость в байтах в секунду на суммарный размер используемых в запросе столбцов. Например, если запрашиваются столбцы на 10 байт, то скорость будет в районе 100—200 млн строк в секунду.
При распределённой обработке запроса, скорость обработки запроса растёт почти линейно, но только при условии, что в результате агрегации или при сортировке получается не слишком большое множество строчек.
При распределённой обработке запроса скорость обработки запроса растёт почти линейно, но только при условии, что в результате агрегации или при сортировке получается не слишком большое множество строк.
## Задержки при обработке коротких запросов {#zaderzhki-pri-obrabotke-korotkikh-zaprosov}
Если запрос использует первичный ключ, и выбирает для обработки не слишком большое количество строчек (сотни тысяч), и использует не слишком большое количество столбцов, то вы можете рассчитывать на latency менее 50 миллисекунд (от единиц миллисекунд в лучшем случае), при условии, что данные помещаются в page cache. Иначе latency вычисляется из количества seek-ов. Если вы используйте вращающиеся диски, то на не слишком сильно нагруженной системе, latency вычисляется по формуле: seek time (10 мс.) \* количество столбцов в запросе \* количество кусков с данными.
Если запрос использует первичный ключ и выбирает для обработки не слишком большое количество строк (сотни тысяч), и при этом использует не слишком большое количество столбцов, то вы можете рассчитывать на задержку менее 50 миллисекунд (от единиц миллисекунд в лучшем случае) при условии, что данные помещаются в страничный кэш. В иных случаях задержка зависит от количества операций поиска (seek). Если вы используйте дисковые накопители, то на не слишком сильно нагруженной системе задержка вычисляется по формуле: seek time (10 мс) \* количество столбцов в запросе \* количество кусков с данными.
## Пропускная способность при обработке многочисленных коротких запросов {#propusknaia-sposobnost-pri-obrabotke-mnogochislennykh-korotkikh-zaprosov}
При тех же условиях, ClickHouse может обработать несколько сотен (до нескольких тысяч в лучшем случае) запросов в секунду на одном сервере. Так как такой сценарий работы не является типичным для аналитических СУБД, рекомендуется рассчитывать не более чем на 100 запросов в секунду.
При тех же условиях ClickHouse может обработать несколько сотен (до нескольких тысяч в лучшем случае) запросов в секунду на одном сервере. Так как такой сценарий работы не является типичным для аналитических СУБД, рекомендуется рассчитывать на скорость не более, чем 100 запросов в секунду.
## Производительность при вставке данных {#proizvoditelnost-pri-vstavke-dannykh}
Данные рекомендуется вставлять пачками не менее 1000 строк или не более одного запроса в секунду. При вставке в таблицу типа MergeTree из tab-separated дампа, скорость вставки будет в районе 50-200 МБ/сек. Если вставляются строчки размером около 1 КБ, то скорость будет в районе 50 000 - 200 000 строчек в секунду. Если строчки маленькие - производительность в строчках в секунду будет выше (на данных БК - `>` 500 000 строк в секунду, на данных Graphite - `>` 1 000 000 строк в секунду). Для увеличения производительности, можно производить несколько запросов INSERT параллельно - при этом производительность растёт линейно.
Данные рекомендуется вставлять пачками не менее 1000 строк или не более одного запроса в секунду. При вставке в таблицу типа MergeTree из текстового файла с табуляцией в качестве разделителя скорость вставки будет в районе 50—200 МБ/с. Если вставляются строки размером около 1 КБ, то скорость будет в районе 50 000 — 200 000 строк в секунду. Если строки маленькие — производительность в строках в секунду будет выше (на данных БК - `>` 500 000 строк в секунду, на данных Graphite - `>` 1 000 000 строк в секунду). Для увеличения производительности можно делать несколько запросов INSERT параллельно — при этом производительность растёт линейно.

View File

@ -108,6 +108,15 @@ sidebar_label: "Ограничения на сложность запроса"
Значение по умолчанию — 0.
## max_bytes_before_external_sort {#settings-max_bytes_before_external_sort}
Включает или отключает выполнение `ORDER BY` во внешней памяти. См. [Детали реализации ORDER BY](../../sql-reference/statements/select/order-by.md#implementation-details)
- Максимальный объем оперативной памяти (в байтах), который может использоваться одной операцией [ORDER BY](../../sql-reference/statements/select/order-by.md). Рекомендуемое значение — половина доступной системной памяти.
- 0 — `ORDER BY` во внешней памяти отключен.
Значение по умолчанию: 0.
## max_rows_to_sort {#max-rows-to-sort}
Максимальное количество строк до сортировки. Позволяет ограничить потребление оперативки при сортировке.

View File

@ -2136,7 +2136,7 @@ SELECT * FROM test_table
- [distributed_replica_error_cap](#settings-distributed_replica_error_cap)
- [distributed_replica_error_half_life](#settings-distributed_replica_error_half_life)
## distributed_directory_monitor_sleep_time_ms {#distributed_directory_monitor_sleep_time_ms}
## distributed_background_insert_sleep_time_ms {#distributed_background_insert_sleep_time_ms}
Основной интервал отправки данных движком таблиц [Distributed](../../engines/table-engines/special/distributed.md). Фактический интервал растёт экспоненциально при возникновении ошибок.
@ -2146,9 +2146,9 @@ SELECT * FROM test_table
Значение по умолчанию: 100 миллисекунд.
## distributed_directory_monitor_max_sleep_time_ms {#distributed_directory_monitor_max_sleep_time_ms}
## distributed_background_insert_max_sleep_time_ms {#distributed_background_insert_max_sleep_time_ms}
Максимальный интервал отправки данных движком таблиц [Distributed](../../engines/table-engines/special/distributed.md). Ограничивает экпоненциальный рост интервала, установленого настройкой [distributed_directory_monitor_sleep_time_ms](#distributed_directory_monitor_sleep_time_ms).
Максимальный интервал отправки данных движком таблиц [Distributed](../../engines/table-engines/special/distributed.md). Ограничивает экпоненциальный рост интервала, установленого настройкой [distributed_background_insert_sleep_time_ms](#distributed_background_insert_sleep_time_ms).
Возможные значения:
@ -2156,7 +2156,7 @@ SELECT * FROM test_table
Значение по умолчанию: 30000 миллисекунд (30 секунд).
## distributed_directory_monitor_batch_inserts {#distributed_directory_monitor_batch_inserts}
## distributed_background_insert_batch {#distributed_background_insert_batch}
Включает/выключает пакетную отправку вставленных данных.
@ -2323,11 +2323,11 @@ SELECT * FROM test_table
Значение по умолчанию: 0.
## insert_distributed_sync {#insert_distributed_sync}
## distributed_foreground_insert {#distributed_foreground_insert}
Включает или отключает режим синхронного добавления данных в распределенные таблицы (таблицы с движком [Distributed](../../engines/table-engines/special/distributed.md#distributed)).
По умолчанию ClickHouse вставляет данные в распределённую таблицу в асинхронном режиме. Если `insert_distributed_sync=1`, то данные вставляются сихронно, а запрос `INSERT` считается выполненным успешно, когда данные записаны на все шарды (по крайней мере на одну реплику для каждого шарда, если `internal_replication = true`).
По умолчанию ClickHouse вставляет данные в распределённую таблицу в асинхронном режиме. Если `distributed_foreground_insert=1`, то данные вставляются сихронно, а запрос `INSERT` считается выполненным успешно, когда данные записаны на все шарды (по крайней мере на одну реплику для каждого шарда, если `internal_replication = true`).
Возможные значения:

View File

@ -31,27 +31,25 @@ WITH arrayMap(x -> demangle(addressToSymbol(x)), trace) AS all SELECT thread_nam
``` text
Row 1:
──────
thread_name: clickhouse-serv
thread_id: 686
query_id: 1a11f70b-626d-47c1-b948-f9c7b206395d
res: sigqueue
DB::StorageSystemStackTrace::fillData(std::__1::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::__1::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, DB::Context const&, DB::SelectQueryInfo const&) const
DB::IStorageSystemOneBlock<DB::StorageSystemStackTrace>::read(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, DB::SelectQueryInfo const&, DB::Context const&, DB::QueryProcessingStage::Enum, unsigned long, unsigned int)
DB::InterpreterSelectQuery::executeFetchColumns(DB::QueryProcessingStage::Enum, DB::QueryPipeline&, std::__1::shared_ptr<DB::PrewhereInfo> const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&)
DB::InterpreterSelectQuery::executeImpl(DB::QueryPipeline&, std::__1::shared_ptr<DB::IBlockInputStream> const&, std::__1::optional<DB::Pipe>)
DB::InterpreterSelectQuery::execute()
DB::InterpreterSelectWithUnionQuery::execute()
DB::executeQueryImpl(char const*, char const*, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool, DB::ReadBuffer*)
DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool)
DB::TCPHandler::runImpl()
DB::TCPHandler::run()
Poco::Net::TCPServerConnection::start()
Poco::Net::TCPServerDispatcher::run()
Poco::PooledThread::run()
Poco::ThreadImpl::runnableEntry(void*)
start_thread
__clone
thread_name: QueryPipelineEx
thread_id: 743490
query_id: dc55a564-febb-4e37-95bb-090ef182c6f1
res: memcpy
large_ralloc
arena_ralloc
do_rallocx
Allocator<true, true>::realloc(void*, unsigned long, unsigned long, unsigned long)
HashTable<unsigned long, HashMapCell<unsigned long, char*, HashCRC32<unsigned long>, HashTableNoState, PairNoInit<unsigned long, char*>>, HashCRC32<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>>::resize(unsigned long, unsigned long)
void DB::Aggregator::executeImplBatch<false, false, true, DB::AggregationMethodOneNumber<unsigned long, HashMapTable<unsigned long, HashMapCell<unsigned long, char*, HashCRC32<unsigned long>, HashTableNoState, PairNoInit<unsigned long, char*>>, HashCRC32<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>>, true, false>>(DB::AggregationMethodOneNumber<unsigned long, HashMapTable<unsigned long, HashMapCell<unsigned long, char*, HashCRC32<unsigned long>, HashTableNoState, PairNoInit<unsigned long, char*>>, HashCRC32<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>>, true, false>&, DB::AggregationMethodOneNumber<unsigned long, HashMapTable<unsigned long, HashMapCell<unsigned long, char*, HashCRC32<unsigned long>, HashTableNoState, PairNoInit<unsigned long, char*>>, HashCRC32<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>>, true, false>::State&, DB::Arena*, unsigned long, unsigned long, DB::Aggregator::AggregateFunctionInstruction*, bool, char*) const
DB::Aggregator::executeImpl(DB::AggregatedDataVariants&, unsigned long, unsigned long, std::__1::vector<DB::IColumn const*, std::__1::allocator<DB::IColumn const*>>&, DB::Aggregator::AggregateFunctionInstruction*, bool, bool, char*) const
DB::Aggregator::executeOnBlock(std::__1::vector<COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::__1::allocator<COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>, unsigned long, unsigned long, DB::AggregatedDataVariants&, std::__1::vector<DB::IColumn const*, std::__1::allocator<DB::IColumn const*>>&, std::__1::vector<std::__1::vector<DB::IColumn const*, std::__1::allocator<DB::IColumn const*>>, std::__1::allocator<std::__1::vector<DB::IColumn const*, std::__1::allocator<DB::IColumn const*>>>>&, bool&) const
DB::AggregatingTransform::work()
DB::ExecutionThreadContext::executeTask()
DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*)
void std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreads()::$_0, void ()>>(std::__1::__function::__policy_storage const*)
ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::worker(std::__1::__list_iterator<ThreadFromGlobalPoolImpl<false>, void*>)
void std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<void ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::scheduleImpl<void>(std::__1::function<void ()>, Priority, std::__1::optional<unsigned long>, bool)::'lambda0'()>(void&&)::'lambda'(), void ()>>(std::__1::__function::__policy_storage const*)
void* std::__1::__thread_proxy[abi:v15000]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, Priority, std::__1::optional<unsigned long>, bool)::'lambda0'()>>(void*)
```
Получение имен файлов и номеров строк в исходном коде ClickHouse:

View File

@ -111,7 +111,7 @@ $ clickhouse-copier --daemon --config zookeeper.xml --task-path /task/path --bas
<settings>
<connect_timeout>3</connect_timeout>
<!-- Sync insert is set forcibly, leave it here just in case. -->
<insert_distributed_sync>1</insert_distributed_sync>
<distributed_foreground_insert>1</distributed_foreground_insert>
</settings>
<!-- Copying tasks description.

View File

@ -128,7 +128,7 @@ SYSTEM RELOAD CONFIG [ON CLUSTER cluster_name]
## Управление распределёнными таблицами {#query-language-system-distributed}
ClickHouse может оперировать [распределёнными](../../sql-reference/statements/system.md) таблицами. Когда пользователь вставляет данные в эти таблицы, ClickHouse сначала формирует очередь из данных, которые должны быть отправлены на узлы кластера, а затем асинхронно отправляет подготовленные данные. Вы можете управлять очередью с помощью запросов [STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends), [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends) и [FLUSH DISTRIBUTED](#query_language-system-flush-distributed). Также есть возможность синхронно вставлять распределенные данные с помощью настройки [insert_distributed_sync](../../operations/settings/settings.md#insert_distributed_sync).
ClickHouse может оперировать [распределёнными](../../sql-reference/statements/system.md) таблицами. Когда пользователь вставляет данные в эти таблицы, ClickHouse сначала формирует очередь из данных, которые должны быть отправлены на узлы кластера, а затем асинхронно отправляет подготовленные данные. Вы можете управлять очередью с помощью запросов [STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends), [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends) и [FLUSH DISTRIBUTED](#query_language-system-flush-distributed). Также есть возможность синхронно вставлять распределенные данные с помощью настройки [distributed_foreground_insert](../../operations/settings/settings.md#distributed_foreground_insert).
### STOP DISTRIBUTED SENDS {#query_language-system-stop-distributed-sends}

View File

@ -43,7 +43,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] AS [db2.]name2
**详见**
- [insert_distributed_sync](../../../operations/settings/settings.md#insert_distributed_sync) 设置
- [distributed_foreground_insert](../../../operations/settings/settings.md#distributed_foreground_insert) 设置
- [MergeTree](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes) 查看示例
**分布式设置**
@ -58,24 +58,24 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] AS [db2.]name2
- `max_delay_to_insert` - 最大延迟多少秒插入数据到分布式表如果有很多挂起字节异步发送。默认值60。
- `monitor_batch_inserts` - 等同于 [distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts)
- `background_insert_batch` - 等同于 [distributed_background_insert_batch](../../../operations/settings/settings.md#distributed_background_insert_batch)
- `monitor_split_batch_on_failure` - 等同于[distributed_directory_monitor_split_batch_on_failure](../../../operations/settings/settings.md#distributed_directory_monitor_split_batch_on_failure)
- `background_insert_split_batch_on_failure` - 等同于[distributed_background_insert_split_batch_on_failure](../../../operations/settings/settings.md#distributed_background_insert_split_batch_on_failure)
- `monitor_sleep_time_ms` - 等同于 [distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms)
- `background_insert_sleep_time_ms` - 等同于 [distributed_background_insert_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_sleep_time_ms)
- `monitor_max_sleep_time_ms` - 等同于 [distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms)
- `background_insert_max_sleep_time_ms` - 等同于 [distributed_background_insert_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_max_sleep_time_ms)
::note
**稳定性设置** (`fsync_...`):
- 只影响异步插入(例如:`insert_distributed_sync=false`), 当数据首先存储在启动节点磁盘上然后再异步发送到shard。
- 只影响异步插入(例如:`distributed_foreground_insert=false`), 当数据首先存储在启动节点磁盘上然后再异步发送到shard。
— 可能会显著降低`insert`的性能
- 影响将存储在分布式表文件夹中的数据写入 **接受您插入的节点** 。如果你需要保证写入数据到底层的MergeTree表中请参阅 `system.merge_tree_settings` 中的持久性设置(`...fsync...`)
**插入限制设置** (`..._insert`) 请见:
- [insert_distributed_sync](../../../operations/settings/settings.md#insert_distributed_sync) 设置
- [distributed_foreground_insert](../../../operations/settings/settings.md#distributed_foreground_insert) 设置
- [prefer_localhost_replica](../../../operations/settings/settings.md#settings-prefer-localhost-replica) 设置
- `bytes_to_throw_insert``bytes_to_delay_insert` 之前处理,所以你不应该设置它的值小于 `bytes_to_delay_insert`
:::
@ -209,7 +209,7 @@ SELECT 查询会被发送到所有分片,并且无论数据在分片中如何
- 使用需要特定键连接数据( IN 或 JOIN )的查询。如果数据是用该键进行分片,则应使用本地 IN 或 JOIN 而不是 GLOBAL IN 或 GLOBAL JOIN这样效率更高。
- 使用大量服务器(上百或更多),但有大量小查询(个别客户的查询 - 网站,广告商或合作伙伴)。为了使小查询不影响整个集群,让单个客户的数据处于单个分片上是有意义的。或者 你可以配置两级分片:将整个集群划分为«层»,一个层可以包含多个分片。单个客户的数据位于单个层上,根据需要将分片添加到层中,层中的数据随机分布。然后给每层创建分布式表,再创建一个全局的分布式表用于全局的查询。
数据是异步写入的。对于分布式表的 INSERT数据块只写本地文件系统。之后会尽快地在后台发送到远程服务器。发送数据的周期性是由[distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms)和[distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms)设置。分布式引擎会分别发送每个插入数据的文件,但是你可以使用[distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts)设置启用批量发送文件。该设置通过更好地利用本地服务器和网络资源来提高集群性能。你应该检查表目录`/var/lib/clickhouse/data/database/table/`中的文件列表(等待发送的数据)来检查数据是否发送成功。执行后台任务的线程数可以通过[background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size)设置。
数据是异步写入的。对于分布式表的 INSERT数据块只写本地文件系统。之后会尽快地在后台发送到远程服务器。发送数据的周期性是由[distributed_background_insert_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_sleep_time_ms)和[distributed_background_insert_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_max_sleep_time_ms)设置。分布式引擎会分别发送每个插入数据的文件,但是你可以使用[distributed_background_insert_batch](../../../operations/settings/settings.md#distributed_background_insert_batch)设置启用批量发送文件。该设置通过更好地利用本地服务器和网络资源来提高集群性能。你应该检查表目录`/var/lib/clickhouse/data/database/table/`中的文件列表(等待发送的数据)来检查数据是否发送成功。执行后台任务的线程数可以通过[background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size)设置。
如果在 INSERT 到分布式表时服务器节点丢失或重启设备故障则插入的数据可能会丢失。如果在表目录中检测到损坏的数据分片则会将其转移到«broken»子目录并不再使用。

View File

@ -1088,7 +1088,7 @@ ClickHouse生成异常
- [表引擎分布式](../../engines/table-engines/special/distributed.md)
- [distributed_replica_error_half_life](#settings-distributed_replica_error_half_life)
## distributed_directory_monitor_sleep_time_ms {#distributed_directory_monitor_sleep_time_ms}
## distributed_background_insert_sleep_time_ms {#distributed_background_insert_sleep_time_ms}
对于基本间隔 [分布](../../engines/table-engines/special/distributed.md) 表引擎发送数据。 在发生错误时,实际间隔呈指数级增长。
@ -1098,9 +1098,9 @@ ClickHouse生成异常
默认值100毫秒。
## distributed_directory_monitor_max_sleep_time_ms {#distributed_directory_monitor_max_sleep_time_ms}
## distributed_background_insert_max_sleep_time_ms {#distributed_background_insert_max_sleep_time_ms}
的最大间隔 [分布](../../engines/table-engines/special/distributed.md) 表引擎发送数据。 限制在设置的区间的指数增长 [distributed_directory_monitor_sleep_time_ms](#distributed_directory_monitor_sleep_time_ms) 设置。
的最大间隔 [分布](../../engines/table-engines/special/distributed.md) 表引擎发送数据。 限制在设置的区间的指数增长 [distributed_background_insert_sleep_time_ms](#distributed_background_insert_sleep_time_ms) 设置。
可能的值:
@ -1108,7 +1108,7 @@ ClickHouse生成异常
默认值30000毫秒30秒
## distributed_directory_monitor_batch_inserts {#distributed_directory_monitor_batch_inserts}
## distributed_background_insert_batch {#distributed_background_insert_batch}
启用/禁用批量发送插入的数据。

View File

@ -100,7 +100,7 @@ clickhouse-copier --daemon --config zookeeper.xml --task-path /task/path --base-
<settings>
<connect_timeout>3</connect_timeout>
<!-- Sync insert is set forcibly, leave it here just in case. -->
<insert_distributed_sync>1</insert_distributed_sync>
<distributed_foreground_insert>1</distributed_foreground_insert>
</settings>
<!-- Copying tasks description.

View File

@ -93,7 +93,7 @@ SYSTEM RELOAD CONFIG [ON CLUSTER cluster_name]
## Managing Distributed Tables {#query-language-system-distributed}
ClickHouse可以管理 [distribute](../../engines/table-engines/special/distributed.md)表。当用户向这类表插入数据时ClickHouse首先为需要发送到集群节点的数据创建一个队列然后异步的发送它们。你可以维护队列的处理过程通过[STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends), [FLUSH DISTRIBUTED](#query_language-system-flush-distributed), 以及 [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends)。你也可以设置 `insert_distributed_sync`参数来以同步的方式插入分布式数据。
ClickHouse可以管理 [distribute](../../engines/table-engines/special/distributed.md)表。当用户向这类表插入数据时ClickHouse首先为需要发送到集群节点的数据创建一个队列然后异步的发送它们。你可以维护队列的处理过程通过[STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends), [FLUSH DISTRIBUTED](#query_language-system-flush-distributed), 以及 [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends)。你也可以设置 `distributed_foreground_insert`参数来以同步的方式插入分布式数据。
### STOP DISTRIBUTED SENDS {#query_language-system-stop-distributed-sends}

View File

@ -4,6 +4,7 @@
#include <iostream>
#include <fstream>
#include <iomanip>
#include <optional>
#include <random>
#include <string_view>
#include <pcg_random.hpp>
@ -28,10 +29,12 @@
#include <Interpreters/Context.h>
#include <Client/Connection.h>
#include <Common/InterruptListener.h>
#include <Common/Config/configReadClient.h>
#include <Common/Config/ConfigProcessor.h>
#include <Common/Config/getClientConfigPath.h>
#include <Common/TerminalSize.h>
#include <Common/StudentTTest.h>
#include <Common/CurrentMetrics.h>
#include <Common/ErrorCodes.h>
#include <filesystem>
@ -156,7 +159,17 @@ public:
if (home_path_cstr)
home_path = home_path_cstr;
configReadClient(config(), home_path);
std::optional<std::string> config_path;
if (config().has("config-file"))
config_path.emplace(config().getString("config-file"));
else
config_path = getClientConfigPath(home_path);
if (config_path.has_value())
{
ConfigProcessor config_processor(*config_path);
auto loaded_config = config_processor.loadConfig();
config().add(loaded_config.configuration);
}
}
int main(const std::vector<std::string> &) override

View File

@ -25,7 +25,8 @@
#include <Common/Exception.h>
#include <Common/formatReadable.h>
#include <Common/TerminalSize.h>
#include <Common/Config/configReadClient.h>
#include <Common/Config/ConfigProcessor.h>
#include <Common/Config/getClientConfigPath.h>
#include <Core/QueryProcessingStage.h>
#include <Columns/ColumnString.h>
@ -131,69 +132,64 @@ void Client::showWarnings()
}
}
void Client::parseConnectionsCredentials()
void Client::parseConnectionsCredentials(Poco::Util::AbstractConfiguration & config, const std::string & connection_name)
{
/// It is not possible to correctly handle multiple --host --port options.
if (hosts_and_ports.size() >= 2)
return;
std::optional<String> host;
std::optional<String> default_connection_name;
if (hosts_and_ports.empty())
{
if (config().has("host"))
host = config().getString("host");
if (config.has("host"))
default_connection_name = config.getString("host");
}
else
{
host = hosts_and_ports.front().host;
default_connection_name = hosts_and_ports.front().host;
}
String connection;
if (config().has("connection"))
connection = config().getString("connection");
if (!connection_name.empty())
connection = connection_name;
else
connection = host.value_or("localhost");
connection = default_connection_name.value_or("localhost");
Strings keys;
config().keys("connections_credentials", keys);
config.keys("connections_credentials", keys);
bool connection_found = false;
for (const auto & key : keys)
{
const String & prefix = "connections_credentials." + key;
const String & connection_name = config().getString(prefix + ".name", "");
if (connection_name != connection)
const String & name = config.getString(prefix + ".name", "");
if (name != connection)
continue;
connection_found = true;
String connection_hostname;
if (config().has(prefix + ".hostname"))
connection_hostname = config().getString(prefix + ".hostname");
if (config.has(prefix + ".hostname"))
connection_hostname = config.getString(prefix + ".hostname");
else
connection_hostname = connection_name;
connection_hostname = name;
if (hosts_and_ports.empty())
config().setString("host", connection_hostname);
if (config().has(prefix + ".port") && hosts_and_ports.empty())
config().setInt("port", config().getInt(prefix + ".port"));
if (config().has(prefix + ".secure") && !config().has("secure"))
config().setBool("secure", config().getBool(prefix + ".secure"));
if (config().has(prefix + ".user") && !config().has("user"))
config().setString("user", config().getString(prefix + ".user"));
if (config().has(prefix + ".password") && !config().has("password"))
config().setString("password", config().getString(prefix + ".password"));
if (config().has(prefix + ".database") && !config().has("database"))
config().setString("database", config().getString(prefix + ".database"));
if (config().has(prefix + ".history_file") && !config().has("history_file"))
config.setString("host", connection_hostname);
if (config.has(prefix + ".port"))
config.setInt("port", config.getInt(prefix + ".port"));
if (config.has(prefix + ".secure"))
config.setBool("secure", config.getBool(prefix + ".secure"));
if (config.has(prefix + ".user"))
config.setString("user", config.getString(prefix + ".user"));
if (config.has(prefix + ".password"))
config.setString("password", config.getString(prefix + ".password"));
if (config.has(prefix + ".database"))
config.setString("database", config.getString(prefix + ".database"));
if (config.has(prefix + ".history_file"))
{
String history_file = config().getString(prefix + ".history_file");
String history_file = config.getString(prefix + ".history_file");
if (history_file.starts_with("~") && !home_path.empty())
history_file = home_path + "/" + history_file.substr(1);
config().setString("history_file", history_file);
config.setString("history_file", history_file);
}
}
if (config().has("connection") && !connection_found)
if (!connection_name.empty() && !connection_found)
throw Exception(ErrorCodes::NO_ELEMENTS_IN_CONFIG, "No such connection '{}' in connections_credentials", connection);
}
@ -263,7 +259,20 @@ void Client::initialize(Poco::Util::Application & self)
if (home_path_cstr)
home_path = home_path_cstr;
configReadClient(config(), home_path);
std::optional<std::string> config_path;
if (config().has("config-file"))
config_path.emplace(config().getString("config-file"));
else
config_path = getClientConfigPath(home_path);
if (config_path.has_value())
{
ConfigProcessor config_processor(*config_path);
auto loaded_config = config_processor.loadConfig();
parseConnectionsCredentials(*loaded_config.configuration, config().getString("connection", ""));
config().add(loaded_config.configuration);
}
else if (config().has("connection"))
throw Exception(ErrorCodes::BAD_ARGUMENTS, "--connection was specified, but config does not exists");
/** getenv is thread-safe in Linux glibc and in all sane libc implementations.
* But the standard does not guarantee that subsequent calls will not rewrite the value by returned pointer.
@ -286,8 +295,6 @@ void Client::initialize(Poco::Util::Application & self)
if (env_password && !config().has("password"))
config().setString("password", env_password);
parseConnectionsCredentials();
// global_context->setApplicationType(Context::ApplicationType::CLIENT);
global_context->setQueryParameters(query_parameters);
@ -1454,7 +1461,9 @@ int mainEntryClickHouseClient(int argc, char ** argv)
try
{
DB::Client client;
// Initialize command line options
client.init(argc, argv);
/// Initialize config file
return client.run();
}
catch (const DB::Exception & e)

View File

@ -47,7 +47,7 @@ protected:
private:
void printChangedSettings() const;
void showWarnings();
void parseConnectionsCredentials();
void parseConnectionsCredentials(Poco::Util::AbstractConfiguration & config, const std::string & connection_name);
std::vector<String> loadWarningMessages();
};
}

View File

@ -58,7 +58,7 @@ void DB::TaskCluster::reloadSettings(const Poco::Util::AbstractConfiguration & c
/// Override important settings
settings_pull.readonly = 1;
settings_pull.prefer_localhost_replica = false;
settings_push.insert_distributed_sync = true;
settings_push.distributed_foreground_insert = true;
settings_push.prefer_localhost_replica = false;
set_default_value(settings_pull.load_balancing, LoadBalancing::NEAREST_HOSTNAME);
@ -66,7 +66,7 @@ void DB::TaskCluster::reloadSettings(const Poco::Util::AbstractConfiguration & c
set_default_value(settings_pull.max_block_size, 8192UL);
set_default_value(settings_pull.preferred_block_size_bytes, 0);
set_default_value(settings_push.insert_distributed_timeout, 0);
set_default_value(settings_push.distributed_background_insert_timeout, 0);
set_default_value(settings_push.alter_sync, 2);
}

View File

@ -537,6 +537,16 @@ static void sanityChecks(Server & server)
{
}
try
{
const char * filename = "/proc/sys/kernel/task_delayacct";
if (readNumber(filename) == 0)
server.context()->addWarningMessage("Delay accounting is not enabled, OSIOWaitMicroseconds will not be gathered. Check " + String(filename));
}
catch (...) // NOLINT(bugprone-empty-catch)
{
}
std::string dev_id = getBlockDeviceId(data_path);
if (getBlockDeviceType(dev_id) == BlockDeviceType::ROT && getBlockDeviceReadAheadBytes(dev_id) == 0)
server.context()->addWarningMessage("Rotational disk with disabled readahead is in use. Performance can be degraded. Used for data: " + String(data_path));

View File

@ -1479,6 +1479,15 @@
<!-- <disable_internal_dns_cache>1</disable_internal_dns_cache> -->
<!-- You can also configure rocksdb like this: -->
<!-- Full list of options:
- options:
- https://github.com/facebook/rocksdb/blob/4b013dcbed2df84fde3901d7655b9b91c557454d/include/rocksdb/options.h#L1452
- column_family_options:
- https://github.com/facebook/rocksdb/blob/4b013dcbed2df84fde3901d7655b9b91c557454d/include/rocksdb/options.h#L66
- block_based_table_options:
- https://github.com/facebook/rocksdb/blob/4b013dcbed2df84fde3901d7655b9b91c557454d/table/block_based/block_based_table_factory.cc#L228
- https://github.com/facebook/rocksdb/blob/4b013dcbed2df84fde3901d7655b9b91c557454d/include/rocksdb/table.h#L129
-->
<!--
<rocksdb>
<options>
@ -1487,6 +1496,9 @@
<column_family_options>
<num_levels>2</num_levels>
</column_family_options>
<block_based_table_options>
<block_size>1024</block_size>
</block_based_table_options>
<tables>
<table>
<name>TABLE</name>
@ -1496,6 +1508,9 @@
<column_family_options>
<num_levels>2</num_levels>
</column_family_options>
<block_based_table_options>
<block_size>1024</block_size>
</block_based_table_options>
</table>
</tables>
</rocksdb>

View File

@ -1,5 +1,5 @@
#include <AggregateFunctions/AggregateFunctionFactory.h>
#include <AggregateFunctions/AggregateFunctionCombinatorFactory.h>
#include <AggregateFunctions/Combinators/AggregateFunctionCombinatorFactory.h>
#include <DataTypes/DataTypeAggregateFunction.h>
#include <DataTypes/DataTypeNullable.h>

View File

@ -151,10 +151,12 @@ public:
}
else if (BitmapKind::Bitmap == kind)
{
auto size = roaring_bitmap->getSizeInBytes();
std::unique_ptr<RoaringBitmap> bitmap = std::make_unique<RoaringBitmap>(*roaring_bitmap);
bitmap->runOptimize();
auto size = bitmap->getSizeInBytes();
writeVarUInt(size, out);
std::unique_ptr<char[]> buf(new char[size]);
roaring_bitmap->write(buf.get());
bitmap->write(buf.get());
out.write(buf.get(), size);
}
}

View File

@ -95,7 +95,7 @@ public:
size_t size = set.size();
writeVarUInt(size, buf);
for (const auto & elem : set)
writeIntBinary(elem, buf);
writeBinaryLittleEndian(elem.key, buf);
}
void deserialize(AggregateDataPtr __restrict place, ReadBuffer & buf, std::optional<size_t> /* version */, Arena *) const override

View File

@ -2,7 +2,7 @@
#include <unordered_set>
#include <AggregateFunctions/AggregateFunctionNull.h>
#include <AggregateFunctions/Combinators/AggregateFunctionNull.h>
#include <Columns/ColumnsNumber.h>

View File

@ -1,138 +0,0 @@
#include "AggregateFunctionMap.h"
#include "AggregateFunctions/AggregateFunctionCombinatorFactory.h"
#include "Functions/FunctionHelpers.h"
namespace DB
{
namespace ErrorCodes
{
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
}
class AggregateFunctionCombinatorMap final : public IAggregateFunctionCombinator
{
public:
String getName() const override { return "Map"; }
DataTypes transformArguments(const DataTypes & arguments) const override
{
if (arguments.empty())
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Incorrect number of arguments for aggregate function with {} suffix", getName());
const auto * map_type = checkAndGetDataType<DataTypeMap>(arguments[0].get());
if (map_type)
{
if (arguments.size() > 1)
throw Exception(ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH, "{} combinator takes only one map argument", getName());
return DataTypes({map_type->getValueType()});
}
// we need this part just to pass to redirection for mapped arrays
auto check_func = [](DataTypePtr t) { return t->getTypeId() == TypeIndex::Array; };
const auto * tup_type = checkAndGetDataType<DataTypeTuple>(arguments[0].get());
if (tup_type)
{
const auto & types = tup_type->getElements();
bool arrays_match = arguments.size() == 1 && types.size() >= 2 && std::all_of(types.begin(), types.end(), check_func);
if (arrays_match)
{
const auto * val_array_type = assert_cast<const DataTypeArray *>(types[1].get());
return DataTypes({val_array_type->getNestedType()});
}
}
else
{
bool arrays_match = arguments.size() >= 2 && std::all_of(arguments.begin(), arguments.end(), check_func);
if (arrays_match)
{
const auto * val_array_type = assert_cast<const DataTypeArray *>(arguments[1].get());
return DataTypes({val_array_type->getNestedType()});
}
}
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Aggregate function {} requires map as argument", getName());
}
AggregateFunctionPtr transformAggregateFunction(
const AggregateFunctionPtr & nested_function,
const AggregateFunctionProperties &,
const DataTypes & arguments,
const Array & params) const override
{
const auto * map_type = checkAndGetDataType<DataTypeMap>(arguments[0].get());
if (map_type)
{
const auto & key_type = map_type->getKeyType();
switch (key_type->getTypeId())
{
case TypeIndex::Enum8:
case TypeIndex::Int8:
return std::make_shared<AggregateFunctionMap<Int8>>(nested_function, arguments);
case TypeIndex::Enum16:
case TypeIndex::Int16:
return std::make_shared<AggregateFunctionMap<Int16>>(nested_function, arguments);
case TypeIndex::Int32:
return std::make_shared<AggregateFunctionMap<Int32>>(nested_function, arguments);
case TypeIndex::Int64:
return std::make_shared<AggregateFunctionMap<Int64>>(nested_function, arguments);
case TypeIndex::Int128:
return std::make_shared<AggregateFunctionMap<Int128>>(nested_function, arguments);
case TypeIndex::Int256:
return std::make_shared<AggregateFunctionMap<Int256>>(nested_function, arguments);
case TypeIndex::UInt8:
return std::make_shared<AggregateFunctionMap<UInt8>>(nested_function, arguments);
case TypeIndex::Date:
case TypeIndex::UInt16:
return std::make_shared<AggregateFunctionMap<UInt16>>(nested_function, arguments);
case TypeIndex::DateTime:
case TypeIndex::UInt32:
return std::make_shared<AggregateFunctionMap<UInt32>>(nested_function, arguments);
case TypeIndex::UInt64:
return std::make_shared<AggregateFunctionMap<UInt64>>(nested_function, arguments);
case TypeIndex::UInt128:
return std::make_shared<AggregateFunctionMap<UInt128>>(nested_function, arguments);
case TypeIndex::UInt256:
return std::make_shared<AggregateFunctionMap<UInt256>>(nested_function, arguments);
case TypeIndex::UUID:
return std::make_shared<AggregateFunctionMap<UUID>>(nested_function, arguments);
case TypeIndex::IPv4:
return std::make_shared<AggregateFunctionMap<IPv4>>(nested_function, arguments);
case TypeIndex::IPv6:
return std::make_shared<AggregateFunctionMap<IPv6>>(nested_function, arguments);
case TypeIndex::FixedString:
case TypeIndex::String:
return std::make_shared<AggregateFunctionMap<String>>(nested_function, arguments);
default:
throw Exception(
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Map key type {} is not is not supported by combinator {}", key_type->getName(), getName());
}
}
else
{
// in case of tuple of arrays or just arrays (checked in transformArguments), try to redirect to sum/min/max-MappedArrays to implement old behavior
auto nested_func_name = nested_function->getName();
if (nested_func_name == "sum" || nested_func_name == "min" || nested_func_name == "max")
{
AggregateFunctionProperties out_properties;
auto & aggr_func_factory = AggregateFunctionFactory::instance();
return aggr_func_factory.get(nested_func_name + "MappedArrays", arguments, params, out_properties);
}
else
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Aggregation '{}Map' is not implemented for mapped arrays",
nested_func_name);
}
}
};
void registerAggregateFunctionCombinatorMap(AggregateFunctionCombinatorFactory & factory)
{
factory.registerCombinator(std::make_shared<AggregateFunctionCombinatorMap>());
}
}

View File

@ -72,14 +72,14 @@ public:
{
writeBinary(has(), buf);
if (has())
writeBinary(value, buf);
writeBinaryLittleEndian(value, buf);
}
void read(ReadBuffer & buf, const ISerialization & /*serialization*/, Arena *)
{
readBinary(has_value, buf);
if (has())
readBinary(value, buf);
readBinaryLittleEndian(value, buf);
}
@ -1275,13 +1275,13 @@ struct AggregateFunctionAnyHeavyData : Data
void write(WriteBuffer & buf, const ISerialization & serialization) const
{
Data::write(buf, serialization);
writeBinary(counter, buf);
writeBinaryLittleEndian(counter, buf);
}
void read(ReadBuffer & buf, const ISerialization & serialization, Arena * arena)
{
Data::read(buf, serialization, arena);
readBinary(counter, buf);
readBinaryLittleEndian(counter, buf);
}
static const char * name() { return "anyHeavy"; }

View File

@ -19,7 +19,6 @@
#include <Common/assert_cast.h>
#include <AggregateFunctions/IAggregateFunction.h>
#include <AggregateFunctions/AggregateFunctionNull.h>
#include <type_traits>
#include <bitset>

View File

@ -34,7 +34,7 @@ struct TheilsUData : CrossTabData
for (const auto & [key, value] : count_ab)
{
Float64 value_ab = value;
Float64 value_b = count_b.at(key.items[1]);
Float64 value_b = count_b.at(key.items[UInt128::_impl::little(1)]);
dep += (value_ab / count) * log(value_ab / value_b);
}

View File

@ -8,8 +8,6 @@
#include <IO/WriteHelpers.h>
#include <Common/assert_cast.h>
#include <AggregateFunctions/AggregateFunctionNull.h>
namespace DB
{
struct Settings;

View File

@ -1,20 +1,21 @@
include("${ClickHouse_SOURCE_DIR}/cmake/dbms_glob_sources.cmake")
add_headers_and_sources(clickhouse_aggregate_functions .)
add_headers_and_sources(clickhouse_aggregate_functions Combinators)
extract_into_parent_list(clickhouse_aggregate_functions_sources dbms_sources
IAggregateFunction.cpp
AggregateFunctionFactory.cpp
AggregateFunctionCombinatorFactory.cpp
AggregateFunctionState.cpp
Combinators/AggregateFunctionCombinatorFactory.cpp
Combinators/AggregateFunctionState.cpp
AggregateFunctionCount.cpp
parseAggregateFunctionParameters.cpp
)
extract_into_parent_list(clickhouse_aggregate_functions_headers dbms_headers
IAggregateFunction.h
IAggregateFunctionCombinator.h
Combinators/IAggregateFunctionCombinator.h
AggregateFunctionFactory.h
AggregateFunctionCombinatorFactory.h
AggregateFunctionState.h
Combinators/AggregateFunctionCombinatorFactory.h
Combinators/AggregateFunctionState.h
AggregateFunctionCount.cpp
FactoryHelpers.h
parseAggregateFunctionParameters.h

View File

@ -0,0 +1,93 @@
#include "AggregateFunctionArgMinMax.h"
#include "AggregateFunctionCombinatorFactory.h"
#include <AggregateFunctions/AggregateFunctionMinMaxAny.h>
#include <DataTypes/DataTypeDate.h>
#include <DataTypes/DataTypeDateTime.h>
#include <DataTypes/DataTypeString.h>
namespace DB
{
namespace ErrorCodes
{
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
}
namespace
{
template <template <typename> class Data>
class AggregateFunctionCombinatorArgMinMax final : public IAggregateFunctionCombinator
{
public:
String getName() const override { return Data<SingleValueDataGeneric<>>::name(); }
DataTypes transformArguments(const DataTypes & arguments) const override
{
if (arguments.empty())
throw Exception(
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH,
"Incorrect number of arguments for aggregate function with {} suffix",
getName());
return DataTypes(arguments.begin(), arguments.end() - 1);
}
AggregateFunctionPtr transformAggregateFunction(
const AggregateFunctionPtr & nested_function,
const AggregateFunctionProperties &,
const DataTypes & arguments,
const Array & params) const override
{
const DataTypePtr & argument_type = arguments.back();
WhichDataType which(argument_type);
#define DISPATCH(TYPE) \
if (which.idx == TypeIndex::TYPE) \
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataFixed<TYPE>>>>(nested_function, arguments, params); /// NOLINT
FOR_NUMERIC_TYPES(DISPATCH)
#undef DISPATCH
if (which.idx == TypeIndex::Date)
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataFixed<DataTypeDate::FieldType>>>>(
nested_function, arguments, params);
if (which.idx == TypeIndex::DateTime)
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataFixed<DataTypeDateTime::FieldType>>>>(
nested_function, arguments, params);
if (which.idx == TypeIndex::DateTime64)
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataFixed<DateTime64>>>>(nested_function, arguments, params);
if (which.idx == TypeIndex::Decimal32)
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataFixed<Decimal32>>>>(nested_function, arguments, params);
if (which.idx == TypeIndex::Decimal64)
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataFixed<Decimal64>>>>(nested_function, arguments, params);
if (which.idx == TypeIndex::Decimal128)
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataFixed<Decimal128>>>>(nested_function, arguments, params);
if (which.idx == TypeIndex::Decimal256)
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataFixed<Decimal256>>>>(nested_function, arguments, params);
if (which.idx == TypeIndex::String)
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataString>>>(nested_function, arguments, params);
return std::make_shared<AggregateFunctionArgMinMax<Data<SingleValueDataGeneric<>>>>(nested_function, arguments, params);
}
};
template <typename Data>
struct AggregateFunctionArgMinDataCapitalized : AggregateFunctionMinData<Data>
{
static const char * name() { return "ArgMin"; }
};
template <typename Data>
struct AggregateFunctionArgMaxDataCapitalized : AggregateFunctionMaxData<Data>
{
static const char * name() { return "ArgMax"; }
};
}
void registerAggregateFunctionCombinatorMinMax(AggregateFunctionCombinatorFactory & factory)
{
factory.registerCombinator(std::make_shared<AggregateFunctionCombinatorArgMinMax<AggregateFunctionArgMinDataCapitalized>>());
factory.registerCombinator(std::make_shared<AggregateFunctionCombinatorArgMinMax<AggregateFunctionArgMaxDataCapitalized>>());
}
}

View File

@ -0,0 +1,111 @@
#pragma once
#include <AggregateFunctions/IAggregateFunction.h>
namespace DB
{
template <typename Key>
class AggregateFunctionArgMinMax final : public IAggregateFunctionHelper<AggregateFunctionArgMinMax<Key>>
{
private:
AggregateFunctionPtr nested_function;
SerializationPtr serialization;
size_t key_col;
size_t key_offset;
Key & key(AggregateDataPtr __restrict place) const { return *reinterpret_cast<Key *>(place + key_offset); }
const Key & key(ConstAggregateDataPtr __restrict place) const { return *reinterpret_cast<const Key *>(place + key_offset); }
public:
AggregateFunctionArgMinMax(AggregateFunctionPtr nested_function_, const DataTypes & arguments, const Array & params)
: IAggregateFunctionHelper<AggregateFunctionArgMinMax<Key>>{arguments, params, nested_function_->getResultType()}
, nested_function{nested_function_}
, serialization(arguments.back()->getDefaultSerialization())
, key_col{arguments.size() - 1}
, key_offset{(nested_function->sizeOfData() + alignof(Key) - 1) / alignof(Key) * alignof(Key)}
{
}
String getName() const override { return nested_function->getName() + Key::name(); }
bool isState() const override { return nested_function->isState(); }
bool isVersioned() const override { return nested_function->isVersioned(); }
size_t getVersionFromRevision(size_t revision) const override { return nested_function->getVersionFromRevision(revision); }
size_t getDefaultVersion() const override { return nested_function->getDefaultVersion(); }
bool allocatesMemoryInArena() const override { return nested_function->allocatesMemoryInArena() || Key::allocatesMemoryInArena(); }
bool hasTrivialDestructor() const override { return nested_function->hasTrivialDestructor(); }
size_t sizeOfData() const override { return key_offset + sizeof(Key); }
size_t alignOfData() const override { return nested_function->alignOfData(); }
void create(AggregateDataPtr __restrict place) const override
{
nested_function->create(place);
new (place + key_offset) Key;
}
void destroy(AggregateDataPtr __restrict place) const noexcept override { nested_function->destroy(place); }
void destroyUpToState(AggregateDataPtr __restrict place) const noexcept override { nested_function->destroyUpToState(place); }
void add(AggregateDataPtr __restrict place, const IColumn ** columns, size_t row_num, Arena * arena) const override
{
if (key(place).changeIfBetter(*columns[key_col], row_num, arena))
{
nested_function->destroy(place);
nested_function->create(place);
nested_function->add(place, columns, row_num, arena);
}
else if (key(place).isEqualTo(*columns[key_col], row_num))
{
nested_function->add(place, columns, row_num, arena);
}
}
void merge(AggregateDataPtr __restrict place, ConstAggregateDataPtr rhs, Arena * arena) const override
{
if (key(place).changeIfBetter(key(rhs), arena))
{
nested_function->destroy(place);
nested_function->create(place);
nested_function->merge(place, rhs, arena);
}
else if (key(place).isEqualTo(key(rhs)))
{
nested_function->merge(place, rhs, arena);
}
}
void serialize(ConstAggregateDataPtr __restrict place, WriteBuffer & buf, std::optional<size_t> version) const override
{
nested_function->serialize(place, buf, version);
key(place).write(buf, *serialization);
}
void deserialize(AggregateDataPtr __restrict place, ReadBuffer & buf, std::optional<size_t> version, Arena * arena) const override
{
nested_function->deserialize(place, buf, version, arena);
key(place).read(buf, *serialization, arena);
}
void insertResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
{
nested_function->insertResultInto(place, to, arena);
}
void insertMergeResultInto(AggregateDataPtr __restrict place, IColumn & to, Arena * arena) const override
{
nested_function->insertMergeResultInto(place, to, arena);
}
AggregateFunctionPtr getNestedFunction() const override { return nested_function; }
};
}

Some files were not shown because too many files have changed in this diff Show More