Merge remote-tracking branch 'origin/master' into fs-cache-improvement

This commit is contained in:
kssenii 2023-11-08 11:53:54 +01:00
commit dbea50738b
555 changed files with 9848 additions and 2969 deletions

View File

@ -33,7 +33,12 @@ jobs:
- name: Python unit tests
run: |
cd "$GITHUB_WORKSPACE/tests/ci"
python3 -m unittest discover -s . -p '*_test.py'
echo "Testing the main ci directory"
python3 -m unittest discover -s . -p 'test_*.py'
for dir in *_lambda/; do
echo "Testing $dir"
python3 -m unittest discover -s "$dir" -p 'test_*.py'
done
DockerHubPushAarch64:
runs-on: [self-hosted, style-checker-aarch64]
needs: CheckLabels
@ -69,7 +74,7 @@ jobs:
name: changed_images_amd64
path: ${{ runner.temp }}/docker_images_check/changed_images_amd64.json
DockerHubPush:
needs: [DockerHubPushAmd64, DockerHubPushAarch64]
needs: [DockerHubPushAmd64, DockerHubPushAarch64, PythonUnitTests]
runs-on: [self-hosted, style-checker]
steps:
- name: Check out repository code

View File

@ -19,7 +19,12 @@ jobs:
- name: Python unit tests
run: |
cd "$GITHUB_WORKSPACE/tests/ci"
python3 -m unittest discover -s . -p '*_test.py'
echo "Testing the main ci directory"
python3 -m unittest discover -s . -p 'test_*.py'
for dir in *_lambda/; do
echo "Testing $dir"
python3 -m unittest discover -s "$dir" -p 'test_*.py'
done
DockerHubPushAarch64:
runs-on: [self-hosted, style-checker-aarch64]
steps:

View File

@ -47,10 +47,10 @@ jobs:
run: |
cd "$GITHUB_WORKSPACE/tests/ci"
echo "Testing the main ci directory"
python3 -m unittest discover -s . -p '*_test.py'
python3 -m unittest discover -s . -p 'test_*.py'
for dir in *_lambda/; do
echo "Testing $dir"
python3 -m unittest discover -s "$dir" -p '*_test.py'
python3 -m unittest discover -s "$dir" -p 'test_*.py'
done
DockerHubPushAarch64:
needs: CheckLabels

View File

@ -1,4 +1,5 @@
### Table of Contents
**[ClickHouse release v23.10, 2023-11-02](#2310)**<br/>
**[ClickHouse release v23.9, 2023-09-28](#239)**<br/>
**[ClickHouse release v23.8 LTS, 2023-08-31](#238)**<br/>
**[ClickHouse release v23.7, 2023-07-27](#237)**<br/>
@ -12,6 +13,184 @@
# 2023 Changelog
### ClickHouse release 23.10, 2023-11-02
#### Backward Incompatible Change
* There is no longer an option to automatically remove broken data parts. This closes [#55174](https://github.com/ClickHouse/ClickHouse/issues/55174). [#55184](https://github.com/ClickHouse/ClickHouse/pull/55184) ([Alexey Milovidov](https://github.com/alexey-milovidov)). [#55557](https://github.com/ClickHouse/ClickHouse/pull/55557) ([Jihyuk Bok](https://github.com/tomahawk28)).
* The obsolete in-memory data parts can no longer be read from the write-ahead log. If you have configured in-memory parts before, they have to be removed before the upgrade. [#55186](https://github.com/ClickHouse/ClickHouse/pull/55186) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove the integration with Meilisearch. Reason: it was compatible only with the old version 0.18. The recent version of Meilisearch changed the protocol and does not work anymore. Note: we would appreciate it if you help to return it back. [#55189](https://github.com/ClickHouse/ClickHouse/pull/55189) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Rename directory monitor concept into background INSERT. All the settings `*directory_monitor*` had been renamed to `distributed_background_insert*`. *Backward compatibility should be preserved* (since old settings had been added as an alias). [#55978](https://github.com/ClickHouse/ClickHouse/pull/55978) ([Azat Khuzhin](https://github.com/azat)).
* Do not interpret the `send_timeout` set on the client side as the `receive_timeout` on the server side and vise-versa. [#56035](https://github.com/ClickHouse/ClickHouse/pull/56035) ([Azat Khuzhin](https://github.com/azat)).
* Comparison of time intervals with different units will throw an exception. This closes [#55942](https://github.com/ClickHouse/ClickHouse/issues/55942). You might have occasionally rely on the previous behavior when the underlying numeric values were compared regardless of the units. [#56090](https://github.com/ClickHouse/ClickHouse/pull/56090) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Rewrited the experimental `S3Queue` table engine completely: changed the way we keep information in zookeeper which allows to make less zookeeper requests, added caching of zookeeper state in cases when we know the state will not change, improved the polling from s3 process to make it less aggressive, changed the way ttl and max set for trached files is maintained, now it is a background process. Added `system.s3queue` and `system.s3queue_log` tables. Closes [#54998](https://github.com/ClickHouse/ClickHouse/issues/54998). [#54422](https://github.com/ClickHouse/ClickHouse/pull/54422) ([Kseniia Sumarokova](https://github.com/kssenii)).
#### New Feature
* Add function `arrayFold(accumulator, x1, ..., xn -> expression, initial, array1, ..., arrayn)` which applies a lambda function to multiple arrays of the same cardinality and collects the result in an accumulator. [#49794](https://github.com/ClickHouse/ClickHouse/pull/49794) ([Lirikl](https://github.com/Lirikl)).
* Support for `Npy` format. `SELECT * FROM file('example_array.npy', Npy)`. [#55982](https://github.com/ClickHouse/ClickHouse/pull/55982) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* If a table has a space-filling curve in its key, e.g., `ORDER BY mortonEncode(x, y)`, the conditions on its arguments, e.g., `x >= 10 AND x <= 20 AND y >= 20 AND y <= 30` can be used for indexing. A setting `analyze_index_with_space_filling_curves` is added to enable or disable this analysis. This closes [#41195](https://github.com/ClickHouse/ClickHouse/issue/41195). Continuation of [#4538](https://github.com/ClickHouse/ClickHouse/pull/4538). Continuation of [#6286](https://github.com/ClickHouse/ClickHouse/pull/6286). Continuation of [#28130](https://github.com/ClickHouse/ClickHouse/pull/28130). Continuation of [#41753](https://github.com/ClickHouse/ClickHouse/pull/#41753). [#55642](https://github.com/ClickHouse/ClickHouse/pull/55642) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* A new setting called `force_optimize_projection_name`, it takes a name of projection as an argument. If it's value set to a non-empty string, ClickHouse checks that this projection is used in the query at least once. Closes [#55331](https://github.com/ClickHouse/ClickHouse/issues/55331). [#56134](https://github.com/ClickHouse/ClickHouse/pull/56134) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Support asynchronous inserts with external data via native protocol. Previously it worked only if data is inlined into query. [#54730](https://github.com/ClickHouse/ClickHouse/pull/54730) ([Anton Popov](https://github.com/CurtizJ)).
* Added aggregation function `lttb` which uses the [Largest-Triangle-Three-Buckets](https://skemman.is/bitstream/1946/15343/3/SS_MSthesis.pdf) algorithm for downsampling data for visualization. [#53145](https://github.com/ClickHouse/ClickHouse/pull/53145) ([Sinan](https://github.com/sinsinan)).
* Query`CHECK TABLE` has better performance and usability (sends progress updates, cancellable). Support checking particular part with `CHECK TABLE ... PART 'part_name'`. [#53404](https://github.com/ClickHouse/ClickHouse/pull/53404) ([vdimir](https://github.com/vdimir)).
* Added function `jsonMergePatch`. When working with JSON data as strings, it provides a way to merge these strings (of JSON objects) together to form a single string containing a single JSON object. [#54364](https://github.com/ClickHouse/ClickHouse/pull/54364) ([Memo](https://github.com/Joeywzr)).
* The second part of Kusto Query Language dialect support. [Phase 1 implementation ](https://github.com/ClickHouse/ClickHouse/pull/37961) has been merged. [#42510](https://github.com/ClickHouse/ClickHouse/pull/42510) ([larryluogit](https://github.com/larryluogit)).
* Added a new SQL function, `arrayRandomSample(arr, k)` which returns a sample of k elements from the input array. Similar functionality could previously be achieved only with less convenient syntax, e.g. "SELECT arrayReduce('groupArraySample(3)', range(10))". [#54391](https://github.com/ClickHouse/ClickHouse/pull/54391) ([itayisraelov](https://github.com/itayisraelov)).
* Introduce `-ArgMin`/`-ArgMax` aggregate combinators which allow to aggregate by min/max values only. One use case can be found in [#54818](https://github.com/ClickHouse/ClickHouse/issues/54818). This PR also reorganize combinators into dedicated folder. [#54947](https://github.com/ClickHouse/ClickHouse/pull/54947) ([Amos Bird](https://github.com/amosbird)).
* Allow to drop cache for Protobuf format with `SYSTEM DROP SCHEMA FORMAT CACHE [FOR Protobuf]`. [#55064](https://github.com/ClickHouse/ClickHouse/pull/55064) ([Aleksandr Musorin](https://github.com/AVMusorin)).
* Add external HTTP Basic authenticator. [#55199](https://github.com/ClickHouse/ClickHouse/pull/55199) ([Aleksei Filatov](https://github.com/aalexfvk)).
* Added function `byteSwap` which reverses the bytes of unsigned integers. This is particularly useful for reversing values of types which are represented as unsigned integers internally such as IPv4. [#55211](https://github.com/ClickHouse/ClickHouse/pull/55211) ([Priyansh Agrawal](https://github.com/Priyansh121096)).
* Added function `formatQuery()` which returns a formatted version (possibly spanning multiple lines) of a SQL query string. Also added function `formatQuerySingleLine()` which does the same but the returned string will not contain linebreaks. [#55239](https://github.com/ClickHouse/ClickHouse/pull/55239) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Added `DWARF` input format that reads debug symbols from an ELF executable/library/object file. [#55450](https://github.com/ClickHouse/ClickHouse/pull/55450) ([Michael Kolupaev](https://github.com/al13n321)).
* Allow to save unparsed records and errors in RabbitMQ, NATS and FileLog engines. Add virtual columns `_error` and `_raw_message`(for NATS and RabbitMQ), `_raw_record` (for FileLog) that are filled when ClickHouse fails to parse new record. The behaviour is controlled under storage settings `nats_handle_error_mode` for NATS, `rabbitmq_handle_error_mode` for RabbitMQ, `handle_error_mode` for FileLog similar to `kafka_handle_error_mode`. If it's set to `default`, en exception will be thrown when ClickHouse fails to parse a record, if it's set to `stream`, erorr and raw record will be saved into virtual columns. Closes [#36035](https://github.com/ClickHouse/ClickHouse/issues/36035). [#55477](https://github.com/ClickHouse/ClickHouse/pull/55477) ([Kruglov Pavel](https://github.com/Avogar)).
* Keeper client improvement: add `get_all_children_number command` that returns number of all children nodes under a specific path. [#55485](https://github.com/ClickHouse/ClickHouse/pull/55485) ([guoxiaolong](https://github.com/guoxiaolongzte)).
* Keeper client improvement: add `get_direct_children_number` command that returns number of direct children nodes under a path. [#55898](https://github.com/ClickHouse/ClickHouse/pull/55898) ([xuzifu666](https://github.com/xuzifu666)).
* Add statement `SHOW SETTING setting_name` which is a simpler version of existing statement `SHOW SETTINGS`. [#55979](https://github.com/ClickHouse/ClickHouse/pull/55979) ([Maksim Kita](https://github.com/kitaisreal)).
* Added fields `substreams` and `filenames` to the `system.parts_columns` table. [#55108](https://github.com/ClickHouse/ClickHouse/pull/55108) ([Anton Popov](https://github.com/CurtizJ)).
* Add support for `SHOW MERGES` query. [#55815](https://github.com/ClickHouse/ClickHouse/pull/55815) ([megao](https://github.com/jetgm)).
* Introduce a setting `create_table_empty_primary_key_by_default` for default `ORDER BY ()`. [#55899](https://github.com/ClickHouse/ClickHouse/pull/55899) ([Srikanth Chekuri](https://github.com/srikanthccv)).
#### Performance Improvement
* Add option `query_plan_preserve_num_streams_after_window_functions` to preserve the number of streams after evaluating window functions to allow parallel stream processing. [#50771](https://github.com/ClickHouse/ClickHouse/pull/50771) ([frinkr](https://github.com/frinkr)).
* Release more streams if data is small. [#53867](https://github.com/ClickHouse/ClickHouse/pull/53867) ([Jiebin Sun](https://github.com/jiebinn)).
* RoaringBitmaps being optimized before serialization. [#55044](https://github.com/ClickHouse/ClickHouse/pull/55044) ([UnamedRus](https://github.com/UnamedRus)).
* Posting lists in inverted indexes are now optimized to use the smallest possible representation for internal bitmaps. Depending on the repetitiveness of the data, this may significantly reduce the space consumption of inverted indexes. [#55069](https://github.com/ClickHouse/ClickHouse/pull/55069) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Fix contention on Context lock, this significantly improves performance for a lot of short-running concurrent queries. [#55121](https://github.com/ClickHouse/ClickHouse/pull/55121) ([Maksim Kita](https://github.com/kitaisreal)).
* Improved the performance of inverted index creation by 30%. This was achieved by replacing `std::unordered_map` with `absl::flat_hash_map`. [#55210](https://github.com/ClickHouse/ClickHouse/pull/55210) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Support ORC filter push down (rowgroup level). [#55330](https://github.com/ClickHouse/ClickHouse/pull/55330) ([李扬](https://github.com/taiyang-li)).
* Improve performance of external aggregation with a lot of temporary files. [#55489](https://github.com/ClickHouse/ClickHouse/pull/55489) ([Maksim Kita](https://github.com/kitaisreal)).
* Set a reasonable size for the marks cache for secondary indices by default to avoid loading the marks over and over again. [#55654](https://github.com/ClickHouse/ClickHouse/pull/55654) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Avoid unnecessary reconstruction of index granules when reading skip indexes. This addresses [#55653](https://github.com/ClickHouse/ClickHouse/issues/55653#issuecomment-1763766009). [#55683](https://github.com/ClickHouse/ClickHouse/pull/55683) ([Amos Bird](https://github.com/amosbird)).
* Cache CAST function in set during execution to improve the performance of function `IN` when set element type doesn't exactly match column type. [#55712](https://github.com/ClickHouse/ClickHouse/pull/55712) ([Duc Canh Le](https://github.com/canhld94)).
* Performance improvement for `ColumnVector::insertMany` and `ColumnVector::insertManyFrom`. [#55714](https://github.com/ClickHouse/ClickHouse/pull/55714) ([frinkr](https://github.com/frinkr)).
* Optimized Map subscript operations by predicting the next row's key position and reduce the comparisons. [#55929](https://github.com/ClickHouse/ClickHouse/pull/55929) ([lgbo](https://github.com/lgbo-ustc)).
* Support struct fields pruning in Parquet (in previous versions it didn't work in some cases). [#56117](https://github.com/ClickHouse/ClickHouse/pull/56117) ([lgbo](https://github.com/lgbo-ustc)).
* Add the ability to tune the number of parallel replicas used in a query execution based on the estimation of rows to read. [#51692](https://github.com/ClickHouse/ClickHouse/pull/51692) ([Raúl Marín](https://github.com/Algunenano)).
* Optimized external aggregation memory consumption in case many temporary files were generated. [#54798](https://github.com/ClickHouse/ClickHouse/pull/54798) ([Nikita Taranov](https://github.com/nickitat)).
* Distributed queries executed in `async_socket_for_remote` mode (default) now respect `max_threads` limit. Previously, some queries could create excessive threads (up to `max_distributed_connections`), causing server performance issues. [#53504](https://github.com/ClickHouse/ClickHouse/pull/53504) ([filimonov](https://github.com/filimonov)).
* Caching skip-able entries while executing DDL from Zookeeper distributed DDL queue. [#54828](https://github.com/ClickHouse/ClickHouse/pull/54828) ([Duc Canh Le](https://github.com/canhld94)).
* Experimental inverted indexes do not store tokens with too many matches (i.e. row ids in the posting list). This saves space and avoids ineffective index lookups when sequential scans would be equally fast or faster. The previous heuristics (`density` parameter passed to the index definition) that controlled when tokens would not be stored was too confusing for users. A much simpler heuristics based on parameter `max_rows_per_postings_list` (default: 64k) is introduced which directly controls the maximum allowed number of row ids in a postings list. [#55616](https://github.com/ClickHouse/ClickHouse/pull/55616) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Improve write performance to `EmbeddedRocksDB` tables. [#55732](https://github.com/ClickHouse/ClickHouse/pull/55732) ([Duc Canh Le](https://github.com/canhld94)).
* Improved overall resilience for ClickHouse in case of many parts within partition (more than 1000). It might reduce the number of `TOO_MANY_PARTS` errors. [#55526](https://github.com/ClickHouse/ClickHouse/pull/55526) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Reduced memory consumption during loading of hierarchical dictionaries. [#55838](https://github.com/ClickHouse/ClickHouse/pull/55838) ([Nikita Taranov](https://github.com/nickitat)).
* All dictionaries support setting `dictionary_use_async_executor`. [#55839](https://github.com/ClickHouse/ClickHouse/pull/55839) ([vdimir](https://github.com/vdimir)).
* Prevent excesive memory usage when deserializing AggregateFunctionTopKGenericData. [#55947](https://github.com/ClickHouse/ClickHouse/pull/55947) ([Raúl Marín](https://github.com/Algunenano)).
* On a Keeper with lots of watches AsyncMetrics threads can consume 100% of CPU for noticable time in `DB::KeeperStorage::getSessionsWithWatchesCount()`. The fix is to avoid traversing heavy `watches` and `list_watches` sets. [#56054](https://github.com/ClickHouse/ClickHouse/pull/56054) ([Alexander Gololobov](https://github.com/davenger)).
* Add setting `optimize_trivial_approximate_count_query` to use `count()` approximation for storage EmbeddedRocksDB. Enable trivial count for StorageJoin. [#55806](https://github.com/ClickHouse/ClickHouse/pull/55806) ([Duc Canh Le](https://github.com/canhld94)).
#### Improvement
* Functions `toDayOfWeek()` (MySQL alias: `DAYOFWEEK()`), `toYearWeek()` (`YEARWEEK()`) and `toWeek()` (`WEEK()`) now supports `String` arguments. This makes its behavior consistent with MySQL's behavior. [#55589](https://github.com/ClickHouse/ClickHouse/pull/55589) ([Robert Schulze](https://github.com/rschu1ze)).
* Introduced setting `date_time_overflow_behavior` with possible values `ignore`, `throw`, `saturate` that controls the overflow behavior when converting from Date, Date32, DateTime64, Integer or Float to Date, Date32, DateTime or DateTime64. [#55696](https://github.com/ClickHouse/ClickHouse/pull/55696) ([Andrey Zvonov](https://github.com/zvonand)).
* Implement query parameters support for `ALTER TABLE ... ACTION PARTITION [ID] {parameter_name:ParameterType}`. Merges [#49516](https://github.com/ClickHouse/ClickHouse/issues/49516). Closes [#49449](https://github.com/ClickHouse/ClickHouse/issues/49449). [#55604](https://github.com/ClickHouse/ClickHouse/pull/55604) ([alesapin](https://github.com/alesapin)).
* Print processor ids in a prettier manner in EXPLAIN. [#48852](https://github.com/ClickHouse/ClickHouse/pull/48852) ([Vlad Seliverstov](https://github.com/behebot)).
* Creating a direct dictionary with a lifetime field will be rejected at create time (as the lifetime does not make sense for direct dictionaries). Fixes: [#27861](https://github.com/ClickHouse/ClickHouse/issues/27861). [#49043](https://github.com/ClickHouse/ClickHouse/pull/49043) ([Rory Crispin](https://github.com/RoryCrispin)).
* Allow parameters in queries with partitions like `ALTER TABLE t DROP PARTITION`. Closes [#49449](https://github.com/ClickHouse/ClickHouse/issues/49449). [#49516](https://github.com/ClickHouse/ClickHouse/pull/49516) ([Nikolay Degterinsky](https://github.com/evillique)).
* Add a new column `xid` for `system.zookeeper_connection`. [#50702](https://github.com/ClickHouse/ClickHouse/pull/50702) ([helifu](https://github.com/helifu)).
* Display the correct server settings in `system.server_settings` after configuration reload. [#53774](https://github.com/ClickHouse/ClickHouse/pull/53774) ([helifu](https://github.com/helifu)).
* Add support for mathematical minus `` character in queries, similar to `-`. [#54100](https://github.com/ClickHouse/ClickHouse/pull/54100) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add replica groups to the experimental `Replicated` database engine. Closes [#53620](https://github.com/ClickHouse/ClickHouse/issues/53620). [#54421](https://github.com/ClickHouse/ClickHouse/pull/54421) ([Nikolay Degterinsky](https://github.com/evillique)).
* It is better to retry retriable s3 errors than totally fail the query. Set bigger value to the s3_retry_attempts by default. [#54770](https://github.com/ClickHouse/ClickHouse/pull/54770) ([Sema Checherinda](https://github.com/CheSema)).
* Add load balancing mode `hostname_levenshtein_distance`. [#54826](https://github.com/ClickHouse/ClickHouse/pull/54826) ([JackyWoo](https://github.com/JackyWoo)).
* Improve hiding secrets in logs. [#55089](https://github.com/ClickHouse/ClickHouse/pull/55089) ([Vitaly Baranov](https://github.com/vitlibar)).
* For now the projection analysis will be performed only on top of query plan. The setting `query_plan_optimize_projection` became obsolete (it was enabled by default long time ago). [#55112](https://github.com/ClickHouse/ClickHouse/pull/55112) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* When function `untuple` is now called on a tuple with named elements and itself has an alias (e.g. `select untuple(tuple(1)::Tuple(element_alias Int)) AS untuple_alias`), then the result column name is now generated from the untuple alias and the tuple element alias (in the example: "untuple_alias.element_alias"). [#55123](https://github.com/ClickHouse/ClickHouse/pull/55123) ([garcher22](https://github.com/garcher22)).
* Added setting `describe_include_virtual_columns`, which allows to include virtual columns of table into result of `DESCRIBE` query. Added setting `describe_compact_output`. If it is set to `true`, `DESCRIBE` query returns only names and types of columns without extra information. [#55129](https://github.com/ClickHouse/ClickHouse/pull/55129) ([Anton Popov](https://github.com/CurtizJ)).
* Sometimes `OPTIMIZE` with `optimize_throw_if_noop=1` may fail with an error `unknown reason` while the real cause of it - different projections in different parts. This behavior is fixed. [#55130](https://github.com/ClickHouse/ClickHouse/pull/55130) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Allow to have several `MaterializedPostgreSQL` tables following the same Postgres table. By default this behaviour is not enabled (for compatibility, because it is a backward-incompatible change), but can be turned on with setting `materialized_postgresql_use_unique_replication_consumer_identifier`. Closes [#54918](https://github.com/ClickHouse/ClickHouse/issues/54918). [#55145](https://github.com/ClickHouse/ClickHouse/pull/55145) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Allow to parse negative `DateTime64` and `DateTime` with fractional part from short strings. [#55146](https://github.com/ClickHouse/ClickHouse/pull/55146) ([Andrey Zvonov](https://github.com/zvonand)).
* To improve compatibility with MySQL, 1. `information_schema.tables` now includes the new field `table_rows`, and 2. `information_schema.columns` now includes the new field `extra`. [#55215](https://github.com/ClickHouse/ClickHouse/pull/55215) ([Robert Schulze](https://github.com/rschu1ze)).
* Clickhouse-client won't show "0 rows in set" if it is zero and if exception was thrown. [#55240](https://github.com/ClickHouse/ClickHouse/pull/55240) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Support rename table without keyword `TABLE` like `RENAME db.t1 to db.t2`. [#55373](https://github.com/ClickHouse/ClickHouse/pull/55373) ([凌涛](https://github.com/lingtaolf)).
* Add `internal_replication` to `system.clusters`. [#55377](https://github.com/ClickHouse/ClickHouse/pull/55377) ([Konstantin Morozov](https://github.com/k-morozov)).
* Select remote proxy resolver based on request protocol, add proxy feature docs and remove `DB::ProxyConfiguration::Protocol::ANY`. [#55430](https://github.com/ClickHouse/ClickHouse/pull/55430) ([Arthur Passos](https://github.com/arthurpassos)).
* Avoid retrying keeper operations on INSERT after table shutdown. [#55519](https://github.com/ClickHouse/ClickHouse/pull/55519) ([Azat Khuzhin](https://github.com/azat)).
* `SHOW COLUMNS` now correctly reports type `FixedString` as `BLOB` if setting `use_mysql_types_in_show_columns` is on. Also added two new settings, `mysql_map_string_to_text_in_show_columns` and `mysql_map_fixed_string_to_text_in_show_columns` to switch the output for types `String` and `FixedString` as `TEXT` or `BLOB`. [#55617](https://github.com/ClickHouse/ClickHouse/pull/55617) ([Serge Klochkov](https://github.com/slvrtrn)).
* During ReplicatedMergeTree tables startup clickhouse server checks set of parts for unexpected parts (exists locally, but not in zookeeper). All unexpected parts move to detached directory and instead of them server tries to restore some ancestor (covered) parts. Now server tries to restore closest ancestors instead of random covered parts. [#55645](https://github.com/ClickHouse/ClickHouse/pull/55645) ([alesapin](https://github.com/alesapin)).
* The advanced dashboard now supports draggable charts on touch devices. This closes [#54206](https://github.com/ClickHouse/ClickHouse/issues/54206). [#55649](https://github.com/ClickHouse/ClickHouse/pull/55649) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Use the default query format if declared when outputting exception with `http_write_exception_in_output_format`. [#55739](https://github.com/ClickHouse/ClickHouse/pull/55739) ([Raúl Marín](https://github.com/Algunenano)).
* Provide a better message for common MATERIALIZED VIEW pitfalls. [#55826](https://github.com/ClickHouse/ClickHouse/pull/55826) ([Raúl Marín](https://github.com/Algunenano)).
* If you dropped the current database, you will still be able to run some queries in `clickhouse-local` and switch to another database. This makes the behavior consistent with `clickhouse-client`. This closes [#55834](https://github.com/ClickHouse/ClickHouse/issues/55834). [#55853](https://github.com/ClickHouse/ClickHouse/pull/55853) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Functions `(add|subtract)(Year|Quarter|Month|Week|Day|Hour|Minute|Second|Millisecond|Microsecond|Nanosecond)` now support string-encoded date arguments, e.g. `SELECT addDays('2023-10-22', 1)`. This increases compatibility with MySQL and is needed by Tableau Online. [#55869](https://github.com/ClickHouse/ClickHouse/pull/55869) ([Robert Schulze](https://github.com/rschu1ze)).
* The setting `apply_deleted_mask` when disabled allows to read rows that where marked as deleted by lightweight DELETE queries. This is useful for debugging. [#55952](https://github.com/ClickHouse/ClickHouse/pull/55952) ([Alexander Gololobov](https://github.com/davenger)).
* Allow skipping `null` values when serailizing Tuple to json objects, which makes it possible to keep compatiability with Spark's `to_json` function, which is also useful for gluten. [#55956](https://github.com/ClickHouse/ClickHouse/pull/55956) ([李扬](https://github.com/taiyang-li)).
* Functions `(add|sub)Date()` now support string-encoded date arguments, e.g. `SELECT addDate('2023-10-22 11:12:13', INTERVAL 5 MINUTE)`. The same support for string-encoded date arguments is added to the plus and minus operators, e.g. `SELECT '2023-10-23' + INTERVAL 1 DAY`. This increases compatibility with MySQL and is needed by Tableau Online. [#55960](https://github.com/ClickHouse/ClickHouse/pull/55960) ([Robert Schulze](https://github.com/rschu1ze)).
* Allow unquoted strings with CR (`\r`) in CSV format. Closes [#39930](https://github.com/ClickHouse/ClickHouse/issues/39930). [#56046](https://github.com/ClickHouse/ClickHouse/pull/56046) ([Kruglov Pavel](https://github.com/Avogar)).
* Allow to run `clickhouse-keeper` using embedded config. [#56086](https://github.com/ClickHouse/ClickHouse/pull/56086) ([Maksim Kita](https://github.com/kitaisreal)).
* Set limit of the maximum configuration value for `queued.min.messages` to avoid problem with start fetching data with Kafka. [#56121](https://github.com/ClickHouse/ClickHouse/pull/56121) ([Stas Morozov](https://github.com/r3b-fish)).
* Fixed a typo in SQL function `minSampleSizeContinous` (renamed `minSampleSizeContinuous`). Old name is preserved for backward compatibility. This closes: [#56139](https://github.com/ClickHouse/ClickHouse/issues/56139). [#56143](https://github.com/ClickHouse/ClickHouse/pull/56143) ([Dorota Szeremeta](https://github.com/orotaday)).
* Print path for broken parts on disk before shutting down the server. Before this change if a part is corrupted on disk and server cannot start, it was almost impossible to understand which part is broken. This is fixed. [#56181](https://github.com/ClickHouse/ClickHouse/pull/56181) ([Duc Canh Le](https://github.com/canhld94)).
#### Build/Testing/Packaging Improvement
* If the database in Docker is already initialized, it doesn't need to be initialized again upon subsequent launches. This can potentially fix the issue of infinite container restarts when the database fails to load within 1000 attempts (relevant for very large databases and multi-node setups). [#50724](https://github.com/ClickHouse/ClickHouse/pull/50724) ([Alexander Nikolaev](https://github.com/AlexNik)).
* Resource with source code including submodules is built in Darwin special build task. It may be used to build ClickHouse without checking out the submodules. [#51435](https://github.com/ClickHouse/ClickHouse/pull/51435) ([Ilya Yatsishin](https://github.com/qoega)).
* An error was occuring when building ClickHouse with the AVX series of instructions enabled globally (which isn't recommended). The reason is that snappy does not enable `SNAPPY_HAVE_X86_CRC32`. [#55049](https://github.com/ClickHouse/ClickHouse/pull/55049) ([monchickey](https://github.com/monchickey)).
* Solve issue with launching standalone `clickhouse-keeper` from `clickhouse-server` package. [#55226](https://github.com/ClickHouse/ClickHouse/pull/55226) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* In the tests, RabbitMQ version is updated to 3.12.6. Improved logs collection for RabbitMQ tests. [#55424](https://github.com/ClickHouse/ClickHouse/pull/55424) ([Ilya Yatsishin](https://github.com/qoega)).
* Modified the error message difference between openssl and boringssl to fix the functional test. [#55975](https://github.com/ClickHouse/ClickHouse/pull/55975) ([MeenaRenganathan22](https://github.com/MeenaRenganathan22)).
* Use upstream repo for apache datasketches. [#55787](https://github.com/ClickHouse/ClickHouse/pull/55787) ([Nikita Taranov](https://github.com/nickitat)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Skip hardlinking inverted index files in mutation [#47663](https://github.com/ClickHouse/ClickHouse/pull/47663) ([cangyin](https://github.com/cangyin)).
* Fixed bug of `match` function (regex) with pattern containing alternation produces incorrect key condition. Closes #53222. [#54696](https://github.com/ClickHouse/ClickHouse/pull/54696) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix 'Cannot find column' in read-in-order optimization with ARRAY JOIN [#51746](https://github.com/ClickHouse/ClickHouse/pull/51746) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Support missed experimental `Object(Nullable(json))` subcolumns in query. [#54052](https://github.com/ClickHouse/ClickHouse/pull/54052) ([zps](https://github.com/VanDarkholme7)).
* Re-add fix for `accurateCastOrNull()` [#54629](https://github.com/ClickHouse/ClickHouse/pull/54629) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Fix detecting `DEFAULT` for columns of a Distributed table created without AS [#55060](https://github.com/ClickHouse/ClickHouse/pull/55060) ([Vitaly Baranov](https://github.com/vitlibar)).
* Proper cleanup in case of exception in ctor of ShellCommandSource [#55103](https://github.com/ClickHouse/ClickHouse/pull/55103) ([Alexander Gololobov](https://github.com/davenger)).
* Fix deadlock in LDAP assigned role update [#55119](https://github.com/ClickHouse/ClickHouse/pull/55119) ([Julian Maicher](https://github.com/jmaicher)).
* Suppress error statistics update for internal exceptions [#55128](https://github.com/ClickHouse/ClickHouse/pull/55128) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix deadlock in backups [#55132](https://github.com/ClickHouse/ClickHouse/pull/55132) ([alesapin](https://github.com/alesapin)).
* Fix storage Iceberg files retrieval [#55144](https://github.com/ClickHouse/ClickHouse/pull/55144) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix partition pruning of extra columns in set. [#55172](https://github.com/ClickHouse/ClickHouse/pull/55172) ([Amos Bird](https://github.com/amosbird)).
* Fix recalculation of skip indexes in ALTER UPDATE queries when table has adaptive granularity [#55202](https://github.com/ClickHouse/ClickHouse/pull/55202) ([Duc Canh Le](https://github.com/canhld94)).
* Fix for background download in fs cache [#55252](https://github.com/ClickHouse/ClickHouse/pull/55252) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Avoid possible memory leaks in compressors in case of missing buffer finalization [#55262](https://github.com/ClickHouse/ClickHouse/pull/55262) ([Azat Khuzhin](https://github.com/azat)).
* Fix functions execution over sparse columns [#55275](https://github.com/ClickHouse/ClickHouse/pull/55275) ([Azat Khuzhin](https://github.com/azat)).
* Fix incorrect merging of Nested for SELECT FINAL FROM SummingMergeTree [#55276](https://github.com/ClickHouse/ClickHouse/pull/55276) ([Azat Khuzhin](https://github.com/azat)).
* Fix bug with inability to drop detached partition in replicated merge tree on top of S3 without zero copy [#55309](https://github.com/ClickHouse/ClickHouse/pull/55309) ([alesapin](https://github.com/alesapin)).
* Fix a crash in MergeSortingPartialResultTransform (due to zero chunks after `remerge`) [#55335](https://github.com/ClickHouse/ClickHouse/pull/55335) ([Azat Khuzhin](https://github.com/azat)).
* Fix data-race in CreatingSetsTransform (on errors) due to throwing shared exception [#55338](https://github.com/ClickHouse/ClickHouse/pull/55338) ([Azat Khuzhin](https://github.com/azat)).
* Fix trash optimization (up to a certain extent) [#55353](https://github.com/ClickHouse/ClickHouse/pull/55353) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix leak in StorageHDFS [#55370](https://github.com/ClickHouse/ClickHouse/pull/55370) ([Azat Khuzhin](https://github.com/azat)).
* Fix parsing of arrays in cast operator [#55417](https://github.com/ClickHouse/ClickHouse/pull/55417) ([Anton Popov](https://github.com/CurtizJ)).
* Fix filtering by virtual columns with OR filter in query [#55418](https://github.com/ClickHouse/ClickHouse/pull/55418) ([Azat Khuzhin](https://github.com/azat)).
* Fix MongoDB connection issues [#55419](https://github.com/ClickHouse/ClickHouse/pull/55419) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix MySQL interface boolean representation [#55427](https://github.com/ClickHouse/ClickHouse/pull/55427) ([Serge Klochkov](https://github.com/slvrtrn)).
* Fix MySQL text protocol DateTime formatting and LowCardinality(Nullable(T)) types reporting [#55479](https://github.com/ClickHouse/ClickHouse/pull/55479) ([Serge Klochkov](https://github.com/slvrtrn)).
* Make `use_mysql_types_in_show_columns` affect only `SHOW COLUMNS` [#55481](https://github.com/ClickHouse/ClickHouse/pull/55481) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix stack symbolizer parsing `DW_FORM_ref_addr` incorrectly and sometimes crashing [#55483](https://github.com/ClickHouse/ClickHouse/pull/55483) ([Michael Kolupaev](https://github.com/al13n321)).
* Destroy fiber in case of exception in cancelBefore in AsyncTaskExecutor [#55516](https://github.com/ClickHouse/ClickHouse/pull/55516) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix Query Parameters not working with custom HTTP handlers [#55521](https://github.com/ClickHouse/ClickHouse/pull/55521) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Fix checking of non handled data for Values format [#55527](https://github.com/ClickHouse/ClickHouse/pull/55527) ([Azat Khuzhin](https://github.com/azat)).
* Fix 'Invalid cursor state' in odbc interacting with MS SQL Server [#55558](https://github.com/ClickHouse/ClickHouse/pull/55558) ([vdimir](https://github.com/vdimir)).
* Fix max execution time and 'break' overflow mode [#55577](https://github.com/ClickHouse/ClickHouse/pull/55577) ([Alexander Gololobov](https://github.com/davenger)).
* Fix crash in QueryNormalizer with cyclic aliases [#55602](https://github.com/ClickHouse/ClickHouse/pull/55602) ([vdimir](https://github.com/vdimir)).
* Disable wrong optimization and add a test [#55609](https://github.com/ClickHouse/ClickHouse/pull/55609) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Merging [#52352](https://github.com/ClickHouse/ClickHouse/issues/52352) [#55621](https://github.com/ClickHouse/ClickHouse/pull/55621) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add a test to avoid incorrect decimal sorting [#55662](https://github.com/ClickHouse/ClickHouse/pull/55662) ([Amos Bird](https://github.com/amosbird)).
* Fix progress bar for s3 and azure Cluster functions with url without globs [#55666](https://github.com/ClickHouse/ClickHouse/pull/55666) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix filtering by virtual columns with OR filter in query (resubmit) [#55678](https://github.com/ClickHouse/ClickHouse/pull/55678) ([Azat Khuzhin](https://github.com/azat)).
* Fixes and improvements for Iceberg storage [#55695](https://github.com/ClickHouse/ClickHouse/pull/55695) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix data race in CreatingSetsTransform (v2) [#55786](https://github.com/ClickHouse/ClickHouse/pull/55786) ([Azat Khuzhin](https://github.com/azat)).
* Throw exception when parsing illegal string as float if precise_float_parsing is true [#55861](https://github.com/ClickHouse/ClickHouse/pull/55861) ([李扬](https://github.com/taiyang-li)).
* Disable predicate pushdown if the CTE contains stateful functions [#55871](https://github.com/ClickHouse/ClickHouse/pull/55871) ([Raúl Marín](https://github.com/Algunenano)).
* Fix normalize ASTSelectWithUnionQuery, as it was stripping `FORMAT` from the query [#55887](https://github.com/ClickHouse/ClickHouse/pull/55887) ([flynn](https://github.com/ucasfl)).
* Try to fix possible segfault in Native ORC input format [#55891](https://github.com/ClickHouse/ClickHouse/pull/55891) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix window functions in case of sparse columns. [#55895](https://github.com/ClickHouse/ClickHouse/pull/55895) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* fix: StorageNull supports subcolumns [#55912](https://github.com/ClickHouse/ClickHouse/pull/55912) ([FFish](https://github.com/wxybear)).
* Do not write retriable errors for Replicated mutate/merge into error log [#55944](https://github.com/ClickHouse/ClickHouse/pull/55944) ([Azat Khuzhin](https://github.com/azat)).
* Fix `SHOW DATABASES LIMIT <N>` [#55962](https://github.com/ClickHouse/ClickHouse/pull/55962) ([Raúl Marín](https://github.com/Algunenano)).
* Fix autogenerated Protobuf schema with fields with underscore [#55974](https://github.com/ClickHouse/ClickHouse/pull/55974) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix dateTime64ToSnowflake64() with non-default scale [#55983](https://github.com/ClickHouse/ClickHouse/pull/55983) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix output/input of Arrow dictionary column [#55989](https://github.com/ClickHouse/ClickHouse/pull/55989) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix fetching schema from schema registry in AvroConfluent [#55991](https://github.com/ClickHouse/ClickHouse/pull/55991) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix 'Block structure mismatch' on concurrent ALTER and INSERTs in Buffer table [#55995](https://github.com/ClickHouse/ClickHouse/pull/55995) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix incorrect free space accounting for least_used JBOD policy [#56030](https://github.com/ClickHouse/ClickHouse/pull/56030) ([Azat Khuzhin](https://github.com/azat)).
* Fix missing scalar issue when evaluating subqueries inside table functions [#56057](https://github.com/ClickHouse/ClickHouse/pull/56057) ([Amos Bird](https://github.com/amosbird)).
* Fix wrong query result when http_write_exception_in_output_format=1 [#56135](https://github.com/ClickHouse/ClickHouse/pull/56135) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix schema cache for fallback JSON->JSONEachRow with changed settings [#56172](https://github.com/ClickHouse/ClickHouse/pull/56172) ([Kruglov Pavel](https://github.com/Avogar)).
* Add error handler to odbc-bridge [#56185](https://github.com/ClickHouse/ClickHouse/pull/56185) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
### ClickHouse release 23.9, 2023-09-28
#### Backward Incompatible Change

View File

@ -22,8 +22,6 @@ curl https://clickhouse.com/ | sh
## Upcoming Events
* [**v23.9 Community Call**]([https://clickhouse.com/company/events/v23-8-community-release-call](https://clickhouse.com/company/events/v23-9-community-release-call)?utm_source=github&utm_medium=social&utm_campaign=release-webinar-2023-08) - Sep 28 - 23.9 is rapidly approaching. Original creator, co-founder, and CTO of ClickHouse Alexey Milovidov will walk us through the highlights of the release.
* [**ClickHouse Meetup in Amsterdam**](https://www.meetup.com/clickhouse-netherlands-user-group/events/296334590/) - Oct 31
* [**ClickHouse Meetup in Beijing**](https://www.meetup.com/clickhouse-beijing-user-group/events/296334856/) - Nov 4
* [**ClickHouse Meetup in San Francisco**](https://www.meetup.com/clickhouse-silicon-valley-meetup-group/events/296334923/) - Nov 8
* [**ClickHouse Meetup in Singapore**](https://www.meetup.com/clickhouse-singapore-meetup-group/events/296334976/) - Nov 15
@ -45,4 +43,4 @@ We are a globally diverse and distributed team, united behind a common goal of c
Check out our **current openings** here: https://clickhouse.com/company/careers
Cant find what you are looking for, but want to let us know you are interested in joining ClickHouse? Email careers@clickhouse.com!
Can't find what you are looking for, but want to let us know you are interested in joining ClickHouse? Email careers@clickhouse.com!

View File

@ -13,9 +13,10 @@ The following versions of ClickHouse server are currently being supported with s
| Version | Supported |
|:-|:-|
| 23.10 | ✔️ |
| 23.9 | ✔️ |
| 23.8 | ✔️ |
| 23.7 | ✔️ |
| 23.7 | |
| 23.6 | ❌ |
| 23.5 | ❌ |
| 23.4 | ❌ |

View File

@ -45,6 +45,10 @@ else ()
target_compile_definitions(common PUBLIC WITH_COVERAGE=0)
endif ()
if (TARGET ch_contrib::crc32_s390x)
target_link_libraries(common PUBLIC ch_contrib::crc32_s390x)
endif()
target_include_directories(common PUBLIC .. "${CMAKE_CURRENT_BINARY_DIR}/..")
target_link_libraries (common

View File

@ -35,6 +35,10 @@
#pragma clang diagnostic ignored "-Wreserved-identifier"
#endif
#if defined(__s390x__)
#include <base/crc32c_s390x.h>
#define CRC_INT s390x_crc32c
#endif
/**
* The std::string_view-like container to avoid creating strings to find substrings in the hash table.
@ -264,8 +268,8 @@ inline size_t hashLessThan8(const char * data, size_t size)
if (size >= 4)
{
UInt64 a = unalignedLoad<uint32_t>(data);
return hashLen16(size + (a << 3), unalignedLoad<uint32_t>(data + size - 4));
UInt64 a = unalignedLoadLittleEndian<uint32_t>(data);
return hashLen16(size + (a << 3), unalignedLoadLittleEndian<uint32_t>(data + size - 4));
}
if (size > 0)
@ -285,8 +289,8 @@ inline size_t hashLessThan16(const char * data, size_t size)
{
if (size > 8)
{
UInt64 a = unalignedLoad<UInt64>(data);
UInt64 b = unalignedLoad<UInt64>(data + size - 8);
UInt64 a = unalignedLoadLittleEndian<UInt64>(data);
UInt64 b = unalignedLoadLittleEndian<UInt64>(data + size - 8);
return hashLen16(a, rotateByAtLeast1(b + size, static_cast<UInt8>(size))) ^ b;
}
@ -315,13 +319,13 @@ struct CRC32Hash
do
{
UInt64 word = unalignedLoad<UInt64>(pos);
UInt64 word = unalignedLoadLittleEndian<UInt64>(pos);
res = static_cast<unsigned>(CRC_INT(res, word));
pos += 8;
} while (pos + 8 < end);
UInt64 word = unalignedLoad<UInt64>(end - 8); /// I'm not sure if this is normal.
UInt64 word = unalignedLoadLittleEndian<UInt64>(end - 8); /// I'm not sure if this is normal.
res = static_cast<unsigned>(CRC_INT(res, word));
return res;

26
base/base/crc32c_s390x.h Normal file
View File

@ -0,0 +1,26 @@
#pragma once
#include <crc32-s390x.h>
inline uint32_t s390x_crc32c_u8(uint32_t crc, uint8_t v)
{
return crc32c_le_vx(crc, reinterpret_cast<unsigned char *>(&v), sizeof(v));
}
inline uint32_t s390x_crc32c_u16(uint32_t crc, uint16_t v)
{
v = __builtin_bswap16(v);
return crc32c_le_vx(crc, reinterpret_cast<unsigned char *>(&v), sizeof(v));
}
inline uint32_t s390x_crc32c_u32(uint32_t crc, uint32_t v)
{
v = __builtin_bswap32(v);
return crc32c_le_vx(crc, reinterpret_cast<unsigned char *>(&v), sizeof(v));
}
inline uint64_t s390x_crc32c(uint64_t crc, uint64_t v)
{
v = __builtin_bswap64(v);
return crc32c_le_vx(static_cast<uint32_t>(crc), reinterpret_cast<unsigned char *>(&v), sizeof(uint64_t));
}

View File

@ -68,7 +68,7 @@ namespace Net
struct ProxyConfig
/// HTTP proxy server configuration.
{
ProxyConfig() : port(HTTP_PORT), protocol("http"), tunnel(true) { }
ProxyConfig() : port(HTTP_PORT), protocol("http"), tunnel(true), originalRequestProtocol("http") { }
std::string host;
/// Proxy server host name or IP address.
@ -87,6 +87,9 @@ namespace Net
/// A regular expression defining hosts for which the proxy should be bypassed,
/// e.g. "localhost|127\.0\.0\.1|192\.168\.0\.\d+". Can also be an empty
/// string to disable proxy bypassing.
std::string originalRequestProtocol;
/// Original request protocol (http or https).
/// Required in the case of: HTTPS request over HTTP proxy with tunneling (CONNECT) off.
};
HTTPClientSession();

View File

@ -418,7 +418,7 @@ void HTTPClientSession::reconnect()
std::string HTTPClientSession::proxyRequestPrefix() const
{
std::string result("http://");
std::string result(_proxyConfig.originalRequestProtocol + "://");
result.append(_host);
/// Do not append default by default, since this may break some servers.
/// One example of such server is GCS (Google Cloud Storage).

View File

@ -2,11 +2,11 @@
# NOTE: has nothing common with DBMS_TCP_PROTOCOL_VERSION,
# only DBMS_TCP_PROTOCOL_VERSION should be incremented on protocol changes.
SET(VERSION_REVISION 54479)
SET(VERSION_REVISION 54480)
SET(VERSION_MAJOR 23)
SET(VERSION_MINOR 10)
SET(VERSION_MINOR 11)
SET(VERSION_PATCH 1)
SET(VERSION_GITHASH 8f9a227de1f530cdbda52c145d41a6b0f1d29961)
SET(VERSION_DESCRIBE v23.10.1.1-testing)
SET(VERSION_STRING 23.10.1.1)
SET(VERSION_GITHASH 13adae0e42fd48de600486fc5d4b64d39f80c43e)
SET(VERSION_DESCRIBE v23.11.1.1-testing)
SET(VERSION_STRING 23.11.1.1)
# end of autochange

View File

@ -3,4 +3,4 @@ It allows to integrate JEMalloc into CMake project.
- Remove JEMALLOC_HAVE_ATTR_FORMAT_GNU_PRINTF because it's non standard.
- Added JEMALLOC_CONFIG_MALLOC_CONF substitution
- Add musl support (USE_MUSL)
- Also note, that darwin build requires JEMALLOC_PREFIX, while others don not
- Also note, that darwin build requires JEMALLOC_PREFIX, while others do not

2
contrib/libhdfs3 vendored

@ -1 +1 @@
Subproject commit 377220ef351ae24994a5fcd2b5fa3930d00c4db0
Subproject commit bdcb91354b1c05b21e73043a112a6f1e3b013497

View File

@ -1,4 +1,4 @@
if(NOT OS_FREEBSD AND NOT APPLE AND NOT ARCH_PPC64LE AND NOT ARCH_S390X)
if(NOT OS_FREEBSD AND NOT APPLE AND NOT ARCH_PPC64LE)
option(ENABLE_HDFS "Enable HDFS" ${ENABLE_LIBRARIES})
elseif(ENABLE_HDFS)
message (${RECONFIGURE_MESSAGE_LEVEL} "Cannot use HDFS3 with current configuration")

View File

@ -1,3 +1,10 @@
option (ENABLE_SSH "Enable support for SSH keys and protocol" ON)
if (NOT ENABLE_SSH)
message(STATUS "Not using SSH")
return()
endif()
set(LIB_SOURCE_DIR "${ClickHouse_SOURCE_DIR}/contrib/libssh")
set(LIB_BINARY_DIR "${ClickHouse_BINARY_DIR}/contrib/libssh")
# Specify search path for CMake modules to be loaded by include()

2
contrib/orc vendored

@ -1 +1 @@
Subproject commit f31c271110a2f0dac908a152f11708193ae209ee
Subproject commit e24f2c2a3ca0769c96704ab20ad6f512a83ea2ad

View File

@ -34,7 +34,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="23.9.3.12"
ARG VERSION="23.10.1.1976"
ARG PACKAGES="clickhouse-keeper"
# user/group precreated explicitly with fixed uid/gid on purpose.

View File

@ -32,7 +32,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="23.9.3.12"
ARG VERSION="23.10.1.1976"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
# user/group precreated explicitly with fixed uid/gid on purpose.

View File

@ -30,7 +30,7 @@ RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb ${REPO_CHANNEL} main"
ARG VERSION="23.9.3.12"
ARG VERSION="23.10.1.1976"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
# set non-empty deb_location_url url to create a docker image

View File

@ -19,7 +19,7 @@ For more information and documentation see https://clickhouse.com/.
### Compatibility
- The amd64 image requires support for [SSE3 instructions](https://en.wikipedia.org/wiki/SSE3). Virtually all x86 CPUs after 2005 support SSE3.
- The arm64 image requires support for the [ARMv8.2-A architecture](https://en.wikipedia.org/wiki/AArch64#ARMv8.2-A). Most ARM CPUs after 2017 support ARMv8.2-A. A notable exception is Raspberry Pi 4 from 2019 whose CPU only supports ARMv8.0-A.
- The arm64 image requires support for the [ARMv8.2-A architecture](https://en.wikipedia.org/wiki/AArch64#ARMv8.2-A) and additionally the Load-Acquire RCpc register. The register is optional in version ARMv8.2-A and mandatory in [ARMv8.3-A](https://en.wikipedia.org/wiki/AArch64#ARMv8.3-A). Supported in Graviton >=2, Azure and GCP instances. Examples for unsupported devices are Raspberry Pi 4 (ARMv8.0-A) and Jetson AGX Xavier/Orin (ARMv8.2-A).
## How to use this image

View File

@ -191,6 +191,7 @@ rg -Fav -e "Code: 236. DB::Exception: Cancelled merging parts" \
-e "ZooKeeperClient" \
-e "KEEPER_EXCEPTION" \
-e "DirectoryMonitor" \
-e "DistributedInsertQueue" \
-e "TABLE_IS_READ_ONLY" \
-e "Code: 1000, e.code() = 111, Connection refused" \
-e "UNFINISHED" \

View File

@ -75,7 +75,7 @@ sidebar_label: 2022
* Fix usage of nested columns with non-array columns with the same prefix [2] [#28762](https://github.com/ClickHouse/ClickHouse/pull/28762) ([Anton Popov](https://github.com/CurtizJ)).
* Lower compiled_expression_cache_size to 128MB [#28816](https://github.com/ClickHouse/ClickHouse/pull/28816) ([Maksim Kita](https://github.com/kitaisreal)).
* Column default dictGet identifier fix [#28863](https://github.com/ClickHouse/ClickHouse/pull/28863) ([Maksim Kita](https://github.com/kitaisreal)).
* Don not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Do not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Merging [#27963](https://github.com/ClickHouse/ClickHouse/issues/27963) [#29063](https://github.com/ClickHouse/ClickHouse/pull/29063) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix terminate on uncaught exception [#29216](https://github.com/ClickHouse/ClickHouse/pull/29216) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix arcadia (pre)stable 21.10 build [#29250](https://github.com/ClickHouse/ClickHouse/pull/29250) ([DimasKovas](https://github.com/DimasKovas)).

View File

@ -29,7 +29,7 @@ sidebar_label: 2022
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Don not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Do not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Merging [#27963](https://github.com/ClickHouse/ClickHouse/issues/27963) [#29063](https://github.com/ClickHouse/ClickHouse/pull/29063) ([Maksim Kita](https://github.com/kitaisreal)).
* May be fix s3 tests [#29762](https://github.com/ClickHouse/ClickHouse/pull/29762) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix ca-bundle.crt in kerberized_hadoop/Dockerfile [#30358](https://github.com/ClickHouse/ClickHouse/pull/30358) ([Vladimir C](https://github.com/vdimir)).

View File

@ -14,6 +14,6 @@ sidebar_label: 2022
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Don not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Do not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Merging [#27963](https://github.com/ClickHouse/ClickHouse/issues/27963) [#29063](https://github.com/ClickHouse/ClickHouse/pull/29063) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix terminate on uncaught exception [#29216](https://github.com/ClickHouse/ClickHouse/pull/29216) ([Alexander Tokmakov](https://github.com/tavplubix)).

View File

@ -15,6 +15,6 @@ sidebar_label: 2022
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Don not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Do not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Merging [#27963](https://github.com/ClickHouse/ClickHouse/issues/27963) [#29063](https://github.com/ClickHouse/ClickHouse/pull/29063) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix terminate on uncaught exception [#29216](https://github.com/ClickHouse/ClickHouse/pull/29216) ([Alexander Tokmakov](https://github.com/tavplubix)).

View File

@ -13,6 +13,6 @@ sidebar_label: 2022
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Don not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Do not add const group by key for query with only having. [#28975](https://github.com/ClickHouse/ClickHouse/pull/28975) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Merging [#27963](https://github.com/ClickHouse/ClickHouse/issues/27963) [#29063](https://github.com/ClickHouse/ClickHouse/pull/29063) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix terminate on uncaught exception [#29216](https://github.com/ClickHouse/ClickHouse/pull/29216) ([Alexander Tokmakov](https://github.com/tavplubix)).

View File

@ -0,0 +1,406 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.10.1.1976-stable (13adae0e42f) FIXME as compared to v23.9.1.1854-stable (8f9a227de1f)
#### Backward Incompatible Change
* Rewrited storage S3Queue completely: changed the way we keep information in zookeeper which allows to make less zookeeper requests, added caching of zookeeper state in cases when we know the state will not change, improved the polling from s3 process to make it less aggressive, changed the way ttl and max set for trached files is maintained, now it is a background process. Added `system.s3queue` and `system.s3queue_log` tables. Closes [#54998](https://github.com/ClickHouse/ClickHouse/issues/54998). [#54422](https://github.com/ClickHouse/ClickHouse/pull/54422) ([Kseniia Sumarokova](https://github.com/kssenii)).
* There is no longer an option to automatically remove broken data parts. This closes [#55174](https://github.com/ClickHouse/ClickHouse/issues/55174). [#55184](https://github.com/ClickHouse/ClickHouse/pull/55184) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* The obsolete in-memory data parts can no longer be read from the write-ahead log. If you have configured in-memory parts before, they have to be removed before the upgrade. [#55186](https://github.com/ClickHouse/ClickHouse/pull/55186) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove the integration with Meilisearch. Reason: it was compatible only with the old version 0.18. The recent version of Meilisearch changed the protocol and does not work anymore. Note: we would appreciate it if you help to return it back. [#55189](https://github.com/ClickHouse/ClickHouse/pull/55189) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Rename directory monitor concept into background INSERT. All settings `*directory_monitor*` had been renamed to `distributed_background_insert*`. **Backward compatibility should be preserved** (since old settings had been added as an alias). [#55978](https://github.com/ClickHouse/ClickHouse/pull/55978) ([Azat Khuzhin](https://github.com/azat)).
* Do not mix-up send_timeout and receive_timeout. [#56035](https://github.com/ClickHouse/ClickHouse/pull/56035) ([Azat Khuzhin](https://github.com/azat)).
* Comparison of time intervals with different units will throw an exception. This closes [#55942](https://github.com/ClickHouse/ClickHouse/issues/55942). You might have occasionally rely on the previous behavior when the underlying numeric values were compared regardless of the units. [#56090](https://github.com/ClickHouse/ClickHouse/pull/56090) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### New Feature
* Add function "arrayFold(x1, ..., xn, accum -> expression, array1, ..., arrayn, init_accum)" which applies a lambda function to multiple arrays of the same cardinality and collects the result in an accumulator. [#49794](https://github.com/ClickHouse/ClickHouse/pull/49794) ([Lirikl](https://github.com/Lirikl)).
* Added aggregation function lttb which uses the [Largest-Triangle-Three-Buckets](https://skemman.is/bitstream/1946/15343/3/SS_MSthesis.pdf) algorithm for downsampling data for visualization. [#53145](https://github.com/ClickHouse/ClickHouse/pull/53145) ([Sinan](https://github.com/sinsinan)).
* Query`CHECK TABLE` has better performance and usability (sends progress updates, cancellable). Support checking particular part with `CHECK TABLE ... PART 'part_name'`. [#53404](https://github.com/ClickHouse/ClickHouse/pull/53404) ([vdimir](https://github.com/vdimir)).
* Added function `jsonMergePatch`. When working with JSON data as strings, it provides a way to merge these strings (of JSON objects) together to form a single string containing a single JSON object. [#54364](https://github.com/ClickHouse/ClickHouse/pull/54364) ([Memo](https://github.com/Joeywzr)).
* Added a new SQL function, "arrayRandomSample(arr, k)" which returns a sample of k elements from the input array. Similar functionality could previously be achieved only with less convenient syntax, e.g. "SELECT arrayReduce('groupArraySample(3)', range(10))". [#54391](https://github.com/ClickHouse/ClickHouse/pull/54391) ([itayisraelov](https://github.com/itayisraelov)).
* Added new function `getHttpHeader` to get HTTP request header value used for a request to ClickHouse server. Return empty string if the request is not done over HTTP protocol or there is no such header. [#54813](https://github.com/ClickHouse/ClickHouse/pull/54813) ([凌涛](https://github.com/lingtaolf)).
* Introduce -ArgMin/-ArgMax aggregate combinators which allow to aggregate by min/max values only. One use case can be found in [#54818](https://github.com/ClickHouse/ClickHouse/issues/54818). This PR also reorganize combinators into dedicated folder. [#54947](https://github.com/ClickHouse/ClickHouse/pull/54947) ([Amos Bird](https://github.com/amosbird)).
* Allow to drop cache for Protobuf format with `SYSTEM DROP SCHEMA FORMAT CACHE [FOR Protobuf]`. [#55064](https://github.com/ClickHouse/ClickHouse/pull/55064) ([Aleksandr Musorin](https://github.com/AVMusorin)).
* Add external HTTP Basic authenticator. [#55199](https://github.com/ClickHouse/ClickHouse/pull/55199) ([Aleksei Filatov](https://github.com/aalexfvk)).
* Added function `byteSwap` which reverses the bytes of unsigned integers. This is particularly useful for reversing values of types which are represented as unsigned integers internally such as IPv4. [#55211](https://github.com/ClickHouse/ClickHouse/pull/55211) ([Priyansh Agrawal](https://github.com/Priyansh121096)).
* Added function `formatQuery()` which returns a formatted version (possibly spanning multiple lines) of a SQL query string. Also added function `formatQuerySingleLine()` which does the same but the returned string will not contain linebreaks. [#55239](https://github.com/ClickHouse/ClickHouse/pull/55239) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Added DWARF input format that reads debug symbols from an ELF executable/library/object file. [#55450](https://github.com/ClickHouse/ClickHouse/pull/55450) ([Michael Kolupaev](https://github.com/al13n321)).
* Allow to save unparsed records and errors in RabbitMQ, NATS and FileLog engines. Add virtual columns `_error` and `_raw_message`(for NATS and RabbitMQ), `_raw_record` (for FileLog) that are filled when ClickHouse fails to parse new record. The behaviour is controlled under storage settings `nats_handle_error_mode` for NATS, `rabbitmq_handle_error_mode` for RabbitMQ, `handle_error_mode` for FileLog similar to `kafka_handle_error_mode`. If it's set to `default`, en exception will be thrown when ClickHouse fails to parse a record, if it's set to `stream`, erorr and raw record will be saved into virtual columns. Closes [#36035](https://github.com/ClickHouse/ClickHouse/issues/36035). [#55477](https://github.com/ClickHouse/ClickHouse/pull/55477) ([Kruglov Pavel](https://github.com/Avogar)).
* Keeper client improvement: add get_all_children_number command that returns number of all children nodes under a specific path. [#55485](https://github.com/ClickHouse/ClickHouse/pull/55485) ([guoxiaolong](https://github.com/guoxiaolongzte)).
* If a table has a space-filling curve in its key, e.g., `ORDER BY mortonEncode(x, y)`, the conditions on its arguments, e.g., `x >= 10 AND x <= 20 AND y >= 20 AND y <= 30` can be used for indexing. A setting `analyze_index_with_space_filling_curves` is added to enable or disable this analysis. This closes [#41195](https://github.com/ClickHouse/ClickHouse/issues/41195). Continuation of [#4538](https://github.com/ClickHouse/ClickHouse/issues/4538). Continuation of [#6286](https://github.com/ClickHouse/ClickHouse/issues/6286). Continuation of [#28130](https://github.com/ClickHouse/ClickHouse/issues/28130). Continuation of [#41753](https://github.com/ClickHouse/ClickHouse/issues/41753). [#55642](https://github.com/ClickHouse/ClickHouse/pull/55642) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add setting `optimize_trivial_approximate_count_query` to use `count()` approximation for storage EmbeddedRocksDB. Enable trivial count for StorageJoin. [#55806](https://github.com/ClickHouse/ClickHouse/pull/55806) ([Duc Canh Le](https://github.com/canhld94)).
* Keeper client improvement: add get_direct_children_number command that returns number of direct children nodes under a path. [#55898](https://github.com/ClickHouse/ClickHouse/pull/55898) ([xuzifu666](https://github.com/xuzifu666)).
* Add statement `SHOW SETTING setting_name` which is a simpler version of existing statement `SHOW SETTINGS`. [#55979](https://github.com/ClickHouse/ClickHouse/pull/55979) ([Maksim Kita](https://github.com/kitaisreal)).
* This pr gives possibility to pass data in Npy format to Clickhouse. ``` SELECT * FROM file('example_array.npy', Npy). [#55982](https://github.com/ClickHouse/ClickHouse/pull/55982) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* This PR impleents a new setting called `force_optimize_projection_name`, it takes a name of projection as an argument. If it's value set to a non-empty string, ClickHouse checks that this projection is used in the query at least once. Closes [#55331](https://github.com/ClickHouse/ClickHouse/issues/55331). [#56134](https://github.com/ClickHouse/ClickHouse/pull/56134) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
#### Performance Improvement
* Add option `query_plan_preserve_num_streams_after_window_functions` to preserve the number of streams after evaluating window functions to allow parallel stream processing. [#50771](https://github.com/ClickHouse/ClickHouse/pull/50771) ([frinkr](https://github.com/frinkr)).
* Release more num_streams if data is small. [#53867](https://github.com/ClickHouse/ClickHouse/pull/53867) ([Jiebin Sun](https://github.com/jiebinn)).
* RoaringBitmaps being optimized before serialization. [#55044](https://github.com/ClickHouse/ClickHouse/pull/55044) ([UnamedRus](https://github.com/UnamedRus)).
* Posting lists in inverted indexes are now optimized to use the smallest possible representation for internal bitmaps. Depending on the repetitiveness of the data, this may significantly reduce the space consumption of inverted indexes. [#55069](https://github.com/ClickHouse/ClickHouse/pull/55069) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Fix contention on Context lock, this significantly improves performance for a lot of short-running concurrent queries. [#55121](https://github.com/ClickHouse/ClickHouse/pull/55121) ([Maksim Kita](https://github.com/kitaisreal)).
* Improved the performance of inverted index creation by 30%. This was achieved by replacing `std::unordered_map` with `absl::flat_hash_map`. [#55210](https://github.com/ClickHouse/ClickHouse/pull/55210) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Support orc filter push down (rowgroup level). [#55330](https://github.com/ClickHouse/ClickHouse/pull/55330) ([李扬](https://github.com/taiyang-li)).
* Improve performance of external aggregation with a lot of temporary files. [#55489](https://github.com/ClickHouse/ClickHouse/pull/55489) ([Maksim Kita](https://github.com/kitaisreal)).
* Set a reasonable size for the marks cache for secondary indices by default to avoid loading the marks over and over again. [#55654](https://github.com/ClickHouse/ClickHouse/pull/55654) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Avoid unnecessary reconstruction of index granules when reading skip indexes. This addresses [#55653](https://github.com/ClickHouse/ClickHouse/issues/55653)#issuecomment-1763766009 . [#55683](https://github.com/ClickHouse/ClickHouse/pull/55683) ([Amos Bird](https://github.com/amosbird)).
* Cache cast function in set during execution to improve the performance of function `IN` when set element type doesn't exactly match column type. [#55712](https://github.com/ClickHouse/ClickHouse/pull/55712) ([Duc Canh Le](https://github.com/canhld94)).
* Performance improvement for `ColumnVector::insertMany` and `ColumnVector::insertManyFrom`. [#55714](https://github.com/ClickHouse/ClickHouse/pull/55714) ([frinkr](https://github.com/frinkr)).
* ... Getting values from a map is widely used. In practice, the key structrues are usally the same in the same map column, we could try to predict the next row's key position and reduce the comparisons. [#55929](https://github.com/ClickHouse/ClickHouse/pull/55929) ([lgbo](https://github.com/lgbo-ustc)).
* ... Fix an issue that struct field prune doesn't work in some cases. For example ```sql INSERT INTO FUNCTION file('test_parquet_struct', Parquet, 'x Tuple(a UInt32, b UInt32, c String)') SELECT tuple(number, rand(), concat('testxxxxxxx' toString(number))) FROM numbers(10);. [#56117](https://github.com/ClickHouse/ClickHouse/pull/56117) ([lgbo](https://github.com/lgbo-ustc)).
#### Improvement
* This is the second part of Kusto Query Language dialect support. [Phase 1 implementation ](https://github.com/ClickHouse/ClickHouse/pull/37961) has been merged. [#42510](https://github.com/ClickHouse/ClickHouse/pull/42510) ([larryluogit](https://github.com/larryluogit)).
* Op processors IDs are raw ptrs casted to UInt64. Print it in a prettier manner:. [#48852](https://github.com/ClickHouse/ClickHouse/pull/48852) ([Vlad Seliverstov](https://github.com/behebot)).
* Creating a direct dictionary with a lifetime field set will be rejected at create time. Fixes: [#27861](https://github.com/ClickHouse/ClickHouse/issues/27861). [#49043](https://github.com/ClickHouse/ClickHouse/pull/49043) ([Rory Crispin](https://github.com/RoryCrispin)).
* Allow parameters in queries with partitions like `ALTER TABLE t DROP PARTITION`. Closes [#49449](https://github.com/ClickHouse/ClickHouse/issues/49449). [#49516](https://github.com/ClickHouse/ClickHouse/pull/49516) ([Nikolay Degterinsky](https://github.com/evillique)).
* 1.Refactor the code about zookeeper_connection 2.Add a new column xid for zookeeper_connection. [#50702](https://github.com/ClickHouse/ClickHouse/pull/50702) ([helifu](https://github.com/helifu)).
* Add the ability to tune the number of parallel replicas used in a query execution based on the estimation of rows to read. [#51692](https://github.com/ClickHouse/ClickHouse/pull/51692) ([Raúl Marín](https://github.com/Algunenano)).
* Distributed queries executed in `async_socket_for_remote` mode (default) now respect `max_threads` limit. Previously, some queries could create excessive threads (up to `max_distributed_connections`), causing server performance issues. [#53504](https://github.com/ClickHouse/ClickHouse/pull/53504) ([filimonov](https://github.com/filimonov)).
* Display the correct server settings after reload. [#53774](https://github.com/ClickHouse/ClickHouse/pull/53774) ([helifu](https://github.com/helifu)).
* Add support for mathematical minus `` character in queries, similar to `-`. [#54100](https://github.com/ClickHouse/ClickHouse/pull/54100) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add replica groups to the Replicated database engine. Closes [#53620](https://github.com/ClickHouse/ClickHouse/issues/53620). [#54421](https://github.com/ClickHouse/ClickHouse/pull/54421) ([Nikolay Degterinsky](https://github.com/evillique)).
* This PR will fix UBsan test error [here](https://github.com/ClickHouse/ClickHouse/pull/54566). [#54568](https://github.com/ClickHouse/ClickHouse/pull/54568) ([JackyWoo](https://github.com/JackyWoo)).
* Support asynchronous inserts with external data via native protocol. Previously it worked only if data is inlined into query. [#54730](https://github.com/ClickHouse/ClickHouse/pull/54730) ([Anton Popov](https://github.com/CurtizJ)).
* It is better to retry retriable s3 errors than totally fail the query. Set bigger value to the s3_retry_attempts by default. [#54770](https://github.com/ClickHouse/ClickHouse/pull/54770) ([Sema Checherinda](https://github.com/CheSema)).
* Optimised external aggregation memory consumption in case many temporary files were generated. [#54798](https://github.com/ClickHouse/ClickHouse/pull/54798) ([Nikita Taranov](https://github.com/nickitat)).
* Add load balancing test_hostname_levenshtein_distance. [#54826](https://github.com/ClickHouse/ClickHouse/pull/54826) ([JackyWoo](https://github.com/JackyWoo)).
* Caching skip-able entries while executing DDL from Zookeeper distributed DDL queue. [#54828](https://github.com/ClickHouse/ClickHouse/pull/54828) ([Duc Canh Le](https://github.com/canhld94)).
* Improve hiding secrets in logs. [#55089](https://github.com/ClickHouse/ClickHouse/pull/55089) ([Vitaly Baranov](https://github.com/vitlibar)).
* Added fields `substreams` and `filenames` to the `system.parts_columns` table. [#55108](https://github.com/ClickHouse/ClickHouse/pull/55108) ([Anton Popov](https://github.com/CurtizJ)).
* For now the projection analysis will be performed only on top of query plan. The setting `query_plan_optimize_projection` became obsolete (it was enabled by default long time ago). [#55112](https://github.com/ClickHouse/ClickHouse/pull/55112) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* When function "untuple()" is now called on a tuple with named elements and itself has an alias (e.g. "select untuple(tuple(1)::Tuple(element_alias Int)) AS untuple_alias"), then the result column name is now generated from the untuple alias and the tuple element alias (in the example: "untuple_alias.element_alias"). [#55123](https://github.com/ClickHouse/ClickHouse/pull/55123) ([garcher22](https://github.com/garcher22)).
* Added setting `describe_include_virtual_columns`, which allows to include virtual columns of table into result of `DESCRIBE` query. Added setting `describe_compact_output`. If it is set to `true`, `DESCRIBE` query returns only names and types of columns without extra information. [#55129](https://github.com/ClickHouse/ClickHouse/pull/55129) ([Anton Popov](https://github.com/CurtizJ)).
* Sometimes `OPTIMIZE` with `optimize_throw_if_noop=1` may fail with an error `unknown reason` while the real cause of it - different projections in different parts. This behavior is fixed. [#55130](https://github.com/ClickHouse/ClickHouse/pull/55130) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Allow to have several MaterializedPostgreSQL tables following the same Postgres table. By default this behaviour is not enabled (for compatibility, because it is backward-incompatible change), but can be turned on with setting `materialized_postgresql_use_unique_replication_consumer_identifier`. Closes [#54918](https://github.com/ClickHouse/ClickHouse/issues/54918). [#55145](https://github.com/ClickHouse/ClickHouse/pull/55145) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Allow to parse negative DateTime64 and DateTime with fractional part from short strings. [#55146](https://github.com/ClickHouse/ClickHouse/pull/55146) ([Andrey Zvonov](https://github.com/zvonand)).
* To improve compatibility with MySQL, 1. "information_schema.tables" now includes the new field "table_rows", and 2. "information_schema.columns" now includes the new field "extra". [#55215](https://github.com/ClickHouse/ClickHouse/pull/55215) ([Robert Schulze](https://github.com/rschu1ze)).
* Clickhouse-client won't show "0 rows in set" if it is zero and if exception was thrown. [#55240](https://github.com/ClickHouse/ClickHouse/pull/55240) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Support rename table without keyword `TABLE` like `RENAME db.t1 to db.t2`. [#55373](https://github.com/ClickHouse/ClickHouse/pull/55373) ([凌涛](https://github.com/lingtaolf)).
* Add internal_replication to system.clusters. [#55377](https://github.com/ClickHouse/ClickHouse/pull/55377) ([Konstantin Morozov](https://github.com/k-morozov)).
* Select remote proxy resolver based on request protocol, add proxy feature docs and remove `DB::ProxyConfiguration::Protocol::ANY`. [#55430](https://github.com/ClickHouse/ClickHouse/pull/55430) ([Arthur Passos](https://github.com/arthurpassos)).
* Avoid retrying keeper operations on INSERT after table shutdown. [#55519](https://github.com/ClickHouse/ClickHouse/pull/55519) ([Azat Khuzhin](https://github.com/azat)).
* Improved overall resilience for ClickHouse in case of many parts within partition (more than 1000). It might reduce the number of `TOO_MANY_PARTS` errors. [#55526](https://github.com/ClickHouse/ClickHouse/pull/55526) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Follow up https://github.com/ClickHouse/ClickHouse/pull/55184 to not to fall into `UNKNOWN_SETTING` error when the user uses the deleted MergeTree settings. [#55557](https://github.com/ClickHouse/ClickHouse/pull/55557) ([Jihyuk Bok](https://github.com/tomahawk28)).
* Updated `dashboard.html()` added a `if` to check if there is one chart, then "maximize" and "drag" buttons are not shown. [#55581](https://github.com/ClickHouse/ClickHouse/pull/55581) ([bhavuk2002](https://github.com/bhavuk2002)).
* Functions `toDayOfWeek()` (MySQL alias: `DAYOFWEEK()`), `toYearWeek()` (`YEARWEEK()`) and `toWeek()` (`WEEK()`) now supports `String` arguments. This makes its behavior consistent with MySQL's behavior. [#55589](https://github.com/ClickHouse/ClickHouse/pull/55589) ([Robert Schulze](https://github.com/rschu1ze)).
* Implement query parameters support for `ALTER TABLE ... ACTION PARTITION [ID] {parameter_name:ParameterType}`. Merges [#49516](https://github.com/ClickHouse/ClickHouse/issues/49516). Closes [#49449](https://github.com/ClickHouse/ClickHouse/issues/49449). [#55604](https://github.com/ClickHouse/ClickHouse/pull/55604) ([alesapin](https://github.com/alesapin)).
* Inverted indexes do not store tokens with too many matches (i.e. row ids in the posting list). This saves space and avoids ineffective index lookups when sequential scans would be equally fast or faster. The previous heuristics (`density` parameter passed to the index definition) that controlled when tokens would not be stored was too confusing for users. A much simpler heuristics based on parameter `max_rows_per_postings_list` (default: 64k) is introduced which directly controls the maximum allowed number of row ids in a postings list. [#55616](https://github.com/ClickHouse/ClickHouse/pull/55616) ([Harry Lee](https://github.com/HarryLeeIBM)).
* `SHOW COLUMNS` now correctly reports type `FixedString` as `BLOB` if setting `use_mysql_types_in_show_columns` is on. Also added two new settings, `mysql_map_string_to_text_in_show_columns` and `mysql_map_fixed_string_to_text_in_show_columns` to switch the output for types `String` and `FixedString` as `TEXT` or `BLOB`. [#55617](https://github.com/ClickHouse/ClickHouse/pull/55617) ([Serge Klochkov](https://github.com/slvrtrn)).
* During ReplicatedMergeTree tables startup clickhouse server checks set of parts for unexpected parts (exists locally, but not in zookeeper). All unexpected parts move to detached directory and instead of them server tries to restore some ancestor (covered) parts. Now server tries to restore closest ancestors instead of random covered parts. [#55645](https://github.com/ClickHouse/ClickHouse/pull/55645) ([alesapin](https://github.com/alesapin)).
* The advanced dashboard now supports draggable charts on touch devices. This closes [#54206](https://github.com/ClickHouse/ClickHouse/issues/54206). [#55649](https://github.com/ClickHouse/ClickHouse/pull/55649) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Introduced setting `date_time_overflow_behavior` with possible values `ignore`, `throw`, `saturate` that controls the overflow behavior when converting from Date, Date32, DateTime64, Integer or Float to Date, Date32, DateTime or DateTime64. [#55696](https://github.com/ClickHouse/ClickHouse/pull/55696) ([Andrey Zvonov](https://github.com/zvonand)).
* Improve write performance to rocksdb. [#55732](https://github.com/ClickHouse/ClickHouse/pull/55732) ([Duc Canh Le](https://github.com/canhld94)).
* Use the default query format if declared when outputting exception with http_write_exception_in_output_format. [#55739](https://github.com/ClickHouse/ClickHouse/pull/55739) ([Raúl Marín](https://github.com/Algunenano)).
* Use upstream repo for apache datasketches. [#55787](https://github.com/ClickHouse/ClickHouse/pull/55787) ([Nikita Taranov](https://github.com/nickitat)).
* Add support for SHOW MERGES query. [#55815](https://github.com/ClickHouse/ClickHouse/pull/55815) ([megao](https://github.com/jetgm)).
* Provide a better message for common MV pitfalls. [#55826](https://github.com/ClickHouse/ClickHouse/pull/55826) ([Raúl Marín](https://github.com/Algunenano)).
* Reduced memory consumption during loading of hierarchical dictionaries. [#55838](https://github.com/ClickHouse/ClickHouse/pull/55838) ([Nikita Taranov](https://github.com/nickitat)).
* All dictionaries support setting `dictionary_use_async_executor`. [#55839](https://github.com/ClickHouse/ClickHouse/pull/55839) ([vdimir](https://github.com/vdimir)).
* If you dropped the current database, you will still be able to run some queries in `clickhouse-local` and switch to another database. This makes the behavior consistent with `clickhouse-client`. This closes [#55834](https://github.com/ClickHouse/ClickHouse/issues/55834). [#55853](https://github.com/ClickHouse/ClickHouse/pull/55853) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Functions `(add|subtract)(Year|Quarter|Month|Week|Day|Hour|Minute|Second|Millisecond|Microsecond|Nanosecond)` now support string-encoded date arguments, e.g. `SELECT addDays('2023-10-22', 1)`. This increases compatibility with MySQL and is needed by Tableau Online. [#55869](https://github.com/ClickHouse/ClickHouse/pull/55869) ([Robert Schulze](https://github.com/rschu1ze)).
* Introduce setting `create_table_empty_primary_key_by_default` for default `ORDER BY ()`. [#55899](https://github.com/ClickHouse/ClickHouse/pull/55899) ([Srikanth Chekuri](https://github.com/srikanthccv)).
* Prevent excesive memory usage when deserializing AggregateFunctionTopKGenericData. [#55947](https://github.com/ClickHouse/ClickHouse/pull/55947) ([Raúl Marín](https://github.com/Algunenano)).
* The setting `apply_deleted_mask` when disabled allows to read rows that where marked as deleted by lightweight DELETE queries. This is useful for debugging. [#55952](https://github.com/ClickHouse/ClickHouse/pull/55952) ([Alexander Gololobov](https://github.com/davenger)).
* Allow skip null values when serailize tuple to json objects, which makes it possible to keep compatiability with spark to_json function, which is also useful for gluten. [#55956](https://github.com/ClickHouse/ClickHouse/pull/55956) ([李扬](https://github.com/taiyang-li)).
* Functions `(add|sub)Date()` now support string-encoded date arguments, e.g. `SELECT addDate('2023-10-22 11:12:13', INTERVAL 5 MINUTE)`. The same support for string-encoded date arguments is added to the plus and minus operators, e.g. `SELECT '2023-10-23' + INTERVAL 1 DAY`. This increases compatibility with MySQL and is needed by Tableau Online. [#55960](https://github.com/ClickHouse/ClickHouse/pull/55960) ([Robert Schulze](https://github.com/rschu1ze)).
* Allow unquoted strings with CR in CSV format. Closes [#39930](https://github.com/ClickHouse/ClickHouse/issues/39930). [#56046](https://github.com/ClickHouse/ClickHouse/pull/56046) ([Kruglov Pavel](https://github.com/Avogar)).
* On a Keeper with lots of watches AsyncMetrics threads can consume 100% of CPU for noticable time in `DB::KeeperStorage::getSessionsWithWatchesCount()`. The fix is to avoid traversing heavy `watches` and `list_watches` sets. [#56054](https://github.com/ClickHouse/ClickHouse/pull/56054) ([Alexander Gololobov](https://github.com/davenger)).
* Allow to run `clickhouse-keeper` using embedded config. [#56086](https://github.com/ClickHouse/ClickHouse/pull/56086) ([Maksim Kita](https://github.com/kitaisreal)).
* Set limit of the maximum configuration value for queued.min.messages to avoid problem with start fetching data with Kafka. [#56121](https://github.com/ClickHouse/ClickHouse/pull/56121) ([Stas Morozov](https://github.com/r3b-fish)).
* Fixed typo in SQL function `minSampleSizeContinous` (renamed `minSampleSizeContinuous`). Old name is preserved for backward compatibility. This closes: [#56139](https://github.com/ClickHouse/ClickHouse/issues/56139). [#56143](https://github.com/ClickHouse/ClickHouse/pull/56143) ([Dorota Szeremeta](https://github.com/orotaday)).
* Print corrupted part path on disk before shutdown server. Before this change if a part is corrupted on disk and server cannot start, it was almost impossible to understand which part is broken. This is fixed. [#56181](https://github.com/ClickHouse/ClickHouse/pull/56181) ([Duc Canh Le](https://github.com/canhld94)).
#### Build/Testing/Packaging Improvement
* If the database is already initialized, it doesn't need to be initialized again upon subsequent launches. This can potentially fix the issue of infinite container restarts when the database fails to load within 1000 attempts (relevant for very large databases and multi-node setups). [#50724](https://github.com/ClickHouse/ClickHouse/pull/50724) ([Alexander Nikolaev](https://github.com/AlexNik)).
* Resource with source code including submodules is built in Darwin special build task. It may be used to build ClickHouse without checkouting submodules. [#51435](https://github.com/ClickHouse/ClickHouse/pull/51435) ([Ilya Yatsishin](https://github.com/qoega)).
* An error will occur when building ClickHouse with the avx series of instructions enabled. CMake command ```shell cmake .. -DCMAKE_BUILD_TYPE=Release -DENABLE_AVX=ON -DENABLE_AVX2=ON -DENABLE_AVX2_FOR_SPEC_OP=ON ``` Failed message ``` [1558/11630] Building CXX object contrib/snappy-cmake/CMakeFiles/_snappy.dir/__/snappy/snappy.cc.o FAILED: contrib/snappy-cmake/CMakeFiles/_snappy.dir/__/snappy/snappy.cc.o /usr/bin/ccache /usr/bin/clang++-17 --target=x86_64-linux-gnu --sysroot=/opt/ClickHouse/cmake/linux/../../contrib/sysroot/linux-x86_64/x86_64-linux-gnu/libc -DHAVE_CONFIG_H -DSTD_EXCEPTION_HAS_STACK_TRACE=1 -D_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS -I/opt/ClickHouse/base/glibc-compatibility/memcpy -isystem /opt/ClickHouse/contrib/snappy -isystem /opt/ClickHouse/build/contrib/snappy-cmake -isystem /opt/ClickHouse/contrib/llvm-project/libcxx/include -isystem /opt/ClickHouse/contrib/llvm-project/libcxxabi/include -isystem /opt/ClickHouse/contrib/libunwind/include --gcc-toolchain=/opt/ClickHouse/cmake/linux/../../contrib/sysroot/linux-x86_64 -fdiagnostics-color=always -Wno-enum-constexpr-conversion -fsized-deallocation -pipe -mssse3 -msse4.1 -msse4.2 -mpclmul -mpopcnt -mavx -mavx2 -fasynchronous-unwind-tables -ffile-prefix-map=/opt/ClickHouse=. -falign-functions=32 -mbranches-within-32B-boundaries -fdiagnostics-absolute-paths -fstrict-vtable-pointers -w -O3 -DNDEBUG -D OS_LINUX -Werror -nostdinc++ -std=c++2b -MD -MT contrib/snappy-cmake/CMakeFiles/_snappy.dir/__/snappy/snappy.cc.o -MF contrib/snappy-cmake/CMakeFiles/_snappy.dir/__/snappy/snappy.cc.o.d -o contrib/snappy-cmake/CMakeFiles/_snappy.dir/__/snappy/snappy.cc.o -c /opt/ClickHouse/contrib/snappy/snappy.cc /opt/ClickHouse/contrib/snappy/snappy.cc:1061:3: error: unknown type name '__m256i' 1061 | __m256i data = _mm256_lddqu_si256(static_cast<const __m256i *>(src)); | ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1061:55: error: unknown type name '__m256i' 1061 | __m256i data = _mm256_lddqu_si256(static_cast<const __m256i *>(src)); | ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1061:18: error: use of undeclared identifier '_mm256_lddqu_si256'; did you mean '_mm_lddqu_si128'? 1061 | __m256i data = _mm256_lddqu_si256(static_cast<const __m256i *>(src)); | ^~~~~~~~~~~~~~~~~~ | _mm_lddqu_si128 /usr/lib/llvm-17/lib/clang/17/include/pmmintrin.h:38:1: note: '_mm_lddqu_si128' declared here 38 | _mm_lddqu_si128(__m128i_u const *__p) | ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1061:66: error: cannot initialize a parameter of type 'const __m128i_u *' with an lvalue of type 'const void *' 1061 | __m256i data = _mm256_lddqu_si256(static_cast<const __m256i *>(src)); | ^~~ /usr/lib/llvm-17/lib/clang/17/include/pmmintrin.h:38:34: note: passing argument to parameter '__p' here 38 | _mm_lddqu_si128(__m128i_u const *__p) | ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1062:40: error: unknown type name '__m256i' 1062 | _mm256_storeu_si256(reinterpret_cast<__m256i *>(dst), data); | ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1065:49: error: unknown type name '__m256i' 1065 | data = _mm256_lddqu_si256(static_cast<const __m256i *>(src) + 1); | ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1065:65: error: arithmetic on a pointer to void 1065 | data = _mm256_lddqu_si256(static_cast<const __m256i *>(src) + 1); | ~~~ ^ /opt/ClickHouse/contrib/snappy/snappy.cc:1066:42: error: unknown type name '__m256i' 1066 | _mm256_storeu_si256(reinterpret_cast<__m256i *>(dst) + 1, data); | ^ 8 errors generated. [1567/11630] Building CXX object contrib/rocksdb-cma...rocksdb.dir/__/rocksdb/db/arena_wrapped_db_iter.cc.o ninja: build stopped: subcommand failed. ``` The reason is that snappy does not enable SNAPPY_HAVE_X86_CRC32. [#55049](https://github.com/ClickHouse/ClickHouse/pull/55049) ([monchickey](https://github.com/monchickey)).
* Add `instance_env_variables` option to integration tests. [#55208](https://github.com/ClickHouse/ClickHouse/pull/55208) ([Arthur Passos](https://github.com/arthurpassos)).
* Solve issue with launching standalone clickhouse-keeper from clickhouse-server package. [#55226](https://github.com/ClickHouse/ClickHouse/pull/55226) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* In tests RabbitMQ version is updated to 3.12.6. Improved logs collection for RabbitMQ tests. [#55424](https://github.com/ClickHouse/ClickHouse/pull/55424) ([Ilya Yatsishin](https://github.com/qoega)).
* Fix integration check python script to use gh api url - Add Readme for CI tests. [#55476](https://github.com/ClickHouse/ClickHouse/pull/55476) ([Max K.](https://github.com/mkaynov)).
* Fix integration check python script to use gh api url - Add Readme for CI tests. [#55716](https://github.com/ClickHouse/ClickHouse/pull/55716) ([Max K.](https://github.com/mkaynov)).
* Check sha512 for tgz; use a proper repository for keeper; write only filenames to TGZ.sha512 files for tarball packages. Prerequisite for [#31473](https://github.com/ClickHouse/ClickHouse/issues/31473). [#55717](https://github.com/ClickHouse/ClickHouse/pull/55717) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Updated to get free port for azurite. [#55796](https://github.com/ClickHouse/ClickHouse/pull/55796) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Reduce the ouput info. [#55938](https://github.com/ClickHouse/ClickHouse/pull/55938) ([helifu](https://github.com/helifu)).
* Modified the error message difference between openssl and boringssl to fix the functional test. [#55975](https://github.com/ClickHouse/ClickHouse/pull/55975) ([MeenaRenganathan22](https://github.com/MeenaRenganathan22)).
* Changes to support the HDFS for s390x. [#56128](https://github.com/ClickHouse/ClickHouse/pull/56128) ([MeenaRenganathan22](https://github.com/MeenaRenganathan22)).
* Fix flaky test of jbod balancer by relaxing the Gini coefficient and introducing more determinism in insertions. [#56175](https://github.com/ClickHouse/ClickHouse/pull/56175) ([Amos Bird](https://github.com/amosbird)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* skip hardlinking inverted index files in mutation [#47663](https://github.com/ClickHouse/ClickHouse/pull/47663) ([cangyin](https://github.com/cangyin)).
* Fix 'Cannot find column' in read-in-order optimization with ARRAY JOIN [#51746](https://github.com/ClickHouse/ClickHouse/pull/51746) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Support missed Object(Nullable(json)) subcolumns in query. [#54052](https://github.com/ClickHouse/ClickHouse/pull/54052) ([zps](https://github.com/VanDarkholme7)).
* Re-add fix for `accurateCastOrNull()` [#54629](https://github.com/ClickHouse/ClickHouse/pull/54629) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Fix detecting DEFAULT for columns of a Distributed table created without AS [#55060](https://github.com/ClickHouse/ClickHouse/pull/55060) ([Vitaly Baranov](https://github.com/vitlibar)).
* Proper cleanup in case of exception in ctor of ShellCommandSource [#55103](https://github.com/ClickHouse/ClickHouse/pull/55103) ([Alexander Gololobov](https://github.com/davenger)).
* Fix deadlock in LDAP assigned role update [#55119](https://github.com/ClickHouse/ClickHouse/pull/55119) ([Julian Maicher](https://github.com/jmaicher)).
* Suppress error statistics update for internal exceptions [#55128](https://github.com/ClickHouse/ClickHouse/pull/55128) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix deadlock in backups [#55132](https://github.com/ClickHouse/ClickHouse/pull/55132) ([alesapin](https://github.com/alesapin)).
* Fix storage Iceberg files retrieval [#55144](https://github.com/ClickHouse/ClickHouse/pull/55144) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix partition pruning of extra columns in set. [#55172](https://github.com/ClickHouse/ClickHouse/pull/55172) ([Amos Bird](https://github.com/amosbird)).
* Fix recalculation of skip indexes in ALTER UPDATE queries when table has adaptive granularity [#55202](https://github.com/ClickHouse/ClickHouse/pull/55202) ([Duc Canh Le](https://github.com/canhld94)).
* Fix for background download in fs cache [#55252](https://github.com/ClickHouse/ClickHouse/pull/55252) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Avoid possible memory leaks in compressors in case of missing buffer finalization [#55262](https://github.com/ClickHouse/ClickHouse/pull/55262) ([Azat Khuzhin](https://github.com/azat)).
* Fix functions execution over sparse columns [#55275](https://github.com/ClickHouse/ClickHouse/pull/55275) ([Azat Khuzhin](https://github.com/azat)).
* Fix incorrect merging of Nested for SELECT FINAL FROM SummingMergeTree [#55276](https://github.com/ClickHouse/ClickHouse/pull/55276) ([Azat Khuzhin](https://github.com/azat)).
* Fix bug with inability to drop detached partition in replicated merge tree on top of S3 without zero copy [#55309](https://github.com/ClickHouse/ClickHouse/pull/55309) ([alesapin](https://github.com/alesapin)).
* Fix SIGSEGV in MergeSortingPartialResultTransform (due to zero chunks after remerge()) [#55335](https://github.com/ClickHouse/ClickHouse/pull/55335) ([Azat Khuzhin](https://github.com/azat)).
* Fix data-race in CreatingSetsTransform (on errors) due to throwing shared exception [#55338](https://github.com/ClickHouse/ClickHouse/pull/55338) ([Azat Khuzhin](https://github.com/azat)).
* Fix trash optimization (up to a certain extent) [#55353](https://github.com/ClickHouse/ClickHouse/pull/55353) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix leak in StorageHDFS [#55370](https://github.com/ClickHouse/ClickHouse/pull/55370) ([Azat Khuzhin](https://github.com/azat)).
* Fix parsing of arrays in cast operator [#55417](https://github.com/ClickHouse/ClickHouse/pull/55417) ([Anton Popov](https://github.com/CurtizJ)).
* Fix filtering by virtual columns with OR filter in query [#55418](https://github.com/ClickHouse/ClickHouse/pull/55418) ([Azat Khuzhin](https://github.com/azat)).
* Fix MongoDB connection issues [#55419](https://github.com/ClickHouse/ClickHouse/pull/55419) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix MySQL interface boolean representation [#55427](https://github.com/ClickHouse/ClickHouse/pull/55427) ([Serge Klochkov](https://github.com/slvrtrn)).
* Fix MySQL text protocol DateTime formatting and LowCardinality(Nullable(T)) types reporting [#55479](https://github.com/ClickHouse/ClickHouse/pull/55479) ([Serge Klochkov](https://github.com/slvrtrn)).
* Make `use_mysql_types_in_show_columns` affect only `SHOW COLUMNS` [#55481](https://github.com/ClickHouse/ClickHouse/pull/55481) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix stack symbolizer parsing DW_FORM_ref_addr incorrectly and sometimes crashing [#55483](https://github.com/ClickHouse/ClickHouse/pull/55483) ([Michael Kolupaev](https://github.com/al13n321)).
* Destroy fiber in case of exception in cancelBefore in AsyncTaskExecutor [#55516](https://github.com/ClickHouse/ClickHouse/pull/55516) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix Query Parameters not working with custom HTTP handlers [#55521](https://github.com/ClickHouse/ClickHouse/pull/55521) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Fix checking of non handled data for Values format [#55527](https://github.com/ClickHouse/ClickHouse/pull/55527) ([Azat Khuzhin](https://github.com/azat)).
* Fix 'Invalid cursor state' in odbc interacting with MS SQL Server [#55558](https://github.com/ClickHouse/ClickHouse/pull/55558) ([vdimir](https://github.com/vdimir)).
* Fix max execution time and 'break' overflow mode [#55577](https://github.com/ClickHouse/ClickHouse/pull/55577) ([Alexander Gololobov](https://github.com/davenger)).
* Fix crash in QueryNormalizer with cyclic aliases [#55602](https://github.com/ClickHouse/ClickHouse/pull/55602) ([vdimir](https://github.com/vdimir)).
* Disable wrong optimization and add a test [#55609](https://github.com/ClickHouse/ClickHouse/pull/55609) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Merging [#52352](https://github.com/ClickHouse/ClickHouse/issues/52352) [#55621](https://github.com/ClickHouse/ClickHouse/pull/55621) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add a test to avoid incorrect decimal sorting [#55662](https://github.com/ClickHouse/ClickHouse/pull/55662) ([Amos Bird](https://github.com/amosbird)).
* Fix progress bar for s3 and azure Cluster functions with url without globs [#55666](https://github.com/ClickHouse/ClickHouse/pull/55666) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix filtering by virtual columns with OR filter in query (resubmit) [#55678](https://github.com/ClickHouse/ClickHouse/pull/55678) ([Azat Khuzhin](https://github.com/azat)).
* Fixes and improvements for Iceberg storage [#55695](https://github.com/ClickHouse/ClickHouse/pull/55695) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix data race in CreatingSetsTransform (v2) [#55786](https://github.com/ClickHouse/ClickHouse/pull/55786) ([Azat Khuzhin](https://github.com/azat)).
* Throw exception when parsing illegal string as float if precise_float_parsing is true [#55861](https://github.com/ClickHouse/ClickHouse/pull/55861) ([李扬](https://github.com/taiyang-li)).
* Disable predicate pushdown if the CTE contains stateful functions [#55871](https://github.com/ClickHouse/ClickHouse/pull/55871) ([Raúl Marín](https://github.com/Algunenano)).
* Fix normalize ASTSelectWithUnionQuery strip FORMAT of the query [#55887](https://github.com/ClickHouse/ClickHouse/pull/55887) ([flynn](https://github.com/ucasfl)).
* Try to fix possible segfault in Native ORC input format [#55891](https://github.com/ClickHouse/ClickHouse/pull/55891) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix window functions in case of sparse columns. [#55895](https://github.com/ClickHouse/ClickHouse/pull/55895) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* fix: StorageNull supports subcolumns [#55912](https://github.com/ClickHouse/ClickHouse/pull/55912) ([FFish](https://github.com/wxybear)).
* Do not write retriable errors for Replicated mutate/merge into error log [#55944](https://github.com/ClickHouse/ClickHouse/pull/55944) ([Azat Khuzhin](https://github.com/azat)).
* Fix `SHOW DATABASES LIMIT <N>` [#55962](https://github.com/ClickHouse/ClickHouse/pull/55962) ([Raúl Marín](https://github.com/Algunenano)).
* Fix autogenerated Protobuf schema with fields with underscore [#55974](https://github.com/ClickHouse/ClickHouse/pull/55974) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix dateTime64ToSnowflake64() with non-default scale [#55983](https://github.com/ClickHouse/ClickHouse/pull/55983) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix output/input of Arrow dictionary column [#55989](https://github.com/ClickHouse/ClickHouse/pull/55989) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix fetching schema from schema registry in AvroConfluent [#55991](https://github.com/ClickHouse/ClickHouse/pull/55991) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix 'Block structure mismatch' on concurrent ALTER and INSERTs in Buffer table [#55995](https://github.com/ClickHouse/ClickHouse/pull/55995) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix incorrect free space accounting for least_used JBOD policy [#56030](https://github.com/ClickHouse/ClickHouse/pull/56030) ([Azat Khuzhin](https://github.com/azat)).
* Fix missing scalar issue when evaluating subqueries inside table functions [#56057](https://github.com/ClickHouse/ClickHouse/pull/56057) ([Amos Bird](https://github.com/amosbird)).
* Fix wrong query result when http_write_exception_in_output_format=1 [#56135](https://github.com/ClickHouse/ClickHouse/pull/56135) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix schema cache for fallback JSON->JSONEachRow with changed settings [#56172](https://github.com/ClickHouse/ClickHouse/pull/56172) ([Kruglov Pavel](https://github.com/Avogar)).
* Add error handler to odbc-bridge [#56185](https://github.com/ClickHouse/ClickHouse/pull/56185) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
#### NO CL ENTRY
* NO CL ENTRY: 'Revert "Fix libssh+openssl3 & s390x (part 2)"'. [#55188](https://github.com/ClickHouse/ClickHouse/pull/55188) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Support SAMPLE BY for VIEW"'. [#55357](https://github.com/ClickHouse/ClickHouse/pull/55357) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Revert "refine error code of duplicated index in create query""'. [#55467](https://github.com/ClickHouse/ClickHouse/pull/55467) ([Han Fei](https://github.com/hanfei1991)).
* NO CL ENTRY: 'Update mysql.md - Remove the Private Preview Note'. [#55486](https://github.com/ClickHouse/ClickHouse/pull/55486) ([Ryadh DAHIMENE](https://github.com/Ryado)).
* NO CL ENTRY: 'Revert "Removed "maximize" and "drag" buttons from `dashboard` in case of single chart"'. [#55623](https://github.com/ClickHouse/ClickHouse/pull/55623) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Fix filtering by virtual columns with OR filter in query"'. [#55657](https://github.com/ClickHouse/ClickHouse/pull/55657) ([Antonio Andelic](https://github.com/antonio2368)).
* NO CL ENTRY: 'Revert "Improve ColumnDecimal, ColumnVector getPermutation performance using pdqsort with RadixSort"'. [#55682](https://github.com/ClickHouse/ClickHouse/pull/55682) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Integration check script fix ups"'. [#55694](https://github.com/ClickHouse/ClickHouse/pull/55694) ([alesapin](https://github.com/alesapin)).
* NO CL ENTRY: 'Revert "Fix 'Block structure mismatch' on concurrent ALTER and INSERTs in Buffer table"'. [#56103](https://github.com/ClickHouse/ClickHouse/pull/56103) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Add function getHttpHeader"'. [#56109](https://github.com/ClickHouse/ClickHouse/pull/56109) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Fix output/input of Arrow dictionary column"'. [#56150](https://github.com/ClickHouse/ClickHouse/pull/56150) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* Set defaults_for_omitted_fields to true for hive text format [#49486](https://github.com/ClickHouse/ClickHouse/pull/49486) ([李扬](https://github.com/taiyang-li)).
* Fixing join tests with analyzer [#49555](https://github.com/ClickHouse/ClickHouse/pull/49555) ([vdimir](https://github.com/vdimir)).
* Make exception about `ALTER TABLE ... DROP COLUMN|INDEX|PROJECTION` more clear [#50181](https://github.com/ClickHouse/ClickHouse/pull/50181) ([Alexander Gololobov](https://github.com/davenger)).
* ANTI JOIN: Invalid number of rows in Chunk [#50944](https://github.com/ClickHouse/ClickHouse/pull/50944) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Analyzer: fix row policy [#53170](https://github.com/ClickHouse/ClickHouse/pull/53170) ([Dmitry Novik](https://github.com/novikd)).
* Add a test with Block structure mismatch in grace hash join. [#53278](https://github.com/ClickHouse/ClickHouse/pull/53278) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Support skip_unused_shards in Analyzer [#53282](https://github.com/ClickHouse/ClickHouse/pull/53282) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Revert "Revert "Planner prepare filters for analysis"" [#53792](https://github.com/ClickHouse/ClickHouse/pull/53792) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Add setting allow_experimental_partial_result [#54514](https://github.com/ClickHouse/ClickHouse/pull/54514) ([vdimir](https://github.com/vdimir)).
* Fix CI skip build and skip tests checks [#54532](https://github.com/ClickHouse/ClickHouse/pull/54532) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* `MergeTreeData::forcefullyMovePartToDetachedAndRemoveFromMemory` does not respect mutations [#54653](https://github.com/ClickHouse/ClickHouse/pull/54653) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix azure test by using unique names [#54738](https://github.com/ClickHouse/ClickHouse/pull/54738) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Use `--filter` to reduce checkout time [#54857](https://github.com/ClickHouse/ClickHouse/pull/54857) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Remove tests [#54873](https://github.com/ClickHouse/ClickHouse/pull/54873) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Remove useless path from lockSharedData in StorageReplicatedMergeTree [#54989](https://github.com/ClickHouse/ClickHouse/pull/54989) ([Mike Kot](https://github.com/myrrc)).
* Fix broken test [#55002](https://github.com/ClickHouse/ClickHouse/pull/55002) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Update version_date.tsv and changelogs after v23.8.3.48-lts [#55063](https://github.com/ClickHouse/ClickHouse/pull/55063) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Use `source` instead of `bash` for pre-build script [#55071](https://github.com/ClickHouse/ClickHouse/pull/55071) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Update s3queue.md to add experimental flag [#55093](https://github.com/ClickHouse/ClickHouse/pull/55093) ([Peignon Melvyn](https://github.com/melvynator)).
* Clean data dir and always start an old server version in aggregate functions compatibility test. [#55105](https://github.com/ClickHouse/ClickHouse/pull/55105) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Update version_date.tsv and changelogs after v23.9.1.1854-stable [#55118](https://github.com/ClickHouse/ClickHouse/pull/55118) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Add links to check reports in status comment [#55122](https://github.com/ClickHouse/ClickHouse/pull/55122) ([vdimir](https://github.com/vdimir)).
* Bump croaring to 2.0.2 [#55127](https://github.com/ClickHouse/ClickHouse/pull/55127) ([Robert Schulze](https://github.com/rschu1ze)).
* check if block is empty after async insert retries [#55143](https://github.com/ClickHouse/ClickHouse/pull/55143) ([Han Fei](https://github.com/hanfei1991)).
* Improve linker detection on macOS [#55147](https://github.com/ClickHouse/ClickHouse/pull/55147) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix libssh+openssl3 & s390x [#55154](https://github.com/ClickHouse/ClickHouse/pull/55154) ([Boris Kuschel](https://github.com/bkuschel)).
* Fix file cache temporary file segment range in FileSegment::reserve [#55164](https://github.com/ClickHouse/ClickHouse/pull/55164) ([vdimir](https://github.com/vdimir)).
* Fix libssh+openssl3 & s390x (part 2) [#55187](https://github.com/ClickHouse/ClickHouse/pull/55187) ([Boris Kuschel](https://github.com/bkuschel)).
* Fix wrong test name [#55190](https://github.com/ClickHouse/ClickHouse/pull/55190) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix libssh+openssl3 & s390x (Part 2.1) [#55192](https://github.com/ClickHouse/ClickHouse/pull/55192) ([Boris Kuschel](https://github.com/bkuschel)).
* Reset signals caught by clickhouse-client if a pager is in use [#55193](https://github.com/ClickHouse/ClickHouse/pull/55193) ([Azat Khuzhin](https://github.com/azat)).
* Update README.md [#55209](https://github.com/ClickHouse/ClickHouse/pull/55209) ([Tyler Hannan](https://github.com/tylerhannan)).
* remove the blocker to grow the metadata file version [#55218](https://github.com/ClickHouse/ClickHouse/pull/55218) ([Sema Checherinda](https://github.com/CheSema)).
* Fix syntax highlight in client for spaceship operator [#55224](https://github.com/ClickHouse/ClickHouse/pull/55224) ([Azat Khuzhin](https://github.com/azat)).
* Fix mypy errors [#55228](https://github.com/ClickHouse/ClickHouse/pull/55228) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Upgrade MinIO to support accepting non signed requests [#55245](https://github.com/ClickHouse/ClickHouse/pull/55245) ([Azat Khuzhin](https://github.com/azat)).
* tests: switch test_throttling to S3 over https to make it more production like [#55247](https://github.com/ClickHouse/ClickHouse/pull/55247) ([Azat Khuzhin](https://github.com/azat)).
* Evaluate defaults during async insert safer [#55253](https://github.com/ClickHouse/ClickHouse/pull/55253) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix data race in context [#55260](https://github.com/ClickHouse/ClickHouse/pull/55260) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Do not allow tests with state ERROR be overwritten by PASSED [#55261](https://github.com/ClickHouse/ClickHouse/pull/55261) ([Azat Khuzhin](https://github.com/azat)).
* Fix query formatting for SYSTEM queries [#55277](https://github.com/ClickHouse/ClickHouse/pull/55277) ([Azat Khuzhin](https://github.com/azat)).
* Context added TSA [#55278](https://github.com/ClickHouse/ClickHouse/pull/55278) ([Maksim Kita](https://github.com/kitaisreal)).
* Improve logging in query cache [#55296](https://github.com/ClickHouse/ClickHouse/pull/55296) ([Robert Schulze](https://github.com/rschu1ze)).
* MaterializedPostgreSQL: remove back check [#55297](https://github.com/ClickHouse/ClickHouse/pull/55297) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Make `use_mysql_types_in_show_columns` independent from the connection [#55298](https://github.com/ClickHouse/ClickHouse/pull/55298) ([Robert Schulze](https://github.com/rschu1ze)).
* Make `HandlingRuleHTTPHandlerFactory` more stupid, but less error prone [#55307](https://github.com/ClickHouse/ClickHouse/pull/55307) ([alesapin](https://github.com/alesapin)).
* Fix tsan issue in croaring [#55311](https://github.com/ClickHouse/ClickHouse/pull/55311) ([Robert Schulze](https://github.com/rschu1ze)).
* Refactorings and better documentation for `toStartOfInterval()` [#55327](https://github.com/ClickHouse/ClickHouse/pull/55327) ([Robert Schulze](https://github.com/rschu1ze)).
* Review [#51946](https://github.com/ClickHouse/ClickHouse/issues/51946) and partially revert it [#55336](https://github.com/ClickHouse/ClickHouse/pull/55336) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update README.md [#55339](https://github.com/ClickHouse/ClickHouse/pull/55339) ([Tyler Hannan](https://github.com/tylerhannan)).
* Context locks small fixes [#55352](https://github.com/ClickHouse/ClickHouse/pull/55352) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix bad test `01605_dictinct_two_level` [#55354](https://github.com/ClickHouse/ClickHouse/pull/55354) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix test_max_rows_to_read_leaf_with_view flakiness (due to prefer_localhost_replica) [#55355](https://github.com/ClickHouse/ClickHouse/pull/55355) ([Azat Khuzhin](https://github.com/azat)).
* Better exception messages [#55356](https://github.com/ClickHouse/ClickHouse/pull/55356) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update CHANGELOG.md [#55359](https://github.com/ClickHouse/ClickHouse/pull/55359) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Better recursion depth check [#55361](https://github.com/ClickHouse/ClickHouse/pull/55361) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Merging [#43085](https://github.com/ClickHouse/ClickHouse/issues/43085) [#55362](https://github.com/ClickHouse/ClickHouse/pull/55362) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix data-race in web disk [#55372](https://github.com/ClickHouse/ClickHouse/pull/55372) ([Azat Khuzhin](https://github.com/azat)).
* Disable skim under TSan (Rust does not supports ThreadSanitizer) [#55378](https://github.com/ClickHouse/ClickHouse/pull/55378) ([Azat Khuzhin](https://github.com/azat)).
* Fix missing thread accounting for insert_distributed_sync=1 [#55392](https://github.com/ClickHouse/ClickHouse/pull/55392) ([Azat Khuzhin](https://github.com/azat)).
* Improve tests for untuple() [#55425](https://github.com/ClickHouse/ClickHouse/pull/55425) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix test that never worked test_rabbitmq_random_detach [#55453](https://github.com/ClickHouse/ClickHouse/pull/55453) ([Ilya Yatsishin](https://github.com/qoega)).
* Fix out of bound error in system.remote_data_paths + disk web [#55468](https://github.com/ClickHouse/ClickHouse/pull/55468) ([alesapin](https://github.com/alesapin)).
* Remove existing moving/ dir if allow_remove_stale_moving_parts is off [#55480](https://github.com/ClickHouse/ClickHouse/pull/55480) ([Mike Kot](https://github.com/myrrc)).
* Bump curl to 8.4 [#55492](https://github.com/ClickHouse/ClickHouse/pull/55492) ([Robert Schulze](https://github.com/rschu1ze)).
* Minor fixes for 02882_replicated_fetch_checksums_doesnt_match.sql [#55493](https://github.com/ClickHouse/ClickHouse/pull/55493) ([vdimir](https://github.com/vdimir)).
* AggregatingTransform initGenerate race condition fix [#55495](https://github.com/ClickHouse/ClickHouse/pull/55495) ([Maksim Kita](https://github.com/kitaisreal)).
* HashTable resize exception handle fix [#55497](https://github.com/ClickHouse/ClickHouse/pull/55497) ([Maksim Kita](https://github.com/kitaisreal)).
* fix lots of 'Structure does not match' warnings in ci [#55503](https://github.com/ClickHouse/ClickHouse/pull/55503) ([Han Fei](https://github.com/hanfei1991)).
* Cleanup: parallel replica coordinator usage [#55515](https://github.com/ClickHouse/ClickHouse/pull/55515) ([Igor Nikonov](https://github.com/devcrafter)).
* add k-morozov to trusted contributors [#55523](https://github.com/ClickHouse/ClickHouse/pull/55523) ([Mike Kot](https://github.com/myrrc)).
* Forbid create inverted index if setting not enabled [#55529](https://github.com/ClickHouse/ClickHouse/pull/55529) ([flynn](https://github.com/ucasfl)).
* Better exception messages but without SEGFAULT [#55541](https://github.com/ClickHouse/ClickHouse/pull/55541) ([Antonio Andelic](https://github.com/antonio2368)).
* Avoid setting same promise twice [#55553](https://github.com/ClickHouse/ClickHouse/pull/55553) ([Antonio Andelic](https://github.com/antonio2368)).
* Better error message in case when merge selecting task failed. [#55554](https://github.com/ClickHouse/ClickHouse/pull/55554) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Add a test [#55564](https://github.com/ClickHouse/ClickHouse/pull/55564) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Added healthcheck for LDAP [#55571](https://github.com/ClickHouse/ClickHouse/pull/55571) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fix global context for tests with --gtest_filter [#55583](https://github.com/ClickHouse/ClickHouse/pull/55583) ([Azat Khuzhin](https://github.com/azat)).
* Fix replica groups for Replicated database engine [#55587](https://github.com/ClickHouse/ClickHouse/pull/55587) ([Azat Khuzhin](https://github.com/azat)).
* Remove unused protobuf includes [#55590](https://github.com/ClickHouse/ClickHouse/pull/55590) ([Raúl Marín](https://github.com/Algunenano)).
* Apply Context changes to standalone Keeper [#55591](https://github.com/ClickHouse/ClickHouse/pull/55591) ([Antonio Andelic](https://github.com/antonio2368)).
* Do not fail if label-to-remove does not exists in PR [#55592](https://github.com/ClickHouse/ClickHouse/pull/55592) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* CI: cast extra column expression `pull_request_number` to Int32 [#55599](https://github.com/ClickHouse/ClickHouse/pull/55599) ([Han Fei](https://github.com/hanfei1991)).
* Add back a test that was removed by mistake [#55605](https://github.com/ClickHouse/ClickHouse/pull/55605) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Bump croaring to v2.0.4 [#55606](https://github.com/ClickHouse/ClickHouse/pull/55606) ([Robert Schulze](https://github.com/rschu1ze)).
* byteswap: Add 16/32-byte integer support [#55607](https://github.com/ClickHouse/ClickHouse/pull/55607) ([Robert Schulze](https://github.com/rschu1ze)).
* Revert [#54421](https://github.com/ClickHouse/ClickHouse/issues/54421) [#55613](https://github.com/ClickHouse/ClickHouse/pull/55613) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix: race condition in kusto implementation [#55615](https://github.com/ClickHouse/ClickHouse/pull/55615) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Remove passed tests from `analyzer_tech_debt.txt` [#55618](https://github.com/ClickHouse/ClickHouse/pull/55618) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Enable 02161_addressToLineWithInlines [#55622](https://github.com/ClickHouse/ClickHouse/pull/55622) ([Michael Kolupaev](https://github.com/al13n321)).
* KeyCondition: preparation [#55625](https://github.com/ClickHouse/ClickHouse/pull/55625) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix flakiness of test_system_merges (by increasing sleep interval properly) [#55627](https://github.com/ClickHouse/ClickHouse/pull/55627) ([Azat Khuzhin](https://github.com/azat)).
* fix `structure does not match` logs again [#55628](https://github.com/ClickHouse/ClickHouse/pull/55628) ([Han Fei](https://github.com/hanfei1991)).
* KeyCondition: small changes [#55640](https://github.com/ClickHouse/ClickHouse/pull/55640) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Resubmit [#54421](https://github.com/ClickHouse/ClickHouse/issues/54421) [#55641](https://github.com/ClickHouse/ClickHouse/pull/55641) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix some typos [#55646](https://github.com/ClickHouse/ClickHouse/pull/55646) ([alesapin](https://github.com/alesapin)).
* Show move/maximize only if there is more than a single chart [#55648](https://github.com/ClickHouse/ClickHouse/pull/55648) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Enable test_query_is_lock_free[detach table] for the analyzer [#55668](https://github.com/ClickHouse/ClickHouse/pull/55668) ([Raúl Marín](https://github.com/Algunenano)).
* Allow FINAL with parallel replicas with custom key [#55679](https://github.com/ClickHouse/ClickHouse/pull/55679) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix StorageMaterializedView::isRemote [#55681](https://github.com/ClickHouse/ClickHouse/pull/55681) ([vdimir](https://github.com/vdimir)).
* Bump gRPC to 1.34.1 [#55693](https://github.com/ClickHouse/ClickHouse/pull/55693) ([Robert Schulze](https://github.com/rschu1ze)).
* Randomize block_number column setting in ci [#55713](https://github.com/ClickHouse/ClickHouse/pull/55713) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Use pool for proxied S3 disk http sessions [#55718](https://github.com/ClickHouse/ClickHouse/pull/55718) ([Aleksei Filatov](https://github.com/aalexfvk)).
* Analyzer: fix block stucture mismatch in matview with engine distributed [#55741](https://github.com/ClickHouse/ClickHouse/pull/55741) ([vdimir](https://github.com/vdimir)).
* Use diff object again, since JSON API limits the files [#55750](https://github.com/ClickHouse/ClickHouse/pull/55750) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Big endian platform max intersection fix [#55756](https://github.com/ClickHouse/ClickHouse/pull/55756) ([Suzy Wang](https://github.com/SuzyWangIBMer)).
* Remove temporary debug logging in MultiplexedConnections [#55764](https://github.com/ClickHouse/ClickHouse/pull/55764) ([Michael Kolupaev](https://github.com/al13n321)).
* Check if partition ID is `nullptr` [#55765](https://github.com/ClickHouse/ClickHouse/pull/55765) ([Antonio Andelic](https://github.com/antonio2368)).
* Control Keeper feature flag randomization with env [#55766](https://github.com/ClickHouse/ClickHouse/pull/55766) ([Antonio Andelic](https://github.com/antonio2368)).
* Added test to check CapnProto cache [#55769](https://github.com/ClickHouse/ClickHouse/pull/55769) ([Aleksandr Musorin](https://github.com/AVMusorin)).
* Query Cache: Only cache initial query [#55771](https://github.com/ClickHouse/ClickHouse/pull/55771) ([zhongyuankai](https://github.com/zhongyuankai)).
* Temporarily disable flaky test [#55772](https://github.com/ClickHouse/ClickHouse/pull/55772) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix test test_postgresql_replica_database_engine_2/test.py::test_replica_consumer [#55774](https://github.com/ClickHouse/ClickHouse/pull/55774) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Improve ColumnDecimal, ColumnVector getPermutation performance using pdqsort with RadixSort fix [#55775](https://github.com/ClickHouse/ClickHouse/pull/55775) ([Maksim Kita](https://github.com/kitaisreal)).
* Fix black check [#55779](https://github.com/ClickHouse/ClickHouse/pull/55779) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Correctly grep fuzzer.log [#55780](https://github.com/ClickHouse/ClickHouse/pull/55780) ([Antonio Andelic](https://github.com/antonio2368)).
* Parallel replicas: cleanup, less copying during announcement [#55781](https://github.com/ClickHouse/ClickHouse/pull/55781) ([Igor Nikonov](https://github.com/devcrafter)).
* Enable test_mutation_simple with the analyzer [#55791](https://github.com/ClickHouse/ClickHouse/pull/55791) ([Raúl Marín](https://github.com/Algunenano)).
* Improve enrich image [#55793](https://github.com/ClickHouse/ClickHouse/pull/55793) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Bump gRPC to v1.36.4 [#55811](https://github.com/ClickHouse/ClickHouse/pull/55811) ([Robert Schulze](https://github.com/rschu1ze)).
* Attemp to fix test_dictionaries_redis flakiness [#55813](https://github.com/ClickHouse/ClickHouse/pull/55813) ([Raúl Marín](https://github.com/Algunenano)).
* Add diagnostic checks for issue [#55041](https://github.com/ClickHouse/ClickHouse/issues/55041) [#55835](https://github.com/ClickHouse/ClickHouse/pull/55835) ([Robert Schulze](https://github.com/rschu1ze)).
* Correct aggregate functions ser/deserialization to be endianness-independent. [#55837](https://github.com/ClickHouse/ClickHouse/pull/55837) ([Austin Kothig](https://github.com/kothiga)).
* Bump gRPC to v1.37.1 [#55840](https://github.com/ClickHouse/ClickHouse/pull/55840) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix 00002_log_and_exception_messages_formatting [#55844](https://github.com/ClickHouse/ClickHouse/pull/55844) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix caching objects in pygithub, and changelogs [#55845](https://github.com/ClickHouse/ClickHouse/pull/55845) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update version_date.tsv and changelogs after v23.3.14.78-lts [#55847](https://github.com/ClickHouse/ClickHouse/pull/55847) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v23.9.2.56-stable [#55848](https://github.com/ClickHouse/ClickHouse/pull/55848) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v23.8.4.69-lts [#55849](https://github.com/ClickHouse/ClickHouse/pull/55849) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* fix node setting in the test [#55850](https://github.com/ClickHouse/ClickHouse/pull/55850) ([Sema Checherinda](https://github.com/CheSema)).
* Add load_metadata_threads to describe filesystem cache [#55863](https://github.com/ClickHouse/ClickHouse/pull/55863) ([Jordi Villar](https://github.com/jrdi)).
* One final leftover in diff_urls of PRInfo [#55874](https://github.com/ClickHouse/ClickHouse/pull/55874) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix digest check in replicated ddl worker [#55877](https://github.com/ClickHouse/ClickHouse/pull/55877) ([Sergei Trifonov](https://github.com/serxa)).
* Test parallel replicas with rollup [#55886](https://github.com/ClickHouse/ClickHouse/pull/55886) ([Raúl Marín](https://github.com/Algunenano)).
* Fix some tests with Replicated database [#55889](https://github.com/ClickHouse/ClickHouse/pull/55889) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Update stress.py [#55890](https://github.com/ClickHouse/ClickHouse/pull/55890) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Revert "Revert "Revert "Add settings for real-time updates during query execution""" [#55893](https://github.com/ClickHouse/ClickHouse/pull/55893) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Make test `system_zookeeper_connection` better [#55900](https://github.com/ClickHouse/ClickHouse/pull/55900) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* A test `01019_alter_materialized_view_consistent` is unstable with Analyzer [#55901](https://github.com/ClickHouse/ClickHouse/pull/55901) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove C++ templates, because they are stupid [#55910](https://github.com/ClickHouse/ClickHouse/pull/55910) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Bump gRPC to v1.39.1 [#55914](https://github.com/ClickHouse/ClickHouse/pull/55914) ([Robert Schulze](https://github.com/rschu1ze)).
* Add sanity check to RPNBuilderFunctionTreeNode [#55915](https://github.com/ClickHouse/ClickHouse/pull/55915) ([Robert Schulze](https://github.com/rschu1ze)).
* Bump gRPC to v1.42.0 [#55916](https://github.com/ClickHouse/ClickHouse/pull/55916) ([Robert Schulze](https://github.com/rschu1ze)).
* Set storage.has_lightweight_delete_parts flag when a part has been loaded [#55935](https://github.com/ClickHouse/ClickHouse/pull/55935) ([Alexander Gololobov](https://github.com/davenger)).
* Include information about supported versions in bug report issue template [#55937](https://github.com/ClickHouse/ClickHouse/pull/55937) ([Nikita Taranov](https://github.com/nickitat)).
* arrayFold: Switch accumulator and array arguments [#55948](https://github.com/ClickHouse/ClickHouse/pull/55948) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix overrides via connections_credentials in case of root directives exists [#55949](https://github.com/ClickHouse/ClickHouse/pull/55949) ([Azat Khuzhin](https://github.com/azat)).
* Test for Bug 43644 [#55955](https://github.com/ClickHouse/ClickHouse/pull/55955) ([Robert Schulze](https://github.com/rschu1ze)).
* Bump protobuf to v3.19.6 [#55963](https://github.com/ClickHouse/ClickHouse/pull/55963) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix possible performance test error [#55964](https://github.com/ClickHouse/ClickHouse/pull/55964) ([Azat Khuzhin](https://github.com/azat)).
* Avoid counting lost parts twice [#55987](https://github.com/ClickHouse/ClickHouse/pull/55987) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Convert unnecessary std::scoped_lock usage to std::lock_guard [#56006](https://github.com/ClickHouse/ClickHouse/pull/56006) ([Robert Schulze](https://github.com/rschu1ze)).
* Stress tests: Try to wait until server is responsive after gdb detach [#56009](https://github.com/ClickHouse/ClickHouse/pull/56009) ([Raúl Marín](https://github.com/Algunenano)).
* test_storage_s3_queue - add debug info [#56011](https://github.com/ClickHouse/ClickHouse/pull/56011) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Bump protobuf to v21.9 [#56014](https://github.com/ClickHouse/ClickHouse/pull/56014) ([Robert Schulze](https://github.com/rschu1ze)).
* Correct the implementation of function `jsonMergePatch` [#56020](https://github.com/ClickHouse/ClickHouse/pull/56020) ([Anton Popov](https://github.com/CurtizJ)).
* Fix 02438_sync_replica_lightweight [#56023](https://github.com/ClickHouse/ClickHouse/pull/56023) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix some bad code by making it worse [#56026](https://github.com/ClickHouse/ClickHouse/pull/56026) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix bash completion for mawk (and update format list and add one more delimiter) [#56050](https://github.com/ClickHouse/ClickHouse/pull/56050) ([Azat Khuzhin](https://github.com/azat)).
* Analyzer: Fix crash on window resolve [#56055](https://github.com/ClickHouse/ClickHouse/pull/56055) ([Dmitry Novik](https://github.com/novikd)).
* Fix function_json_value_return_type_allow_nullable setting name in doc [#56056](https://github.com/ClickHouse/ClickHouse/pull/56056) ([vdimir](https://github.com/vdimir)).
* Force shutdown in upgrade test [#56074](https://github.com/ClickHouse/ClickHouse/pull/56074) ([Raúl Marín](https://github.com/Algunenano)).
* Try enable `01154_move_partition_long` with s3 [#56080](https://github.com/ClickHouse/ClickHouse/pull/56080) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix race condition between DROP_RANGE and committing existing block [#56083](https://github.com/ClickHouse/ClickHouse/pull/56083) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix flakiness of 02263_lazy_mark_load [#56087](https://github.com/ClickHouse/ClickHouse/pull/56087) ([Michael Kolupaev](https://github.com/al13n321)).
* Make the code less bloated [#56091](https://github.com/ClickHouse/ClickHouse/pull/56091) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Maybe smaller binary [#56112](https://github.com/ClickHouse/ClickHouse/pull/56112) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove old trash from unit tests [#56113](https://github.com/ClickHouse/ClickHouse/pull/56113) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove some bloat [#56114](https://github.com/ClickHouse/ClickHouse/pull/56114) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix test_format_schema_on_server flakiness [#56116](https://github.com/ClickHouse/ClickHouse/pull/56116) ([Azat Khuzhin](https://github.com/azat)).
* Fix: schedule delayed part checks correctly [#56123](https://github.com/ClickHouse/ClickHouse/pull/56123) ([Igor Nikonov](https://github.com/devcrafter)).
* Beautify `show merges` [#56124](https://github.com/ClickHouse/ClickHouse/pull/56124) ([Denny Crane](https://github.com/den-crane)).
* Better options for disabling frame pointer omitting [#56130](https://github.com/ClickHouse/ClickHouse/pull/56130) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fix: incorrect brace style in clang-format [#56133](https://github.com/ClickHouse/ClickHouse/pull/56133) ([Igor Nikonov](https://github.com/devcrafter)).
* Do not try to activate covered parts when handilng unexpected parts [#56137](https://github.com/ClickHouse/ClickHouse/pull/56137) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Re-fix 'Block structure mismatch' on concurrent ALTER and INSERTs in Buffer table [#56140](https://github.com/ClickHouse/ClickHouse/pull/56140) ([Michael Kolupaev](https://github.com/al13n321)).
* Update version_date.tsv and changelogs after v23.3.15.29-lts [#56145](https://github.com/ClickHouse/ClickHouse/pull/56145) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v23.8.5.16-lts [#56146](https://github.com/ClickHouse/ClickHouse/pull/56146) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v23.9.3.12-stable [#56147](https://github.com/ClickHouse/ClickHouse/pull/56147) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Update version_date.tsv and changelogs after v23.7.6.111-stable [#56148](https://github.com/ClickHouse/ClickHouse/pull/56148) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Fasttest timeout setting [#56160](https://github.com/ClickHouse/ClickHouse/pull/56160) ([Max K.](https://github.com/mkaynov)).
* Use monotonic clock for part check scheduling [#56162](https://github.com/ClickHouse/ClickHouse/pull/56162) ([Igor Nikonov](https://github.com/devcrafter)).
* More metrics for fs cache [#56165](https://github.com/ClickHouse/ClickHouse/pull/56165) ([Kseniia Sumarokova](https://github.com/kssenii)).
* FileCache minor changes [#56168](https://github.com/ClickHouse/ClickHouse/pull/56168) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Update 01414_mutations_and_errors_zookeeper.sh [#56176](https://github.com/ClickHouse/ClickHouse/pull/56176) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Shard fs cache keys [#56194](https://github.com/ClickHouse/ClickHouse/pull/56194) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Do less work when there lots of read requests and watches for same paths [#56197](https://github.com/ClickHouse/ClickHouse/pull/56197) ([Alexander Gololobov](https://github.com/davenger)).
* Remove skip_unused_shards tests from analyzer skiplist [#56200](https://github.com/ClickHouse/ClickHouse/pull/56200) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Easy tests fix for analyzer [#56211](https://github.com/ClickHouse/ClickHouse/pull/56211) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Add a warning if delayed accounting is not enabled (breaks OSIOWaitMicroseconds) [#56227](https://github.com/ClickHouse/ClickHouse/pull/56227) ([Azat Khuzhin](https://github.com/azat)).
* Do not remove part if `Too many open files` is thrown [#56238](https://github.com/ClickHouse/ClickHouse/pull/56238) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix ORC commit [#56261](https://github.com/ClickHouse/ClickHouse/pull/56261) ([Raúl Marín](https://github.com/Algunenano)).
* Fix typo in largestTriangleThreeBuckets.md [#56263](https://github.com/ClickHouse/ClickHouse/pull/56263) ([Nikita Taranov](https://github.com/nickitat)).

View File

@ -67,22 +67,30 @@ Implementations of `ReadBuffer`/`WriteBuffer` are used for working with files an
Read/WriteBuffers only deal with bytes. There are functions from `ReadHelpers` and `WriteHelpers` header files to help with formatting input/output. For example, there are helpers to write a number in decimal format.
Lets look at what happens when you want to write a result set in `JSON` format to stdout. You have a result set ready to be fetched from `IBlockInputStream`. You create `WriteBufferFromFileDescriptor(STDOUT_FILENO)` to write bytes to stdout. You create `JSONRowOutputStream`, initialized with that `WriteBuffer`, to write rows in `JSON` to stdout. You create `BlockOutputStreamFromRowOutputStream` on top of it, to represent it as `IBlockOutputStream`. Then you call `copyData` to transfer data from `IBlockInputStream` to `IBlockOutputStream`, and everything works. Internally, `JSONRowOutputStream` will write various JSON delimiters and call the `IDataType::serializeTextJSON` method with a reference to `IColumn` and the row number as arguments. Consequently, `IDataType::serializeTextJSON` will call a method from `WriteHelpers.h`: for example, `writeText` for numeric types and `writeJSONString` for `DataTypeString`.
Let's examine what happens when you want to write a result set in `JSON` format to stdout.
You have a result set ready to be fetched from a pulling `QueryPipeline`.
First, you create a `WriteBufferFromFileDescriptor(STDOUT_FILENO)` to write bytes to stdout.
Next, you connect the result from the query pipeline to `JSONRowOutputFormat`, which is initialized with that `WriteBuffer`, to write rows in `JSON` format to stdout.
This can be done via the `complete` method, which turns a pulling `QueryPipeline` into a completed `QueryPipeline`.
Internally, `JSONRowOutputFormat` will write various JSON delimiters and call the `IDataType::serializeTextJSON` method with a reference to `IColumn` and the row number as arguments. Consequently, `IDataType::serializeTextJSON` will call a method from `WriteHelpers.h`: for example, `writeText` for numeric types and `writeJSONString` for `DataTypeString`.
## Tables {#tables}
The `IStorage` interface represents tables. Different implementations of that interface are different table engines. Examples are `StorageMergeTree`, `StorageMemory`, and so on. Instances of these classes are just tables.
The key `IStorage` methods are `read` and `write`. There are also `alter`, `rename`, `drop`, and so on. The `read` method accepts the following arguments: the set of columns to read from a table, the `AST` query to consider, and the desired number of streams to return. It returns one or multiple `IBlockInputStream` objects and information about the stage of data processing that was completed inside a table engine during query execution.
The key methods in `IStorage` are `read` and `write`, along with others such as `alter`, `rename`, and `drop`. The `read` method accepts the following arguments: a set of columns to read from a table, the `AST` query to consider, and the desired number of streams. It returns a `Pipe`.
In most cases, the read method is only responsible for reading the specified columns from a table, not for any further data processing. All further data processing is done by the query interpreter and is outside the responsibility of `IStorage`.
In most cases, the read method is responsible only for reading the specified columns from a table, not for any further data processing.
All subsequent data processing is handled by another part of the pipeline, which falls outside the responsibility of `IStorage`.
But there are notable exceptions:
- The AST query is passed to the `read` method, and the table engine can use it to derive index usage and to read fewer data from a table.
- Sometimes the table engine can process data itself to a specific stage. For example, `StorageDistributed` can send a query to remote servers, ask them to process data to a stage where data from different remote servers can be merged, and return that preprocessed data. The query interpreter then finishes processing the data.
The tables `read` method can return multiple `IBlockInputStream` objects to allow parallel data processing. These multiple block input streams can read from a table in parallel. Then you can wrap these streams with various transformations (such as expression evaluation or filtering) that can be calculated independently and create a `UnionBlockInputStream` on top of them, to read from multiple streams in parallel.
The tables `read` method can return a `Pipe` consisting of multiple `Processors`. These `Processors` can read from a table in parallel.
Then, you can connect these processors with various other transformations (such as expression evaluation or filtering), which can be calculated independently.
And then, create a `QueryPipeline` on top of them, and execute it via `PipelineExecutor`.
There are also `TableFunction`s. These are functions that return a temporary `IStorage` object to use in the `FROM` clause of a query.
@ -98,9 +106,19 @@ A hand-written recursive descent parser parses a query. For example, `ParserSele
## Interpreters {#interpreters}
Interpreters are responsible for creating the query execution pipeline from an `AST`. There are simple interpreters, such as `InterpreterExistsQuery` and `InterpreterDropQuery`, or the more sophisticated `InterpreterSelectQuery`. The query execution pipeline is a combination of block input or output streams. For example, the result of interpreting the `SELECT` query is the `IBlockInputStream` to read the result set from; the result of the `INSERT` query is the `IBlockOutputStream` to write data for insertion to, and the result of interpreting the `INSERT SELECT` query is the `IBlockInputStream` that returns an empty result set on the first read, but that copies data from `SELECT` to `INSERT` at the same time.
Interpreters are responsible for creating the query execution pipeline from an AST. There are simple interpreters, such as `InterpreterExistsQuery` and `InterpreterDropQuery`, as well as the more sophisticated `InterpreterSelectQuery`.
`InterpreterSelectQuery` uses `ExpressionAnalyzer` and `ExpressionActions` machinery for query analysis and transformations. This is where most rule-based query optimizations are done. `ExpressionAnalyzer` is quite messy and should be rewritten: various query transformations and optimizations should be extracted to separate classes to allow modular transformations of query.
The query execution pipeline is a combination of processors that can consume and produce chunks (sets of columns with specific types).
A processor communicates via ports and can have multiple input ports and multiple output ports.
A more detailed description can be found in [src/Processors/IProcessor.h](https://github.com/ClickHouse/ClickHouse/blob/master/src/Processors/IProcessor.h).
For example, the result of interpreting the `SELECT` query is a "pulling" `QueryPipeline` which has a special output port to read the result set from.
The result of the `INSERT` query is a "pushing" `QueryPipeline` with an input port to write data for insertion.
And the result of interpreting the `INSERT SELECT` query is a "completed" `QueryPipeline` that has no inputs or outputs but copies data from `SELECT` to `INSERT` simultaneously.
`InterpreterSelectQuery` uses `ExpressionAnalyzer` and `ExpressionActions` machinery for query analysis and transformations. This is where most rule-based query optimizations are performed. `ExpressionAnalyzer` is quite messy and should be rewritten: various query transformations and optimizations should be extracted into separate classes to allow for modular transformations of the query.
To address current problems that exist in interpreters, a new `InterpreterSelectQueryAnalyzer` is being developed. It is a new version of `InterpreterSelectQuery` that does not use `ExpressionAnalyzer` and introduces an additional abstraction level between `AST` and `QueryPipeline` called `QueryTree`. It is not production-ready yet, but it can be tested with the `allow_experimental_analyzer` flag.
## Functions {#functions}

View File

@ -11,6 +11,8 @@ This is intended for continuous integration checks that run on Linux servers. If
The cross-build for macOS is based on the [Build instructions](../development/build.md), follow them first.
The following sections provide a walk-through for building ClickHouse for `x86_64` macOS. If youre targeting ARM architecture, simply substitute all occurrences of `x86_64` with `aarch64`. For example, replace `x86_64-apple-darwin` with `aarch64-apple-darwin` throughout the steps.
## Install Clang-17
Follow the instructions from https://apt.llvm.org/ for your Ubuntu or Debian setup.
@ -30,13 +32,13 @@ export CCTOOLS=$(cd ~/cctools && pwd)
mkdir ${CCTOOLS}
cd ${CCTOOLS}
git clone https://github.com/tpoechtrager/apple-libtapi.git
git clone --depth=1 https://github.com/tpoechtrager/apple-libtapi.git
cd apple-libtapi
INSTALLPREFIX=${CCTOOLS} ./build.sh
./install.sh
cd ..
git clone https://github.com/tpoechtrager/cctools-port.git
git clone --depth=1 https://github.com/tpoechtrager/cctools-port.git
cd cctools-port/cctools
./configure --prefix=$(readlink -f ${CCTOOLS}) --with-libtapi=$(readlink -f ${CCTOOLS}) --target=x86_64-apple-darwin
make install
@ -46,7 +48,7 @@ Also, we need to download macOS X SDK into the working tree.
``` bash
cd ClickHouse/cmake/toolchain/darwin-x86_64
curl -L 'https://github.com/phracker/MacOSX-SDKs/releases/download/10.15/MacOSX10.15.sdk.tar.xz' | tar xJ --strip-components=1
curl -L 'https://github.com/phracker/MacOSX-SDKs/releases/download/11.3/MacOSX11.0.sdk.tar.xz' | tar xJ --strip-components=1
```
## Build ClickHouse {#build-clickhouse}

View File

@ -23,7 +23,7 @@ sudo bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"
``` bash
cd ClickHouse
mkdir build-riscv64
CC=clang-16 CXX=clang++-16 cmake . -Bbuild-riscv64 -G Ninja -DCMAKE_TOOLCHAIN_FILE=cmake/linux/toolchain-riscv64.cmake -DGLIBC_COMPATIBILITY=OFF -DENABLE_LDAP=OFF -DOPENSSL_NO_ASM=ON -DENABLE_JEMALLOC=ON -DENABLE_PARQUET=OFF -DENABLE_GRPC=OFF -DENABLE_HDFS=OFF -DENABLE_MYSQL=OFF
CC=clang-17 CXX=clang++-17 cmake . -Bbuild-riscv64 -G Ninja -DCMAKE_TOOLCHAIN_FILE=cmake/linux/toolchain-riscv64.cmake -DGLIBC_COMPATIBILITY=OFF -DENABLE_LDAP=OFF -DOPENSSL_NO_ASM=ON -DENABLE_JEMALLOC=ON -DENABLE_PARQUET=OFF -DENABLE_GRPC=OFF -DENABLE_HDFS=OFF -DENABLE_MYSQL=OFF
ninja -C build-riscv64
```

View File

@ -345,7 +345,7 @@ struct ExtractDomain
**7.** For abstract classes (interfaces) you can add the `I` prefix.
``` cpp
class IBlockInputStream
class IProcessor
```
**8.** If you use a variable locally, you can use the short name.

View File

@ -28,7 +28,6 @@ SETTINGS
kafka_topic_list = 'topic1,topic2,...',
kafka_group_name = 'group_name',
kafka_format = 'data_format'[,]
[kafka_row_delimiter = 'delimiter_symbol',]
[kafka_schema = '',]
[kafka_num_consumers = N,]
[kafka_max_block_size = 0,]
@ -53,7 +52,6 @@ Required parameters:
Optional parameters:
- `kafka_row_delimiter` — Delimiter character, which ends the message. **This setting is deprecated and is no longer used, not left for compatibility reasons.**
- `kafka_schema` — Parameter that must be used if the format requires a schema definition. For example, [Capn Proto](https://capnproto.org/) requires the path to the schema file and the name of the root `schema.capnp:Message` object.
- `kafka_num_consumers` — The number of consumers per table. Specify more consumers if the throughput of one consumer is insufficient. The total number of consumers should not exceed the number of partitions in the topic, since only one consumer can be assigned per partition, and must not be greater than the number of physical cores on the server where ClickHouse is deployed. Default: `1`.
- `kafka_max_block_size` — The maximum batch size (in messages) for poll. Default: [max_insert_block_size](../../../operations/settings/settings.md#setting-max_insert_block_size).
@ -64,7 +62,7 @@ Optional parameters:
- `kafka_poll_max_batch_size` — Maximum amount of messages to be polled in a single Kafka poll. Default: [max_block_size](../../../operations/settings/settings.md#setting-max_block_size).
- `kafka_flush_interval_ms` — Timeout for flushing data from Kafka. Default: [stream_flush_interval_ms](../../../operations/settings/settings.md#stream-flush-interval-ms).
- `kafka_thread_per_consumer` — Provide independent thread for each consumer. When enabled, every consumer flush the data independently, in parallel (otherwise — rows from several consumers squashed to form one block). Default: `0`.
- `kafka_handle_error_mode` — How to handle errors for Kafka engine. Possible values: default, stream.
- `kafka_handle_error_mode` — How to handle errors for Kafka engine. Possible values: default (the exception will be thrown if we fail to parse a message), stream (the exception message and raw message will be saved in virtual columns `_error` and `_raw_message`).
- `kafka_commit_on_select` — Commit messages when select query is made. Default: `false`.
- `kafka_max_rows_per_message` — The maximum number of rows written in one kafka message for row-based formats. Default : `1`.
@ -249,6 +247,13 @@ Example:
- `_headers.name` — Array of message's headers keys.
- `_headers.value` — Array of message's headers values.
Additional virtual columns when `kafka_handle_error_mode='stream'`:
- `_raw_message` - Raw message that couldn't be parsed successfully.
- `_error` - Exception message happened during failed parsing.
Note: `_raw_message` and `_error` virtual columns are filled only in case of exception during parsing, they are always empty when message was parsed successfully.
## Data formats support {#data-formats-support}
Kafka engine supports all [formats](../../../interfaces/formats.md) supported in ClickHouse.

View File

@ -25,7 +25,6 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
nats_url = 'host:port',
nats_subjects = 'subject1,subject2,...',
nats_format = 'data_format'[,]
[nats_row_delimiter = 'delimiter_symbol',]
[nats_schema = '',]
[nats_num_consumers = N,]
[nats_queue_group = 'group_name',]
@ -40,7 +39,8 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
[nats_password = 'password',]
[nats_token = 'clickhouse',]
[nats_startup_connect_tries = '5']
[nats_max_rows_per_message = 1]
[nats_max_rows_per_message = 1,]
[nats_handle_error_mode = 'default']
```
Required parameters:
@ -51,7 +51,6 @@ Required parameters:
Optional parameters:
- `nats_row_delimiter` Delimiter character, which ends the message. **This setting is deprecated and is no longer used, not left for compatibility reasons.**
- `nats_schema` Parameter that must be used if the format requires a schema definition. For example, [Capn Proto](https://capnproto.org/) requires the path to the schema file and the name of the root `schema.capnp:Message` object.
- `nats_num_consumers` The number of consumers per table. Default: `1`. Specify more consumers if the throughput of one consumer is insufficient.
- `nats_queue_group` Name for queue group of NATS subscribers. Default is the table name.
@ -66,6 +65,7 @@ Optional parameters:
- `nats_token` - NATS auth token.
- `nats_startup_connect_tries` - Number of connect tries at startup. Default: `5`.
- `nats_max_rows_per_message` — The maximum number of rows written in one NATS message for row-based formats. (default : `1`).
- `nats_handle_error_mode` — How to handle errors for RabbitMQ engine. Possible values: default (the exception will be thrown if we fail to parse a message), stream (the exception message and raw message will be saved in virtual columns `_error` and `_raw_message`).
SSL connection:
@ -165,6 +165,14 @@ If you want to change the target table by using `ALTER`, we recommend disabling
- `_subject` - NATS message subject.
Additional virtual columns when `kafka_handle_error_mode='stream'`:
- `_raw_message` - Raw message that couldn't be parsed successfully.
- `_error` - Exception message happened during failed parsing.
Note: `_raw_message` and `_error` virtual columns are filled only in case of exception during parsing, they are always empty when message was parsed successfully.
## Data formats support {#data-formats-support}
NATS engine supports all [formats](../../../interfaces/formats.md) supported in ClickHouse.

View File

@ -28,7 +28,6 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
[rabbitmq_exchange_type = 'exchange_type',]
[rabbitmq_routing_key_list = 'key1,key2,...',]
[rabbitmq_secure = 0,]
[rabbitmq_row_delimiter = 'delimiter_symbol',]
[rabbitmq_schema = '',]
[rabbitmq_num_consumers = N,]
[rabbitmq_num_queues = N,]
@ -45,7 +44,8 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
[rabbitmq_username = '',]
[rabbitmq_password = '',]
[rabbitmq_commit_on_select = false,]
[rabbitmq_max_rows_per_message = 1]
[rabbitmq_max_rows_per_message = 1,]
[rabbitmq_handle_error_mode = 'default']
```
Required parameters:
@ -58,7 +58,6 @@ Optional parameters:
- `rabbitmq_exchange_type` The type of RabbitMQ exchange: `direct`, `fanout`, `topic`, `headers`, `consistent_hash`. Default: `fanout`.
- `rabbitmq_routing_key_list` A comma-separated list of routing keys.
- `rabbitmq_row_delimiter` Delimiter character, which ends the message. **This setting is deprecated and is no longer used, not left for compatibility reasons.**
- `rabbitmq_schema` Parameter that must be used if the format requires a schema definition. For example, [Capn Proto](https://capnproto.org/) requires the path to the schema file and the name of the root `schema.capnp:Message` object.
- `rabbitmq_num_consumers` The number of consumers per table. Specify more consumers if the throughput of one consumer is insufficient. Default: `1`
- `rabbitmq_num_queues` Total number of queues. Increasing this number can significantly improve performance. Default: `1`.
@ -78,6 +77,7 @@ Optional parameters:
- `rabbitmq_max_rows_per_message` — The maximum number of rows written in one RabbitMQ message for row-based formats. Default : `1`.
- `rabbitmq_empty_queue_backoff_start` — A start backoff point to reschedule read if the rabbitmq queue is empty.
- `rabbitmq_empty_queue_backoff_end` — An end backoff point to reschedule read if the rabbitmq queue is empty.
- `rabbitmq_handle_error_mode` — How to handle errors for RabbitMQ engine. Possible values: default (the exception will be thrown if we fail to parse a message), stream (the exception message and raw message will be saved in virtual columns `_error` and `_raw_message`).
@ -191,6 +191,13 @@ Example:
- `_message_id` - messageID of the received message; non-empty if was set, when message was published.
- `_timestamp` - timestamp of the received message; non-empty if was set, when message was published.
Additional virtual columns when `kafka_handle_error_mode='stream'`:
- `_raw_message` - Raw message that couldn't be parsed successfully.
- `_error` - Exception message happened during failed parsing.
Note: `_raw_message` and `_error` virtual columns are filled only in case of exception during parsing, they are always empty when message was parsed successfully.
## Data formats support {#data-formats-support}
RabbitMQ engine supports all [formats](../../../interfaces/formats.md) supported in ClickHouse.

View File

@ -48,61 +48,61 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] AS [db2.]name2
#### policy_name
`policy_name` - (optionally) policy name, it will be used to store temporary files for async send
`policy_name` - (optionally) policy name, it will be used to store temporary files for background send
**See Also**
- [insert_distributed_sync](../../../operations/settings/settings.md#insert_distributed_sync) setting
- [distributed_foreground_insert](../../../operations/settings/settings.md#distributed_foreground_insert) setting
- [MergeTree](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes) for the examples
### Distributed Settings
#### fsync_after_insert
`fsync_after_insert` - do the `fsync` for the file data after asynchronous insert to Distributed. Guarantees that the OS flushed the whole inserted data to a file **on the initiator node** disk.
`fsync_after_insert` - do the `fsync` for the file data after background insert to Distributed. Guarantees that the OS flushed the whole inserted data to a file **on the initiator node** disk.
#### fsync_directories
`fsync_directories` - do the `fsync` for directories. Guarantees that the OS refreshed directory metadata after operations related to asynchronous inserts on Distributed table (after insert, after sending the data to shard, etc).
`fsync_directories` - do the `fsync` for directories. Guarantees that the OS refreshed directory metadata after operations related to background inserts on Distributed table (after insert, after sending the data to shard, etc).
#### bytes_to_throw_insert
`bytes_to_throw_insert` - if more than this number of compressed bytes will be pending for async INSERT, an exception will be thrown. 0 - do not throw. Default 0.
`bytes_to_throw_insert` - if more than this number of compressed bytes will be pending for background INSERT, an exception will be thrown. 0 - do not throw. Default 0.
#### bytes_to_delay_insert
`bytes_to_delay_insert` - if more than this number of compressed bytes will be pending for async INSERT, the query will be delayed. 0 - do not delay. Default 0.
`bytes_to_delay_insert` - if more than this number of compressed bytes will be pending for background INSERT, the query will be delayed. 0 - do not delay. Default 0.
#### max_delay_to_insert
`max_delay_to_insert` - max delay of inserting data into Distributed table in seconds, if there are a lot of pending bytes for async send. Default 60.
`max_delay_to_insert` - max delay of inserting data into Distributed table in seconds, if there are a lot of pending bytes for background send. Default 60.
#### monitor_batch_inserts
#### background_insert_batch
`monitor_batch_inserts` - same as [distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts)
`background_insert_batch` - same as [distributed_background_insert_batch](../../../operations/settings/settings.md#distributed_background_insert_batch)
#### monitor_split_batch_on_failure
#### background_insert_split_batch_on_failure
`monitor_split_batch_on_failure` - same as [distributed_directory_monitor_split_batch_on_failure](../../../operations/settings/settings.md#distributed_directory_monitor_split_batch_on_failure)
`background_insert_split_batch_on_failure` - same as [distributed_background_insert_split_batch_on_failure](../../../operations/settings/settings.md#distributed_background_insert_split_batch_on_failure)
#### monitor_sleep_time_ms
#### background_insert_sleep_time_ms
`monitor_sleep_time_ms` - same as [distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms)
`background_insert_sleep_time_ms` - same as [distributed_background_insert_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_sleep_time_ms)
#### monitor_max_sleep_time_ms
#### background_insert_max_sleep_time_ms
`monitor_max_sleep_time_ms` - same as [distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms)
`background_insert_max_sleep_time_ms` - same as [distributed_background_insert_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_max_sleep_time_ms)
:::note
**Durability settings** (`fsync_...`):
- Affect only asynchronous INSERTs (i.e. `insert_distributed_sync=false`) when data first stored on the initiator node disk and later asynchronously send to shards.
- Affect only background INSERTs (i.e. `distributed_foreground_insert=false`) when data first stored on the initiator node disk and later, in background, send to shards.
- May significantly decrease the inserts' performance
- Affect writing the data stored inside Distributed table folder into the **node which accepted your insert**. If you need to have guarantees of writing data to underlying MergeTree tables - see durability settings (`...fsync...`) in `system.merge_tree_settings`
For **Insert limit settings** (`..._insert`) see also:
- [insert_distributed_sync](../../../operations/settings/settings.md#insert_distributed_sync) setting
- [distributed_foreground_insert](../../../operations/settings/settings.md#distributed_foreground_insert) setting
- [prefer_localhost_replica](../../../operations/settings/settings.md#settings-prefer-localhost-replica) setting
- `bytes_to_throw_insert` handled before `bytes_to_delay_insert`, so you should not set it to the value less then `bytes_to_delay_insert`
:::
@ -232,7 +232,7 @@ You should be concerned about the sharding scheme in the following cases:
- Queries are used that require joining data (`IN` or `JOIN`) by a specific key. If data is sharded by this key, you can use local `IN` or `JOIN` instead of `GLOBAL IN` or `GLOBAL JOIN`, which is much more efficient.
- A large number of servers is used (hundreds or more) with a large number of small queries, for example, queries for data of individual clients (e.g. websites, advertisers, or partners). In order for the small queries to not affect the entire cluster, it makes sense to locate data for a single client on a single shard. Alternatively, you can set up bi-level sharding: divide the entire cluster into “layers”, where a layer may consist of multiple shards. Data for a single client is located on a single layer, but shards can be added to a layer as necessary, and data is randomly distributed within them. `Distributed` tables are created for each layer, and a single shared distributed table is created for global queries.
Data is written asynchronously. When inserted in the table, the data block is just written to the local file system. The data is sent to the remote servers in the background as soon as possible. The periodicity for sending data is managed by the [distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms) and [distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms) settings. The `Distributed` engine sends each file with inserted data separately, but you can enable batch sending of files with the [distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts) setting. This setting improves cluster performance by better utilizing local server and network resources. You should check whether data is sent successfully by checking the list of files (data waiting to be sent) in the table directory: `/var/lib/clickhouse/data/database/table/`. The number of threads performing background tasks can be set by [background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size) setting.
Data is written in background. When inserted in the table, the data block is just written to the local file system. The data is sent to the remote servers in the background as soon as possible. The periodicity for sending data is managed by the [distributed_background_insert_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_sleep_time_ms) and [distributed_background_insert_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_max_sleep_time_ms) settings. The `Distributed` engine sends each file with inserted data separately, but you can enable batch sending of files with the [distributed_background_insert_batch](../../../operations/settings/settings.md#distributed_background_insert_batch) setting. This setting improves cluster performance by better utilizing local server and network resources. You should check whether data is sent successfully by checking the list of files (data waiting to be sent) in the table directory: `/var/lib/clickhouse/data/database/table/`. The number of threads performing background tasks can be set by [background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size) setting.
If the server ceased to exist or had a rough restart (for example, due to a hardware failure) after an `INSERT` to a `Distributed` table, the inserted data might be lost. If a damaged data part is detected in the table directory, it is transferred to the `broken` subdirectory and no longer used.

View File

@ -0,0 +1,105 @@
---
slug: /en/engines/table-engines/special/filelog
sidebar_position: 160
sidebar_label: FileLog
---
# FileLog Engine {#filelog-engine}
This engine allows to process application log files as a stream of records.
`FileLog` lets you:
- Subscribe to log files.
- Process new records as they are appended to subscribed log files.
## Creating a Table {#creating-a-table}
``` sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE = FileLog('path_to_logs', 'format_name') SETTINGS
[poll_timeout_ms = 0,]
[poll_max_batch_size = 0,]
[max_block_size = 0,]
[max_threads = 0,]
[poll_directory_watch_events_backoff_init = 500,]
[poll_directory_watch_events_backoff_max = 32000,]
[poll_directory_watch_events_backoff_factor = 2,]
[handle_error_mode = 'default']
```
Engine arguments:
- `path_to_logs` Path to log files to subscribe. It can be path to a directory with log files or to a single log file. Note that ClickHouse allows only paths inside `user_files` directory.
- `format_name` - Record format. Note that FileLog process each line in a file as a separate record and not all data formats are suitable for it.
Optional parameters:
- `poll_timeout_ms` - Timeout for single poll from log file. Default: [stream_poll_timeout_ms](../../../operations/settings/settings.md#stream_poll_timeout_ms).
- `poll_max_batch_size` — Maximum amount of records to be polled in a single poll. Default: [max_block_size](../../../operations/settings/settings.md#setting-max_block_size).
- `max_block_size` — The maximum batch size (in records) for poll. Default: [max_insert_block_size](../../../operations/settings/settings.md#setting-max_insert_block_size).
- `max_threads` - Number of max threads to parse files, default is 0, which means the number will be max(1, physical_cpu_cores / 4).
- `poll_directory_watch_events_backoff_init` - The initial sleep value for watch directory thread. Default: `500`.
- `poll_directory_watch_events_backoff_max` - The max sleep value for watch directory thread. Default: `32000`.
- `poll_directory_watch_events_backoff_factor` - The speed of backoff, exponential by default. Default: `2`.
- `handle_error_mode` — How to handle errors for FileLog engine. Possible values: default (the exception will be thrown if we fail to parse a message), stream (the exception message and raw message will be saved in virtual columns `_error` and `_raw_message`).
## Description {#description}
The delivered records are tracked automatically, so each record in a log file is only counted once.
`SELECT` is not particularly useful for reading records (except for debugging), because each record can be read only once. It is more practical to create real-time threads using [materialized views](../../../sql-reference/statements/create/view.md). To do this:
1. Use the engine to create a FileLog table and consider it a data stream.
2. Create a table with the desired structure.
3. Create a materialized view that converts data from the engine and puts it into a previously created table.
When the `MATERIALIZED VIEW` joins the engine, it starts collecting data in the background. This allows you to continually receive records from log files and convert them to the required format using `SELECT`.
One FileLog table can have as many materialized views as you like, they do not read data from the table directly, but receive new records (in blocks), this way you can write to several tables with different detail level (with grouping - aggregation and without).
Example:
``` sql
CREATE TABLE logs (
timestamp UInt64,
level String,
message String
) ENGINE = FileLog('user_files/my_app/app.log', 'JSONEachRow');
CREATE TABLE daily (
day Date,
level String,
total UInt64
) ENGINE = SummingMergeTree(day, (day, level), 8192);
CREATE MATERIALIZED VIEW consumer TO daily
AS SELECT toDate(toDateTime(timestamp)) AS day, level, count() as total
FROM queue GROUP BY day, level;
SELECT level, sum(total) FROM daily GROUP BY level;
```
To stop receiving streams data or to change the conversion logic, detach the materialized view:
``` sql
DETACH TABLE consumer;
ATTACH TABLE consumer;
```
If you want to change the target table by using `ALTER`, we recommend disabling the material view to avoid discrepancies between the target table and the data from the view.
## Virtual Columns {#virtual-columns}
- `_filename` - Name of the log file.
- `_offset` - Offset in the log file.
Additional virtual columns when `kafka_handle_error_mode='stream'`:
- `_raw_record` - Raw record that couldn't be parsed successfully.
- `_error` - Exception message happened during failed parsing.
Note: `_raw_record` and `_error` virtual columns are filled only in case of exception during parsing, they are always empty when message was parsed successfully.

View File

@ -74,6 +74,7 @@ The supported formats are:
| [ArrowStream](#data-format-arrow-stream) | ✔ | ✔ |
| [ORC](#data-format-orc) | ✔ | ✔ |
| [One](#data-format-one) | ✔ | ✗ |
| [Npy](#data-format-npy) | ✔ | ✗ |
| [RowBinary](#rowbinary) | ✔ | ✔ |
| [RowBinaryWithNames](#rowbinarywithnamesandtypes) | ✔ | ✔ |
| [RowBinaryWithNamesAndTypes](#rowbinarywithnamesandtypes) | ✔ | ✔ |
@ -2454,6 +2455,50 @@ Result:
└──────────────┘
```
## Npy {#data-format-npy}
This function is designed to load a NumPy array from a .npy file into ClickHouse. The NumPy file format is a binary format used for efficiently storing arrays of numerical data. During import, ClickHouse treats top level dimension as an array of rows with single column. Supported Npy data types and their corresponding type in ClickHouse:
| Npy type | ClickHouse type |
|:--------:|:---------------:|
| b1 | UInt8 |
| i1 | Int8 |
| i2 | Int16 |
| i4 | Int32 |
| i8 | Int64 |
| u1 | UInt8 |
| u2 | UInt16 |
| u4 | UInt32 |
| u8 | UInt64 |
| f4 | Float32 |
| f8 | Float64 |
| S | String |
| U | String |
**Example of saving an array in .npy format using Python**
```Python
import numpy as np
arr = np.array([[[1],[2],[3]],[[4],[5],[6]]])
np.save('example_array.npy', arr)
```
**Example of reading a NumPy file in ClickHouse**
Query:
```sql
SELECT *
FROM file('example_array.npy', Npy)
```
Result:
```
┌─array─────────┐
│ [[1],[2],[3]] │
│ [[4],[5],[6]] │
└───────────────┘
```
## LineAsString {#lineasstring}
In this format, every line of input data is interpreted as a single string value. This format can only be parsed for table with a single field of type [String](/docs/en/sql-reference/data-types/string.md). The remaining columns must be set to [DEFAULT](/docs/en/sql-reference/statements/create/table.md/#default) or [MATERIALIZED](/docs/en/sql-reference/statements/create/table.md/#materialized), or omitted.

View File

@ -2762,3 +2762,7 @@ Proxy settings are determined in the following order:
ClickHouse will check the highest priority resolver type for the request protocol. If it is not defined,
it will check the next highest priority resolver type, until it reaches the environment resolver.
This also allows a mix of resolver types can be used.
### disable_tunneling_for_https_requests_over_http_proxy {#disable_tunneling_for_https_requests_over_http_proxy}
By default, tunneling (i.e, `HTTP CONNECT`) is used to make `HTTPS` requests over `HTTP` proxy. This setting can be used to disable it.

View File

@ -106,6 +106,15 @@ Possible values:
Default value: 0.
## max_bytes_before_external_sort {#settings-max_bytes_before_external_sort}
Enables or disables execution of `ORDER BY` clauses in external memory. See [ORDER BY Implementation Details](../../sql-reference/statements/select/order-by.md#implementation-details)
- Maximum volume of RAM (in bytes) that can be used by the single [ORDER BY](../../sql-reference/statements/select/order-by.md) operation. Recommended value is half of available system memory
- 0 — `ORDER BY` in external memory disabled.
Default value: 0.
## max_rows_to_sort {#max-rows-to-sort}
A maximum number of rows before sorting. This allows you to limit memory consumption when sorting.
@ -163,7 +172,27 @@ If you set `timeout_before_checking_execution_speed `to 0, ClickHouse will use c
## timeout_overflow_mode {#timeout-overflow-mode}
What to do if the query is run longer than max_execution_time: throw or break. By default, throw.
What to do if the query is run longer than `max_execution_time`: `throw` or `break`. By default, `throw`.
# max_execution_time_leaf
Similar semantic to `max_execution_time` but only apply on leaf node for distributed or remote queries.
For example, if we want to limit execution time on leaf node to `10s` but no limit on the initial node, instead of having `max_execution_time` in the nested subquery settings:
``` sql
SELECT count() FROM cluster(cluster, view(SELECT * FROM t SETTINGS max_execution_time = 10));
```
We can use `max_execution_time_leaf` as the query settings:
``` sql
SELECT count() FROM cluster(cluster, view(SELECT * FROM t)) SETTINGS max_execution_time_leaf = 10;
```
# timeout_overflow_mode_leaf
What to do when the query in leaf node run longer than `max_execution_time_leaf`: `throw` or `break`. By default, `throw`.
## min_execution_speed {#min-execution-speed}

View File

@ -897,6 +897,12 @@ Use DOS/Windows-style line separator (CRLF) in CSV instead of Unix style (LF).
Disabled by default.
### input_format_csv_allow_cr_end_of_line {#input_format_csv_allow_cr_end_of_line}
If it is set true, CR(\\r) will be allowed at end of line not followed by LF(\\n)
Disabled by default.
### input_format_csv_enum_as_number {#input_format_csv_enum_as_number}
When enabled, always treat enum values as enum ids for CSV input format. It's recommended to enable this setting if data contains only enum ids to optimize enum parsing.

View File

@ -2473,7 +2473,7 @@ See also:
- [distributed_replica_error_cap](#distributed_replica_error_cap)
- [distributed_replica_error_half_life](#settings-distributed_replica_error_half_life)
## distributed_directory_monitor_sleep_time_ms {#distributed_directory_monitor_sleep_time_ms}
## distributed_background_insert_sleep_time_ms {#distributed_background_insert_sleep_time_ms}
Base interval for the [Distributed](../../engines/table-engines/special/distributed.md) table engine to send data. The actual interval grows exponentially in the event of errors.
@ -2483,9 +2483,9 @@ Possible values:
Default value: 100 milliseconds.
## distributed_directory_monitor_max_sleep_time_ms {#distributed_directory_monitor_max_sleep_time_ms}
## distributed_background_insert_max_sleep_time_ms {#distributed_background_insert_max_sleep_time_ms}
Maximum interval for the [Distributed](../../engines/table-engines/special/distributed.md) table engine to send data. Limits exponential growth of the interval set in the [distributed_directory_monitor_sleep_time_ms](#distributed_directory_monitor_sleep_time_ms) setting.
Maximum interval for the [Distributed](../../engines/table-engines/special/distributed.md) table engine to send data. Limits exponential growth of the interval set in the [distributed_background_insert_sleep_time_ms](#distributed_background_insert_sleep_time_ms) setting.
Possible values:
@ -2493,7 +2493,7 @@ Possible values:
Default value: 30000 milliseconds (30 seconds).
## distributed_directory_monitor_batch_inserts {#distributed_directory_monitor_batch_inserts}
## distributed_background_insert_batch {#distributed_background_insert_batch}
Enables/disables inserted data sending in batches.
@ -2506,13 +2506,13 @@ Possible values:
Default value: 0.
## distributed_directory_monitor_split_batch_on_failure {#distributed_directory_monitor_split_batch_on_failure}
## distributed_background_insert_split_batch_on_failure {#distributed_background_insert_split_batch_on_failure}
Enables/disables splitting batches on failures.
Sometimes sending particular batch to the remote shard may fail, because of some complex pipeline after (i.e. `MATERIALIZED VIEW` with `GROUP BY`) due to `Memory limit exceeded` or similar errors. In this case, retrying will not help (and this will stuck distributed sends for the table) but sending files from that batch one by one may succeed INSERT.
So installing this setting to `1` will disable batching for such batches (i.e. temporary disables `distributed_directory_monitor_batch_inserts` for failed batches).
So installing this setting to `1` will disable batching for such batches (i.e. temporary disables `distributed_background_insert_batch` for failed batches).
Possible values:
@ -2695,15 +2695,15 @@ Possible values:
Default value: 0.
## insert_distributed_sync {#insert_distributed_sync}
## distributed_foreground_insert {#distributed_foreground_insert}
Enables or disables synchronous data insertion into a [Distributed](../../engines/table-engines/special/distributed.md/#distributed) table.
By default, when inserting data into a `Distributed` table, the ClickHouse server sends data to cluster nodes in asynchronous mode. When `insert_distributed_sync=1`, the data is processed synchronously, and the `INSERT` operation succeeds only after all the data is saved on all shards (at least one replica for each shard if `internal_replication` is true).
By default, when inserting data into a `Distributed` table, the ClickHouse server sends data to cluster nodes in background mode. When `distributed_foreground_insert=1`, the data is processed synchronously, and the `INSERT` operation succeeds only after all the data is saved on all shards (at least one replica for each shard if `internal_replication` is true).
Possible values:
- 0 — Data is inserted in asynchronous mode.
- 0 — Data is inserted in background mode.
- 1 — Data is inserted in synchronous mode.
Default value: `0`.
@ -2762,7 +2762,7 @@ Result:
## use_compact_format_in_distributed_parts_names {#use_compact_format_in_distributed_parts_names}
Uses compact format for storing blocks for async (`insert_distributed_sync`) INSERT into tables with `Distributed` engine.
Uses compact format for storing blocks for background (`distributed_foreground_insert`) INSERT into tables with `Distributed` engine.
Possible values:
@ -2772,7 +2772,7 @@ Possible values:
Default value: `1`.
:::note
- with `use_compact_format_in_distributed_parts_names=0` changes from cluster definition will not be applied for async INSERT.
- with `use_compact_format_in_distributed_parts_names=0` changes from cluster definition will not be applied for background INSERT.
- with `use_compact_format_in_distributed_parts_names=1` changing the order of the nodes in the cluster definition, will change the `shard_index`/`replica_index` so be aware.
:::
@ -3310,22 +3310,11 @@ Possible values:
Default value: `0`.
## use_mysql_types_in_show_columns {#use_mysql_types_in_show_columns}
Show the names of MySQL data types corresponding to ClickHouse data types in [SHOW COLUMNS](../../sql-reference/statements/show.md#show_columns).
Possible values:
- 0 - Show names of native ClickHouse data types.
- 1 - Show names of MySQL data types corresponding to ClickHouse data types.
Default value: `0`.
## mysql_map_string_to_text_in_show_columns {#mysql_map_string_to_text_in_show_columns}
When enabled, [String](../../sql-reference/data-types/string.md) ClickHouse data type will be displayed as `TEXT` in [SHOW COLUMNS](../../sql-reference/statements/show.md#show_columns).
Has effect only when [use_mysql_types_in_show_columns](#use_mysql_types_in_show_columns) is enabled.
Has an effect only when the connection is made through the MySQL wire protocol.
- 0 - Use `BLOB`.
- 1 - Use `TEXT`.
@ -3336,7 +3325,7 @@ Default value: `0`.
When enabled, [FixedString](../../sql-reference/data-types/fixedstring.md) ClickHouse data type will be displayed as `TEXT` in [SHOW COLUMNS](../../sql-reference/statements/show.md#show_columns).
Has effect only when [use_mysql_types_in_show_columns](#use_mysql_types_in_show_columns) is enabled.
Has an effect only when the connection is made through the MySQL wire protocol.
- 0 - Use `BLOB`.
- 1 - Use `TEXT`.
@ -3944,6 +3933,16 @@ Possible values:
Default value: `0`.
## force_optimize_projection_name {#force-optimize-projection_name}
If it is set to a non-empty string, check that this projection is used in the query at least once.
Possible values:
- string: name of projection that used in a query
Default value: `''`.
## alter_sync {#alter-sync}
Allows to set up waiting for actions to be executed on replicas by [ALTER](../../sql-reference/statements/alter/index.md), [OPTIMIZE](../../sql-reference/statements/optimize.md) or [TRUNCATE](../../sql-reference/statements/truncate.md) queries.
@ -3956,6 +3955,10 @@ Possible values:
Default value: `1`.
:::note
`alter_sync` is applicable to `Replicated` tables only, it does nothing to alters of not `Replicated` tables.
:::
## replication_wait_for_inactive_replica_timeout {#replication-wait-for-inactive-replica-timeout}
Specifies how long (in seconds) to wait for inactive replicas to execute [ALTER](../../sql-reference/statements/alter/index.md), [OPTIMIZE](../../sql-reference/statements/optimize.md) or [TRUNCATE](../../sql-reference/statements/truncate.md) queries.
@ -4780,6 +4783,10 @@ a Tuple(
)
```
## analyze_index_with_space_filling_curves
If a table has a space-filling curve in its index, e.g. `ORDER BY mortonEncode(x, y)`, and the query has conditions on its arguments, e.g. `x >= 10 AND x <= 20 AND y >= 20 AND y <= 30`, use the space-filling curve for index analysis.
## dictionary_use_async_executor {#dictionary_use_async_executor}
Execute a pipeline for reading dictionary source in several threads. It's supported only by dictionaries with local CLICKHOUSE source.
@ -4794,3 +4801,10 @@ LIFETIME(MIN 0 MAX 3600)
LAYOUT(COMPLEX_KEY_HASHED_ARRAY())
SETTINGS(dictionary_use_async_executor=1, max_threads=8);
```
## storage_metadata_write_full_object_key {#storage_metadata_write_full_object_key}
When set to `true` the metadata files are written with `VERSION_FULL_OBJECT_KEY` format version. With that format full object storage key names are written to the metadata files.
When set to `false` the metadata files are written with the previous format version, `VERSION_INLINE_DATA`. With that format only suffixes of object storage key names are are written to the metadata files. The prefix for all of object storage key names is set in configurations files at `storage_configuration.disks` section.
Default value: `false`.

View File

@ -78,6 +78,11 @@ If procfs is supported and enabled on the system, ClickHouse server collects the
- `OSReadBytes`
- `OSWriteBytes`
:::note
`OSIOWaitMicroseconds` is disabled by default in Linux kernels starting from 5.14.x.
You can enable it using `sudo sysctl kernel.task_delayacct=1` or by creating a `.conf` file in `/etc/sysctl.d/` with `kernel.task_delayacct = 1`
:::
## Related content
- Blog: [System Tables and a window into the internals of ClickHouse](https://clickhouse.com/blog/clickhouse-debugging-issues-with-system-tables)

View File

@ -18,12 +18,14 @@ SHOW TABLES FROM information_schema;
│ KEY_COLUMN_USAGE │
│ REFERENTIAL_CONSTRAINTS │
│ SCHEMATA │
| STATISTICS |
│ TABLES │
│ VIEWS │
│ columns │
│ key_column_usage │
│ referential_constraints │
│ schemata │
| statistics |
│ tables │
│ views │
└─────────────────────────┘
@ -32,11 +34,12 @@ SHOW TABLES FROM information_schema;
`INFORMATION_SCHEMA` contains the following views:
- [COLUMNS](#columns)
- [SCHEMATA](#schemata)
- [TABLES](#tables)
- [VIEWS](#views)
- [KEY_COLUMN_USAGE](#key_column_usage)
- [REFERENTIAL_CONSTRAINTS](#referential_constraints)
- [SCHEMATA](#schemata)
- [STATISTICS](#statistics)
- [TABLES](#tables)
- [VIEWS](#views)
Case-insensitive equivalent views, e.g. `INFORMATION_SCHEMA.columns` are provided for reasons of compatibility with other databases. The same applies to all the columns in these views - both lowercase (for example, `table_name`) and uppercase (`TABLE_NAME`) variants are provided.
@ -372,3 +375,28 @@ Columns:
- `delete_rule` ([String](../../sql-reference/data-types/string.md)) — Currently unused.
- `table_name` ([String](../../sql-reference/data-types/string.md)) — Currently unused.
- `referenced_table_name` ([String](../../sql-reference/data-types/string.md)) — Currently unused.
## STATISTICS {#statistics}
Provides information about table indexes. Currently returns an empty result (no rows) which is just enough to provide compatibility with 3rd party tools like Tableau Online.
Columns:
- `table_catalog` ([String](../../sql-reference/data-types/string.md)) — Currently unused.
- `table_schema` ([String](../../sql-reference/data-types/string.md)) — Currently unused.
- `table_name` ([String](../../sql-reference/data-types/string.md)) — Currently unused.
- `non_unique` ([Int32](../../sql-reference/data-types/int-uint.md)) — Currently unused.
- `index_schema` ([String](../../sql-reference/data-types/string.md)) — Currently unused.
- `index_name` ([Nullable](../../sql-reference/data-types/nullable.md)([String](../../sql-reference/data-types/string.md))) — Currently unused.
- `seq_in_index` ([UInt32](../../sql-reference/data-types/int-uint.md)) — Currently unused.
- `column_name` ([Nullable](../../sql-reference/data-types/nullable.md)([String](../../sql-reference/data-types/string.md))) — Currently unused.
- `collation` ([Nullable](../../sql-reference/data-types/nullable.md)([String](../../sql-reference/data-types/string.md))) — Currently unused.
- `cardinality` ([Nullable](../../sql-reference/data-types/nullable.md)([Int64](../../sql-reference/data-types/int-uint.md))) — Currently unused.
- `sub_part` ([Nullable](../../sql-reference/data-types/nullable.md)([Int64](../../sql-reference/data-types/int-uint.md))) — Currently unused.
- `packed` ([Nullable](../../sql-reference/data-types/nullable.md)([String](../../sql-reference/data-types/string.md))) — Currently unused.
- `nullable` ([String](../../sql-reference/data-types/string.md)) — Currently unused.
- `index_type` ([String](../../sql-reference/data-types/string.md)) — Currently unused.
- `comment` ([String](../../sql-reference/data-types/string.md)) — Currently unused.
- `index_comment` ([String](../../sql-reference/data-types/string.md)) — Currently unused.
- `is_visible` ([String](../../sql-reference/data-types/string.md)) — Currently unused.
- `expression` ([Nullable](../../sql-reference/data-types/nullable.md)([String](../../sql-reference/data-types/string.md))) — Currently unused.

View File

@ -35,27 +35,25 @@ WITH arrayMap(x -> demangle(addressToSymbol(x)), trace) AS all SELECT thread_nam
``` text
Row 1:
──────
thread_name: clickhouse-serv
thread_id: 686
query_id: 1a11f70b-626d-47c1-b948-f9c7b206395d
res: sigqueue
DB::StorageSystemStackTrace::fillData(std::__1::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::__1::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, DB::Context const&, DB::SelectQueryInfo const&) const
DB::IStorageSystemOneBlock<DB::StorageSystemStackTrace>::read(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, DB::SelectQueryInfo const&, DB::Context const&, DB::QueryProcessingStage::Enum, unsigned long, unsigned int)
DB::InterpreterSelectQuery::executeFetchColumns(DB::QueryProcessingStage::Enum, DB::QueryPipeline&, std::__1::shared_ptr<DB::PrewhereInfo> const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&)
DB::InterpreterSelectQuery::executeImpl(DB::QueryPipeline&, std::__1::shared_ptr<DB::IBlockInputStream> const&, std::__1::optional<DB::Pipe>)
DB::InterpreterSelectQuery::execute()
DB::InterpreterSelectWithUnionQuery::execute()
DB::executeQueryImpl(char const*, char const*, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool, DB::ReadBuffer*)
DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool)
DB::TCPHandler::runImpl()
DB::TCPHandler::run()
Poco::Net::TCPServerConnection::start()
Poco::Net::TCPServerDispatcher::run()
Poco::PooledThread::run()
Poco::ThreadImpl::runnableEntry(void*)
start_thread
__clone
thread_name: QueryPipelineEx
thread_id: 743490
query_id: dc55a564-febb-4e37-95bb-090ef182c6f1
res: memcpy
large_ralloc
arena_ralloc
do_rallocx
Allocator<true, true>::realloc(void*, unsigned long, unsigned long, unsigned long)
HashTable<unsigned long, HashMapCell<unsigned long, char*, HashCRC32<unsigned long>, HashTableNoState, PairNoInit<unsigned long, char*>>, HashCRC32<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>>::resize(unsigned long, unsigned long)
void DB::Aggregator::executeImplBatch<false, false, true, DB::AggregationMethodOneNumber<unsigned long, HashMapTable<unsigned long, HashMapCell<unsigned long, char*, HashCRC32<unsigned long>, HashTableNoState, PairNoInit<unsigned long, char*>>, HashCRC32<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>>, true, false>>(DB::AggregationMethodOneNumber<unsigned long, HashMapTable<unsigned long, HashMapCell<unsigned long, char*, HashCRC32<unsigned long>, HashTableNoState, PairNoInit<unsigned long, char*>>, HashCRC32<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>>, true, false>&, DB::AggregationMethodOneNumber<unsigned long, HashMapTable<unsigned long, HashMapCell<unsigned long, char*, HashCRC32<unsigned long>, HashTableNoState, PairNoInit<unsigned long, char*>>, HashCRC32<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>>, true, false>::State&, DB::Arena*, unsigned long, unsigned long, DB::Aggregator::AggregateFunctionInstruction*, bool, char*) const
DB::Aggregator::executeImpl(DB::AggregatedDataVariants&, unsigned long, unsigned long, std::__1::vector<DB::IColumn const*, std::__1::allocator<DB::IColumn const*>>&, DB::Aggregator::AggregateFunctionInstruction*, bool, bool, char*) const
DB::Aggregator::executeOnBlock(std::__1::vector<COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::__1::allocator<COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>, unsigned long, unsigned long, DB::AggregatedDataVariants&, std::__1::vector<DB::IColumn const*, std::__1::allocator<DB::IColumn const*>>&, std::__1::vector<std::__1::vector<DB::IColumn const*, std::__1::allocator<DB::IColumn const*>>, std::__1::allocator<std::__1::vector<DB::IColumn const*, std::__1::allocator<DB::IColumn const*>>>>&, bool&) const
DB::AggregatingTransform::work()
DB::ExecutionThreadContext::executeTask()
DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*)
void std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreads()::$_0, void ()>>(std::__1::__function::__policy_storage const*)
ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::worker(std::__1::__list_iterator<ThreadFromGlobalPoolImpl<false>, void*>)
void std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<void ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::scheduleImpl<void>(std::__1::function<void ()>, Priority, std::__1::optional<unsigned long>, bool)::'lambda0'()>(void&&)::'lambda'(), void ()>>(std::__1::__function::__policy_storage const*)
void* std::__1::__thread_proxy[abi:v15000]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, Priority, std::__1::optional<unsigned long>, bool)::'lambda0'()>>(void*)
```
Getting filenames and line numbers in ClickHouse source code:

View File

@ -115,7 +115,7 @@ Parameters:
<settings>
<connect_timeout>3</connect_timeout>
<!-- Sync insert is set forcibly, leave it here just in case. -->
<insert_distributed_sync>1</insert_distributed_sync>
<distributed_foreground_insert>1</distributed_foreground_insert>
</settings>
<!-- Copying tasks description.

View File

@ -12,6 +12,7 @@ A client application to interact with clickhouse-keeper by its native protocol.
- `-q QUERY`, `--query=QUERY` — Query to execute. If this parameter is not passed, `clickhouse-keeper-client` will start in interactive mode.
- `-h HOST`, `--host=HOST` — Server host. Default value: `localhost`.
- `-p N`, `--port=N` — Server port. Default value: 9181
- `-c FILE_PATH`, `--config-file=FILE_PATH` — Set path of config file to get the connection string. Default value: `config.xml`.
- `--connection-timeout=TIMEOUT` — Set connection timeout in seconds. Default value: 10s.
- `--session-timeout=TIMEOUT` — Set session timeout in seconds. Default value: 10s.
- `--operation-timeout=TIMEOUT` — Set operation timeout in seconds. Default value: 10s.

View File

@ -60,7 +60,7 @@ SELECT largestTriangleThreeBuckets(4)(x, y) FROM largestTriangleThreeBuckets_tes
Result:
``` text
┌────────largestTriangleThreeBuckets(3)(x, y)───────────┐
┌────────largestTriangleThreeBuckets(4)(x, y)───────────┐
│ [(1,10),(3,15),(5,40),(10,70)] │
└───────────────────────────────────────────────────────┘
```

View File

@ -2175,7 +2175,7 @@ Result:
## arrayRandomSample
Function `arrayRandomSample` returns a subset with `samples`-many random elements of an input array. If `samples` exceeds the size of the input array, the sample size is limited to the size of the array. In this case, all elements of the input array are returned, but the order is not guaranteed. The function can handle both flat arrays and nested arrays.
Function `arrayRandomSample` returns a subset with `samples`-many random elements of an input array. If `samples` exceeds the size of the input array, the sample size is limited to the size of the array, i.e. all array elements are returned but their order is not guaranteed. The function can handle both flat arrays and nested arrays.
**Syntax**
@ -2185,13 +2185,15 @@ arrayRandomSample(arr, samples)
**Arguments**
- `arr` — The input array from which to sample elements. This may be flat or nested arrays.
- `samples`An unsigned integer specifying the number of elements to include in the random sample.
- `arr` — The input array from which to sample elements. ([Array(T)](../data-types/array.md))
- `samples`The number of elements to include in the random sample ([UInt*](../data-types/int-uint.md))
**Returned Value**
- An array containing a random sample of elements from the input array.
Type: [Array](../data-types/array.md).
**Examples**
Query:
@ -2201,9 +2203,10 @@ SELECT arrayRandomSample(['apple', 'banana', 'cherry', 'date'], 2) as res;
```
Result:
```
┌─res────────────────┐
│ ['banana','apple'] │
│ ['cherry','apple'] │
└────────────────────┘
```
@ -2214,6 +2217,7 @@ SELECT arrayRandomSample([[1, 2], [3, 4], [5, 6]], 2) as res;
```
Result:
```
┌─res───────────┐
│ [[3,4],[5,6]] │
@ -2222,24 +2226,12 @@ Result:
Query:
```sql
SELECT arrayRandomSample([1, 2, 3, 4, 5], 0) as res;
```
Result:
```
┌─res─┐
│ [] │
└─────┘
```
Query:
```sql
SELECT arrayRandomSample([1, 2, 3], 5) as res;
```
Result:
```
┌─res─────┐
│ [3,1,2] │

View File

@ -1557,10 +1557,10 @@ Returns for a given date, the number of days passed since [1 January 0000](https
toDaysSinceYearZero(date[, time_zone])
```
Aliases: `TO_DAYS`
Alias: `TO_DAYS`
**Arguments**
- `date` — The date to calculate the number of days passed since year zero from. [Date](../../sql-reference/data-types/date.md), [Date32](../../sql-reference/data-types/date32.md), [DateTime](../../sql-reference/data-types/datetime.md) or [DateTime64](../../sql-reference/data-types/datetime64.md).
- `time_zone` — A String type const value or a expression represent the time zone. [String types](../../sql-reference/data-types/string.md)
@ -1584,6 +1584,56 @@ Result:
└────────────────────────────────────────────┘
```
**See Also**
- [fromDaysSinceYearZero](#fromDaysSinceYearZero)
## fromDaysSinceYearZero
Returns for a given number of days passed since [1 January 0000](https://en.wikipedia.org/wiki/Year_zero) the corresponding date in the [proleptic Gregorian calendar defined by ISO 8601](https://en.wikipedia.org/wiki/Gregorian_calendar#Proleptic_Gregorian_calendar). The calculation is the same as in MySQL's [`FROM_DAYS()`](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_from-days) function.
The result is undefined if it cannot be represented within the bounds of the [Date](../../sql-reference/data-types/date.md) type.
**Syntax**
``` sql
fromDaysSinceYearZero(days)
```
Alias: `FROM_DAYS`
**Arguments**
- `days` — The number of days passed since year zero.
**Returned value**
The date corresponding to the number of days passed since year zero.
Type: [Date](../../sql-reference/data-types/date.md).
**Example**
``` sql
SELECT fromDaysSinceYearZero(739136), fromDaysSinceYearZero(toDaysSinceYearZero(toDate('2023-09-08')));
```
Result:
``` text
┌─fromDaysSinceYearZero(739136)─┬─fromDaysSinceYearZero(toDaysSinceYearZero(toDate('2023-09-08')))─┐
│ 2023-09-08 │ 2023-09-08 │
└───────────────────────────────┴──────────────────────────────────────────────────────────────────┘
```
**See Also**
- [toDaysSinceYearZero](#toDaysSinceYearZero)
## fromDaysSinceYearZero32
Like [fromDaysSinceYearZero](#fromDaysSinceYearZero) but returns a [Date32](../../sql-reference/data-types/date32.md).
## age
Returns the `unit` component of the difference between `startdate` and `enddate`. The difference is calculated using a precision of 1 microsecond.

View File

@ -2760,10 +2760,13 @@ message Root
Returns a formatted, possibly multi-line, version of the given SQL query.
Throws an exception if the query is not well-formed. To return `NULL` instead, function `formatQueryOrNull()` may be used.
**Syntax**
```sql
formatQuery(query)
formatQueryOrNull(query)
```
**Arguments**
@ -2796,10 +2799,13 @@ WHERE (a > 3) AND (b < 3) │
Like formatQuery() but the returned formatted string contains no line breaks.
Throws an exception if the query is not well-formed. To return `NULL` instead, function `formatQuerySingleLineOrNull()` may be used.
**Syntax**
```sql
formatQuerySingleLine(query)
formatQuerySingleLineOrNull(query)
```
**Arguments**

View File

@ -1371,6 +1371,86 @@ Result:
└──────────────────┘
```
## byteHammingDistance
Calculates the [hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) between two byte strings.
**Syntax**
```sql
byteHammingDistance(string1, string2)
```
**Examples**
``` sql
SELECT byteHammingDistance('karolin', 'kathrin');
```
Result:
``` text
┌─byteHammingDistance('karolin', 'kathrin')─┐
│ 3 │
└───────────────────────────────────────────┘
```
Alias: mismatches
## stringJaccardIndex
Calculates the [Jaccard similarity index](https://en.wikipedia.org/wiki/Jaccard_index) between two byte strings.
**Syntax**
```sql
stringJaccardIndex(string1, string2)
```
**Examples**
``` sql
SELECT stringJaccardIndex('clickhouse', 'mouse');
```
Result:
``` text
┌─stringJaccardIndex('clickhouse', 'mouse')─┐
│ 0.4 │
└───────────────────────────────────────────┘
```
## stringJaccardIndexUTF8
Like [stringJaccardIndex](#stringJaccardIndex) but for UTF8-encoded strings.
## editDistance
Calculates the [edit distance](https://en.wikipedia.org/wiki/Edit_distance) between two byte strings.
**Syntax**
```sql
editDistance(string1, string2)
```
**Examples**
``` sql
SELECT editDistance('clickhouse', 'mouse');
```
Result:
``` text
┌─editDistance('clickhouse', 'mouse')─┐
│ 6 │
└─────────────────────────────────────┘
```
Alias: levenshteinDistance
## initcap
Convert the first letter of each word to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters.

View File

@ -681,79 +681,3 @@ Like [hasSubsequence](#hasSubsequence) but assumes `haystack` and `needle` are U
## hasSubsequenceCaseInsensitiveUTF8
Like [hasSubsequenceUTF8](#hasSubsequenceUTF8) but searches case-insensitively.
## byteHammingDistance
Calculates the [hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) between two byte strings.
**Syntax**
```sql
byteHammingDistance(string2, string2)
```
**Examples**
``` sql
SELECT byteHammingDistance('abc', 'ab') ;
```
Result:
``` text
┌─byteHammingDistance('abc', 'ab')─┐
│ 1 │
└──────────────────────────────────┘
```
- Alias: mismatches
## jaccardIndex
Calculates the [Jaccard similarity index](https://en.wikipedia.org/wiki/Jaccard_index) between two byte strings.
**Syntax**
```sql
byteJaccardIndex(string1, string2)
```
**Examples**
``` sql
SELECT jaccardIndex('clickhouse', 'mouse');
```
Result:
``` text
┌─jaccardIndex('clickhouse', 'mouse')─┐
│ 0.4 │
└─────────────────────────────────────────┘
```
## editDistance
Calculates the [edit distance](https://en.wikipedia.org/wiki/Edit_distance) between two byte strings.
**Syntax**
```sql
editDistance(string1, string2)
```
**Examples**
``` sql
SELECT editDistance('clickhouse', 'mouse');
```
Result:
``` text
┌─editDistance('clickhouse', 'mouse')─┐
│ 6 │
└─────────────────────────────────────────┘
```
- Alias: levenshteinDistance

View File

@ -10,10 +10,10 @@ Attaches a table or a dictionary, for example, when moving a database to another
**Syntax**
``` sql
ATTACH TABLE|DICTIONARY [IF NOT EXISTS] [db.]name [ON CLUSTER cluster] ...
ATTACH TABLE|DICTIONARY|DATABASE [IF NOT EXISTS] [db.]name [ON CLUSTER cluster] ...
```
The query does not create data on the disk, but assumes that data is already in the appropriate places, and just adds information about the table or the dictionary to the server. After executing the `ATTACH` query, the server will know about the existence of the table or the dictionary.
The query does not create data on the disk, but assumes that data is already in the appropriate places, and just adds information about the specified table, dictionary or database to the server. After executing the `ATTACH` query, the server will know about the existence of the table, dictionary or database.
If a table was previously detached ([DETACH](../../sql-reference/statements/detach.md) query), meaning that its structure is known, you can use shorthand without defining the structure.
@ -79,3 +79,13 @@ Attaches a previously detached dictionary.
``` sql
ATTACH DICTIONARY [IF NOT EXISTS] [db.]name [ON CLUSTER cluster]
```
## Attach Existing Database
Attaches a previously detached database.
**Syntax**
``` sql
ATTACH DATABASE [IF NOT EXISTS] name [ENGINE=<database engine>] [ON CLUSTER cluster]
```

View File

@ -136,8 +136,30 @@ Output:
└──────────────┴───────────┴──────────────────────────────────────────┘
```
If the checksums.txt file is missing, it can be restored. It will be recalculated and rewritten during the execution of the CHECK TABLE command for the specific partition, and the status will still be reported as 'success.'"
If the checksums.txt file is missing, it can be restored. It will be recalculated and rewritten during the execution of the CHECK TABLE command for the specific partition, and the status will still be reported as 'is_passed = 1'.
You can check all existing `(Replicated)MergeTree` tables at once by using the `CHECK ALL TABLES` query.
```sql
CHECK ALL TABLES
FORMAT PrettyCompactMonoBlock
SETTINGS check_query_single_value_result = 0
```
```text
┌─database─┬─table────┬─part_path───┬─is_passed─┬─message─┐
│ default │ t2 │ all_1_95_3 │ 1 │ │
│ db1 │ table_01 │ all_39_39_0 │ 1 │ │
│ default │ t1 │ all_39_39_0 │ 1 │ │
│ db1 │ t1 │ all_39_39_0 │ 1 │ │
│ db1 │ table_01 │ all_1_6_1 │ 1 │ │
│ default │ t1 │ all_1_6_1 │ 1 │ │
│ db1 │ t1 │ all_1_6_1 │ 1 │ │
│ db1 │ table_01 │ all_7_38_2 │ 1 │ │
│ db1 │ t1 │ all_7_38_2 │ 1 │ │
│ default │ t1 │ all_7_38_2 │ 1 │ │
└──────────┴──────────┴─────────────┴───────────┴─────────┘
```
## If the Data Is Corrupted

View File

@ -228,7 +228,7 @@ hex(hexed): 5A90B714
Calculated columns (synonym). Column of this type are not stored in the table and it is not possible to INSERT values into them.
When SELECT queries explicitly reference columns of this type, the value is computed at query time from `expr`. By default, `SELECT *` excludes ALIAS columns. This behavior can be disabled with setting `asteriks_include_alias_columns`.
When SELECT queries explicitly reference columns of this type, the value is computed at query time from `expr`. By default, `SELECT *` excludes ALIAS columns. This behavior can be disabled with setting `asterisk_include_alias_columns`.
When using the ALTER query to add new columns, old data for these columns is not written. Instead, when reading old data that does not have values for the new columns, expressions are computed on the fly by default. However, if running the expressions requires different columns that are not indicated in the query, these columns will additionally be read, but only for the blocks of data that need it.

View File

@ -5,17 +5,17 @@ sidebar_label: DETACH
title: "DETACH Statement"
---
Makes the server "forget" about the existence of a table, a materialized view, or a dictionary.
Makes the server "forget" about the existence of a table, a materialized view, a dictionary, or a database.
**Syntax**
``` sql
DETACH TABLE|VIEW|DICTIONARY [IF EXISTS] [db.]name [ON CLUSTER cluster] [PERMANENTLY] [SYNC]
DETACH TABLE|VIEW|DICTIONARY|DATABASE [IF EXISTS] [db.]name [ON CLUSTER cluster] [PERMANENTLY] [SYNC]
```
Detaching does not delete the data or metadata of a table, a materialized view or a dictionary. If an entity was not detached `PERMANENTLY`, on the next server launch the server will read the metadata and recall the table/view/dictionary again. If an entity was detached `PERMANENTLY`, there will be no automatic recall.
Detaching does not delete the data or metadata of a table, a materialized view, a dictionary or a database. If an entity was not detached `PERMANENTLY`, on the next server launch the server will read the metadata and recall the table/view/dictionary/database again. If an entity was detached `PERMANENTLY`, there will be no automatic recall.
Whether a table or a dictionary was detached permanently or not, in both cases you can reattach them using the [ATTACH](../../sql-reference/statements/attach.md) query.
Whether a table, a dictionary or a database was detached permanently or not, in both cases you can reattach them using the [ATTACH](../../sql-reference/statements/attach.md) query.
System log tables can be also attached back (e.g. `query_log`, `text_log`, etc). Other system tables can't be reattached. On the next server launch the server will recall those tables again.
`ATTACH MATERIALIZED VIEW` does not work with short syntax (without `SELECT`), but you can attach it using the `ATTACH TABLE` query.

View File

@ -207,7 +207,7 @@ The optional keyword `FULL` causes the output to include the collation, comment
The statement produces a result table with the following structure:
- `field` - The name of the column (String)
- `type` - The column data type. If setting `[use_mysql_types_in_show_columns](../../operations/settings/settings.md#use_mysql_types_in_show_columns) = 1` (default: 0), then the equivalent type name in MySQL is shown. (String)
- `type` - The column data type. If the query was made through the MySQL wire protocol, then the equivalent type name in MySQL is shown. (String)
- `null` - `YES` if the column data type is Nullable, `NO` otherwise (String)
- `key` - `PRI` if the column is part of the primary key, `SOR` if the column is part of the sorting key, empty otherwise (String)
- `default` - Default expression of the column if it is of type `ALIAS`, `DEFAULT`, or `MATERIALIZED`, otherwise `NULL`. (Nullable(String))

View File

@ -166,7 +166,7 @@ Aborts ClickHouse process (like `kill -9 {$ pid_clickhouse-server}`)
## Managing Distributed Tables
ClickHouse can manage [distributed](../../engines/table-engines/special/distributed.md) tables. When a user inserts data into these tables, ClickHouse first creates a queue of the data that should be sent to cluster nodes, then asynchronously sends it. You can manage queue processing with the [STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends), [FLUSH DISTRIBUTED](#query_language-system-flush-distributed), and [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends) queries. You can also synchronously insert distributed data with the [insert_distributed_sync](../../operations/settings/settings.md#insert_distributed_sync) setting.
ClickHouse can manage [distributed](../../engines/table-engines/special/distributed.md) tables. When a user inserts data into these tables, ClickHouse first creates a queue of the data that should be sent to cluster nodes, then asynchronously sends it. You can manage queue processing with the [STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends), [FLUSH DISTRIBUTED](#query_language-system-flush-distributed), and [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends) queries. You can also synchronously insert distributed data with the [distributed_foreground_insert](../../operations/settings/settings.md#distributed_foreground_insert) setting.
### STOP DISTRIBUTED SENDS

View File

@ -7,7 +7,7 @@ keywords: [gcs, bucket]
# gcs Table Function
Provides a table-like interface to select/insert files in [Google Cloud Storage](https://cloud.google.com/storage/).
Provides a table-like interface to `SELECT` and `INSERT` data from [Google Cloud Storage](https://cloud.google.com/storage/). Requires the [`Storage Object User` IAM role](https://cloud.google.com/storage/docs/access-control/iam-roles).
**Syntax**

View File

@ -49,21 +49,9 @@ ClickHouse — полноценная столбцовая СУБД. Данны
Блоки создаются для всех обработанных фрагментов данных. Напоминаем, что одни и те же типы вычислений, имена столбцов и типы переиспользуются в разных блоках и только данные колонок изменяются. Лучше разделить данные и заголовок блока потому, что в блоках маленького размера мы имеем большой оверхэд по временным строкам при копировании умных указателей (`shared_ptrs`) и имен столбцов.
## Потоки блоков (Block Streams) {#block-streams}
## Процессоры
Потоки блоков обрабатывают данные. Мы используем потоки блоков для чтения данных, трансформации или записи данных куда-либо. `IBlockInputStream` предоставляет метод `read` для получения следующего блока, пока это возможно, и метод `write`, чтобы продвигать (push) блок куда-либо.
Потоки отвечают за:
1. Чтение и запись в таблицу. Таблица лишь возвращает поток для чтения или записи блоков.
2. Реализацию форматов данных. Например, при выводе данных в терминал в формате `Pretty`, вы создаете выходной поток блоков, который форматирует поступающие в него блоки.
3. Трансформацию данных. Допустим, у вас есть `IBlockInputStream` и вы хотите создать отфильтрованный поток. Вы создаете `FilterBlockInputStream` и инициализируете его вашим потоком. Затем вы тянете (pull) блоки из `FilterBlockInputStream`, а он тянет блоки исходного потока, фильтрует их и возвращает отфильтрованные блоки вам. Таким образом построены конвейеры выполнения запросов.
Имеются и более сложные трансформации. Например, когда вы тянете блоки из `AggregatingBlockInputStream`, он считывает все данные из своего источника, агрегирует их, и возвращает поток агрегированных данных вам. Другой пример: конструктор `UnionBlockInputStream` принимает множество источников входных данных и число потоков. Такой `Stream` работает в несколько потоков и читает данные источников параллельно.
> Потоки блоков используют «втягивающий» (pull) подход к управлению потоком выполнения: когда вы вытягиваете блок из первого потока, он, следовательно, вытягивает необходимые блоки из вложенных потоков, так и работает весь конвейер выполнения. Ни «pull» ни «push» не имеют явного преимущества, потому что поток управления неявный, и это ограничивает в реализации различных функций, таких как одновременное выполнение нескольких запросов (слияние нескольких конвейеров вместе). Это ограничение можно преодолеть с помощью сопрограмм (coroutines) или просто запуском дополнительных потоков, которые ждут друг друга. У нас может быть больше возможностей, если мы сделаем поток управления явным: если мы локализуем логику для передачи данных из одной расчетной единицы в другую вне этих расчетных единиц. Читайте эту [статью](http://journal.stuffwithstuff.com/2013/01/13/iteration-inside-and-out/) для углубленного изучения.
Следует отметить, что конвейер выполнения запроса создает временные данные на каждом шаге. Мы стараемся сохранить размер блока достаточно маленьким, чтобы временные данные помещались в кэш процессора. При таком допущении запись и чтение временных данных практически бесплатны по сравнению с другими расчетами. Мы могли бы рассмотреть альтернативу, которая заключается в том, чтобы объединить многие операции в конвейере вместе. Это может сделать конвейер как можно короче и удалить большую часть временных данных, что может быть преимуществом, но у такого подхода также есть недостатки. Например, разделенный конвейер позволяет легко реализовать кэширование промежуточных данных, использование промежуточных данных из аналогичных запросов, выполняемых одновременно, и объединение конвейеров для аналогичных запросов.
Смотрите описание в файле [src/Processors/IProcessor.h](https://github.com/ClickHouse/ClickHouse/blob/master/src/Processors/IProcessor.h) исходного кода.
## Форматы {#formats}
@ -81,13 +69,16 @@ ClickHouse — полноценная столбцовая СУБД. Данны
Буферы чтения-записи имеют дело только с байтами. В заголовочных файлах `ReadHelpers` и `WriteHelpers` объявлены некоторые функции, чтобы помочь с форматированием ввода-вывода. Например, есть помощники для записи числа в десятичном формате.
Давайте посмотрим, что происходит, когда вы хотите вывести результат в `JSON` формате в стандартный вывод (stdout). У вас есть результирующий набор данных, готовый к извлечению из `IBlockInputStream`. Вы создаете `WriteBufferFromFileDescriptor(STDOUT_FILENO)` чтобы записать байты в stdout. Вы создаете `JSONRowOutputStream`, инициализируете с этим `WriteBuffer`'ом, чтобы записать строки `JSON` в stdout. Кроме того вы создаете `BlockOutputStreamFromRowOutputStream`, реализуя `IBlockOutputStream`. Затем вызывается `copyData` для передачи данных из `IBlockInputStream` в `IBlockOutputStream` и все работает. Внутренний `JSONRowOutputStream` будет писать в формате `JSON` различные разделители и вызвать `IDataType::serializeTextJSON` метод со ссылкой на `IColumn` и номер строки в качестве аргументов. Следовательно, `IDataType::serializeTextJSON` вызовет метод из `WriteHelpers.h`: например, `writeText` для числовых типов и `writeJSONString` для `DataTypeString`.
Давайте посмотрим, что происходит, когда вы хотите вывести результат в `JSON` формате в стандартный вывод (stdout). У вас есть результирующий набор данных, готовый к извлечению из `QueryPipeline`. Вы создаете `WriteBufferFromFileDescriptor(STDOUT_FILENO)` чтобы записать байты в stdout. Вы создаете `JSONRowOutputFormat`, инициализируете с этим `WriteBuffer`'ом, чтобы записать строки `JSON` в stdout.
Чтобы соеденить выход `QueryPipeline` с форматом, можно использовать метод `complete`, который превращает `QueryPipeline` в завершенный `QueryPipeline`.
Внутренний `JSONRowOutputStream` будет писать в формате `JSON` различные разделители и вызвать `IDataType::serializeTextJSON` метод со ссылкой на `IColumn` и номер строки в качестве аргументов. Следовательно, `IDataType::serializeTextJSON` вызовет метод из `WriteHelpers.h`: например, `writeText` для числовых типов и `writeJSONString` для `DataTypeString`.
## Таблицы {#tables}
Интерфейс `IStorage` служит для отображения таблицы. Различные движки таблиц являются реализациями этого интерфейса. Примеры `StorageMergeTree`, `StorageMemory` и так далее. Экземпляры этих классов являются просто таблицами.
Ключевые методы `IStorage` это `read` и `write`. Есть и другие варианты — `alter`, `rename`, `drop` и так далее. Метод `read` принимает следующие аргументы: набор столбцов для чтения из таблицы, `AST` запрос и желаемое количество потоков для вывода. Он возвращает один или несколько объектов `IBlockInputStream` и информацию о стадии обработки данных, которая была завершена внутри табличного движка во время выполнения запроса.
Ключевые методы `IStorage` это `read` и `write`. Есть и другие варианты — `alter`, `rename`, `drop` и так далее.
Метод `read` принимает следующие аргументы: набор столбцов для чтения из таблицы, `AST` запрос и желаемое количество потоков для вывода и возвращает `Pipe`.
В большинстве случаев метод read отвечает только за чтение указанных столбцов из таблицы, а не за дальнейшую обработку данных. Вся дальнейшая обработка данных осуществляется интерпретатором запросов и не входит в сферу ответственности `IStorage`.
@ -96,7 +87,9 @@ ClickHouse — полноценная столбцовая СУБД. Данны
- AST-запрос, передающийся в метод `read`, может использоваться движком таблицы для получения информации о возможности использования индекса и считывания меньшего количества данных из таблицы.
- Иногда движок таблиц может сам обрабатывать данные до определенного этапа. Например, `StorageDistributed` можно отправить запрос на удаленные серверы, попросить их обработать данные до этапа, когда данные с разных удаленных серверов могут быть объединены, и вернуть эти предварительно обработанные данные. Затем интерпретатор запросов завершает обработку данных.
Метод `read` может возвращать несколько объектов `IBlockInputStream`, позволяя осуществлять параллельную обработку данных. Эти несколько блочных входных потоков могут считываться из таблицы параллельно. Затем вы можете обернуть эти потоки различными преобразованиями (такими как вычисление выражений или фильтрация), которые могут быть вычислены независимо, и создать `UnionBlockInputStream` поверх них, чтобы читать из нескольких потоков параллельно.
Метод `read` может возвращать `Pipe`, состоящий из нескольких процессоров. Каждый их этих процессоров может читать данные параллельно.
Затем, вы можете соеденить эти просессоры с другими преобразованиями (такими как вычисление выражений или фильтрация), которые могут быть вычислены независимо.
Далее, создан `QueryPipeline` поверх них, можно выполнить пайплайн с помощью `PipelineExecutor`.
Есть и другие варианты. Например, `TableFunction` возвращает временный объект `IStorage`, который можно подставить во `FROM`.
@ -112,10 +105,18 @@ ClickHouse — полноценная столбцовая СУБД. Данны
## Интерпретаторы {#interpreters}
Интерпретаторы отвечают за создание конвейера выполнения запроса из `AST`. Есть простые интерпретаторы, такие как `InterpreterExistsQuery` и `InterpreterDropQuery` или более сложный `InterpreterSelectQuery`. Конвейер выполнения запроса представляет собой комбинацию входных и выходных потоков блоков. Например, результатом интерпретации `SELECT` запроса является `IBlockInputStream` для чтения результирующего набора данных; результат интерпретации `INSERT` запроса — это `IBlockOutputStream`, для записи данных, предназначенных для вставки; результат интерпретации `INSERT SELECT` запроса — это `IBlockInputStream`, который возвращает пустой результирующий набор при первом чтении, но копирует данные из `SELECT` к `INSERT`.
Интерпретаторы отвечают за создание конвейера выполнения запроса из `AST`. Есть простые интерпретаторы, такие как `InterpreterExistsQuery` и `InterpreterDropQuery` или более сложный `InterpreterSelectQuery`.
Конвейер выполнения запроса представляет собой комбинацию процессоров, которые могут принимать на вход и также возвращать чанки (набор колонок с их типами)
Процессоры обмениваются данными через порты и могут иметь несколько входных и выходных портов.
Более подробное описание можно найти в файле [src/Processors/IProcessor.h](https://github.com/ClickHouse/ClickHouse/blob/master/src/Processors/IProcessor.h).
Например, результатом интерпретации `SELECT` запроса является `QueryPipeline`, который имеет специальный выходной порт для чтения результирующего набора данных. Результатом интерпретации `INSERT` запроса является `QueryPipeline` с входным портом для записи данных для вставки. Результатом интерпретации `INSERT SELECT` запроса является завершенный `QueryPipeline`, который не имеет входов или выходов, но копирует данные из `SELECT` в `INSERT` одновременно.
`InterpreterSelectQuery` использует `ExpressionAnalyzer` и `ExpressionActions` механизмы для анализа запросов и преобразований. Именно здесь выполняется большинство оптимизаций запросов на основе правил. `ExpressionAnalyzer` написан довольно грязно и должен быть переписан: различные преобразования запросов и оптимизации должны быть извлечены в отдельные классы, чтобы позволить модульные преобразования или запросы.
Для решения текущих проблем, существующих в интерпретаторах, разрабатывается новый `InterpreterSelectQueryAnalyzer`. Это новая версия `InterpreterSelectQuery`, которая не использует `ExpressionAnalyzer` и вводит дополнительный уровень абстракции между `AST` и `QueryPipeline`, называемый `QueryTree`. Он еще не готов к использованию в продакшене, но его можно протестировать с помощью флага `allow_experimental_analyzer`.
## Функции {#functions}
Существуют обычные функции и агрегатные функции. Агрегатные функции смотрите в следующем разделе.

View File

@ -345,7 +345,7 @@ struct ExtractDomain
**7.** Для абстрактных классов (интерфейсов) можно добавить в начало имени букву `I`.
``` cpp
class IBlockInputStream
class IProcessor
```
**8.** Если переменная используется достаточно локально, то можно использовать короткое имя.

View File

@ -22,7 +22,7 @@ sidebar_label: Distributed
Смотрите также:
- настройка `insert_distributed_sync`
- настройка `distributed_foreground_insert`
- [MergeTree](../mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes) для примера
Пример:
@ -131,7 +131,7 @@ logs - имя кластера в конфигурационном файле с
- используются запросы, требующие соединение данных (IN, JOIN) по определённому ключу - тогда если данные шардированы по этому ключу, то можно использовать локальные IN, JOIN вместо GLOBAL IN, GLOBAL JOIN, что кардинально более эффективно.
- используется большое количество серверов (сотни и больше) и большое количество маленьких запросов (запросы отдельных клиентов - сайтов, рекламодателей, партнёров) - тогда, для того, чтобы маленькие запросы не затрагивали весь кластер, имеет смысл располагать данные одного клиента на одном шарде, или сделать двухуровневое шардирование: разбить весь кластер на «слои», где слой может состоять из нескольких шардов; данные для одного клиента располагаются на одном слое, но в один слой можно по мере необходимости добавлять шарды, в рамках которых данные распределены произвольным образом; создаются распределённые таблицы на каждый слой и одна общая распределённая таблица для глобальных запросов.
Запись данных осуществляется полностью асинхронно. При вставке в таблицу, блок данных сначала записывается в файловую систему. Затем, в фоновом режиме отправляются на удалённые серверы при первой возможности. Период отправки регулируется настройками [distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms) и [distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms). Движок таблиц `Distributed` отправляет каждый файл со вставленными данными отдельно, но можно включить пакетную отправку данных настройкой [distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts). Эта настройка улучшает производительность кластера за счет более оптимального использования ресурсов сервера-отправителя и сети. Необходимо проверять, что данные отправлены успешно, для этого проверьте список файлов (данных, ожидающих отправки) в каталоге таблицы `/var/lib/clickhouse/data/database/table/`. Количество потоков для выполнения фоновых задач можно задать с помощью настройки [background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size).
Запись данных осуществляется полностью асинхронно. При вставке в таблицу, блок данных сначала записывается в файловую систему. Затем, в фоновом режиме отправляются на удалённые серверы при первой возможности. Период отправки регулируется настройками [distributed_background_insert_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_sleep_time_ms) и [distributed_background_insert_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_max_sleep_time_ms). Движок таблиц `Distributed` отправляет каждый файл со вставленными данными отдельно, но можно включить пакетную отправку данных настройкой [distributed_background_insert_batch](../../../operations/settings/settings.md#distributed_background_insert_batch). Эта настройка улучшает производительность кластера за счет более оптимального использования ресурсов сервера-отправителя и сети. Необходимо проверять, что данные отправлены успешно, для этого проверьте список файлов (данных, ожидающих отправки) в каталоге таблицы `/var/lib/clickhouse/data/database/table/`. Количество потоков для выполнения фоновых задач можно задать с помощью настройки [background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size).
Если после INSERT-а в Distributed таблицу, сервер перестал существовать или был грубо перезапущен (например, в следствие аппаратного сбоя), то записанные данные могут быть потеряны. Если в директории таблицы обнаружен повреждённый кусок данных, то он переносится в поддиректорию broken и больше не используется.

View File

@ -4,17 +4,17 @@ sidebar_position: 12
sidebar_label: Tutorial
---
# ClickHouse Tutorial {#clickhouse-tutorial}
# Руководство {#clickhouse-tutorial}
## What to Expect from This Tutorial? {#what-to-expect-from-this-tutorial}
## Что вы получите, пройдя это руководство? {#what-to-expect-from-this-tutorial}
By going through this tutorial, youll learn how to set up a simple ClickHouse cluster. Itll be small, but fault-tolerant and scalable. Then we will use one of the example datasets to fill it with data and execute some demo queries.
Пройдя это руководство вы научитесь устанавливать простой кластер Clickhouse. Он будет небольшим, но отказоустойчивым и масштабируемым. Далее мы воспользуемся одним из готовых наборов данных для наполнения кластера данными и выполнения над ними нескольких демонстрационных запросов.
## Single Node Setup {#single-node-setup}
## Установка на одном узле {#single-node-setup}
To postpone the complexities of a distributed environment, well start with deploying ClickHouse on a single server or virtual machine. ClickHouse is usually installed from [deb](../getting-started/install.md#install-from-deb-packages) or [rpm](../getting-started/install.md#from-rpm-packages) packages, but there are [alternatives](../getting-started/install.md#from-docker-image) for the operating systems that do no support them.
Чтобы не погружаться сразу в сложности распределённого окружения мы начнём с развёртывания ClickHouse на одном сервере или одной виртуальной машине. ClickHouse обычно устанавливаается из [deb](../getting-started/install.md#install-from-deb-packages)- или [rpm](../getting-started/install.md#from-rpm-packages)-пакетов, но есть и [альтернативы](../getting-started/install.md#from-docker-image) для операционных систем без соответствующих пакетных менеджеров.
For example, you have chosen `deb` packages and executed:
Например, выбираем нужные `deb`-пакеты и выполняем:
``` bash
sudo apt-get install -y apt-transport-https ca-certificates dirmngr
@ -30,49 +30,49 @@ sudo service clickhouse-server start
clickhouse-client # or "clickhouse-client --password" if you've set up a password.
```
What do we have in the packages that got installed:
Что мы получим по результатам установки этих пакетов:
- `clickhouse-client` package contains [clickhouse-client](../interfaces/cli.md) application, interactive ClickHouse console client.
- `clickhouse-common` package contains a ClickHouse executable file.
- `clickhouse-server` package contains configuration files to run ClickHouse as a server.
- с пакетом `clickhouse-client` будет установлена программа [clickhouse-client](../interfaces/cli.md) — интерактивный консольный клиент ClickHouse.
- пакет `clickhouse-common` включает исполняемый файл ClickHouse.
- пакет `clickhouse-server` содержит конфигурационные файлы для запуска ClickHouse в качестве сервера.
Server config files are located in `/etc/clickhouse-server/`. Before going further, please notice the `<path>` element in `config.xml`. Path determines the location for data storage, so it should be located on volume with large disk capacity; the default value is `/var/lib/clickhouse/`. If you want to adjust the configuration, its not handy to directly edit `config.xml` file, considering it might get rewritten on future package updates. The recommended way to override the config elements is to create [files in config.d directory](../operations/configuration-files.md) which serve as “patches” to config.xml.
Файлы конфигурации сервера располагаются в каталоге `/etc/clickhouse-server/`. Прежде чем идти дальше, обратите внимание на элемент `<path>` в файле `config.xml`. Путь, задаваемый этим элементом, определяет местоположение данных, таким образом, он должен быть расположен на томе большой ёмкости; значение по умолчанию — `/var/lib/clickhouse/`. Если вы хотите изменить конфигурацию, то лучше не редактировать вручную файл `config.xml`, поскольку он может быть переписан будущими пакетными обновлениями; рекомендуется создать файлы с необходимыми конфигурационными элементами [в каталоге config.d](../operations/configuration-files.md), которые рассматриваются как “патчи” к config.xml.
As you might have noticed, `clickhouse-server` is not launched automatically after package installation. It wont be automatically restarted after updates, either. The way you start the server depends on your init system, usually, it is:
Вы могли заметить, что `clickhouse-server` не запускается автоматически после установки пакетов. Также сервер не будет автоматически перезапускаться после обновлений. Способ запуска сервера зависит от используемой подсистемы инициализации, обычно это делается так:
``` bash
sudo service clickhouse-server start
```
or
или
``` bash
sudo /etc/init.d/clickhouse-server start
```
The default location for server logs is `/var/log/clickhouse-server/`. The server is ready to handle client connections once it logs the `Ready for connections` message.
Журналы сервера по умолчанию ведутся в `/var/log/clickhouse-server/`. Как только в журнале появится сообщение `Ready for connections` — сервер готов принимать клиентские соединения.
Once the `clickhouse-server` is up and running, we can use `clickhouse-client` to connect to the server and run some test queries like `SELECT "Hello, world!";`.
Теперь, когда `clickhouse-server` запущен, можно подключиться к нему с использованием `clickhouse-client` и выполнить тестовый запрос, например, `SELECT 'Hello, world!';`.
<details markdown="1">
<summary>Quick tips for clickhouse-client</summary>
<summary>Советы по использованию clickhouse-client</summary>
Interactive mode:
Интерактивный режим:
``` bash
clickhouse-client
clickhouse-client --host=... --port=... --user=... --password=...
```
Enable multiline queries:
Включить многострочный режим запросов:
``` bash
clickhouse-client -m
clickhouse-client --multiline
```
Run queries in batch-mode:
Включить пакетный режим запуска запросов:
``` bash
clickhouse-client --query='SELECT 1'
@ -80,7 +80,7 @@ echo 'SELECT 1' | clickhouse-client
clickhouse-client <<< 'SELECT 1'
```
Insert data from a file in specified format:
Вставить данные из файла заданного формата:
``` bash
clickhouse-client --query='INSERT INTO table VALUES' < data.txt
@ -89,39 +89,39 @@ clickhouse-client --query='INSERT INTO table FORMAT TabSeparated' < data.tsv
</details>
## Import Sample Dataset {#import-sample-dataset}
## Загрузка набора данных из примеров {#import-sample-dataset}
Now its time to fill our ClickHouse server with some sample data. In this tutorial, well use some anonymized metric data. There are [multiple ways to import the dataset](../getting-started/example-datasets/metrica.md), and for the sake of the tutorial, well go with the most realistic one.
Настало время загрузить в ClickHouse данные из примеров. В этом руководстве мы используем анонимизированные данные посещений сайтов (веб-метрики). Существует [множество способов импортировать набор данных](../getting-started/example-datasets/metrica.md), но для целей данного руководства мы используем наиболее практичный из них.
### Download and Extract Table Data {#download-and-extract-table-data}
### Загрузка и извлечение табличных данных {#download-and-extract-table-data}
``` bash
curl https://datasets.clickhouse.com/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
curl https://datasets.clickhouse.com/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv
```
The extracted files are about 10GB in size.
Распакованные файлы занимают около 10 ГБ.
### Create Tables {#create-tables}
### Создание таблиц {#create-tables}
As in most databases management systems, ClickHouse logically groups tables into “databases”. Theres a `default` database, but well create a new one named `tutorial`:
ClickHouse, как и большинство СУБД, логически объединяет таблицы в «базы данных». Существует база данных по умолчанию — `default`, но мы созданим новую, дав ей наименование `tutorial`:
``` bash
clickhouse-client --query "CREATE DATABASE IF NOT EXISTS tutorial"
```
Syntax for creating tables is way more complicated compared to databases (see [reference](../sql-reference/statements/create/table.md). In general `CREATE TABLE` statement has to specify three key things:
Ситаксис для создания таблиц более сложен в сравнении с другими СУБД (см. [руководство по SQL](../sql-reference/statements/create/table.md). Оператор `CREATE TABLE` должен указывать на три ключевых момента:
1. Name of table to create.
2. Table schema, i.e. list of columns and their [data types](../sql-reference/data-types/index.md).
3. [Table engine](../engines/table-engines/index.md) and its settings, which determines all the details on how queries to this table will be physically executed.
1. Имя создаваемой таблицы.
2. Схему таблицы, то есть задавать список столбцов и их [типы данных](../sql-reference/data-types/index.md).
3. [Движок таблицы](../engines/table-engines/index.md) и его параметры, которые определяют все детали того, как запросы к данной таблице будут физически исполняться.
There are only two tables to create:
Мы создадим все лишь две таблицы:
- `hits` is a table with each action done by all users on all websites covered by the service.
- `visits` is a table that contains pre-built sessions instead of individual actions.
- таблицу `hits` с действиями, осуществлёнными всеми пользователями на всех сайтах, обслуживаемых сервисом;
- таблицу `visits`, содержащую посещения — преднастроенные сессии вместо каждого действия.
Lets see and execute the real create table queries for these tables:
Выполним операторы `CREATE TABLE` для создания этих таблиц:
``` sql
CREATE TABLE tutorial.hits_v1
@ -462,22 +462,22 @@ ORDER BY (CounterID, StartDate, intHash32(UserID), VisitID)
SAMPLE BY intHash32(UserID)
```
You can execute those queries using the interactive mode of `clickhouse-client` (just launch it in a terminal without specifying a query in advance) or try some [alternative interface](../interfaces/index.md) if you want.
Эти операторы можно выполнить с использованием интерактивного режима в `clickhouse-client` (запустите его из командной строки не указывая заранее запросы) или, при желании, воспользоваться [альтернативным интерфейсом](../interfaces/index.md) .
As we can see, `hits_v1` uses the [basic MergeTree engine](../engines/table-engines/mergetree-family/mergetree.md), while the `visits_v1` uses the [Collapsing](../engines/table-engines/mergetree-family/collapsingmergetree.md) variant.
Как вы можете видеть, `hits_v1` использует [базовый вариант движка MergeTree](../engines/table-engines/mergetree-family/mergetree.md), тогда как `visits_v1` использует вариант [Collapsing](../engines/table-engines/mergetree-family/collapsingmergetree.md).
### Import Data {#import-data}
### Импорт данных {#import-data}
Data import to ClickHouse is done via [INSERT INTO](../sql-reference/statements/insert-into.md) query like in many other SQL databases. However, data is usually provided in one of the [supported serialization formats](../interfaces/formats.md) instead of `VALUES` clause (which is also supported).
Импорт данных в ClickHouse выполняется оператором [INSERT INTO](../sql-reference/statements/insert-into.md) как в большинстве SQL-систем. Однако данные для вставки в таблицы ClickHouse обычно предоставляются в одном из [поддерживаемых форматов](../interfaces/formats.md) вместо их непосредственного указания в предложении `VALUES` (хотя и этот способ поддерживается).
The files we downloaded earlier are in tab-separated format, so heres how to import them via console client:
В нашем случае файлы были загружены ранее в формате со значениями, разделёнными знаком табуляции; импортируем их, указав соответствующие запросы в аргументах командной строки:
``` bash
clickhouse-client --query "INSERT INTO tutorial.hits_v1 FORMAT TSV" --max_insert_block_size=100000 < hits_v1.tsv
clickhouse-client --query "INSERT INTO tutorial.visits_v1 FORMAT TSV" --max_insert_block_size=100000 < visits_v1.tsv
```
ClickHouse has a lot of [settings to tune](../operations/settings/index.md) and one way to specify them in console client is via arguments, as we can see with `--max_insert_block_size`. The easiest way to figure out what settings are available, what do they mean and what the defaults are is to query the `system.settings` table:
ClickHouse оснащён множеством [изменяемых настроек](../operations/settings/index.md) и один из способов их указать — передать при запуске консольного клиенте их в качестве аргументов, как вы видели в этом примере с `--max_insert_block_size`. Простейший способ узнать, какие настройки доступны, что они означают и какие у них значения по умолчанию — запросить содержимое таблицы `system.settings`:
``` sql
SELECT name, value, changed, description
@ -488,23 +488,23 @@ FORMAT TSV
max_insert_block_size 1048576 0 "The maximum block size for insertion, if we control the creation of blocks for insertion."
```
Optionally you can [OPTIMIZE](../sql-reference/statements/optimize.md) the tables after import. Tables that are configured with an engine from MergeTree-family always do merges of data parts in the background to optimize data storage (or at least check if it makes sense). These queries force the table engine to do storage optimization right now instead of some time later:
Можно также применить оператор [OPTIMIZE](../sql-reference/statements/optimize.md) к таблицам после импорта. Для таблиц, созданных с движками семейства MergeTree, слияние частей загруженных данных выполняется в фоновом режиме (по крайней мере проверяется, имеет ли смысл его осуществить); этот оператор принудительно запускает соответствующие процессы слияния вместо того, чтобы эти действия были выполнены в фоне когда-нибудь позже.
``` bash
clickhouse-client --query "OPTIMIZE TABLE tutorial.hits_v1 FINAL"
clickhouse-client --query "OPTIMIZE TABLE tutorial.visits_v1 FINAL"
```
These queries start an I/O and CPU intensive operation, so if the table consistently receives new data, its better to leave it alone and let merges run in the background.
Эти запросы запускают интеснивные по отношению к вводу-выводу и процессорным ресурсам операции, таким образом, если таблица всё ещё получает новые данные, лучше дать возможность слияниям запуститься в фоне.
Now we can check if the table import was successful:
Проверим, успешно ли загрузились данные:
``` bash
clickhouse-client --query "SELECT COUNT(*) FROM tutorial.hits_v1"
clickhouse-client --query "SELECT COUNT(*) FROM tutorial.visits_v1"
```
## Example Queries {#example-queries}
## Примеры запросов {#example-queries}
``` sql
SELECT
@ -526,18 +526,18 @@ FROM tutorial.visits_v1
WHERE (CounterID = 912887) AND (toYYYYMM(StartDate) = 201403)
```
## Cluster Deployment {#cluster-deployment}
## Кластерное развёртывание {#cluster-deployment}
ClickHouse cluster is a homogenous cluster. Steps to set up:
Кластер ClickHouse — гомогенный, то есть все узлы в нём равны, ведущие и ведомые не выделяются. Шаги по установке кластера:
1. Install ClickHouse server on all machines of the cluster
2. Set up cluster configs in configuration files
3. Create local tables on each instance
4. Create a [Distributed table](../engines/table-engines/special/distributed.md)
1. Установить сервер ClickHouse на всех узлах будущего кластера.
2. Прописать кластерные конфигурации в конфигурационных файлах.
3. Создать локальные таблицы на каждом экземпляре.
4. Создать [распределённую таблицу](../engines/table-engines/special/distributed.md).
[Distributed table](../engines/table-engines/special/distributed.md) is actually a kind of “view” to local tables of ClickHouse cluster. SELECT query from a distributed table executes using resources of all clusters shards. You may specify configs for multiple clusters and create multiple distributed tables providing views to different clusters.
[Распределённая таблица](../engines/table-engines/special/distributed.md) — в некотором смысле «представление» над локальными таблицами кластера ClickHouse. Запрос SELECT к распределённой таблице выполняется на всех узлах кластера. Вы можете указать конфигурации для нескольких кластеров и создать множество распределённых таблиц, «смотрящих» на разные кластеры.
Example config for a cluster with three shards, one replica each:
Пример конфигурации кластера с тремя сегментами и одной репликой для каждой:
``` xml
<remote_servers>
@ -564,38 +564,38 @@ Example config for a cluster with three shards, one replica each:
</remote_servers>
```
For further demonstration, lets create a new local table with the same `CREATE TABLE` query that we used for `hits_v1`, but different table name:
Далее создадим новую локальную таблицу с помощью того же запроса `CREATE TABLE`, что использовался для таблицы `hits_v1`, но с другим именем:
``` sql
CREATE TABLE tutorial.hits_local (...) ENGINE = MergeTree() ...
```
Creating a distributed table providing a view into local tables of the cluster:
Создадим распределённую таблицу, обеспечивающую представление над локальными таблицами кластера:
``` sql
CREATE TABLE tutorial.hits_all AS tutorial.hits_local
ENGINE = Distributed(perftest_3shards_1replicas, tutorial, hits_local, rand());
```
A common practice is to create similar Distributed tables on all machines of the cluster. It allows running distributed queries on any machine of the cluster. Also theres an alternative option to create temporary distributed table for a given SELECT query using [remote](../sql-reference/table-functions/remote.md) table function.
Стандартная практика — создание одинаковых распределённых таблиц на всех узлах кластера. Это позволит запускать распределённые запросы с любого узла. Альтернативой может быть создание временной распределённой таблицы для заданного отдельно взятого запроса с использованием табличной функции [remote](../sql-reference/table-functions/remote.md).
Lets run [INSERT SELECT](../sql-reference/statements/insert-into.md) into the Distributed table to spread the table to multiple servers.
Выполним [INSERT SELECT](../sql-reference/statements/insert-into.md) в распределённую таблицу, чтобы распределить данные по нескольким узлам.
``` sql
INSERT INTO tutorial.hits_all SELECT * FROM tutorial.hits_v1;
```
:::danger Notice
This approach is not suitable for the sharding of large tables. Theres a separate tool [clickhouse-copier](../operations/utilities/clickhouse-copier.md) that can re-shard arbitrary large tables.
:::danger Внимание!
Этот подход не годится для сегментирования больших таблиц. Есть инструмент [clickhouse-copier](../operations/utilities/clickhouse-copier.md), специально предназначенный для перераспределения любых больших таблиц.
:::
As you could expect, computationally heavy queries run N times faster if they utilize 3 servers instead of one.
Как и следовало ожидать, вычислительно сложные запросы работают втрое быстрее, если они выполняются на трёх серверах, а не на одном.
In this case, we have used a cluster with 3 shards, and each contains a single replica.
В данном случае мы использовали кластер из трёх сегментов с одной репликой для каждого.
To provide resilience in a production environment, we recommend that each shard should contain 2-3 replicas spread between multiple availability zones or datacenters (or at least racks). Note that ClickHouse supports an unlimited number of replicas.
В продуктивных окружениях для обеспечения надёжности мы рекомендуем чтобы каждый сегмент был защищён 2—3 репликами, разнесёнными на разные зоны отказоустойчивости или разные центры обработки данных (или хотя бы разные стойки). Особо отметим, что ClickHouse поддерживает неограниченное количество реплик.
Example config for a cluster of one shard containing three replicas:
Пример конфигурации кластера с одним сегментом и тремя репликами:
``` xml
<remote_servers>
@ -619,13 +619,13 @@ Example config for a cluster of one shard containing three replicas:
</remote_servers>
```
To enable native replication [ZooKeeper](http://zookeeper.apache.org/) is required. ClickHouse takes care of data consistency on all replicas and runs restore procedure after failure automatically. Its recommended to deploy the ZooKeeper cluster on separate servers (where no other processes including ClickHouse are running).
Для работы встроенной репликации необходимо использовать [ZooKeeper](http://zookeeper.apache.org/). ClickHouse заботится о согласованности данных на всех репликах и автоматически запускает процедуры восстановления в случае сбоев. Рекомендуется развёртывание кластера ZooKeeper на отдельных серверах (на которых не запущено других процессов, в том числе ClickHouse).
:::note Note
ZooKeeper is not a strict requirement: in some simple cases, you can duplicate the data by writing it into all the replicas from your application code. This approach is **not** recommended, in this case, ClickHouse wont be able to guarantee data consistency on all replicas. Thus it becomes the responsibility of your application.
:::note Примечание
Использование ZooKeeper — нестрогая рекомендация: можно продублировать данные, записывая их непосредственно из приложения на несколько реплик. Но этот поход **не рекомедуется** в общем случае, поскольку ClickHouse не сможет гарантировать согласованность данных на всех репликах; обеспечение согласованности станет заботой вашего приложения.
:::
ZooKeeper locations are specified in the configuration file:
Адреса узлов ZooKeeper указываются в файле конфиуграции:
``` xml
<zookeeper>
@ -644,7 +644,7 @@ ZooKeeper locations are specified in the configuration file:
</zookeeper>
```
Also, we need to set macros for identifying each shard and replica which are used on table creation:
Также необходимо в секции macros указать идентификаторы для сегментов и реплик, они нужны будут при создании таблиц:
``` xml
<macros>
@ -653,7 +653,7 @@ Also, we need to set macros for identifying each shard and replica which are use
</macros>
```
If there are no replicas at the moment on replicated table creation, a new first replica is instantiated. If there are already live replicas, the new replica clones data from existing ones. You have an option to create all replicated tables first, and then insert data to it. Another option is to create some replicas and add the others after or during data insertion.
Если в момент создания реплцированной таблицы ни одной реплики ещё нет, то будет создана первая из них. Если уже есть работающие реплики, то в новые реплики данные будут склонированы из существующих. Есть возможность вначале создать реплицируемые таблицы, а затем вставить в них данные. Но можно создать вначале создать только часть реплик и добавить ещё несколько после вставки или в процессе вставки данных.
``` sql
CREATE TABLE tutorial.hits_replica (...)
@ -664,10 +664,10 @@ ENGINE = ReplicatedMergeTree(
...
```
Here we use [ReplicatedMergeTree](../engines/table-engines/mergetree-family/replication.md) table engine. In parameters we specify ZooKeeper path containing shard and replica identifiers.
Здесь мы используем движок [ReplicatedMergeTree](../engines/table-engines/mergetree-family/replication.md). Указываем в параметрах путь в Zookeeper к идентификаторам сегмента и реплики.
``` sql
INSERT INTO tutorial.hits_replica SELECT * FROM tutorial.hits_local;
```
Replication operates in multi-master mode. Data can be loaded into any replica, and the system then syncs it with other instances automatically. Replication is asynchronous so at a given moment, not all replicas may contain recently inserted data. At least one replica should be up to allow data ingestion. Others will sync up data and repair consistency once they will become active again. Note that this approach allows for the low possibility of a loss of recently inserted data.
Репликация работает в режиме мультимастера. Это означает, что данные могут быть загружены на любую из реплик и система автоматически синхронизирует данные между остальными репликами. Репликация асинхронна, то есть в конкретный момент времнени не все реплики могут содержать недавно добавленные данные. Как минимум одна реплика должна быть в строю для приёма данных. Прочие реплики синхронизируются и восстановят согласованное состояния как только снова станут активными. Заметим, что при таком подходе есть вероятность утраты недавно добавленных данных.

View File

@ -108,6 +108,15 @@ sidebar_label: "Ограничения на сложность запроса"
Значение по умолчанию — 0.
## max_bytes_before_external_sort {#settings-max_bytes_before_external_sort}
Включает или отключает выполнение `ORDER BY` во внешней памяти. См. [Детали реализации ORDER BY](../../sql-reference/statements/select/order-by.md#implementation-details)
- Максимальный объем оперативной памяти (в байтах), который может использоваться одной операцией [ORDER BY](../../sql-reference/statements/select/order-by.md). Рекомендуемое значение — половина доступной системной памяти.
- 0 — `ORDER BY` во внешней памяти отключен.
Значение по умолчанию: 0.
## max_rows_to_sort {#max-rows-to-sort}
Максимальное количество строк до сортировки. Позволяет ограничить потребление оперативки при сортировке.

View File

@ -2136,7 +2136,7 @@ SELECT * FROM test_table
- [distributed_replica_error_cap](#settings-distributed_replica_error_cap)
- [distributed_replica_error_half_life](#settings-distributed_replica_error_half_life)
## distributed_directory_monitor_sleep_time_ms {#distributed_directory_monitor_sleep_time_ms}
## distributed_background_insert_sleep_time_ms {#distributed_background_insert_sleep_time_ms}
Основной интервал отправки данных движком таблиц [Distributed](../../engines/table-engines/special/distributed.md). Фактический интервал растёт экспоненциально при возникновении ошибок.
@ -2146,9 +2146,9 @@ SELECT * FROM test_table
Значение по умолчанию: 100 миллисекунд.
## distributed_directory_monitor_max_sleep_time_ms {#distributed_directory_monitor_max_sleep_time_ms}
## distributed_background_insert_max_sleep_time_ms {#distributed_background_insert_max_sleep_time_ms}
Максимальный интервал отправки данных движком таблиц [Distributed](../../engines/table-engines/special/distributed.md). Ограничивает экпоненциальный рост интервала, установленого настройкой [distributed_directory_monitor_sleep_time_ms](#distributed_directory_monitor_sleep_time_ms).
Максимальный интервал отправки данных движком таблиц [Distributed](../../engines/table-engines/special/distributed.md). Ограничивает экпоненциальный рост интервала, установленого настройкой [distributed_background_insert_sleep_time_ms](#distributed_background_insert_sleep_time_ms).
Возможные значения:
@ -2156,7 +2156,7 @@ SELECT * FROM test_table
Значение по умолчанию: 30000 миллисекунд (30 секунд).
## distributed_directory_monitor_batch_inserts {#distributed_directory_monitor_batch_inserts}
## distributed_background_insert_batch {#distributed_background_insert_batch}
Включает/выключает пакетную отправку вставленных данных.
@ -2323,11 +2323,11 @@ SELECT * FROM test_table
Значение по умолчанию: 0.
## insert_distributed_sync {#insert_distributed_sync}
## distributed_foreground_insert {#distributed_foreground_insert}
Включает или отключает режим синхронного добавления данных в распределенные таблицы (таблицы с движком [Distributed](../../engines/table-engines/special/distributed.md#distributed)).
По умолчанию ClickHouse вставляет данные в распределённую таблицу в асинхронном режиме. Если `insert_distributed_sync=1`, то данные вставляются сихронно, а запрос `INSERT` считается выполненным успешно, когда данные записаны на все шарды (по крайней мере на одну реплику для каждого шарда, если `internal_replication = true`).
По умолчанию ClickHouse вставляет данные в распределённую таблицу в асинхронном режиме. Если `distributed_foreground_insert=1`, то данные вставляются сихронно, а запрос `INSERT` считается выполненным успешно, когда данные записаны на все шарды (по крайней мере на одну реплику для каждого шарда, если `internal_replication = true`).
Возможные значения:

View File

@ -31,27 +31,25 @@ WITH arrayMap(x -> demangle(addressToSymbol(x)), trace) AS all SELECT thread_nam
``` text
Row 1:
──────
thread_name: clickhouse-serv
thread_id: 686
query_id: 1a11f70b-626d-47c1-b948-f9c7b206395d
res: sigqueue
DB::StorageSystemStackTrace::fillData(std::__1::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::__1::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, DB::Context const&, DB::SelectQueryInfo const&) const
DB::IStorageSystemOneBlock<DB::StorageSystemStackTrace>::read(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, DB::SelectQueryInfo const&, DB::Context const&, DB::QueryProcessingStage::Enum, unsigned long, unsigned int)
DB::InterpreterSelectQuery::executeFetchColumns(DB::QueryProcessingStage::Enum, DB::QueryPipeline&, std::__1::shared_ptr<DB::PrewhereInfo> const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&)
DB::InterpreterSelectQuery::executeImpl(DB::QueryPipeline&, std::__1::shared_ptr<DB::IBlockInputStream> const&, std::__1::optional<DB::Pipe>)
DB::InterpreterSelectQuery::execute()
DB::InterpreterSelectWithUnionQuery::execute()
DB::executeQueryImpl(char const*, char const*, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool, DB::ReadBuffer*)
DB::executeQuery(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context&, bool, DB::QueryProcessingStage::Enum, bool)
DB::TCPHandler::runImpl()
DB::TCPHandler::run()
Poco::Net::TCPServerConnection::start()
Poco::Net::TCPServerDispatcher::run()
Poco::PooledThread::run()
Poco::ThreadImpl::runnableEntry(void*)
start_thread
__clone
thread_name: QueryPipelineEx
thread_id: 743490
query_id: dc55a564-febb-4e37-95bb-090ef182c6f1
res: memcpy
large_ralloc
arena_ralloc
do_rallocx
Allocator<true, true>::realloc(void*, unsigned long, unsigned long, unsigned long)
HashTable<unsigned long, HashMapCell<unsigned long, char*, HashCRC32<unsigned long>, HashTableNoState, PairNoInit<unsigned long, char*>>, HashCRC32<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>>::resize(unsigned long, unsigned long)
void DB::Aggregator::executeImplBatch<false, false, true, DB::AggregationMethodOneNumber<unsigned long, HashMapTable<unsigned long, HashMapCell<unsigned long, char*, HashCRC32<unsigned long>, HashTableNoState, PairNoInit<unsigned long, char*>>, HashCRC32<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>>, true, false>>(DB::AggregationMethodOneNumber<unsigned long, HashMapTable<unsigned long, HashMapCell<unsigned long, char*, HashCRC32<unsigned long>, HashTableNoState, PairNoInit<unsigned long, char*>>, HashCRC32<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>>, true, false>&, DB::AggregationMethodOneNumber<unsigned long, HashMapTable<unsigned long, HashMapCell<unsigned long, char*, HashCRC32<unsigned long>, HashTableNoState, PairNoInit<unsigned long, char*>>, HashCRC32<unsigned long>, HashTableGrowerWithPrecalculation<8ul>, Allocator<true, true>>, true, false>::State&, DB::Arena*, unsigned long, unsigned long, DB::Aggregator::AggregateFunctionInstruction*, bool, char*) const
DB::Aggregator::executeImpl(DB::AggregatedDataVariants&, unsigned long, unsigned long, std::__1::vector<DB::IColumn const*, std::__1::allocator<DB::IColumn const*>>&, DB::Aggregator::AggregateFunctionInstruction*, bool, bool, char*) const
DB::Aggregator::executeOnBlock(std::__1::vector<COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::__1::allocator<COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>, unsigned long, unsigned long, DB::AggregatedDataVariants&, std::__1::vector<DB::IColumn const*, std::__1::allocator<DB::IColumn const*>>&, std::__1::vector<std::__1::vector<DB::IColumn const*, std::__1::allocator<DB::IColumn const*>>, std::__1::allocator<std::__1::vector<DB::IColumn const*, std::__1::allocator<DB::IColumn const*>>>>&, bool&) const
DB::AggregatingTransform::work()
DB::ExecutionThreadContext::executeTask()
DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*)
void std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreads()::$_0, void ()>>(std::__1::__function::__policy_storage const*)
ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::worker(std::__1::__list_iterator<ThreadFromGlobalPoolImpl<false>, void*>)
void std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<void ThreadPoolImpl<ThreadFromGlobalPoolImpl<false>>::scheduleImpl<void>(std::__1::function<void ()>, Priority, std::__1::optional<unsigned long>, bool)::'lambda0'()>(void&&)::'lambda'(), void ()>>(std::__1::__function::__policy_storage const*)
void* std::__1::__thread_proxy[abi:v15000]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, Priority, std::__1::optional<unsigned long>, bool)::'lambda0'()>>(void*)
```
Получение имен файлов и номеров строк в исходном коде ClickHouse:

View File

@ -111,7 +111,7 @@ $ clickhouse-copier --daemon --config zookeeper.xml --task-path /task/path --bas
<settings>
<connect_timeout>3</connect_timeout>
<!-- Sync insert is set forcibly, leave it here just in case. -->
<insert_distributed_sync>1</insert_distributed_sync>
<distributed_foreground_insert>1</distributed_foreground_insert>
</settings>
<!-- Copying tasks description.

View File

@ -128,7 +128,7 @@ SYSTEM RELOAD CONFIG [ON CLUSTER cluster_name]
## Управление распределёнными таблицами {#query-language-system-distributed}
ClickHouse может оперировать [распределёнными](../../sql-reference/statements/system.md) таблицами. Когда пользователь вставляет данные в эти таблицы, ClickHouse сначала формирует очередь из данных, которые должны быть отправлены на узлы кластера, а затем асинхронно отправляет подготовленные данные. Вы можете управлять очередью с помощью запросов [STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends), [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends) и [FLUSH DISTRIBUTED](#query_language-system-flush-distributed). Также есть возможность синхронно вставлять распределенные данные с помощью настройки [insert_distributed_sync](../../operations/settings/settings.md#insert_distributed_sync).
ClickHouse может оперировать [распределёнными](../../sql-reference/statements/system.md) таблицами. Когда пользователь вставляет данные в эти таблицы, ClickHouse сначала формирует очередь из данных, которые должны быть отправлены на узлы кластера, а затем асинхронно отправляет подготовленные данные. Вы можете управлять очередью с помощью запросов [STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends), [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends) и [FLUSH DISTRIBUTED](#query_language-system-flush-distributed). Также есть возможность синхронно вставлять распределенные данные с помощью настройки [distributed_foreground_insert](../../operations/settings/settings.md#distributed_foreground_insert).
### STOP DISTRIBUTED SENDS {#query_language-system-stop-distributed-sends}

View File

@ -43,7 +43,7 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] AS [db2.]name2
**详见**
- [insert_distributed_sync](../../../operations/settings/settings.md#insert_distributed_sync) 设置
- [distributed_foreground_insert](../../../operations/settings/settings.md#distributed_foreground_insert) 设置
- [MergeTree](../../../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-multiple-volumes) 查看示例
**分布式设置**
@ -58,24 +58,24 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] AS [db2.]name2
- `max_delay_to_insert` - 最大延迟多少秒插入数据到分布式表如果有很多挂起字节异步发送。默认值60。
- `monitor_batch_inserts` - 等同于 [distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts)
- `background_insert_batch` - 等同于 [distributed_background_insert_batch](../../../operations/settings/settings.md#distributed_background_insert_batch)
- `monitor_split_batch_on_failure` - 等同于[distributed_directory_monitor_split_batch_on_failure](../../../operations/settings/settings.md#distributed_directory_monitor_split_batch_on_failure)
- `background_insert_split_batch_on_failure` - 等同于[distributed_background_insert_split_batch_on_failure](../../../operations/settings/settings.md#distributed_background_insert_split_batch_on_failure)
- `monitor_sleep_time_ms` - 等同于 [distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms)
- `background_insert_sleep_time_ms` - 等同于 [distributed_background_insert_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_sleep_time_ms)
- `monitor_max_sleep_time_ms` - 等同于 [distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms)
- `background_insert_max_sleep_time_ms` - 等同于 [distributed_background_insert_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_max_sleep_time_ms)
::note
**稳定性设置** (`fsync_...`):
- 只影响异步插入(例如:`insert_distributed_sync=false`), 当数据首先存储在启动节点磁盘上然后再异步发送到shard。
- 只影响异步插入(例如:`distributed_foreground_insert=false`), 当数据首先存储在启动节点磁盘上然后再异步发送到shard。
— 可能会显著降低`insert`的性能
- 影响将存储在分布式表文件夹中的数据写入 **接受您插入的节点** 。如果你需要保证写入数据到底层的MergeTree表中请参阅 `system.merge_tree_settings` 中的持久性设置(`...fsync...`)
**插入限制设置** (`..._insert`) 请见:
- [insert_distributed_sync](../../../operations/settings/settings.md#insert_distributed_sync) 设置
- [distributed_foreground_insert](../../../operations/settings/settings.md#distributed_foreground_insert) 设置
- [prefer_localhost_replica](../../../operations/settings/settings.md#settings-prefer-localhost-replica) 设置
- `bytes_to_throw_insert``bytes_to_delay_insert` 之前处理,所以你不应该设置它的值小于 `bytes_to_delay_insert`
:::
@ -209,7 +209,7 @@ SELECT 查询会被发送到所有分片,并且无论数据在分片中如何
- 使用需要特定键连接数据( IN 或 JOIN )的查询。如果数据是用该键进行分片,则应使用本地 IN 或 JOIN 而不是 GLOBAL IN 或 GLOBAL JOIN这样效率更高。
- 使用大量服务器(上百或更多),但有大量小查询(个别客户的查询 - 网站,广告商或合作伙伴)。为了使小查询不影响整个集群,让单个客户的数据处于单个分片上是有意义的。或者 你可以配置两级分片:将整个集群划分为«层»,一个层可以包含多个分片。单个客户的数据位于单个层上,根据需要将分片添加到层中,层中的数据随机分布。然后给每层创建分布式表,再创建一个全局的分布式表用于全局的查询。
数据是异步写入的。对于分布式表的 INSERT数据块只写本地文件系统。之后会尽快地在后台发送到远程服务器。发送数据的周期性是由[distributed_directory_monitor_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_sleep_time_ms)和[distributed_directory_monitor_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_directory_monitor_max_sleep_time_ms)设置。分布式引擎会分别发送每个插入数据的文件,但是你可以使用[distributed_directory_monitor_batch_inserts](../../../operations/settings/settings.md#distributed_directory_monitor_batch_inserts)设置启用批量发送文件。该设置通过更好地利用本地服务器和网络资源来提高集群性能。你应该检查表目录`/var/lib/clickhouse/data/database/table/`中的文件列表(等待发送的数据)来检查数据是否发送成功。执行后台任务的线程数可以通过[background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size)设置。
数据是异步写入的。对于分布式表的 INSERT数据块只写本地文件系统。之后会尽快地在后台发送到远程服务器。发送数据的周期性是由[distributed_background_insert_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_sleep_time_ms)和[distributed_background_insert_max_sleep_time_ms](../../../operations/settings/settings.md#distributed_background_insert_max_sleep_time_ms)设置。分布式引擎会分别发送每个插入数据的文件,但是你可以使用[distributed_background_insert_batch](../../../operations/settings/settings.md#distributed_background_insert_batch)设置启用批量发送文件。该设置通过更好地利用本地服务器和网络资源来提高集群性能。你应该检查表目录`/var/lib/clickhouse/data/database/table/`中的文件列表(等待发送的数据)来检查数据是否发送成功。执行后台任务的线程数可以通过[background_distributed_schedule_pool_size](../../../operations/settings/settings.md#background_distributed_schedule_pool_size)设置。
如果在 INSERT 到分布式表时服务器节点丢失或重启设备故障则插入的数据可能会丢失。如果在表目录中检测到损坏的数据分片则会将其转移到«broken»子目录并不再使用。

View File

@ -1088,7 +1088,7 @@ ClickHouse生成异常
- [表引擎分布式](../../engines/table-engines/special/distributed.md)
- [distributed_replica_error_half_life](#settings-distributed_replica_error_half_life)
## distributed_directory_monitor_sleep_time_ms {#distributed_directory_monitor_sleep_time_ms}
## distributed_background_insert_sleep_time_ms {#distributed_background_insert_sleep_time_ms}
对于基本间隔 [分布](../../engines/table-engines/special/distributed.md) 表引擎发送数据。 在发生错误时,实际间隔呈指数级增长。
@ -1098,9 +1098,9 @@ ClickHouse生成异常
默认值100毫秒。
## distributed_directory_monitor_max_sleep_time_ms {#distributed_directory_monitor_max_sleep_time_ms}
## distributed_background_insert_max_sleep_time_ms {#distributed_background_insert_max_sleep_time_ms}
的最大间隔 [分布](../../engines/table-engines/special/distributed.md) 表引擎发送数据。 限制在设置的区间的指数增长 [distributed_directory_monitor_sleep_time_ms](#distributed_directory_monitor_sleep_time_ms) 设置。
的最大间隔 [分布](../../engines/table-engines/special/distributed.md) 表引擎发送数据。 限制在设置的区间的指数增长 [distributed_background_insert_sleep_time_ms](#distributed_background_insert_sleep_time_ms) 设置。
可能的值:
@ -1108,7 +1108,7 @@ ClickHouse生成异常
默认值30000毫秒30秒
## distributed_directory_monitor_batch_inserts {#distributed_directory_monitor_batch_inserts}
## distributed_background_insert_batch {#distributed_background_insert_batch}
启用/禁用批量发送插入的数据。

View File

@ -100,7 +100,7 @@ clickhouse-copier --daemon --config zookeeper.xml --task-path /task/path --base-
<settings>
<connect_timeout>3</connect_timeout>
<!-- Sync insert is set forcibly, leave it here just in case. -->
<insert_distributed_sync>1</insert_distributed_sync>
<distributed_foreground_insert>1</distributed_foreground_insert>
</settings>
<!-- Copying tasks description.

View File

@ -93,7 +93,7 @@ SYSTEM RELOAD CONFIG [ON CLUSTER cluster_name]
## Managing Distributed Tables {#query-language-system-distributed}
ClickHouse可以管理 [distribute](../../engines/table-engines/special/distributed.md)表。当用户向这类表插入数据时ClickHouse首先为需要发送到集群节点的数据创建一个队列然后异步的发送它们。你可以维护队列的处理过程通过[STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends), [FLUSH DISTRIBUTED](#query_language-system-flush-distributed), 以及 [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends)。你也可以设置 `insert_distributed_sync`参数来以同步的方式插入分布式数据。
ClickHouse可以管理 [distribute](../../engines/table-engines/special/distributed.md)表。当用户向这类表插入数据时ClickHouse首先为需要发送到集群节点的数据创建一个队列然后异步的发送它们。你可以维护队列的处理过程通过[STOP DISTRIBUTED SENDS](#query_language-system-stop-distributed-sends), [FLUSH DISTRIBUTED](#query_language-system-flush-distributed), 以及 [START DISTRIBUTED SENDS](#query_language-system-start-distributed-sends)。你也可以设置 `distributed_foreground_insert`参数来以同步的方式插入分布式数据。
### STOP DISTRIBUTED SENDS {#query_language-system-stop-distributed-sends}

View File

@ -58,7 +58,7 @@ void DB::TaskCluster::reloadSettings(const Poco::Util::AbstractConfiguration & c
/// Override important settings
settings_pull.readonly = 1;
settings_pull.prefer_localhost_replica = false;
settings_push.insert_distributed_sync = true;
settings_push.distributed_foreground_insert = true;
settings_push.prefer_localhost_replica = false;
set_default_value(settings_pull.load_balancing, LoadBalancing::NEAREST_HOSTNAME);
@ -66,7 +66,7 @@ void DB::TaskCluster::reloadSettings(const Poco::Util::AbstractConfiguration & c
set_default_value(settings_pull.max_block_size, 8192UL);
set_default_value(settings_pull.preferred_block_size_bytes, 0);
set_default_value(settings_push.insert_distributed_timeout, 0);
set_default_value(settings_push.distributed_background_insert_timeout, 0);
set_default_value(settings_push.alter_sync, 2);
}

View File

@ -1106,7 +1106,7 @@ public:
{
if (isInteger(data_type))
{
if (isUnsignedInteger(data_type))
if (isUInt(data_type))
return std::make_unique<UnsignedIntegerModel>(seed);
else
return std::make_unique<SignedIntegerModel>(seed);

View File

@ -536,6 +536,16 @@ static void sanityChecks(Server & server)
{
}
try
{
const char * filename = "/proc/sys/kernel/task_delayacct";
if (readNumber(filename) == 0)
server.context()->addWarningMessage("Delay accounting is not enabled, OSIOWaitMicroseconds will not be gathered. Check " + String(filename));
}
catch (...) // NOLINT(bugprone-empty-catch)
{
}
std::string dev_id = getBlockDeviceId(data_path);
if (getBlockDeviceType(dev_id) == BlockDeviceType::ROT && getBlockDeviceReadAheadBytes(dev_id) == 0)
server.context()->addWarningMessage("Rotational disk with disabled readahead is in use. Performance can be degraded. Used for data: " + String(data_path));

View File

@ -1472,6 +1472,15 @@
<!-- <disable_internal_dns_cache>1</disable_internal_dns_cache> -->
<!-- You can also configure rocksdb like this: -->
<!-- Full list of options:
- options:
- https://github.com/facebook/rocksdb/blob/4b013dcbed2df84fde3901d7655b9b91c557454d/include/rocksdb/options.h#L1452
- column_family_options:
- https://github.com/facebook/rocksdb/blob/4b013dcbed2df84fde3901d7655b9b91c557454d/include/rocksdb/options.h#L66
- block_based_table_options:
- https://github.com/facebook/rocksdb/blob/4b013dcbed2df84fde3901d7655b9b91c557454d/table/block_based/block_based_table_factory.cc#L228
- https://github.com/facebook/rocksdb/blob/4b013dcbed2df84fde3901d7655b9b91c557454d/include/rocksdb/table.h#L129
-->
<!--
<rocksdb>
<options>
@ -1480,6 +1489,9 @@
<column_family_options>
<num_levels>2</num_levels>
</column_family_options>
<block_based_table_options>
<block_size>1024</block_size>
</block_based_table_options>
<tables>
<table>
<name>TABLE</name>
@ -1489,6 +1501,9 @@
<column_family_options>
<num_levels>2</num_levels>
</column_family_options>
<block_based_table_options>
<block_size>1024</block_size>
</block_based_table_options>
</table>
</tables>
</rocksdb>

View File

@ -84,7 +84,7 @@ public:
}
}
if (!isUnsignedInteger(arguments[1]))
if (!isUInt(arguments[1]))
throw Exception(ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT, "Second argument of aggregate function {} must be unsigned integer.", getName());
if (default_value.isNull())

View File

@ -151,10 +151,12 @@ public:
}
else if (BitmapKind::Bitmap == kind)
{
auto size = roaring_bitmap->getSizeInBytes();
std::unique_ptr<RoaringBitmap> bitmap = std::make_unique<RoaringBitmap>(*roaring_bitmap);
bitmap->runOptimize();
auto size = bitmap->getSizeInBytes();
writeVarUInt(size, out);
std::unique_ptr<char[]> buf(new char[size]);
roaring_bitmap->write(buf.get());
bitmap->write(buf.get());
out.write(buf.get(), size);
}
}

View File

@ -95,7 +95,7 @@ public:
size_t size = set.size();
writeVarUInt(size, buf);
for (const auto & elem : set)
writeIntBinary(elem, buf);
writeBinaryLittleEndian(elem.key, buf);
}
void deserialize(AggregateDataPtr __restrict place, ReadBuffer & buf, std::optional<size_t> /* version */, Arena *) const override

View File

@ -72,14 +72,14 @@ public:
{
writeBinary(has(), buf);
if (has())
writeBinary(value, buf);
writeBinaryLittleEndian(value, buf);
}
void read(ReadBuffer & buf, const ISerialization & /*serialization*/, Arena *)
{
readBinary(has_value, buf);
if (has())
readBinary(value, buf);
readBinaryLittleEndian(value, buf);
}
@ -1275,13 +1275,13 @@ struct AggregateFunctionAnyHeavyData : Data
void write(WriteBuffer & buf, const ISerialization & serialization) const
{
Data::write(buf, serialization);
writeBinary(counter, buf);
writeBinaryLittleEndian(counter, buf);
}
void read(ReadBuffer & buf, const ISerialization & serialization, Arena * arena)
{
Data::read(buf, serialization, arena);
readBinary(counter, buf);
readBinaryLittleEndian(counter, buf);
}
static const char * name() { return "anyHeavy"; }

View File

@ -238,7 +238,7 @@ public:
if constexpr (has_second_arg)
{
assertBinary(Name::name, types);
if (!isUnsignedInteger(types[1]))
if (!isUInt(types[1]))
throw Exception(
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT,
"Second argument (weight) for function {} must be unsigned integer, but it has type {}",

View File

@ -34,7 +34,7 @@ struct TheilsUData : CrossTabData
for (const auto & [key, value] : count_ab)
{
Float64 value_ab = value;
Float64 value_b = count_b.at(key.items[1]);
Float64 value_b = count_b.at(key.items[UInt128::_impl::little(1)]);
dep += (value_ab / count) * log(value_ab / value_b);
}

View File

@ -35,8 +35,8 @@ struct AggregateFunctionMapCombinatorData
using SearchType = KeyType;
std::unordered_map<KeyType, AggregateDataPtr> merged_maps;
static void writeKey(KeyType key, WriteBuffer & buf) { writeBinary(key, buf); }
static void readKey(KeyType & key, ReadBuffer & buf) { readBinary(key, buf); }
static void writeKey(KeyType key, WriteBuffer & buf) { writeBinaryLittleEndian(key, buf); }
static void readKey(KeyType & key, ReadBuffer & buf) { readBinaryLittleEndian(key, buf); }
};
template <>

View File

@ -98,8 +98,8 @@ struct CrossTabData
Float64 chi_squared = 0;
for (const auto & [key, value_ab] : count_ab)
{
Float64 value_a = count_a.at(key.items[0]);
Float64 value_b = count_b.at(key.items[1]);
Float64 value_a = count_a.at(key.items[UInt128::_impl::little(0)]);
Float64 value_b = count_b.at(key.items[UInt128::_impl::little(1)]);
Float64 expected_value_ab = (value_a * value_b) / count;

View File

@ -44,7 +44,7 @@ using FunctionOverloadResolverPtr = std::shared_ptr<IFunctionOverloadResolver>;
class FunctionNode;
using FunctionNodePtr = std::shared_ptr<FunctionNode>;
enum class FunctionKind
enum class FunctionKind : UInt8
{
UNKNOWN,
ORDINARY,

View File

@ -15,6 +15,11 @@
namespace DB
{
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
}
/** Visitor that traverse query tree in depth.
* Derived class must implement `visitImpl` method.
* Additionally subclass can control if child need to be visited using `needChildVisit` method, by
@ -99,7 +104,7 @@ using ConstInDepthQueryTreeVisitor = InDepthQueryTreeVisitor<Derived, true /*con
* 2. enterImpl This method is called before children are processed.
* 3. leaveImpl This method is called after children are processed.
*/
template <typename Derived, bool const_visitor = false>
template <typename Derived>
class InDepthQueryTreeVisitorWithContext
{
public:
@ -169,31 +174,45 @@ private:
return *static_cast<Derived *>(this);
}
bool shouldSkipSubtree(
void visitChildIfNeeded(
VisitQueryTreeNodeType & parent,
VisitQueryTreeNodeType & child,
size_t subtree_index)
VisitQueryTreeNodeType & child)
{
bool need_visit_child = getDerived().needChildVisit(parent, child);
if (!need_visit_child)
return true;
return;
// We do not visit ListNode with arguments of TableFunctionNode directly, because
// we need to know which arguments are in the unresolved state.
// It must be safe because we do not modify table function arguments list in any visitor.
if (auto * table_function_node = parent->as<TableFunctionNode>())
{
if (child != table_function_node->getArgumentsNode())
throw Exception(ErrorCodes::LOGICAL_ERROR, "TableFunctioNode is expected to have only one child node");
const auto & unresolved_indexes = table_function_node->getUnresolvedArgumentIndexes();
return std::find(unresolved_indexes.begin(), unresolved_indexes.end(), subtree_index) != unresolved_indexes.end();
size_t index = 0;
for (auto & argument : table_function_node->getArguments())
{
if (std::find(unresolved_indexes.begin(),
unresolved_indexes.end(),
index) == unresolved_indexes.end())
visit(argument);
++index;
}
return;
}
return false;
else
visit(child);
}
void visitChildren(VisitQueryTreeNodeType & expression)
{
size_t index = 0;
for (auto & child : expression->getChildren())
{
if (child && !shouldSkipSubtree(expression, child, index))
visit(child);
++index;
if (child)
visitChildIfNeeded(expression, child);
}
}

View File

@ -5,6 +5,7 @@
#include <Analyzer/InDepthQueryTreeVisitor.h>
#include <Analyzer/ConstantNode.h>
#include <Analyzer/FunctionNode.h>
#include <Analyzer/Utils.h>
namespace DB
{
@ -12,10 +13,13 @@ namespace DB
namespace
{
class IfConstantConditionVisitor : public InDepthQueryTreeVisitor<IfConstantConditionVisitor>
class IfConstantConditionVisitor : public InDepthQueryTreeVisitorWithContext<IfConstantConditionVisitor>
{
public:
static void visitImpl(QueryTreeNodePtr & node)
using Base = InDepthQueryTreeVisitorWithContext<IfConstantConditionVisitor>;
using Base::Base;
void enterImpl(QueryTreeNodePtr & node)
{
auto * function_node = node->as<FunctionNode>();
if (!function_node || (function_node->getFunctionName() != "if" && function_node->getFunctionName() != "multiIf"))
@ -40,18 +44,22 @@ public:
else
return;
QueryTreeNodePtr argument_node;
if (condition_boolean_value)
node = function_node->getArguments().getNodes()[1];
argument_node = function_node->getArguments().getNodes()[1];
else
node = function_node->getArguments().getNodes()[2];
argument_node = function_node->getArguments().getNodes()[2];
if (node->getResultType()->equals(*argument_node->getResultType()))
node = argument_node;
}
};
}
void IfConstantConditionPass::run(QueryTreeNodePtr query_tree_node, ContextPtr)
void IfConstantConditionPass::run(QueryTreeNodePtr query_tree_node, ContextPtr context)
{
IfConstantConditionVisitor visitor;
IfConstantConditionVisitor visitor(std::move(context));
visitor.visit(query_tree_node);
}

View File

@ -10,6 +10,8 @@
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypeLowCardinality.h>
#include <AggregateFunctions/AggregateFunctionFactory.h>
#include <Functions/FunctionHelpers.h>
#include <Functions/FunctionFactory.h>
@ -513,6 +515,31 @@ private:
bool has_function = false;
};
inline AggregateFunctionPtr resolveAggregateFunction(FunctionNode * function_node)
{
Array parameters;
for (const auto & param : function_node->getParameters())
{
auto * constant = param->as<ConstantNode>();
parameters.push_back(constant->getValue());
}
const auto & function_node_argument_nodes = function_node->getArguments().getNodes();
DataTypes argument_types;
argument_types.reserve(function_node_argument_nodes.size());
for (const auto & function_node_argument : function_node_argument_nodes)
argument_types.emplace_back(function_node_argument->getResultType());
AggregateFunctionProperties properties;
return AggregateFunctionFactory::instance().get(
function_node->getFunctionName(),
argument_types,
parameters,
properties);
}
}
bool hasFunctionNode(const QueryTreeNodePtr & node, std::string_view function_name)
@ -573,6 +600,32 @@ void replaceColumns(QueryTreeNodePtr & node,
visitor.visit(node);
}
void rerunFunctionResolve(FunctionNode * function_node, ContextPtr context)
{
if (!function_node->isResolved())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Trying to rerun resolve of unresolved function '{}'", function_node->getFunctionName());
const auto & name = function_node->getFunctionName();
if (function_node->isOrdinaryFunction())
{
// Special case, don't need to be resolved. It must be processed by GroupingFunctionsResolvePass.
if (name == "grouping")
return;
auto function = FunctionFactory::instance().get(name, context);
function_node->resolveAsFunction(function->build(function_node->getArgumentColumns()));
}
else if (function_node->isAggregateFunction())
{
if (name == "nothing")
return;
function_node->resolveAsAggregateFunction(resolveAggregateFunction(function_node));
}
else if (function_node->isWindowFunction())
{
function_node->resolveAsWindowFunction(resolveAggregateFunction(function_node));
}
}
namespace
{
@ -605,4 +658,5 @@ NameSet collectIdentifiersFullNames(const QueryTreeNodePtr & node)
visitor.visit(node);
return out;
}
}

View File

@ -7,6 +7,8 @@
namespace DB
{
class FunctionNode;
/// Returns true if node part of root tree, false otherwise
bool isNodePartOfTree(const IQueryTreeNode * node, const IQueryTreeNode * root);
@ -83,6 +85,10 @@ void replaceColumns(QueryTreeNodePtr & node,
const QueryTreeNodePtr & table_expression_node,
const std::unordered_map<std::string, QueryTreeNodePtr> & column_name_to_node);
/** Resolve function node again using it's content.
* This function should be called when arguments or parameters are changed.
*/
void rerunFunctionResolve(FunctionNode * function_node, ContextPtr context);
/// Just collect all identifiers from query tree
NameSet collectIdentifiersFullNames(const QueryTreeNodePtr & node);

View File

@ -43,6 +43,7 @@ public:
std::shared_ptr<IBackupCoordination> getBackupCoordination() const { return backup_coordination; }
const ReadSettings & getReadSettings() const { return read_settings; }
ContextPtr getContext() const { return context; }
const ZooKeeperRetriesInfo & getZooKeeperRetriesInfo() const { return global_zookeeper_retries_info; }
/// Adds a backup entry which will be later returned by run().
/// These function can be called by implementations of IStorage::backupData() in inherited storage classes.

View File

@ -4,6 +4,7 @@
#include <Poco/Net/NetException.h>
#include <Poco/Util/HelpFormatter.h>
#include <Common/ErrorHandlers.h>
#include <Common/SensitiveDataMasker.h>
#include <Common/StringUtils/StringUtils.h>
#include <Common/logger_useful.h>
@ -209,6 +210,9 @@ int IBridge::main(const std::vector<std::string> & /*args*/)
if (is_help)
return Application::EXIT_OK;
static ServerErrorHandler error_handler;
Poco::ErrorHandler::set(&error_handler);
registerFormats();
LOG_INFO(log, "Starting up {} on host: {}, port: {}", bridgeName(), hostname, port);

View File

@ -330,10 +330,6 @@ if (TARGET ch_contrib::cpuid)
target_link_libraries(clickhouse_common_io PRIVATE ch_contrib::cpuid)
endif()
if (TARGET ch_contrib::crc32_s390x)
target_link_libraries(clickhouse_common_io PUBLIC ch_contrib::crc32_s390x)
endif()
if (TARGET ch_contrib::crc32-vpmsum)
target_link_libraries(clickhouse_common_io PUBLIC ch_contrib::crc32-vpmsum)
endif()

View File

@ -13,8 +13,9 @@ namespace DB
static constexpr auto PROXY_HTTP_ENVIRONMENT_VARIABLE = "http_proxy";
static constexpr auto PROXY_HTTPS_ENVIRONMENT_VARIABLE = "https_proxy";
EnvironmentProxyConfigurationResolver::EnvironmentProxyConfigurationResolver(Protocol protocol_)
: protocol(protocol_)
EnvironmentProxyConfigurationResolver::EnvironmentProxyConfigurationResolver(
Protocol request_protocol_, bool disable_tunneling_for_https_requests_over_http_proxy_)
: ProxyConfigurationResolver(request_protocol_, disable_tunneling_for_https_requests_over_http_proxy_)
{}
namespace
@ -37,7 +38,7 @@ namespace
ProxyConfiguration EnvironmentProxyConfigurationResolver::resolve()
{
const auto * proxy_host = getProxyHost(protocol);
const auto * proxy_host = getProxyHost(request_protocol);
if (!proxy_host)
{
@ -54,7 +55,9 @@ ProxyConfiguration EnvironmentProxyConfigurationResolver::resolve()
return ProxyConfiguration {
host,
ProxyConfiguration::protocolFromString(scheme),
port
port,
useTunneling(request_protocol, ProxyConfiguration::protocolFromString(scheme), disable_tunneling_for_https_requests_over_http_proxy),
request_protocol
};
}

View File

@ -11,13 +11,10 @@ namespace DB
class EnvironmentProxyConfigurationResolver : public ProxyConfigurationResolver
{
public:
explicit EnvironmentProxyConfigurationResolver(Protocol protocol_);
EnvironmentProxyConfigurationResolver(Protocol request_protocol, bool disable_tunneling_for_https_requests_over_http_proxy_ = false);
ProxyConfiguration resolve() override;
void errorReport(const ProxyConfiguration &) override {}
private:
Protocol protocol;
};
}

View File

@ -586,6 +586,7 @@
M(704, CANNOT_USE_QUERY_CACHE_WITH_NONDETERMINISTIC_FUNCTIONS) \
M(705, TABLE_NOT_EMPTY) \
M(706, LIBSSH_ERROR) \
M(707, GCP_ERROR) \
M(999, KEEPER_EXCEPTION) \
M(1000, POCO_EXCEPTION) \
M(1001, STD_EXCEPTION) \

View File

@ -51,7 +51,7 @@ void abortOnFailedAssertion(const String & description)
}
bool terminate_on_any_exception = false;
static int terminate_status_code = 128 + SIGABRT;
thread_local bool update_error_statistics = true;
/// - Aborts the process if error code is LOGICAL_ERROR.
@ -92,7 +92,7 @@ Exception::Exception(const MessageMasked & msg_masked, int code, bool remote_)
, remote(remote_)
{
if (terminate_on_any_exception)
std::terminate();
std::_Exit(terminate_status_code);
capture_thread_frame_pointers = thread_frame_pointers;
handle_error_code(msg_masked.msg, code, remote, getStackFramePointers());
}
@ -102,7 +102,7 @@ Exception::Exception(MessageMasked && msg_masked, int code, bool remote_)
, remote(remote_)
{
if (terminate_on_any_exception)
std::terminate();
std::_Exit(terminate_status_code);
capture_thread_frame_pointers = thread_frame_pointers;
handle_error_code(message(), code, remote, getStackFramePointers());
}
@ -111,7 +111,7 @@ Exception::Exception(CreateFromPocoTag, const Poco::Exception & exc)
: Poco::Exception(exc.displayText(), ErrorCodes::POCO_EXCEPTION)
{
if (terminate_on_any_exception)
std::terminate();
std::_Exit(terminate_status_code);
capture_thread_frame_pointers = thread_frame_pointers;
#ifdef STD_EXCEPTION_HAS_STACK_TRACE
auto * stack_trace_frames = exc.get_stack_trace_frames();
@ -125,7 +125,7 @@ Exception::Exception(CreateFromSTDTag, const std::exception & exc)
: Poco::Exception(demangle(typeid(exc).name()) + ": " + String(exc.what()), ErrorCodes::STD_EXCEPTION)
{
if (terminate_on_any_exception)
std::terminate();
std::_Exit(terminate_status_code);
capture_thread_frame_pointers = thread_frame_pointers;
#ifdef STD_EXCEPTION_HAS_STACK_TRACE
auto * stack_trace_frames = exc.get_stack_trace_frames();

View File

@ -37,7 +37,9 @@ static struct InitFiu
ONCE(replicated_merge_tree_insert_quorum_fail_0) \
REGULAR(use_delayed_remote_source) \
REGULAR(cluster_discovery_faults) \
REGULAR(check_table_query_delay_for_part) \
REGULAR(dummy_failpoint) \
REGULAR(prefetched_reader_pool_failpoint) \
PAUSEABLE_ONCE(dummy_pausable_failpoint_once) \
PAUSEABLE(dummy_pausable_failpoint)

View File

@ -54,30 +54,7 @@ inline DB::UInt64 intHash64(DB::UInt64 x)
#endif
#if defined(__s390x__) && __BYTE_ORDER__==__ORDER_BIG_ENDIAN__
#include <crc32-s390x.h>
inline uint32_t s390x_crc32_u8(uint32_t crc, uint8_t v)
{
return crc32c_le_vx(crc, reinterpret_cast<unsigned char *>(&v), sizeof(v));
}
inline uint32_t s390x_crc32_u16(uint32_t crc, uint16_t v)
{
v = std::byteswap(v);
return crc32c_le_vx(crc, reinterpret_cast<unsigned char *>(&v), sizeof(v));
}
inline uint32_t s390x_crc32_u32(uint32_t crc, uint32_t v)
{
v = std::byteswap(v);
return crc32c_le_vx(crc, reinterpret_cast<unsigned char *>(&v), sizeof(v));
}
inline uint64_t s390x_crc32(uint64_t crc, uint64_t v)
{
v = std::byteswap(v);
return crc32c_le_vx(static_cast<uint32_t>(crc), reinterpret_cast<unsigned char *>(&v), sizeof(uint64_t));
}
#include <base/crc32c_s390x.h>
#endif
/// NOTE: Intel intrinsic can be confusing.
@ -92,7 +69,7 @@ inline DB::UInt64 intHashCRC32(DB::UInt64 x)
#elif (defined(__PPC64__) || defined(__powerpc64__)) && __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
return crc32_ppc(-1U, reinterpret_cast<const unsigned char *>(&x), sizeof(x));
#elif defined(__s390x__) && __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
return s390x_crc32(-1U, x);
return s390x_crc32c(-1U, x);
#else
/// On other platforms we do not have CRC32. NOTE This can be confusing.
/// NOTE: consider using intHash32()
@ -108,7 +85,7 @@ inline DB::UInt64 intHashCRC32(DB::UInt64 x, DB::UInt64 updated_value)
#elif (defined(__PPC64__) || defined(__powerpc64__)) && __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
return crc32_ppc(updated_value, reinterpret_cast<const unsigned char *>(&x), sizeof(x));
#elif defined(__s390x__) && __BYTE_ORDER__==__ORDER_BIG_ENDIAN__
return s390x_crc32(updated_value, x);
return s390x_crc32c(updated_value, x);
#else
/// On other platforms we do not have CRC32. NOTE This can be confusing.
return intHash64(x) ^ updated_value;
@ -358,7 +335,7 @@ struct UInt128Hash
{
size_t operator()(UInt128 x) const
{
return CityHash_v1_0_2::Hash128to64({x.items[0], x.items[1]});
return CityHash_v1_0_2::Hash128to64({x.items[UInt128::_impl::little(0)], x.items[UInt128::_impl::little(1)]});
}
};
@ -403,8 +380,8 @@ struct UInt128HashCRC32
size_t operator()(UInt128 x) const
{
UInt64 crc = -1ULL;
crc = s390x_crc32(crc, x.items[UInt128::_impl::little(0)]);
crc = s390x_crc32(crc, x.items[UInt128::_impl::little(1)]);
crc = s390x_crc32c(crc, x.items[UInt128::_impl::little(0)]);
crc = s390x_crc32c(crc, x.items[UInt128::_impl::little(1)]);
return crc;
}
};
@ -472,10 +449,10 @@ struct UInt256HashCRC32
size_t operator()(UInt256 x) const
{
UInt64 crc = -1ULL;
crc = s390x_crc32(crc, x.items[UInt256::_impl::little(0)]);
crc = s390x_crc32(crc, x.items[UInt256::_impl::little(1)]);
crc = s390x_crc32(crc, x.items[UInt256::_impl::little(2)]);
crc = s390x_crc32(crc, x.items[UInt256::_impl::little(3)]);
crc = s390x_crc32c(crc, x.items[UInt256::_impl::little(0)]);
crc = s390x_crc32c(crc, x.items[UInt256::_impl::little(1)]);
crc = s390x_crc32c(crc, x.items[UInt256::_impl::little(2)]);
crc = s390x_crc32c(crc, x.items[UInt256::_impl::little(3)]);
return crc;
}
};
@ -557,7 +534,10 @@ struct IntHash32
else if constexpr (sizeof(T) <= sizeof(UInt64))
{
DB::UInt64 out {0};
std::memcpy(&out, &key, sizeof(T));
if constexpr (std::endian::native == std::endian::little)
std::memcpy(&out, &key, sizeof(T));
else
std::memcpy(reinterpret_cast<char*>(&out) + sizeof(DB::UInt64) - sizeof(T), &key, sizeof(T));
return intHash32<salt>(out);
}

View File

@ -75,22 +75,22 @@ struct StringHashTableHash
size_t ALWAYS_INLINE operator()(StringKey8 key) const
{
size_t res = -1ULL;
res = s390x_crc32(res, key);
res = s390x_crc32c(res, key);
return res;
}
size_t ALWAYS_INLINE operator()(StringKey16 key) const
{
size_t res = -1ULL;
res = s390x_crc32(res, key.items[UInt128::_impl::little(0)]);
res = s390x_crc32(res, key.items[UInt128::_impl::little(1)]);
res = s390x_crc32c(res, key.items[UInt128::_impl::little(0)]);
res = s390x_crc32c(res, key.items[UInt128::_impl::little(1)]);
return res;
}
size_t ALWAYS_INLINE operator()(StringKey24 key) const
{
size_t res = -1ULL;
res = s390x_crc32(res, key.a);
res = s390x_crc32(res, key.b);
res = s390x_crc32(res, key.c);
res = s390x_crc32c(res, key.a);
res = s390x_crc32c(res, key.b);
res = s390x_crc32c(res, key.c);
return res;
}
#else

Some files were not shown because too many files have changed in this diff Show More