Merge branch 'master' into auto/v23.8.3.48-lts

This commit is contained in:
Mikhail f. Shiryaev 2023-09-29 11:34:57 +02:00 committed by GitHub
commit d8085c8d28
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
211 changed files with 3769 additions and 1404 deletions

View File

@ -17,6 +17,7 @@ on: # yamllint disable-line rule:truthy
- 'docker/docs/**'
- 'docs/**'
- 'utils/check-style/aspell-ignore/**'
- 'tests/ci/docs_check.py'
jobs:
CheckLabels:
runs-on: [self-hosted, style-checker]

View File

@ -17,6 +17,7 @@ on: # yamllint disable-line rule:truthy
- 'docker/docs/**'
- 'docs/**'
- 'utils/check-style/aspell-ignore/**'
- 'tests/ci/docs_check.py'
##########################################################################################
##################################### SMALL CHECKS #######################################
##########################################################################################

View File

@ -1,4 +1,5 @@
### Table of Contents
**[ClickHouse release v23.9, 2023-09-28](#239)**<br/>
**[ClickHouse release v23.8 LTS, 2023-08-31](#238)**<br/>
**[ClickHouse release v23.7, 2023-07-27](#237)**<br/>
**[ClickHouse release v23.6, 2023-06-30](#236)**<br/>
@ -11,6 +12,174 @@
# 2023 Changelog
### ClickHouse release 23.9, 2023-09-28
#### Backward Incompatible Change
* Remove the `status_info` configuration option and dictionaries status from the default Prometheus handler. [#54090](https://github.com/ClickHouse/ClickHouse/pull/54090) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* The experimental parts metadata cache is removed from the codebase. [#54215](https://github.com/ClickHouse/ClickHouse/pull/54215) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Disable setting `input_format_json_try_infer_numbers_from_strings` by default, so we don't try to infer numbers from strings in JSON formats by default to avoid possible parsing errors when sample data contains strings that looks like a number. [#55099](https://github.com/ClickHouse/ClickHouse/pull/55099) ([Kruglov Pavel](https://github.com/Avogar)).
#### New Feature
* Improve schema inference from JSON formats: 1) Now it's possible to infer named Tuples from JSON objects without experimantal JSON type under a setting `input_format_json_try_infer_named_tuples_from_objects` in JSON formats. Previously without experimantal type JSON we could only infer JSON objects as Strings or Maps, now we can infer named Tuple. Resulting Tuple type will conain all keys of objects that were read in data sample during schema inference. It can be useful for reading structured JSON data without sparse objects. The setting is enabled by default. 2) Allow parsing JSON array into a column with type String under setting `input_format_json_read_arrays_as_strings`. It can help reading arrays with values with different types. 3) Allow to use type String for JSON keys with unkown types (`null`/`[]`/`{}`) in sample data under setting `input_format_json_infer_incomplete_types_as_strings`. Now in JSON formats we can read any value into String column and we can avoid getting error `Cannot determine type for column 'column_name' by first 25000 rows of data, most likely this column contains only Nulls or empty Arrays/Maps` during schema inference by using type String for unknown types, so the data will be read successfully. [#54427](https://github.com/ClickHouse/ClickHouse/pull/54427) ([Kruglov Pavel](https://github.com/Avogar)).
* Added IO scheduling support for remote disks. Storage configuration for disk types `s3`, `s3_plain`, `hdfs` and `azure_blob_storage` can now contain `read_resource` and `write_resource` elements holding resource names. Scheduling policies for these resources can be configured in a separate server configuration section `resources`. Queries can be marked using setting `workload` and classified using server configuration section `workload_classifiers` to achieve diverse resource scheduling goals. More details in [the docs](https://clickhouse.com/docs/en/operations/workload-scheduling). [#47009](https://github.com/ClickHouse/ClickHouse/pull/47009) ([Sergei Trifonov](https://github.com/serxa)). Added "bandwidth_limit" IO scheduling node type. It allows you to specify `max_speed` and `max_burst` constraints on traffic passing though this node. [#54618](https://github.com/ClickHouse/ClickHouse/pull/54618) ([Sergei Trifonov](https://github.com/serxa)).
* Added new type of authentication based on SSH keys. It works only for the native TCP protocol. [#41109](https://github.com/ClickHouse/ClickHouse/pull/41109) ([George Gamezardashvili](https://github.com/InfJoker)).
* Added a new column `_block_number` for MergeTree tables. [#44532](https://github.com/ClickHouse/ClickHouse/issues/44532). [#47532](https://github.com/ClickHouse/ClickHouse/pull/47532) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Add `IF EMPTY` clause for `DROP TABLE` queries. [#48915](https://github.com/ClickHouse/ClickHouse/pull/48915) ([Pavel Novitskiy](https://github.com/pnovitskiy)).
* SQL functions `toString(datetime, timezone)` and `formatDateTime(datetime, format, timezone)` now support non-constant timezone arguments. [#53680](https://github.com/ClickHouse/ClickHouse/pull/53680) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Add support for `ALTER TABLE MODIFY COMMENT`. Note: something similar was added by an external contributor a long time ago, but the feature did not work at all and only confused users. This closes [#36377](https://github.com/ClickHouse/ClickHouse/issues/36377). [#51304](https://github.com/ClickHouse/ClickHouse/pull/51304) ([Alexey Milovidov](https://github.com/alexey-milovidov)). Note: this command does not propagate between replicas, so the replicas of a table could have different comments.
* Added `GCD` a.k.a. "greatest common denominator" as a new data compression codec. The codec computes the GCD of all column values, and then divides each value by the GCD. The GCD codec is a data preparation codec (similar to Delta and DoubleDelta) and cannot be used stand-alone. It works with data integer, decimal and date/time type. A viable use case for the GCD codec are column values that change (increase/decrease) in multiples of the GCD, e.g. 24 - 28 - 16 - 24 - 8 - 24 (assuming GCD = 4). [#53149](https://github.com/ClickHouse/ClickHouse/pull/53149) ([Alexander Nam](https://github.com/seshWCS)).
* Two new type aliases `DECIMAL(P)` (as shortcut for `DECIMAL(P, 0)` and `DECIMAL` (as shortcut for `DECIMAL(10, 0)`) were added. This makes ClickHouse more compatible with MySQL's SQL dialect. [#53328](https://github.com/ClickHouse/ClickHouse/pull/53328) ([Val Doroshchuk](https://github.com/valbok)).
* Added a new system log table `backup_log` to track all `BACKUP` and `RESTORE` operations. [#53638](https://github.com/ClickHouse/ClickHouse/pull/53638) ([Victor Krasnov](https://github.com/sirvickr)).
* Added a format setting `output_format_markdown_escape_special_characters` (default: false). The setting controls whether special characters like `!`, `#`, `$` etc. are escaped (i.e. prefixed by a backslash) in the `Markdown` output format. [#53860](https://github.com/ClickHouse/ClickHouse/pull/53860) ([irenjj](https://github.com/irenjj)).
* Add function `decodeHTMLComponent`. [#54097](https://github.com/ClickHouse/ClickHouse/pull/54097) ([Bharat Nallan](https://github.com/bharatnc)).
* Added `peak_threads_usage` to query_log table. [#54335](https://github.com/ClickHouse/ClickHouse/pull/54335) ([Alexey Gerasimchuck](https://github.com/Demilivor)).
* Add `SHOW FUNCTIONS` support to clickhouse-client. [#54337](https://github.com/ClickHouse/ClickHouse/pull/54337) ([Julia Kartseva](https://github.com/wat-ze-hex)).
* Added function `toDaysSinceYearZero` with alias `TO_DAYS` (for compatibility with MySQL) which returns the number of days passed since `0001-01-01` (in Proleptic Gregorian Calendar). [#54479](https://github.com/ClickHouse/ClickHouse/pull/54479) ([Robert Schulze](https://github.com/rschu1ze)). Function `toDaysSinceYearZero()` now supports arguments of type `DateTime` and `DateTime64`. [#54856](https://github.com/ClickHouse/ClickHouse/pull/54856) ([Serge Klochkov](https://github.com/slvrtrn)).
* Added functions `YYYYMMDDtoDate`, `YYYYMMDDtoDate32`, `YYYYMMDDhhmmssToDateTime` and `YYYYMMDDhhmmssToDateTime64`. They convert a date or date with time encoded as integer (e.g. 20230911) into a native date or date with time. As such, they provide the opposite functionality of existing functions `YYYYMMDDToDate`, `YYYYMMDDToDateTime`, `YYYYMMDDhhmmddToDateTime`, `YYYYMMDDhhmmddToDateTime64`. [#54509](https://github.com/ClickHouse/ClickHouse/pull/54509) ([Quanfa Fu](https://github.com/dentiscalprum)) ([Robert Schulze](https://github.com/rschu1ze)).
* Add several string distance functions, including `byteHammingDistance`, `editDistance`. [#54935](https://github.com/ClickHouse/ClickHouse/pull/54935) ([flynn](https://github.com/ucasfl)).
* Allow specifying the expiration date and, optionally, the time for user credentials with `VALID UNTIL datetime` clause. [#51261](https://github.com/ClickHouse/ClickHouse/pull/51261) ([Nikolay Degterinsky](https://github.com/evillique)).
* Allow S3-style URLs for table functions `s3`, `gcs`, `oss`. URL is automatically converted to HTTP. Example: `'s3://clickhouse-public-datasets/hits.csv'` is converted to `'https://clickhouse-public-datasets.s3.amazonaws.com/hits.csv'`. [#54931](https://github.com/ClickHouse/ClickHouse/pull/54931) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Add new setting `print_pretty_type_names` to print pretty deep nested types like Tuple/Maps/Arrays. [#55095](https://github.com/ClickHouse/ClickHouse/pull/55095) ([Kruglov Pavel](https://github.com/Avogar)).
#### Performance Improvement
* Speed up reading from S3 by enabling prefetches by default. [#53709](https://github.com/ClickHouse/ClickHouse/pull/53709) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Do not implicitly read PK and version columns in lonely parts if unnecessary for queries with FINAL. [#53919](https://github.com/ClickHouse/ClickHouse/pull/53919) ([Duc Canh Le](https://github.com/canhld94)).
* Optimize group by constant keys. Will optimize queries with group by `_file/_path` after https://github.com/ClickHouse/ClickHouse/pull/53529. [#53549](https://github.com/ClickHouse/ClickHouse/pull/53549) ([Kruglov Pavel](https://github.com/Avogar)).
* Improve performance of sorting for `Decimal` columns. Improve performance of insertion into `MergeTree` if ORDER BY contains a `Decimal` column. Improve performance of sorting when data is already sorted or almost sorted. [#35961](https://github.com/ClickHouse/ClickHouse/pull/35961) ([Maksim Kita](https://github.com/kitaisreal)).
* Improve performance for huge query analysis. Fixes [#51224](https://github.com/ClickHouse/ClickHouse/issues/51224). [#51469](https://github.com/ClickHouse/ClickHouse/pull/51469) ([frinkr](https://github.com/frinkr)).
* An optimization to rewrite `COUNT(DISTINCT ...)` and various `uniq` variants to `count` if it is selected from a subquery with GROUP BY. [#52082](https://github.com/ClickHouse/ClickHouse/pull/52082) [#52645](https://github.com/ClickHouse/ClickHouse/pull/52645) ([JackyWoo](https://github.com/JackyWoo)).
* Remove manual calls to `mmap/mremap/munmap` and delegate all this work to `jemalloc` - and it slightly improves performance. [#52792](https://github.com/ClickHouse/ClickHouse/pull/52792) ([Nikita Taranov](https://github.com/nickitat)).
* Fixed high in CPU consumption when working with NATS. [#54399](https://github.com/ClickHouse/ClickHouse/pull/54399) ([Vasilev Pyotr](https://github.com/vahpetr)).
* Since we use separate instructions for executing `toString()` with datetime argument, it is possible to improve performance a bit for non-datetime arguments and have some parts of the code cleaner. Follows up [#53680](https://github.com/ClickHouse/ClickHouse/issues/53680). [#54443](https://github.com/ClickHouse/ClickHouse/pull/54443) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Instead of serializing json elements into a `std::stringstream`, this PR try to put the serialization result into `ColumnString` direclty. [#54613](https://github.com/ClickHouse/ClickHouse/pull/54613) ([lgbo](https://github.com/lgbo-ustc)).
* Enable ORDER BY optimization for reading data in corresponding order from a MergeTree table in case that the table is behind a view. [#54628](https://github.com/ClickHouse/ClickHouse/pull/54628) ([Vitaly Baranov](https://github.com/vitlibar)).
* Improve JSON SQL functions by reusing `GeneratorJSONPath` and removing several shared pointers. [#54735](https://github.com/ClickHouse/ClickHouse/pull/54735) ([lgbo](https://github.com/lgbo-ustc)).
* Keeper tries to batch flush requests for better performance. [#53049](https://github.com/ClickHouse/ClickHouse/pull/53049) ([Antonio Andelic](https://github.com/antonio2368)).
* Now `clickhouse-client` processes files in parallel in case of `INFILE 'glob_expression'`. Closes [#54218](https://github.com/ClickHouse/ClickHouse/issues/54218). [#54533](https://github.com/ClickHouse/ClickHouse/pull/54533) ([Max K.](https://github.com/mkaynov)).
* Allow to use primary key for IN function where primary key column types are different from `IN` function right side column types. Example: `SELECT id FROM test_table WHERE id IN (SELECT '5')`. Closes [#48936](https://github.com/ClickHouse/ClickHouse/issues/48936). [#54544](https://github.com/ClickHouse/ClickHouse/pull/54544) ([Maksim Kita](https://github.com/kitaisreal)).
* Hash JOIN tries to shrink internal buffers consuming half of maximal available memory (set by `max_bytes_in_join`). [#54584](https://github.com/ClickHouse/ClickHouse/pull/54584) ([vdimir](https://github.com/vdimir)).
* Respect `max_block_size` for array join to avoid possible OOM. Close [#54290](https://github.com/ClickHouse/ClickHouse/issues/54290). [#54664](https://github.com/ClickHouse/ClickHouse/pull/54664) ([李扬](https://github.com/taiyang-li)).
* Reuse HTTP connections in the `s3` table function. [#54812](https://github.com/ClickHouse/ClickHouse/pull/54812) ([Michael Kolupaev](https://github.com/al13n321)).
* Replace the linear search in `MergeTreeRangeReader::Stream::ceilRowsToCompleteGranules` with a binary search. [#54869](https://github.com/ClickHouse/ClickHouse/pull/54869) ([usurai](https://github.com/usurai)).
#### Experimental Feature
* The creation of `Annoy` indexes can now be parallelized using setting `max_threads_for_annoy_index_creation`. [#54047](https://github.com/ClickHouse/ClickHouse/pull/54047) ([Robert Schulze](https://github.com/rschu1ze)).
* Parallel replicas over distributed don't read from all replicas [#54199](https://github.com/ClickHouse/ClickHouse/pull/54199) ([Igor Nikonov](https://github.com/devcrafter)).
#### Improvement
* Allow to replace long names of files of columns in `MergeTree` data parts to hashes of names. It helps to avoid `File name too long` error in some cases. [#50612](https://github.com/ClickHouse/ClickHouse/pull/50612) ([Anton Popov](https://github.com/CurtizJ)).
* Parse data in `JSON` format as `JSONEachRow` if failed to parse metadata. It will allow to read files with `.json` extension even if real format is JSONEachRow. Closes [#45740](https://github.com/ClickHouse/ClickHouse/issues/45740). [#54405](https://github.com/ClickHouse/ClickHouse/pull/54405) ([Kruglov Pavel](https://github.com/Avogar)).
* Output valid JSON/XML on excetpion during HTTP query execution. Add setting `http_write_exception_in_output_format` to enable/disable this behaviour (enabled by default). [#52853](https://github.com/ClickHouse/ClickHouse/pull/52853) ([Kruglov Pavel](https://github.com/Avogar)).
* View `information_schema.tables` now has a new field `data_length` which shows the approximate size of the data on disk. Required to run queries generated by Amazon QuickSight. [#55037](https://github.com/ClickHouse/ClickHouse/pull/55037) ([Robert Schulze](https://github.com/rschu1ze)).
* The MySQL interface gained a minimal implementation of prepared statements, just enough to allow a connection from Tableau Online to ClickHouse via the MySQL connector. [#54115](https://github.com/ClickHouse/ClickHouse/pull/54115) ([Serge Klochkov](https://github.com/slvrtrn)). Please note: the prepared statements implementation is pretty minimal, we do not support arguments binding yet, it is not required in this particular Tableau online use case. It will be implemented as a follow-up if necessary after extensive testing of Tableau Online in case we discover issues.
* Support case-insensitive and dot-all matching modes in `regexp_tree` dictionaries. [#50906](https://github.com/ClickHouse/ClickHouse/pull/50906) ([Johann Gan](https://github.com/johanngan)).
* Keeper improvement: Add a `createIfNotExists` Keeper command. [#48855](https://github.com/ClickHouse/ClickHouse/pull/48855) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* More precise integer type inference, fix [#51236](https://github.com/ClickHouse/ClickHouse/issues/51236). [#53003](https://github.com/ClickHouse/ClickHouse/pull/53003) ([Chen768959](https://github.com/Chen768959)).
* Introduced resolving of charsets in the string literals for MaterializedMySQL. [#53220](https://github.com/ClickHouse/ClickHouse/pull/53220) ([Val Doroshchuk](https://github.com/valbok)).
* Fix a subtle issue with a rarely used `EmbeddedRocksDB` table engine in an extremely rare scenario: sometimes the `EmbeddedRocksDB` table engine does not close files correctly in NFS after running `DROP TABLE`. [#53502](https://github.com/ClickHouse/ClickHouse/pull/53502) ([Mingliang Pan](https://github.com/liangliangpan)).
* `RESTORE TABLE ON CLUSTER` must create replicated tables with a matching UUID on hosts. Otherwise the macro `{uuid}` in ZooKeeper path can't work correctly after RESTORE. This PR implements that. [#53765](https://github.com/ClickHouse/ClickHouse/pull/53765) ([Vitaly Baranov](https://github.com/vitlibar)).
* Added restore setting `restore_broken_parts_as_detached`: if it's true the RESTORE process won't stop on broken parts while restoring, instead all the broken parts will be copied to the `detached` folder with the prefix `broken-from-backup'. If it's false the RESTORE process will stop on the first broken part (if any). The default value is false. [#53877](https://github.com/ClickHouse/ClickHouse/pull/53877) ([Vitaly Baranov](https://github.com/vitlibar)).
* Add `elapsed_ns` field to HTTP headers X-ClickHouse-Progress and X-ClickHouse-Summary. [#54179](https://github.com/ClickHouse/ClickHouse/pull/54179) ([joelynch](https://github.com/joelynch)).
* Implementation of `reconfig` (https://github.com/ClickHouse/ClickHouse/pull/49450), `sync`, and `exists` commands for keeper-client. [#54201](https://github.com/ClickHouse/ClickHouse/pull/54201) ([pufit](https://github.com/pufit)).
* `clickhouse-local` and `clickhouse-client` now allow to specify the `--query` parameter multiple times, e.g. `./clickhouse-client --query "SELECT 1" --query "SELECT 2"`. This syntax is slightly more intuitive than `./clickhouse-client --multiquery "SELECT 1;S ELECT 2"`, a bit easier to script (e.g. `queries.push_back('--query "$q"')`) and more consistent with the behavior of existing parameter `--queries-file` (e.g. `./clickhouse client --queries-file queries1.sql --queries-file queries2.sql`). [#54249](https://github.com/ClickHouse/ClickHouse/pull/54249) ([Robert Schulze](https://github.com/rschu1ze)).
* Add sub-second precision to `formatReadableTimeDelta`. [#54250](https://github.com/ClickHouse/ClickHouse/pull/54250) ([Andrey Zvonov](https://github.com/zvonand)).
* Enable `allow_remove_stale_moving_parts` by default. [#54260](https://github.com/ClickHouse/ClickHouse/pull/54260) ([vdimir](https://github.com/vdimir)).
* Fix using count from cache and improve progress bar for reading from archives. [#54271](https://github.com/ClickHouse/ClickHouse/pull/54271) ([Kruglov Pavel](https://github.com/Avogar)).
* Add support for S3 credentials using SSO. To define a profile to be used with SSO, set `AWS_PROFILE` environment variable. [#54347](https://github.com/ClickHouse/ClickHouse/pull/54347) ([Antonio Andelic](https://github.com/antonio2368)).
* Support NULL as default for nested types Array/Tuple/Map for input formats. Closes [#51100](https://github.com/ClickHouse/ClickHouse/issues/51100). [#54351](https://github.com/ClickHouse/ClickHouse/pull/54351) ([Kruglov Pavel](https://github.com/Avogar)).
* Allow reading some unusual configuration of chunks from Arrow/Parquet formats. [#54370](https://github.com/ClickHouse/ClickHouse/pull/54370) ([Arthur Passos](https://github.com/arthurpassos)).
* Add `STD` alias to `stddevPop` function for MySQL compatibility. Closes [#54274](https://github.com/ClickHouse/ClickHouse/issues/54274). [#54382](https://github.com/ClickHouse/ClickHouse/pull/54382) ([Nikolay Degterinsky](https://github.com/evillique)).
* Add `addDate` function for compatibility with MySQL and `subDate` for consistency. Reference [#54275](https://github.com/ClickHouse/ClickHouse/issues/54275). [#54400](https://github.com/ClickHouse/ClickHouse/pull/54400) ([Nikolay Degterinsky](https://github.com/evillique)).
* Support `SAMPLE BY` for views. [#54477](https://github.com/ClickHouse/ClickHouse/pull/54477) ([Azat Khuzhin](https://github.com/azat)).
* Add `modification_time` into `system.detached_parts`. [#54506](https://github.com/ClickHouse/ClickHouse/pull/54506) ([Azat Khuzhin](https://github.com/azat)).
* Added a setting `splitby_max_substrings_includes_remaining_string` which controls if functions "splitBy*()" with argument "max_substring" > 0 include the remaining string (if any) in the result array (Python/Spark semantics) or not. The default behavior does not change. [#54518](https://github.com/ClickHouse/ClickHouse/pull/54518) ([Robert Schulze](https://github.com/rschu1ze)).
* Better integer types inference for `Int64`/`UInt64` fields. Continuation of [#53003](https://github.com/ClickHouse/ClickHouse/pull/53003). Now it works also for nested types like Arrays of Arrays and for functions like `map/tuple`. Issue: [#51236](https://github.com/ClickHouse/ClickHouse/issues/51236). [#54553](https://github.com/ClickHouse/ClickHouse/pull/54553) ([Kruglov Pavel](https://github.com/Avogar)).
* Added array operations for multiplying, dividing and modulo on scalar. Works in each way, for example `5 * [5, 5]` and `[5, 5] * 5` - both cases are possible. [#54608](https://github.com/ClickHouse/ClickHouse/pull/54608) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Add optional `version` argument to `rm` command in `keeper-client` to support safer deletes. [#54708](https://github.com/ClickHouse/ClickHouse/pull/54708) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Disable killing the server by systemd (that may lead to data loss when using Buffer tables). [#54744](https://github.com/ClickHouse/ClickHouse/pull/54744) ([Azat Khuzhin](https://github.com/azat)).
* Added field `is_deterministic` to system table `system.functions` which indicates whether the result of a function is stable between two invocations (given exactly the same inputs) or not. [#54766](https://github.com/ClickHouse/ClickHouse/pull/54766) [#55035](https://github.com/ClickHouse/ClickHouse/pull/55035) ([Robert Schulze](https://github.com/rschu1ze)).
* Made the views in schema `information_schema` more compatible with the equivalent views in MySQL (i.e. modified and extended them) up to a point where Tableau Online is able to connect to ClickHouse. More specifically: 1. The type of field `information_schema.tables.table_type` changed from Enum8 to String. 2. Added fields `table_comment` and `table_collation` to view `information_schema.table`. 3. Added views `information_schema.key_column_usage` and `referential_constraints`. 4. Replaced uppercase aliases in `information_schema` views with concrete uppercase columns. [#54773](https://github.com/ClickHouse/ClickHouse/pull/54773) ([Serge Klochkov](https://github.com/slvrtrn)).
* The query cache now returns an error if the user tries to cache the result of a query with a non-deterministic function such as `now`, `randomString` and `dictGet`. Compared to the previous behavior (silently don't cache the result), this reduces confusion and surprise for users. [#54801](https://github.com/ClickHouse/ClickHouse/pull/54801) ([Robert Schulze](https://github.com/rschu1ze)).
* Forbid special columns like materialized/ephemeral/alias for `file`/`s3`/`url`/... storages, fix insert into ephemeral columns from files. Closes [#53477](https://github.com/ClickHouse/ClickHouse/issues/53477). [#54803](https://github.com/ClickHouse/ClickHouse/pull/54803) ([Kruglov Pavel](https://github.com/Avogar)).
* More configurable collecting metadata for backup. [#54804](https://github.com/ClickHouse/ClickHouse/pull/54804) ([Vitaly Baranov](https://github.com/vitlibar)).
* `clickhouse-local`'s log file (if enabled with --server_logs_file flag) will now prefix each line with timestamp, thread id, etc, just like `clickhouse-server`. [#54807](https://github.com/ClickHouse/ClickHouse/pull/54807) ([Michael Kolupaev](https://github.com/al13n321)).
* Field `is_obsolete` in the `system.merge_tree_settings` table - it is now 1 for obsolete merge tree settings. Previously, only the description indicated that the setting is obsolete. [#54837](https://github.com/ClickHouse/ClickHouse/pull/54837) ([Robert Schulze](https://github.com/rschu1ze)).
* Make it possible to use plural when using interval literals. `INTERVAL 2 HOURS` should be equivalent to `INTERVAL 2 HOUR`. [#54860](https://github.com/ClickHouse/ClickHouse/pull/54860) ([Jordi Villar](https://github.com/jrdi)).
* Always allow the creation of a projection with `Nullable` PK. This fixes [#54814](https://github.com/ClickHouse/ClickHouse/issues/54814). [#54895](https://github.com/ClickHouse/ClickHouse/pull/54895) ([Amos Bird](https://github.com/amosbird)).
* Retry backup's S3 operations after connection reset failure. [#54900](https://github.com/ClickHouse/ClickHouse/pull/54900) ([Vitaly Baranov](https://github.com/vitlibar)).
* Make the exception message exact in case of the maximum value of a settings is less than the minimum value. [#54925](https://github.com/ClickHouse/ClickHouse/pull/54925) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* `LIKE`, `match`, and other regular expressions matching functions now allow matching with patterns containing non-UTF-8 substrings by falling back to binary matching. Example: you can use `string LIKE '\xFE\xFF%'` to detect BOM. This closes [#54486](https://github.com/ClickHouse/ClickHouse/issues/54486). [#54942](https://github.com/ClickHouse/ClickHouse/pull/54942) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Added `ContextLockWaitMicroseconds` profile event. [#55029](https://github.com/ClickHouse/ClickHouse/pull/55029) ([Maksim Kita](https://github.com/kitaisreal)).
* The Keeper dynamically adjusts log levels. [#50372](https://github.com/ClickHouse/ClickHouse/pull/50372) ([helifu](https://github.com/helifu)).
* Added function `timestamp` for compatibility with MySQL. Closes [#54275](https://github.com/ClickHouse/ClickHouse/issues/54275). [#54639](https://github.com/ClickHouse/ClickHouse/pull/54639) ([Nikolay Degterinsky](https://github.com/evillique)).
#### Build/Testing/Packaging Improvement
* Bumped the compiler of official and continuous integration builds of ClickHouse from Clang 16 to 17. [#53831](https://github.com/ClickHouse/ClickHouse/pull/53831) ([Robert Schulze](https://github.com/rschu1ze)).
* Regenerated tld data for lookups (`tldLookup.generated.cpp`). [#54269](https://github.com/ClickHouse/ClickHouse/pull/54269) ([Bharat Nallan](https://github.com/bharatnc)).
* Remove the redundant `clickhouse-keeper-client` symlink. [#54587](https://github.com/ClickHouse/ClickHouse/pull/54587) ([Tomas Barton](https://github.com/deric)).
* Use `/usr/bin/env` to resolve bash - now it supports Nix OS. [#54603](https://github.com/ClickHouse/ClickHouse/pull/54603) ([Fionera](https://github.com/fionera)).
* CMake added `PROFILE_CPU` option needed to perform `perf record` without using a DWARF call graph. [#54917](https://github.com/ClickHouse/ClickHouse/pull/54917) ([Maksim Kita](https://github.com/kitaisreal)).
* If the linker is different than LLD, stop with a fatal error. [#55036](https://github.com/ClickHouse/ClickHouse/pull/55036) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Replaced the library to handle (encode/decode) base64 values from Turbo-Base64 to aklomp-base64. Both are SIMD-accelerated on x86 and ARM but 1. the license of the latter (BSD-2) is more favorable for ClickHouse, Turbo64 switched in the meantime to GPL-3, 2. with more GitHub stars, aklomp-base64 seems more future-proof, 3. aklomp-base64 has a slightly nicer API (which is arguably subjective), and 4. aklomp-base64 does not require us to hack around bugs (like non-threadsafe initialization). Note: aklomp-base64 rejects unpadded base64 values whereas Turbo-Base64 decodes them on a best-effort basis. RFC-4648 leaves it open whether padding is mandatory or not, but depending on the context this may be a behavioral change to be aware of. [#54119](https://github.com/ClickHouse/ClickHouse/pull/54119) ([Mikhail Koviazin](https://github.com/mkmkme)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Fix REPLACE/MOVE PARTITION with zero-copy replication (note: "zero-copy replication" is an experimental feature) [#54193](https://github.com/ClickHouse/ClickHouse/pull/54193) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix zero copy locks with hardlinks (note: "zero-copy replication" is an experimental feature) [#54859](https://github.com/ClickHouse/ClickHouse/pull/54859) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix zero copy garbage (note: "zero-copy replication" is an experimental feature) [#54550](https://github.com/ClickHouse/ClickHouse/pull/54550) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Pass HTTP retry timeout as milliseconds (it was incorrect before). [#54438](https://github.com/ClickHouse/ClickHouse/pull/54438) ([Duc Canh Le](https://github.com/canhld94)).
* Fix misleading error message in OUTFILE with `CapnProto`/`Protobuf` [#52870](https://github.com/ClickHouse/ClickHouse/pull/52870) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix summary reporting with parallel replicas with LIMIT [#53050](https://github.com/ClickHouse/ClickHouse/pull/53050) ([Raúl Marín](https://github.com/Algunenano)).
* Fix throttling of BACKUPs from/to S3 (in case native copy was not used) and in some other places as well [#53336](https://github.com/ClickHouse/ClickHouse/pull/53336) ([Azat Khuzhin](https://github.com/azat)).
* Fix IO throttling during copying whole directories [#53338](https://github.com/ClickHouse/ClickHouse/pull/53338) ([Azat Khuzhin](https://github.com/azat)).
* Fix: moved to prewhere condition actions can lose column [#53492](https://github.com/ClickHouse/ClickHouse/pull/53492) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fixed internal error when replacing with byte-equal parts [#53735](https://github.com/ClickHouse/ClickHouse/pull/53735) ([Pedro Riera](https://github.com/priera)).
* Fix: require columns participating in interpolate expression [#53754](https://github.com/ClickHouse/ClickHouse/pull/53754) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix cluster discovery initialization + setting up fail points in config [#54113](https://github.com/ClickHouse/ClickHouse/pull/54113) ([vdimir](https://github.com/vdimir)).
* Fix issues in `accurateCastOrNull` [#54136](https://github.com/ClickHouse/ClickHouse/pull/54136) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Fix nullable primary key with the FINAL modifier [#54164](https://github.com/ClickHouse/ClickHouse/pull/54164) ([Amos Bird](https://github.com/amosbird)).
* Fixed error that prevented insertion in replicated materialized view of new data in presence of duplicated data. [#54184](https://github.com/ClickHouse/ClickHouse/pull/54184) ([Pedro Riera](https://github.com/priera)).
* Fix: allow `IPv6` for bloom filter [#54200](https://github.com/ClickHouse/ClickHouse/pull/54200) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* fix possible type mismatch with `IPv4` [#54212](https://github.com/ClickHouse/ClickHouse/pull/54212) ([Bharat Nallan](https://github.com/bharatnc)).
* Fix `system.data_skipping_indices` for recreated indices [#54225](https://github.com/ClickHouse/ClickHouse/pull/54225) ([Artur Malchanau](https://github.com/Hexta)).
* fix name clash for multiple join rewriter v2 [#54240](https://github.com/ClickHouse/ClickHouse/pull/54240) ([Tao Wang](https://github.com/wangtZJU)).
* Fix unexpected errors in `system.errors` after join [#54306](https://github.com/ClickHouse/ClickHouse/pull/54306) ([vdimir](https://github.com/vdimir)).
* Fix `isZeroOrNull(NULL)` [#54316](https://github.com/ClickHouse/ClickHouse/pull/54316) ([flynn](https://github.com/ucasfl)).
* Fix: parallel replicas over distributed with `prefer_localhost_replica` = 1 [#54334](https://github.com/ClickHouse/ClickHouse/pull/54334) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix logical error in vertical merge + replacing merge tree + optimize cleanup [#54368](https://github.com/ClickHouse/ClickHouse/pull/54368) ([alesapin](https://github.com/alesapin)).
* Fix possible error `URI contains invalid characters` in the `s3` table function [#54373](https://github.com/ClickHouse/ClickHouse/pull/54373) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix segfault in AST optimization of `arrayExists` function [#54379](https://github.com/ClickHouse/ClickHouse/pull/54379) ([Nikolay Degterinsky](https://github.com/evillique)).
* Check for overflow before addition in `analysisOfVariance` function [#54385](https://github.com/ClickHouse/ClickHouse/pull/54385) ([Antonio Andelic](https://github.com/antonio2368)).
* Reproduce and fix the bug in removeSharedRecursive [#54430](https://github.com/ClickHouse/ClickHouse/pull/54430) ([Sema Checherinda](https://github.com/CheSema)).
* Fix possible incorrect result with SimpleAggregateFunction in PREWHERE and FINAL [#54436](https://github.com/ClickHouse/ClickHouse/pull/54436) ([Azat Khuzhin](https://github.com/azat)).
* Fix filtering parts with indexHint for non analyzer [#54449](https://github.com/ClickHouse/ClickHouse/pull/54449) ([Azat Khuzhin](https://github.com/azat)).
* Fix aggregate projections with normalized states [#54480](https://github.com/ClickHouse/ClickHouse/pull/54480) ([Amos Bird](https://github.com/amosbird)).
* `clickhouse-local`: something for multiquery parameter [#54498](https://github.com/ClickHouse/ClickHouse/pull/54498) ([CuiShuoGuo](https://github.com/bakam412)).
* `clickhouse-local` supports `--database` command line argument [#54503](https://github.com/ClickHouse/ClickHouse/pull/54503) ([vdimir](https://github.com/vdimir)).
* Fix possible parsing error in `-WithNames` formats with disabled `input_format_with_names_use_header` [#54513](https://github.com/ClickHouse/ClickHouse/pull/54513) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix rare case of CHECKSUM_DOESNT_MATCH error [#54549](https://github.com/ClickHouse/ClickHouse/pull/54549) ([alesapin](https://github.com/alesapin)).
* Fix sorting of UNION ALL of already sorted results [#54564](https://github.com/ClickHouse/ClickHouse/pull/54564) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix snapshot install in Keeper [#54572](https://github.com/ClickHouse/ClickHouse/pull/54572) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix race in `ColumnUnique` [#54575](https://github.com/ClickHouse/ClickHouse/pull/54575) ([Nikita Taranov](https://github.com/nickitat)).
* Annoy/Usearch index: Fix LOGICAL_ERROR during build-up with default values [#54600](https://github.com/ClickHouse/ClickHouse/pull/54600) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix serialization of `ColumnDecimal` [#54601](https://github.com/ClickHouse/ClickHouse/pull/54601) ([Nikita Taranov](https://github.com/nickitat)).
* Fix schema inference for *Cluster functions for column names with spaces [#54635](https://github.com/ClickHouse/ClickHouse/pull/54635) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix using structure from insertion tables in case of defaults and explicit insert columns [#54655](https://github.com/ClickHouse/ClickHouse/pull/54655) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix: avoid using regex match, possibly containing alternation, as a key condition. [#54696](https://github.com/ClickHouse/ClickHouse/pull/54696) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix ReplacingMergeTree with vertical merge and cleanup [#54706](https://github.com/ClickHouse/ClickHouse/pull/54706) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Fix virtual columns having incorrect values after ORDER BY [#54811](https://github.com/ClickHouse/ClickHouse/pull/54811) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix filtering parts with indexHint for non analyzer [#54825](https://github.com/ClickHouse/ClickHouse/pull/54825) [#54449](https://github.com/ClickHouse/ClickHouse/pull/54449) ([Azat Khuzhin](https://github.com/azat)).
* Fix Keeper segfault during shutdown [#54841](https://github.com/ClickHouse/ClickHouse/pull/54841) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix `Invalid number of rows in Chunk` in MaterializedPostgreSQL [#54844](https://github.com/ClickHouse/ClickHouse/pull/54844) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Move obsolete format settings to separate section [#54855](https://github.com/ClickHouse/ClickHouse/pull/54855) ([Kruglov Pavel](https://github.com/Avogar)).
* Rebuild `minmax_count_projection` when partition key gets modified [#54943](https://github.com/ClickHouse/ClickHouse/pull/54943) ([Amos Bird](https://github.com/amosbird)).
* Fix bad cast to `ColumnVector<Int128>` in function `if` [#55019](https://github.com/ClickHouse/ClickHouse/pull/55019) ([Kruglov Pavel](https://github.com/Avogar)).
* Prevent attaching parts from tables with different projections or indices [#55062](https://github.com/ClickHouse/ClickHouse/pull/55062) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Store NULL in scalar result map for empty subquery result [#52240](https://github.com/ClickHouse/ClickHouse/pull/52240) ([vdimir](https://github.com/vdimir)).
* Fix `FINAL` produces invalid read ranges in a rare case [#54934](https://github.com/ClickHouse/ClickHouse/pull/54934) ([Nikita Taranov](https://github.com/nickitat)).
* Fix: insert quorum w/o keeper retries [#55026](https://github.com/ClickHouse/ClickHouse/pull/55026) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix simple state with nullable [#55030](https://github.com/ClickHouse/ClickHouse/pull/55030) ([Pedro Riera](https://github.com/priera)).
### <a id="238"></a> ClickHouse release 23.8 LTS, 2023-08-31
#### Backward Incompatible Change

View File

@ -13,9 +13,10 @@ The following versions of ClickHouse server are currently being supported with s
| Version | Supported |
|:-|:-|
| 23.9 | ✔️ |
| 23.8 | ✔️ |
| 23.7 | ✔️ |
| 23.6 | ✔️ |
| 23.6 | |
| 23.5 | ❌ |
| 23.4 | ❌ |
| 23.3 | ✔️ |

View File

@ -2,11 +2,11 @@
# NOTE: has nothing common with DBMS_TCP_PROTOCOL_VERSION,
# only DBMS_TCP_PROTOCOL_VERSION should be incremented on protocol changes.
SET(VERSION_REVISION 54478)
SET(VERSION_REVISION 54479)
SET(VERSION_MAJOR 23)
SET(VERSION_MINOR 9)
SET(VERSION_MINOR 10)
SET(VERSION_PATCH 1)
SET(VERSION_GITHASH ebc7d9a9f3b40be89e0b3e738b35d394aabeea3e)
SET(VERSION_DESCRIBE v23.9.1.1-testing)
SET(VERSION_STRING 23.9.1.1)
SET(VERSION_GITHASH 8f9a227de1f530cdbda52c145d41a6b0f1d29961)
SET(VERSION_DESCRIBE v23.10.1.1-testing)
SET(VERSION_STRING 23.10.1.1)
# end of autochange

View File

@ -12,7 +12,7 @@ endif()
set(COMPILER_CACHE "auto" CACHE STRING "Speedup re-compilations using the caching tools; valid options are 'auto' (ccache, then sccache), 'ccache', 'sccache', or 'disabled'")
if(COMPILER_CACHE STREQUAL "auto")
find_program (CCACHE_EXECUTABLE ccache sccache)
find_program (CCACHE_EXECUTABLE NAMES ccache sccache)
elseif (COMPILER_CACHE STREQUAL "ccache")
find_program (CCACHE_EXECUTABLE ccache)
elseif(COMPILER_CACHE STREQUAL "sccache")

2
contrib/googletest vendored

@ -1 +1 @@
Subproject commit 71140c3ca7a87bb1b5b9c9f1500fea8858cce344
Subproject commit e47544ad31cb3ceecd04cc13e8fe556f8df9fe0b

View File

@ -32,7 +32,7 @@ RUN arch=${TARGETARCH:-amd64} \
esac
ARG REPOSITORY="https://s3.amazonaws.com/clickhouse-builds/22.4/31c367d3cd3aefd316778601ff6565119fe36682/package_release"
ARG VERSION="23.8.3.48"
ARG VERSION="23.9.1.1854"
ARG PACKAGES="clickhouse-keeper"
# user/group precreated explicitly with fixed uid/gid on purpose.

View File

@ -26,9 +26,6 @@ fi
mkdir -p /build/build_docker
cd /build/build_docker
rm -f CMakeCache.txt
# Read cmake arguments into array (possibly empty)
read -ra CMAKE_FLAGS <<< "${CMAKE_FLAGS:-}"
env
if [ -n "$MAKE_DEB" ]; then
rm -rf /build/packages/root
@ -72,11 +69,17 @@ else
echo "There are no subcommands to execute :)"
fi
# Read cmake arguments into array (possibly empty)
# The name of local variable has to be different from the name of environment variable
# not to override it. And make it usable for other processes.
read -ra CMAKE_FLAGS_ARRAY <<< "${CMAKE_FLAGS:-}"
env
if [ "$BUILD_MUSL_KEEPER" == "1" ]
then
# build keeper with musl separately
# and without rust bindings
cmake --debug-trycompile -DENABLE_RUST=OFF -DBUILD_STANDALONE_KEEPER=1 -DENABLE_CLICKHOUSE_KEEPER=1 -DCMAKE_VERBOSE_MAKEFILE=1 -DUSE_MUSL=1 -LA -DCMAKE_TOOLCHAIN_FILE=/build/cmake/linux/toolchain-x86_64-musl.cmake "-DCMAKE_BUILD_TYPE=$BUILD_TYPE" "-DSANITIZE=$SANITIZER" -DENABLE_CHECK_HEAVY_BUILDS=1 "${CMAKE_FLAGS[@]}" ..
cmake --debug-trycompile -DENABLE_RUST=OFF -DBUILD_STANDALONE_KEEPER=1 -DENABLE_CLICKHOUSE_KEEPER=1 -DCMAKE_VERBOSE_MAKEFILE=1 -DUSE_MUSL=1 -LA -DCMAKE_TOOLCHAIN_FILE=/build/cmake/linux/toolchain-x86_64-musl.cmake "-DCMAKE_BUILD_TYPE=$BUILD_TYPE" "-DSANITIZE=$SANITIZER" -DENABLE_CHECK_HEAVY_BUILDS=1 "${CMAKE_FLAGS_ARRAY[@]}" ..
# shellcheck disable=SC2086 # No quotes because I want it to expand to nothing if empty.
ninja $NINJA_FLAGS clickhouse-keeper
@ -91,11 +94,11 @@ then
rm -f CMakeCache.txt
# Modify CMake flags, so we won't overwrite standalone keeper with symlinks
CMAKE_FLAGS+=(-DBUILD_STANDALONE_KEEPER=0 -DCREATE_KEEPER_SYMLINK=0)
CMAKE_FLAGS_ARRAY+=(-DBUILD_STANDALONE_KEEPER=0 -DCREATE_KEEPER_SYMLINK=0)
fi
# Build everything
cmake --debug-trycompile -DCMAKE_VERBOSE_MAKEFILE=1 -LA "-DCMAKE_BUILD_TYPE=$BUILD_TYPE" "-DSANITIZE=$SANITIZER" -DENABLE_CHECK_HEAVY_BUILDS=1 "${CMAKE_FLAGS[@]}" ..
cmake --debug-trycompile -DCMAKE_VERBOSE_MAKEFILE=1 -LA "-DCMAKE_BUILD_TYPE=$BUILD_TYPE" "-DSANITIZE=$SANITIZER" -DENABLE_CHECK_HEAVY_BUILDS=1 "${CMAKE_FLAGS_ARRAY[@]}" ..
# No quotes because I want it to expand to nothing if empty.
# shellcheck disable=SC2086 # No quotes because I want it to expand to nothing if empty.

View File

@ -105,7 +105,7 @@ def run_docker_image_with_env(
ccache_mount = ""
cmd = (
f"docker run --network=host --user={user} --rm {ccache_mount}"
f"docker run --network=host --user={user} --rm {ccache_mount} "
f"--volume={output_dir}:/output --volume={ch_root}:/build {env_part} "
f"--volume={cargo_cache_dir}:/rust/cargo/registry {interactive} {image_name}"
)

View File

@ -33,7 +33,7 @@ RUN arch=${TARGETARCH:-amd64} \
# lts / testing / prestable / etc
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="https://packages.clickhouse.com/tgz/${REPO_CHANNEL}"
ARG VERSION="23.8.3.48"
ARG VERSION="23.9.1.1854"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
# user/group precreated explicitly with fixed uid/gid on purpose.

View File

@ -23,7 +23,7 @@ RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list
ARG REPO_CHANNEL="stable"
ARG REPOSITORY="deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb ${REPO_CHANNEL} main"
ARG VERSION="23.8.3.48"
ARG VERSION="23.9.1.1854"
ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
# set non-empty deb_location_url url to create a docker image

View File

@ -69,6 +69,16 @@ else
fi
if [[ -n "$USE_DATABASE_REPLICATED" ]] && [[ "$USE_DATABASE_REPLICATED" -eq 1 ]]; then
sudo cat /etc/clickhouse-server1/config.d/filesystem_caches_path.xml \
| sed "s|<filesystem_caches_path>/var/lib/clickhouse/filesystem_caches/</filesystem_caches_path>|<filesystem_caches_path>/var/lib/clickhouse/filesystem_caches_1/</filesystem_caches_path>|" \
> /etc/clickhouse-server1/config.d/filesystem_caches_path.xml.tmp
mv /etc/clickhouse-server1/config.d/filesystem_caches_path.xml.tmp /etc/clickhouse-server1/config.d/filesystem_caches_path.xml
sudo cat /etc/clickhouse-server2/config.d/filesystem_caches_path.xml \
| sed "s|<filesystem_caches_path>/var/lib/clickhouse/filesystem_caches/</filesystem_caches_path>|<filesystem_caches_path>/var/lib/clickhouse/filesystem_caches_2/</filesystem_caches_path>|" \
> /etc/clickhouse-server2/config.d/filesystem_caches_path.xml.tmp
mv /etc/clickhouse-server2/config.d/filesystem_caches_path.xml.tmp /etc/clickhouse-server2/config.d/filesystem_caches_path.xml
mkdir -p /var/run/clickhouse-server1
sudo chown clickhouse:clickhouse /var/run/clickhouse-server1
sudo -E -u clickhouse /usr/bin/clickhouse server --config /etc/clickhouse-server1/config.xml --daemon \

View File

@ -6,5 +6,4 @@ FROM clickhouse/stateless-test:$FROM_TAG
RUN apt-get install gdb
COPY run.sh /
COPY process_unit_tests_result.py /
CMD ["/bin/bash", "/run.sh"]

View File

@ -1,102 +0,0 @@
#!/usr/bin/env python3
import os
import logging
import argparse
import csv
OK_SIGN = "OK ]"
FAILED_SIGN = "FAILED ]"
SEGFAULT = "Segmentation fault"
SIGNAL = "received signal SIG"
PASSED = "PASSED"
def get_test_name(line):
elements = reversed(line.split(" "))
for element in elements:
if "(" not in element and ")" not in element:
return element
raise Exception("No test name in line '{}'".format(line))
def process_result(result_folder):
summary = []
total_counter = 0
failed_counter = 0
result_log_path = "{}/test_result.txt".format(result_folder)
if not os.path.exists(result_log_path):
logging.info("No output log on path %s", result_log_path)
return "exception", "No output log", []
status = "success"
description = ""
passed = False
with open(result_log_path, "r") as test_result:
for line in test_result:
if OK_SIGN in line:
logging.info("Found ok line: '%s'", line)
test_name = get_test_name(line.strip())
logging.info("Test name: '%s'", test_name)
summary.append((test_name, "OK"))
total_counter += 1
elif FAILED_SIGN in line and "listed below" not in line and "ms)" in line:
logging.info("Found fail line: '%s'", line)
test_name = get_test_name(line.strip())
logging.info("Test name: '%s'", test_name)
summary.append((test_name, "FAIL"))
total_counter += 1
failed_counter += 1
elif SEGFAULT in line:
logging.info("Found segfault line: '%s'", line)
status = "failure"
description += "Segmentation fault. "
break
elif SIGNAL in line:
logging.info("Received signal line: '%s'", line)
status = "failure"
description += "Exit on signal. "
break
elif PASSED in line:
logging.info("PASSED record found: '%s'", line)
passed = True
if not passed:
status = "failure"
description += "PASSED record not found. "
if failed_counter != 0:
status = "failure"
if not description:
description += "fail: {}, passed: {}".format(
failed_counter, total_counter - failed_counter
)
return status, description, summary
def write_results(results_file, status_file, results, status):
with open(results_file, "w") as f:
out = csv.writer(f, delimiter="\t")
out.writerows(results)
with open(status_file, "w") as f:
out = csv.writer(f, delimiter="\t")
out.writerow(status)
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s")
parser = argparse.ArgumentParser(
description="ClickHouse script for parsing results of unit tests"
)
parser.add_argument("--in-results-dir", default="/test_output/")
parser.add_argument("--out-results-file", default="/test_output/test_results.tsv")
parser.add_argument("--out-status-file", default="/test_output/check_status.tsv")
args = parser.parse_args()
state, description, test_results = process_result(args.in_results_dir)
logging.info("Result parsed")
status = (state, description)
write_results(args.out_results_file, args.out_status_file, test_results, status)
logging.info("Result written")

View File

@ -3,5 +3,4 @@
set -x
service zookeeper start && sleep 7 && /usr/share/zookeeper/bin/zkCli.sh -server localhost:2181 -create create /clickhouse_test '';
timeout 40m gdb -q -ex 'set print inferior-events off' -ex 'set confirm off' -ex 'set print thread-events off' -ex run -ex bt -ex quit --args ./unit_tests_dbms | tee test_output/test_result.txt
./process_unit_tests_result.py || echo -e "failure\tCannot parse results" > /test_output/check_status.tsv
timeout 40m gdb -q -ex 'set print inferior-events off' -ex 'set confirm off' -ex 'set print thread-events off' -ex run -ex bt -ex quit --args ./unit_tests_dbms --gtest_output='json:test_output/test_result.json' | tee test_output/test_result.txt

View File

@ -0,0 +1,381 @@
---
sidebar_position: 1
sidebar_label: 2023
---
# 2023 Changelog
### ClickHouse release v23.9.1.1854-stable (8f9a227de1f) FIXME as compared to v23.8.1.2992-lts (ebc7d9a9f3b)
#### Backward Incompatible Change
* Remove the `status_info` configuration option and dictionaries status from the default Prometheus handler. [#54090](https://github.com/ClickHouse/ClickHouse/pull/54090) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* The experimental parts metadata cache is removed from the codebase. [#54215](https://github.com/ClickHouse/ClickHouse/pull/54215) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Disable setting `input_format_json_try_infer_numbers_from_strings` by default, so we don't try to infer numbers from strings in JSON formats by default to avoid possible parsing errors when sample data contains strings that looks like a number. [#55099](https://github.com/ClickHouse/ClickHouse/pull/55099) ([Kruglov Pavel](https://github.com/Avogar)).
#### New Feature
* Added new type of authentication based on SSH keys. It works only for Native TCP protocol. [#41109](https://github.com/ClickHouse/ClickHouse/pull/41109) ([George Gamezardashvili](https://github.com/InfJoker)).
* Added IO Scheduling support for remote disks. Storage configuration for disk types `s3`, `s3_plain`, `hdfs` and `azure_blob_storage` can now contain `read_resource` and `write_resource` elements holding resource names. Scheduling policies for these resources can be configured in a separate server configuration section `resources`. Queries can be marked using setting `workload` and classified using server configuration section `workload_classifiers` to achieve diverse resource scheduling goals. More details in docs/en/operations/workload-scheduling.md. [#47009](https://github.com/ClickHouse/ClickHouse/pull/47009) ([Sergei Trifonov](https://github.com/serxa)).
* Added a new column _block_number resolves [#44532](https://github.com/ClickHouse/ClickHouse/issues/44532). [#47532](https://github.com/ClickHouse/ClickHouse/pull/47532) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Add options `partial_result_update_duration_ms` and `max_rows_in_partial_result` to show updates of a partial result of output table in real-time during query execution. [#48607](https://github.com/ClickHouse/ClickHouse/pull/48607) ([Alexey Perevyshin](https://github.com/alexX512)).
* Support case-insensitive and dot-all matching modes in RegExpTree dictionaries. [#50906](https://github.com/ClickHouse/ClickHouse/pull/50906) ([Johann Gan](https://github.com/johanngan)).
* Add support for `ALTER TABLE MODIFY COMMENT`. Note: something similar was added by an external contributor a long time ago, but the feature did not work at all and only confused users. This closes [#36377](https://github.com/ClickHouse/ClickHouse/issues/36377). [#51304](https://github.com/ClickHouse/ClickHouse/pull/51304) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Added "GCD" aka. "greatest common denominator" as a new data compression codec. The codec computes the GCD of all column values, and then divides each value by the GCD. The GCD codec is a data preparation codec (similar to Delta and DoubleDelta) and cannot be used stand-alone. It works with data integer, decimal and date/time type. A viable use case for the GCD codec are column values that change (increase/decrease) in multiples of the GCD, e.g. 24 - 28 - 16 - 24 - 8 - 24 (assuming GCD = 4). [#53149](https://github.com/ClickHouse/ClickHouse/pull/53149) ([Alexander Nam](https://github.com/seshWCS)).
* Two new type aliases "DECIMAL(P)" (as shortcut for "DECIMAL(P, 0") and "DECIMAL" (as shortcut for "DECIMAL(10, 0)") were added. This makes ClickHouse more compatible with MySQL's SQL dialect. [#53328](https://github.com/ClickHouse/ClickHouse/pull/53328) ([Val Doroshchuk](https://github.com/valbok)).
* Added a new system log table `backup_log` to track all `BACKUP` and `RESTORE` operations. [#53638](https://github.com/ClickHouse/ClickHouse/pull/53638) ([Victor Krasnov](https://github.com/sirvickr)).
* Added a format setting "output_format_markdown_escape_special_characters" (default: false). The setting controls whether special characters like "!", "#", "$" etc. are escaped (i.e. prefixed by a backslash) in the "Markdown" output format. [#53860](https://github.com/ClickHouse/ClickHouse/pull/53860) ([irenjj](https://github.com/irenjj)).
* Add function `decodeHTMLComponent`. [#54097](https://github.com/ClickHouse/ClickHouse/pull/54097) ([Bharat Nallan](https://github.com/bharatnc)).
* Added peak_threads_usage to query_log table. [#54335](https://github.com/ClickHouse/ClickHouse/pull/54335) ([Alexey Gerasimchuck](https://github.com/Demilivor)).
* Add SHOW FUNCTIONS support to clickhouse-client. [#54337](https://github.com/ClickHouse/ClickHouse/pull/54337) ([Julia Kartseva](https://github.com/wat-ze-hex)).
* This PRs improves schema inference from JSON formats: 1) Now it's possible to infer named Tuples from JSON objects without experimantal JSON type under a setting `input_format_json_try_infer_named_tuples_from_objects` in JSON formats. Previously without experimantal type JSON we could only infer JSON objects as Strings or Maps, now we can infer named Tuple. Resulting Tuple type will conain all keys of objects that were read in data sample during schema inference. It can be useful for reading structured JSON data without sparse objects. The setting is enabled by default. 2) Allow parsing JSON array into a column with type String under setting `input_format_json_read_arrays_as_strings`. It can help reading arrays with values with different types. 3) Allow to use type String for JSON keys with unkown types (`null`/`[]`/`{}`) in sample data under setting `input_format_json_infer_incomplete_types_as_strings`. Now in JSON formats we can read any value into String column and we can avoid getting error `Cannot determine type for column 'column_name' by first 25000 rows of data, most likely this column contains only Nulls or empty Arrays/Maps` during schema inference by using type String for unknown types, so the data will be read successfully. [#54427](https://github.com/ClickHouse/ClickHouse/pull/54427) ([Kruglov Pavel](https://github.com/Avogar)).
* Added function "toDaysSinceYearZero" with alias "TO_DAYS()" (for compatibility with MySQL) which returns the number of days passed since 0001-01-01. [#54479](https://github.com/ClickHouse/ClickHouse/pull/54479) ([Robert Schulze](https://github.com/rschu1ze)).
* Added functions YYYYMMDDtoDate(), YYYYMMDDtoDate32(), YYYYMMDDhhmmssToDateTime() and YYYYMMDDhhmmssToDateTime64(). They convert a date or date with time encoded as integer (e.g. 20230911) into a native date or date with time. As such, they provide the opposite functionality of existing functions YYYYMMDDToDate(), YYYYMMDDToDateTime(), YYYYMMDDhhmmddToDateTime(), YYYYMMDDhhmmddToDateTime64(). [#54509](https://github.com/ClickHouse/ClickHouse/pull/54509) ([Robert Schulze](https://github.com/rschu1ze)).
* Added "bandwidth_limit" IO scheduling node type. It allows you to specify `max_speed` and `max_burst` constraints on traffic passing though this node. More details in docs/en/operations/workload-scheduling.md. [#54618](https://github.com/ClickHouse/ClickHouse/pull/54618) ([Sergei Trifonov](https://github.com/serxa)).
* Function `toDaysSinceYearZero()` now supports arguments of type `DateTime` and `DateTime64`. [#54856](https://github.com/ClickHouse/ClickHouse/pull/54856) ([Serge Klochkov](https://github.com/slvrtrn)).
* Allow S3-style URLs for table functions `s3`, `gcs`, `oss`. URL is automatically converted to HTTP. Example: `'s3://clickhouse-public-datasets/hits.csv'` is converted to `'https://clickhouse-public-datasets.s3.amazonaws.com/hits.csv'`. [#54931](https://github.com/ClickHouse/ClickHouse/pull/54931) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Add several string distance functions, include `byteHammingDistance`, `byteJaccardIndex`, `byteEditDistance`. ### Documentation entry for user-facing changes. [#54935](https://github.com/ClickHouse/ClickHouse/pull/54935) ([flynn](https://github.com/ucasfl)).
* Add new setting `print_pretty_type_names` to print pretty deep nested types like Tuple/Maps/Arrays. [#55095](https://github.com/ClickHouse/ClickHouse/pull/55095) ([Kruglov Pavel](https://github.com/Avogar)).
#### Performance Improvement
* Improve performance of sorting for decimal columns. Improve performance of insertion into MergeTree if ORDER BY contains Decimal column. Improve performance of sorting when data is already sorted or almost sorted. [#35961](https://github.com/ClickHouse/ClickHouse/pull/35961) ([Maksim Kita](https://github.com/kitaisreal)).
* Improve performance for huge query analysis. Fixes [#51224](https://github.com/ClickHouse/ClickHouse/issues/51224). [#51469](https://github.com/ClickHouse/ClickHouse/pull/51469) ([frinkr](https://github.com/frinkr)).
* 1. Add rewriter for new analyzer. [#52082](https://github.com/ClickHouse/ClickHouse/pull/52082) ([JackyWoo](https://github.com/JackyWoo)).
* 1. Add rewriter for both old and new analyzer. 2. Add settings `optimize_uniq_to_count`. [#52645](https://github.com/ClickHouse/ClickHouse/pull/52645) ([JackyWoo](https://github.com/JackyWoo)).
* Remove manual calls to `mmap/mremap/munmap` and delegate all this work to `jemalloc`. [#52792](https://github.com/ClickHouse/ClickHouse/pull/52792) ([Nikita Taranov](https://github.com/nickitat)).
* Now roaringBitmaps being optimized before serialization. [#52842](https://github.com/ClickHouse/ClickHouse/pull/52842) ([UnamedRus](https://github.com/UnamedRus)).
* Optimize group by constant keys. Will optimize queries with group by `_file/_path` after https://github.com/ClickHouse/ClickHouse/pull/53529. [#53549](https://github.com/ClickHouse/ClickHouse/pull/53549) ([Kruglov Pavel](https://github.com/Avogar)).
* Speed up reading from S3 by enabling prefetches by default. [#53709](https://github.com/ClickHouse/ClickHouse/pull/53709) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Do not implicitly read pk and version columns in lonely parts if unnecessary. [#53919](https://github.com/ClickHouse/ClickHouse/pull/53919) ([Duc Canh Le](https://github.com/canhld94)).
* Fixed high in CPU consumption when working with NATS. [#54399](https://github.com/ClickHouse/ClickHouse/pull/54399) ([Vasilev Pyotr](https://github.com/vahpetr)).
* Since we use separate instructions for executing `toString()` with datetime argument, it is possible to improve performance a bit for non-datetime arguments and have some parts of the code cleaner. Follows up [#53680](https://github.com/ClickHouse/ClickHouse/issues/53680). [#54443](https://github.com/ClickHouse/ClickHouse/pull/54443) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Instead of serializing json elements into a `std::stringstream`, this PR try to put the serialization result into `ColumnString` direclty. [#54613](https://github.com/ClickHouse/ClickHouse/pull/54613) ([lgbo](https://github.com/lgbo-ustc)).
* Enable ORDER BY optimization for reading data in corresponding order from a MergeTree table in case that the table is behind a view. [#54628](https://github.com/ClickHouse/ClickHouse/pull/54628) ([Vitaly Baranov](https://github.com/vitlibar)).
* Improve JSON SQL functions by reusing `GeneratorJSONPath`. Since there are several `make_shared` in `GenerateorJSONPath`'s constructor, it has bad performance. [#54735](https://github.com/ClickHouse/ClickHouse/pull/54735) ([lgbo](https://github.com/lgbo-ustc)).
#### Improvement
* Keeper improvement: Add a `createIfNotExists` Keeper command. [#48855](https://github.com/ClickHouse/ClickHouse/pull/48855) ([Konstantin Bogdanov](https://github.com/thevar1able)).
* Add IF EMPTY clause for DROP TABLE queries. [#48915](https://github.com/ClickHouse/ClickHouse/pull/48915) ([Pavel Novitskiy](https://github.com/pnovitskiy)).
* The Keeper dynamically adjusts log levels. [#50372](https://github.com/ClickHouse/ClickHouse/pull/50372) ([helifu](https://github.com/helifu)).
* Allow to replace long names of files of columns in `MergeTree` data parts to hashes of names. It helps to avoid `File name too long` error in some cases. [#50612](https://github.com/ClickHouse/ClickHouse/pull/50612) ([Anton Popov](https://github.com/CurtizJ)).
* Allow specifying the expiration date and, optionally, the time for user credentials with `VALID UNTIL datetime` clause. [#51261](https://github.com/ClickHouse/ClickHouse/pull/51261) ([Nikolay Degterinsky](https://github.com/evillique)).
* Add setting `ignore_access_denied_multidirectory_globs`. [#52839](https://github.com/ClickHouse/ClickHouse/pull/52839) ([Andrey Zvonov](https://github.com/zvonand)).
* Output valid JSON/XML on excetpion during HTTP query execution. Add setting `http_write_exception_in_output_format` to enable/disable this behaviour (enabled by default). [#52853](https://github.com/ClickHouse/ClickHouse/pull/52853) ([Kruglov Pavel](https://github.com/Avogar)).
* More precise Integer type inference, fix [#51236](https://github.com/ClickHouse/ClickHouse/issues/51236). [#53003](https://github.com/ClickHouse/ClickHouse/pull/53003) ([Chen768959](https://github.com/Chen768959)).
* Keeper tries to batch flush requests for better performance. [#53049](https://github.com/ClickHouse/ClickHouse/pull/53049) ([Antonio Andelic](https://github.com/antonio2368)).
* Introduced resolving of charsets in the string literals for MaterializedMySQL. [#53220](https://github.com/ClickHouse/ClickHouse/pull/53220) ([Val Doroshchuk](https://github.com/valbok)).
* Fix a subtle issue with a rarely used `EmbeddedRocksDB` table engine in an extremely rare scenario: sometimes the `EmbeddedRocksDB` table engine does not close files correctly in NFS after running `DROP TABLE`. [#53502](https://github.com/ClickHouse/ClickHouse/pull/53502) ([Mingliang Pan](https://github.com/liangliangpan)).
* SQL functions "toString(datetime)" and "formatDateTime()" now support non-constant timezone arguments. [#53680](https://github.com/ClickHouse/ClickHouse/pull/53680) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* `RESTORE TABLE ON CLUSTER` must create replicated tables with a matching UUID on hosts. Otherwise the macro `{uuid}` in ZooKeeper path can't work correctly after RESTORE. This PR implements that. [#53765](https://github.com/ClickHouse/ClickHouse/pull/53765) ([Vitaly Baranov](https://github.com/vitlibar)).
* Added restore setting `restore_broken_parts_as_detached`: if it's true the RESTORE process won't stop on broken parts while restoring, instead all the broken parts will be copied to the `detached` folder with the prefix `broken-from-backup'. If it's false the RESTORE process will stop on the first broken part (if any). The default value is false. [#53877](https://github.com/ClickHouse/ClickHouse/pull/53877) ([Vitaly Baranov](https://github.com/vitlibar)).
* The creation of Annoy indexes can now be parallelized using setting `max_threads_for_annoy_index_creation`. [#54047](https://github.com/ClickHouse/ClickHouse/pull/54047) ([Robert Schulze](https://github.com/rschu1ze)).
* The MySQL interface gained a minimal implementation of prepared statements, just enough to allow a connection from Tableau Online to ClickHouse via the MySQL connector. [#54115](https://github.com/ClickHouse/ClickHouse/pull/54115) ([Serge Klochkov](https://github.com/slvrtrn)).
* Replaced the library to handle (encode/decode) base64 values from Turbo-Base64 to aklomp-base64. Both are SIMD-accelerated on x86 and ARM but 1. the license of the latter (BSD-2) is more favorable for ClickHouse, Turbo64 switched in the meantime to GPL-3, 2. with more GitHub stars, aklomp-base64 seems more future-proof, 3. aklomp-base64 has a slightly nicer API (which is arguably subjective), and 4. aklomp-base64 does not require us to hack around bugs (like non-threadsafe initialization). Note: aklomp-base64 rejects unpadded base64 values whereas Turbo-Base64 decodes them on a best-effort basis. RFC-4648 leaves it open whether padding is mandatory or not, but depending on the context this may be a behavioral change to be aware of. [#54119](https://github.com/ClickHouse/ClickHouse/pull/54119) ([Mikhail Koviazin](https://github.com/mkmkme)).
* Add elapsed_ns to HTTP headers X-ClickHouse-Progress and X-ClickHouse-Summary. [#54179](https://github.com/ClickHouse/ClickHouse/pull/54179) ([joelynch](https://github.com/joelynch)).
* Implementation of `reconfig` (https://github.com/ClickHouse/ClickHouse/pull/49450), `sync`, and `exists` commands for keeper-client. [#54201](https://github.com/ClickHouse/ClickHouse/pull/54201) ([pufit](https://github.com/pufit)).
* "clickhouse-local" and "clickhouse-client" now allow to specify the "--query" parameter multiple times, e.g. './clickhouse-client --query "SELECT 1" --query "SELECT 2"'. This syntax is slightly more intuitive than `./clickhouse-client --multiquery "SELECT 1;SELECT2", a bit easier to script (e.g. "queries.push_back('--query "$q"')") and more consistent with the behavior of existing parameter "--queries-file" (e.g. "./clickhouse client --queries-file queries1.sql --queries-file queries2.sql"). [#54249](https://github.com/ClickHouse/ClickHouse/pull/54249) ([Robert Schulze](https://github.com/rschu1ze)).
* Add sub-second precision to `formatReadableTimeDelta`. [#54250](https://github.com/ClickHouse/ClickHouse/pull/54250) ([Andrey Zvonov](https://github.com/zvonand)).
* Fix wrong reallocation in HashedArrayDictionary:. [#54254](https://github.com/ClickHouse/ClickHouse/pull/54254) ([Vitaly Baranov](https://github.com/vitlibar)).
* Enable allow_remove_stale_moving_parts by default. [#54260](https://github.com/ClickHouse/ClickHouse/pull/54260) ([vdimir](https://github.com/vdimir)).
* Fix using count from cache and improve progress bar for reading from archives. [#54271](https://github.com/ClickHouse/ClickHouse/pull/54271) ([Kruglov Pavel](https://github.com/Avogar)).
* Add support for S3 credentials using SSO. To define a profile to be used with SSO, set `AWS_PROFILE` environment variable. [#54347](https://github.com/ClickHouse/ClickHouse/pull/54347) ([Antonio Andelic](https://github.com/antonio2368)).
* Support NULL as default for nested types Array/Tuple/Map for input formats. Closes [#51100](https://github.com/ClickHouse/ClickHouse/issues/51100). [#54351](https://github.com/ClickHouse/ClickHouse/pull/54351) ([Kruglov Pavel](https://github.com/Avogar)).
* This is actually a bug fix, but not sure I'll be able to add a test to support the case, so I have put it as an improvement. This issue was introduced in https://github.com/ClickHouse/ClickHouse/pull/45878, which is when CH started reading arrow in batches. [#54370](https://github.com/ClickHouse/ClickHouse/pull/54370) ([Arthur Passos](https://github.com/arthurpassos)).
* Add STD alias to stddevPop function for MySQL compatibility. Closes [#54274](https://github.com/ClickHouse/ClickHouse/issues/54274). [#54382](https://github.com/ClickHouse/ClickHouse/pull/54382) ([Nikolay Degterinsky](https://github.com/evillique)).
* Add `addDate` function for compatibility with MySQL and `subDate` for consistency. Reference [#54275](https://github.com/ClickHouse/ClickHouse/issues/54275). [#54400](https://github.com/ClickHouse/ClickHouse/pull/54400) ([Nikolay Degterinsky](https://github.com/evillique)).
* Parse data in JSON format as JSONEachRow if failed to parse metadata. It will allow to read files with `.json` extension even if real format is JSONEachRow. Closes [#45740](https://github.com/ClickHouse/ClickHouse/issues/45740). [#54405](https://github.com/ClickHouse/ClickHouse/pull/54405) ([Kruglov Pavel](https://github.com/Avogar)).
* Pass http retry timeout as milliseconds. [#54438](https://github.com/ClickHouse/ClickHouse/pull/54438) ([Duc Canh Le](https://github.com/canhld94)).
* Support SAMPLE BY for VIEW. [#54477](https://github.com/ClickHouse/ClickHouse/pull/54477) ([Azat Khuzhin](https://github.com/azat)).
* Add modification_time into system.detached_parts. [#54506](https://github.com/ClickHouse/ClickHouse/pull/54506) ([Azat Khuzhin](https://github.com/azat)).
* Added a setting "splitby_max_substrings_includes_remaining_string" which controls if functions "splitBy*()" with argument "max_substring" > 0 include the remaining string (if any) in the result array (Python/Spark semantics) or not. The default behavior does not change. [#54518](https://github.com/ClickHouse/ClickHouse/pull/54518) ([Robert Schulze](https://github.com/rschu1ze)).
* Now clickhouse-client process files in parallel in case of `INFILE 'glob_expression'`. Closes [#54218](https://github.com/ClickHouse/ClickHouse/issues/54218). [#54533](https://github.com/ClickHouse/ClickHouse/pull/54533) ([Max K.](https://github.com/mkaynov)).
* Allow to use primary key for IN function where primary key column types are different from `IN` function right side column types. Example: `SELECT id FROM test_table WHERE id IN (SELECT '5')`. Closes [#48936](https://github.com/ClickHouse/ClickHouse/issues/48936). [#54544](https://github.com/ClickHouse/ClickHouse/pull/54544) ([Maksim Kita](https://github.com/kitaisreal)).
* Better integer types inference for Int64/UInt64 fields. Continuation of https://github.com/ClickHouse/ClickHouse/pull/53003. Now it works also for nested types like Arrays of Arrays anf for functions like `map/tuple`. Issue: [#51236](https://github.com/ClickHouse/ClickHouse/issues/51236). [#54553](https://github.com/ClickHouse/ClickHouse/pull/54553) ([Kruglov Pavel](https://github.com/Avogar)).
* HashJoin tries to shrink internal buffers consuming half of maximal available memory (set by `max_bytes_in_join`). [#54584](https://github.com/ClickHouse/ClickHouse/pull/54584) ([vdimir](https://github.com/vdimir)).
* Added array operations for multiplying, dividing and modulo on scalar. Works in each way, for example `5 * [5, 5]` and `[5, 5] * 5` - both cases are possible. [#54608](https://github.com/ClickHouse/ClickHouse/pull/54608) ([Yarik Briukhovetskyi](https://github.com/yariks5s)).
* Added function `timestamp` for compatibility with MySQL. Closes [#54275](https://github.com/ClickHouse/ClickHouse/issues/54275). [#54639](https://github.com/ClickHouse/ClickHouse/pull/54639) ([Nikolay Degterinsky](https://github.com/evillique)).
* Respect max_block_size for array join to avoid possible OOM. Close [#54290](https://github.com/ClickHouse/ClickHouse/issues/54290). [#54664](https://github.com/ClickHouse/ClickHouse/pull/54664) ([李扬](https://github.com/taiyang-li)).
* Add optional `version` argument to `rm` command in `keeper-client` to support safer deletes. [#54708](https://github.com/ClickHouse/ClickHouse/pull/54708) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Disable killing the server by systemd (that may lead to data loss when using Buffer tables). [#54744](https://github.com/ClickHouse/ClickHouse/pull/54744) ([Azat Khuzhin](https://github.com/azat)).
* Added field "is_deterministic" to system table "system.functions" which indicates whether the result of a function is stable between two invocations (given exactly the same inputs) or not. [#54766](https://github.com/ClickHouse/ClickHouse/pull/54766) ([Robert Schulze](https://github.com/rschu1ze)).
* Made the views in schema "information_schema" more compatible with the equivalent views in MySQL (i.e. modified and extended them) up to a point where Tableau Online is able to connect to ClickHouse. More specifically: 1. The type of field "information_schema.tables.table_type" changed from Enum8 to String. 2. Added fields "table_comment" and "table_collation" to view "information_schema.table". 3. Added views "information_schema.key_column_usage" and "referential_constraints". 4. Replaced uppercase aliases in "information_schema" views with concrete uppercase columns. [#54773](https://github.com/ClickHouse/ClickHouse/pull/54773) ([Serge Klochkov](https://github.com/slvrtrn)).
* The query cache now returns an error if the user tries to cache the result of a query with a non-deterministic function such as "now()", "randomString()" and "dictGet()". Compared to the previous behavior (silently don't cache the result), this reduces confusion and surprise for users. [#54801](https://github.com/ClickHouse/ClickHouse/pull/54801) ([Robert Schulze](https://github.com/rschu1ze)).
* Forbid special columns for file/s3/url/... storages, fix insert into ephemeral columns from files. Closes [#53477](https://github.com/ClickHouse/ClickHouse/issues/53477). [#54803](https://github.com/ClickHouse/ClickHouse/pull/54803) ([Kruglov Pavel](https://github.com/Avogar)).
* More configurable collecting metadata for backup. [#54804](https://github.com/ClickHouse/ClickHouse/pull/54804) ([Vitaly Baranov](https://github.com/vitlibar)).
* `clickhouse-local`'s log file (if enabled with --server_logs_file flag) will now prefix each line with timestamp, thread id, etc, just like `clickhouse-server`. [#54807](https://github.com/ClickHouse/ClickHouse/pull/54807) ([Michael Kolupaev](https://github.com/al13n321)).
* Reuse HTTP connections in s3 table function. [#54812](https://github.com/ClickHouse/ClickHouse/pull/54812) ([Michael Kolupaev](https://github.com/al13n321)).
* Avoid excessive calls to getifaddrs in isLocalAddress. [#54819](https://github.com/ClickHouse/ClickHouse/pull/54819) ([Duc Canh Le](https://github.com/canhld94)).
* Field "is_obsolete" in system.merge_tree_settings is now 1 for obsolete merge tree settings. Previously, only the description indicated that the setting is obsolete. [#54837](https://github.com/ClickHouse/ClickHouse/pull/54837) ([Robert Schulze](https://github.com/rschu1ze)).
* Make it possible to use plural when using interval literals. `INTERVAL 2 HOURS` should be equivalent to `INTERVAL 2 HOUR`. [#54860](https://github.com/ClickHouse/ClickHouse/pull/54860) ([Jordi Villar](https://github.com/jrdi)).
* Replace the linear method in `MergeTreeRangeReader::Stream::ceilRowsToCompleteGranules` with a binary search. [#54869](https://github.com/ClickHouse/ClickHouse/pull/54869) ([usurai](https://github.com/usurai)).
* Always allow the creation of a projection with `Nullable` PK. This fixes [#54814](https://github.com/ClickHouse/ClickHouse/issues/54814). [#54895](https://github.com/ClickHouse/ClickHouse/pull/54895) ([Amos Bird](https://github.com/amosbird)).
* Retry backup S3 operations after connection reset failure. [#54900](https://github.com/ClickHouse/ClickHouse/pull/54900) ([Vitaly Baranov](https://github.com/vitlibar)).
* Make the exception message exact in case of the maximum value of a settings is less than the minimum value. [#54925](https://github.com/ClickHouse/ClickHouse/pull/54925) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* LIKE, match, and other regular expressions matching functions now allow matching with patterns containing non-UTF-8 substrings by falling back to binary matching. Example: you can use `string LIKE '\xFE\xFF%'` to detect BOM. This closes [#54486](https://github.com/ClickHouse/ClickHouse/issues/54486). [#54942](https://github.com/ClickHouse/ClickHouse/pull/54942) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* ProfileEvents added ContextLockWaitMicroseconds event. [#55029](https://github.com/ClickHouse/ClickHouse/pull/55029) ([Maksim Kita](https://github.com/kitaisreal)).
* Added field "is_deterministic" to system table "system.functions" which indicates whether the result of a function is stable between two invocations (given exactly the same inputs) or not. [#55035](https://github.com/ClickHouse/ClickHouse/pull/55035) ([Robert Schulze](https://github.com/rschu1ze)).
* View information_schema.tables now has a new field `data_length` which shows the approximate size of the data on disk. Required to run queries generated by Amazon QuickSight. [#55037](https://github.com/ClickHouse/ClickHouse/pull/55037) ([Robert Schulze](https://github.com/rschu1ze)).
#### Build/Testing/Packaging Improvement
* ClickHouse is built with Musl instead of GLibc by default. [#52550](https://github.com/ClickHouse/ClickHouse/pull/52550) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* ClickHouse is built with Musl instead of GLibc. [#52721](https://github.com/ClickHouse/ClickHouse/pull/52721) ([Azat Khuzhin](https://github.com/azat)).
* Bumped the compiler of official and continuous integration builds of ClickHouse from Clang 16 to 17. [#53831](https://github.com/ClickHouse/ClickHouse/pull/53831) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix flaky test. `wait_resolver` function was asserting the response to be == proxy1, but it might actually return proxy2. Account for that as well. [#54191](https://github.com/ClickHouse/ClickHouse/pull/54191) ([Arthur Passos](https://github.com/arthurpassos)).
* Regenerated tld data for lookups (`tldLookup.generated.cpp`). [#54269](https://github.com/ClickHouse/ClickHouse/pull/54269) ([Bharat Nallan](https://github.com/bharatnc)).
* Report properly timeout for check itself in `fast_test_check`/`stress_check`. [#54278](https://github.com/ClickHouse/ClickHouse/pull/54278) ([Igor Nikonov](https://github.com/devcrafter)).
* Suddenly, `test_host_regexp_multiple_ptr_records_concurrent` became flaky. [#54307](https://github.com/ClickHouse/ClickHouse/pull/54307) ([Arthur Passos](https://github.com/arthurpassos)).
* Fixed precise float parsing issue on s390x. [#54330](https://github.com/ClickHouse/ClickHouse/pull/54330) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Enrich `changed_images.json` with the latest tag from master for images that are not changed in the pull request. [#54369](https://github.com/ClickHouse/ClickHouse/pull/54369) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Fixed endian issue in jemalloc_bins system table for s390x. [#54517](https://github.com/ClickHouse/ClickHouse/pull/54517) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Fixed random generation issue for UInt256 and IPv4 on s390x. [#54576](https://github.com/ClickHouse/ClickHouse/pull/54576) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Remove redundant `clickhouse-keeper-client` symlink. [#54587](https://github.com/ClickHouse/ClickHouse/pull/54587) ([Tomas Barton](https://github.com/deric)).
* Use `/usr/bin/env` to resolve bash. [#54603](https://github.com/ClickHouse/ClickHouse/pull/54603) ([Fionera](https://github.com/fionera)).
* Move all `tests/ci/*.lib files` to `stateless-tests` image. Closes [#54540](https://github.com/ClickHouse/ClickHouse/issues/54540). [#54668](https://github.com/ClickHouse/ClickHouse/pull/54668) ([Kruglov Pavel](https://github.com/Avogar)).
* We build and upload them for every push, which isn't worth it. [#54675](https://github.com/ClickHouse/ClickHouse/pull/54675) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fixed SimHash function endian issue for s390x. [#54793](https://github.com/ClickHouse/ClickHouse/pull/54793) ([Harry Lee](https://github.com/HarryLeeIBM)).
* Do not clone the fast tests repo twice; parallelize submodules checkout; use the current user in the fast-tests container. [#54849](https://github.com/ClickHouse/ClickHouse/pull/54849) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Avoid running pull request ci workflow for fixes touching .md files only. [#54914](https://github.com/ClickHouse/ClickHouse/pull/54914) ([Max K.](https://github.com/mkaynov)).
* CMake added `PROFILE_CPU` option needed to perform `perf record` without using DWARF call graph. [#54917](https://github.com/ClickHouse/ClickHouse/pull/54917) ([Maksim Kita](https://github.com/kitaisreal)).
* Use `--gtest_output='json:'` to parse unit test results. [#54922](https://github.com/ClickHouse/ClickHouse/pull/54922) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Added support for additional scripts (you need to mound a volume) to extend build process. [#55000](https://github.com/ClickHouse/ClickHouse/pull/55000) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* If the linker is different than LLD, stop with a fatal error. [#55036](https://github.com/ClickHouse/ClickHouse/pull/55036) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
#### Bug Fix (user-visible misbehavior in an official stable release)
* Store NULL in scalar result map for empty subquery result [#52240](https://github.com/ClickHouse/ClickHouse/pull/52240) ([vdimir](https://github.com/vdimir)).
* Fix misleading error message in OUTFILE with CapnProto/Protobuf [#52870](https://github.com/ClickHouse/ClickHouse/pull/52870) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix summary reporting with parallel replicas with LIMIT [#53050](https://github.com/ClickHouse/ClickHouse/pull/53050) ([Raúl Marín](https://github.com/Algunenano)).
* Fix throttling of BACKUPs from/to S3 (in case native copy was not used) and in some other places as well [#53336](https://github.com/ClickHouse/ClickHouse/pull/53336) ([Azat Khuzhin](https://github.com/azat)).
* Fix IO throttling during copying whole directories [#53338](https://github.com/ClickHouse/ClickHouse/pull/53338) ([Azat Khuzhin](https://github.com/azat)).
* Fix: moved to prewhere condition actions can lose column [#53492](https://github.com/ClickHouse/ClickHouse/pull/53492) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* 37737 fixed internal error when replacing with byte-equal parts [#53735](https://github.com/ClickHouse/ClickHouse/pull/53735) ([Pedro Riera](https://github.com/priera)).
* Fix: require columns participating in interpolate expression [#53754](https://github.com/ClickHouse/ClickHouse/pull/53754) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix cluster discovery initialization + setting up fail points in config [#54113](https://github.com/ClickHouse/ClickHouse/pull/54113) ([vdimir](https://github.com/vdimir)).
* Fix issues in accurateCastOrNull [#54136](https://github.com/ClickHouse/ClickHouse/pull/54136) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Fix nullable primary key in final [#54164](https://github.com/ClickHouse/ClickHouse/pull/54164) ([Amos Bird](https://github.com/amosbird)).
* Inserting only non-duplicate chunks in MV [#54184](https://github.com/ClickHouse/ClickHouse/pull/54184) ([Pedro Riera](https://github.com/priera)).
* Fix REPLACE/MOVE PARTITION with zero-copy replication [#54193](https://github.com/ClickHouse/ClickHouse/pull/54193) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix: parallel replicas over distributed don't read from all replicas [#54199](https://github.com/ClickHouse/ClickHouse/pull/54199) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix: allow IPv6 for bloom filter [#54200](https://github.com/ClickHouse/ClickHouse/pull/54200) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* fix possible type mismatch with IPv4 [#54212](https://github.com/ClickHouse/ClickHouse/pull/54212) ([Bharat Nallan](https://github.com/bharatnc)).
* Fix system.data_skipping_indices for recreated indices [#54225](https://github.com/ClickHouse/ClickHouse/pull/54225) ([Artur Malchanau](https://github.com/Hexta)).
* fix name clash for multiple join rewriter v2 [#54240](https://github.com/ClickHouse/ClickHouse/pull/54240) ([Tao Wang](https://github.com/wangtZJU)).
* Fix unexpected errors in system.errors after join [#54306](https://github.com/ClickHouse/ClickHouse/pull/54306) ([vdimir](https://github.com/vdimir)).
* Fix isZeroOrNull(NULL) [#54316](https://github.com/ClickHouse/ClickHouse/pull/54316) ([flynn](https://github.com/ucasfl)).
* Fix: parallel replicas over distributed with prefer_localhost_replica=1 [#54334](https://github.com/ClickHouse/ClickHouse/pull/54334) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix logical error in vertical merge + replacing merge tree + optimize cleanup [#54368](https://github.com/ClickHouse/ClickHouse/pull/54368) ([alesapin](https://github.com/alesapin)).
* Fix possible error 'URI contains invalid characters' in s3 table function [#54373](https://github.com/ClickHouse/ClickHouse/pull/54373) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix segfault in AST optimization of `arrayExists` function [#54379](https://github.com/ClickHouse/ClickHouse/pull/54379) ([Nikolay Degterinsky](https://github.com/evillique)).
* Check for overflow before addition in `analysisOfVariance` function [#54385](https://github.com/ClickHouse/ClickHouse/pull/54385) ([Antonio Andelic](https://github.com/antonio2368)).
* reproduce and fix the bug in removeSharedRecursive [#54430](https://github.com/ClickHouse/ClickHouse/pull/54430) ([Sema Checherinda](https://github.com/CheSema)).
* Fix possible incorrect result with SimpleAggregateFunction in PREWHERE and FINAL [#54436](https://github.com/ClickHouse/ClickHouse/pull/54436) ([Azat Khuzhin](https://github.com/azat)).
* Fix filtering parts with indexHint for non analyzer [#54449](https://github.com/ClickHouse/ClickHouse/pull/54449) ([Azat Khuzhin](https://github.com/azat)).
* Fix aggregate projections with normalized states [#54480](https://github.com/ClickHouse/ClickHouse/pull/54480) ([Amos Bird](https://github.com/amosbird)).
* Bugfix/local multiquery parameter [#54498](https://github.com/ClickHouse/ClickHouse/pull/54498) ([CuiShuoGuo](https://github.com/bakam412)).
* clickhouse-local support --database command line argument [#54503](https://github.com/ClickHouse/ClickHouse/pull/54503) ([vdimir](https://github.com/vdimir)).
* Fix possible parsing error in WithNames formats with disabled input_format_with_names_use_header [#54513](https://github.com/ClickHouse/ClickHouse/pull/54513) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix rare case of CHECKSUM_DOESNT_MATCH error [#54549](https://github.com/ClickHouse/ClickHouse/pull/54549) ([alesapin](https://github.com/alesapin)).
* Fix zero copy garbage [#54550](https://github.com/ClickHouse/ClickHouse/pull/54550) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix sorting of UNION ALL of already sorted results [#54564](https://github.com/ClickHouse/ClickHouse/pull/54564) ([Vitaly Baranov](https://github.com/vitlibar)).
* Fix snapshot install in Keeper [#54572](https://github.com/ClickHouse/ClickHouse/pull/54572) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix race in `ColumnUnique` [#54575](https://github.com/ClickHouse/ClickHouse/pull/54575) ([Nikita Taranov](https://github.com/nickitat)).
* Annoy/Usearch index: Fix LOGICAL_ERROR during build-up with default values [#54600](https://github.com/ClickHouse/ClickHouse/pull/54600) ([Robert Schulze](https://github.com/rschu1ze)).
* Fix serialization of `ColumnDecimal` [#54601](https://github.com/ClickHouse/ClickHouse/pull/54601) ([Nikita Taranov](https://github.com/nickitat)).
* Fix schema inference for *Cluster functions for column names with spaces [#54635](https://github.com/ClickHouse/ClickHouse/pull/54635) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix using structure from insertion tables in case of defaults and explicit insert columns [#54655](https://github.com/ClickHouse/ClickHouse/pull/54655) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix: avoid using regex match, possibly containing alternation, as a key condition. [#54696](https://github.com/ClickHouse/ClickHouse/pull/54696) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Fix ReplacingMergeTree with vertical merge and cleanup [#54706](https://github.com/ClickHouse/ClickHouse/pull/54706) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Fix virtual columns having incorrect values after ORDER BY [#54811](https://github.com/ClickHouse/ClickHouse/pull/54811) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix filtering parts with indexHint for non analyzer (resubmit) [#54825](https://github.com/ClickHouse/ClickHouse/pull/54825) ([Azat Khuzhin](https://github.com/azat)).
* Fix Keeper segfault during shutdown [#54841](https://github.com/ClickHouse/ClickHouse/pull/54841) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix "Invalid number of rows in Chunk" in MaterializedPostgreSQL [#54844](https://github.com/ClickHouse/ClickHouse/pull/54844) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Move obsolete format settings to separate section [#54855](https://github.com/ClickHouse/ClickHouse/pull/54855) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix zero copy locks with hardlinks [#54859](https://github.com/ClickHouse/ClickHouse/pull/54859) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix `FINAL` produces invalid read ranges in a rare case [#54934](https://github.com/ClickHouse/ClickHouse/pull/54934) ([Nikita Taranov](https://github.com/nickitat)).
* Rebuild minmax_count_projection when partition key gets modified [#54943](https://github.com/ClickHouse/ClickHouse/pull/54943) ([Amos Bird](https://github.com/amosbird)).
* Fix bad cast to ColumnVector<Int128> in function if [#55019](https://github.com/ClickHouse/ClickHouse/pull/55019) ([Kruglov Pavel](https://github.com/Avogar)).
* Fix: insert quorum w/o keeper retries [#55026](https://github.com/ClickHouse/ClickHouse/pull/55026) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix simple state with nullable [#55030](https://github.com/ClickHouse/ClickHouse/pull/55030) ([Pedro Riera](https://github.com/priera)).
* Prevent attaching parts from tables with different projections or indices [#55062](https://github.com/ClickHouse/ClickHouse/pull/55062) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
#### NO CL ENTRY
* NO CL ENTRY: 'Revert "Revert "Fixed wrong python test name pattern""'. [#54043](https://github.com/ClickHouse/ClickHouse/pull/54043) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* NO CL ENTRY: 'Revert "Fix: respect skip_unavailable_shards with parallel replicas"'. [#54189](https://github.com/ClickHouse/ClickHouse/pull/54189) ([Alexander Tokmakov](https://github.com/tavplubix)).
* NO CL ENTRY: 'Revert "Add settings for real-time updates during query execution"'. [#54470](https://github.com/ClickHouse/ClickHouse/pull/54470) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Fix issues in accurateCastOrNull"'. [#54472](https://github.com/ClickHouse/ClickHouse/pull/54472) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Revert "Add settings for real-time updates during query execution""'. [#54476](https://github.com/ClickHouse/ClickHouse/pull/54476) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* NO CL ENTRY: 'Revert "add runOptimize call in bitmap write method"'. [#54528](https://github.com/ClickHouse/ClickHouse/pull/54528) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Optimize uniq to count"'. [#54566](https://github.com/ClickHouse/ClickHouse/pull/54566) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Add stateless test for clickhouse keeper-client --no-confirmation"'. [#54616](https://github.com/ClickHouse/ClickHouse/pull/54616) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* NO CL ENTRY: 'Revert "Remove flaky tests for the experimental `UNDROP` feature"'. [#54671](https://github.com/ClickHouse/ClickHouse/pull/54671) ([Alexander Tokmakov](https://github.com/tavplubix)).
* NO CL ENTRY: 'Revert "Fix filtering parts with indexHint for non analyzer"'. [#54806](https://github.com/ClickHouse/ClickHouse/pull/54806) ([Azat Khuzhin](https://github.com/azat)).
* NO CL ENTRY: 'Revert "refine error code of duplicated index in create query"'. [#54840](https://github.com/ClickHouse/ClickHouse/pull/54840) ([Alexander Tokmakov](https://github.com/tavplubix)).
* NO CL ENTRY: 'Revert "Avoid excessive calls to getifaddrs in isLocalAddress"'. [#54893](https://github.com/ClickHouse/ClickHouse/pull/54893) ([Igor Nikonov](https://github.com/devcrafter)).
* NO CL ENTRY: 'Revert "Fix NATS high cpu usage"'. [#55005](https://github.com/ClickHouse/ClickHouse/pull/55005) ([Nikolay Degterinsky](https://github.com/evillique)).
#### NOT FOR CHANGELOG / INSIGNIFICANT
* libFuzzer: add CI fuzzers build, add tcp protocol fuzzer, fix other fuzzers. [#42599](https://github.com/ClickHouse/ClickHouse/pull/42599) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Add new exceptions to 4xx error [#50722](https://github.com/ClickHouse/ClickHouse/pull/50722) ([Boris Kuschel](https://github.com/bkuschel)).
* Test libunwind changes. [#51436](https://github.com/ClickHouse/ClickHouse/pull/51436) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix data race in copyFromIStreamWithProgressCallback [#51449](https://github.com/ClickHouse/ClickHouse/pull/51449) ([Michael Kolupaev](https://github.com/al13n321)).
* Abort on `std::logic_error` in CI [#51907](https://github.com/ClickHouse/ClickHouse/pull/51907) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Unify setting http keep-alive timeout, increase default to 30s [#53068](https://github.com/ClickHouse/ClickHouse/pull/53068) ([Nikita Taranov](https://github.com/nickitat)).
* Add a regression test for broken Vertical merge after ADD+DROP COLUMN [#53214](https://github.com/ClickHouse/ClickHouse/pull/53214) ([Azat Khuzhin](https://github.com/azat)).
* Revert "Revert "dateDiff: add support for plural units."" [#53803](https://github.com/ClickHouse/ClickHouse/pull/53803) ([Han Fei](https://github.com/hanfei1991)).
* Fix some tests [#53892](https://github.com/ClickHouse/ClickHouse/pull/53892) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Refactoring of reading from `MergeTree` tables [#53931](https://github.com/ClickHouse/ClickHouse/pull/53931) ([Anton Popov](https://github.com/CurtizJ)).
* Use pathlib.Path in S3Helper, rewrite build reports, improve small things [#54010](https://github.com/ClickHouse/ClickHouse/pull/54010) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Correct UniquesHashSet to be endianness-independent. [#54045](https://github.com/ClickHouse/ClickHouse/pull/54045) ([Austin Kothig](https://github.com/kothiga)).
* Increase retries for test_merge_tree_azure_blob_storage [#54069](https://github.com/ClickHouse/ClickHouse/pull/54069) ([SmitaRKulkarni](https://github.com/SmitaRKulkarni)).
* Fix SipHash128 reference for big-endian platforms [#54095](https://github.com/ClickHouse/ClickHouse/pull/54095) ([ltrk2](https://github.com/ltrk2)).
* Small usearch index improvements: metrics and configurable internal data type [#54103](https://github.com/ClickHouse/ClickHouse/pull/54103) ([Michael Kolupaev](https://github.com/al13n321)).
* Small refactoring for read from object storage [#54134](https://github.com/ClickHouse/ClickHouse/pull/54134) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Minor changes [#54171](https://github.com/ClickHouse/ClickHouse/pull/54171) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix hostname and co result constness in new analyzer [#54174](https://github.com/ClickHouse/ClickHouse/pull/54174) ([vdimir](https://github.com/vdimir)).
* Amend a confusing line of code in Loggers.cpp [#54183](https://github.com/ClickHouse/ClickHouse/pull/54183) ([Victor Krasnov](https://github.com/sirvickr)).
* Fix partition id pruning for analyzer. [#54185](https://github.com/ClickHouse/ClickHouse/pull/54185) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Update version after release [#54186](https://github.com/ClickHouse/ClickHouse/pull/54186) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Update version_date.tsv and changelogs after v23.8.1.2992-lts [#54188](https://github.com/ClickHouse/ClickHouse/pull/54188) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Fix pager in client/local interactive mode when not all data had been read [#54190](https://github.com/ClickHouse/ClickHouse/pull/54190) ([Azat Khuzhin](https://github.com/azat)).
* Fix flaky test `01099_operators_date_and_timestamp` [#54195](https://github.com/ClickHouse/ClickHouse/pull/54195) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Save system tables from s3_disk in the report [#54198](https://github.com/ClickHouse/ClickHouse/pull/54198) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix timezones in the CI Logs database [#54210](https://github.com/ClickHouse/ClickHouse/pull/54210) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* 2R: Fix: respect skip_unavailable_shards with parallel replicas [#54213](https://github.com/ClickHouse/ClickHouse/pull/54213) ([Igor Nikonov](https://github.com/devcrafter)).
* S3Queue is experimental [#54214](https://github.com/ClickHouse/ClickHouse/pull/54214) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Improve vars with refernce in Settings cpps [#54220](https://github.com/ClickHouse/ClickHouse/pull/54220) ([xuzifu666](https://github.com/xuzifu666)).
* Add ProfileEvents::Timer class [#54221](https://github.com/ClickHouse/ClickHouse/pull/54221) ([Stig Bakken](https://github.com/stigsb)).
* Test: extend cluster_all_replicas integration test with skip_unavailable_shards [#54223](https://github.com/ClickHouse/ClickHouse/pull/54223) ([Igor Nikonov](https://github.com/devcrafter)).
* remove semicolon [#54236](https://github.com/ClickHouse/ClickHouse/pull/54236) ([YinZheng-Sun](https://github.com/YinZheng-Sun)).
* Fix bad code in the `system.filesystem_cache`: catching exceptions [#54237](https://github.com/ClickHouse/ClickHouse/pull/54237) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Merge [#54236](https://github.com/ClickHouse/ClickHouse/issues/54236) [#54238](https://github.com/ClickHouse/ClickHouse/pull/54238) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Minor improvement, load from config [#54244](https://github.com/ClickHouse/ClickHouse/pull/54244) ([zhanglistar](https://github.com/zhanglistar)).
* Follow-up to [#54198](https://github.com/ClickHouse/ClickHouse/issues/54198) [#54246](https://github.com/ClickHouse/ClickHouse/pull/54246) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Properly re-initialize ZooKeeper fault injection [#54251](https://github.com/ClickHouse/ClickHouse/pull/54251) ([Alexander Gololobov](https://github.com/davenger)).
* Update ci-slack-bot.py [#54253](https://github.com/ClickHouse/ClickHouse/pull/54253) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix clickhouse-test --no-drop-if-fail on reference mismatch [#54256](https://github.com/ClickHouse/ClickHouse/pull/54256) ([Azat Khuzhin](https://github.com/azat)).
* Improve slack-bot-ci lambda [#54258](https://github.com/ClickHouse/ClickHouse/pull/54258) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Update version_date.tsv and changelogs after v23.3.12.11-lts [#54259](https://github.com/ClickHouse/ClickHouse/pull/54259) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Minor change [#54261](https://github.com/ClickHouse/ClickHouse/pull/54261) ([flynn](https://github.com/ucasfl)).
* Add a note of where the lambda is deployed [#54268](https://github.com/ClickHouse/ClickHouse/pull/54268) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Query cache: Log caching of entries [#54270](https://github.com/ClickHouse/ClickHouse/pull/54270) ([Robert Schulze](https://github.com/rschu1ze)).
* Update version_date.tsv and changelogs after v23.8.2.7-lts [#54273](https://github.com/ClickHouse/ClickHouse/pull/54273) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Fix test `02783_parsedatetimebesteffort_syslog` [#54279](https://github.com/ClickHouse/ClickHouse/pull/54279) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix `test_keeper_disks` [#54291](https://github.com/ClickHouse/ClickHouse/pull/54291) ([Antonio Andelic](https://github.com/antonio2368)).
* Code improvement for reading from archives [#54293](https://github.com/ClickHouse/ClickHouse/pull/54293) ([Antonio Andelic](https://github.com/antonio2368)).
* Rollback testing part from [#42599](https://github.com/ClickHouse/ClickHouse/issues/42599) [#54301](https://github.com/ClickHouse/ClickHouse/pull/54301) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* CI: libFuzzer integration [#54310](https://github.com/ClickHouse/ClickHouse/pull/54310) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Update version_date.tsv and changelogs after v23.3.13.6-lts [#54313](https://github.com/ClickHouse/ClickHouse/pull/54313) ([robot-clickhouse](https://github.com/robot-clickhouse)).
* Add logs for parallel replica over distributed [#54315](https://github.com/ClickHouse/ClickHouse/pull/54315) ([Igor Nikonov](https://github.com/devcrafter)).
* Increase timeout for system.stack_trace in 01051_system_stack_trace [#54321](https://github.com/ClickHouse/ClickHouse/pull/54321) ([Azat Khuzhin](https://github.com/azat)).
* Fix replace_partition test [#54322](https://github.com/ClickHouse/ClickHouse/pull/54322) ([Pedro Riera](https://github.com/priera)).
* Fix segfault in system.zookeeper [#54326](https://github.com/ClickHouse/ClickHouse/pull/54326) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fixed flaky test `02841_parallel_replicas_summary` [#54331](https://github.com/ClickHouse/ClickHouse/pull/54331) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Consolidate GCD codec tests (Follow up to [#53149](https://github.com/ClickHouse/ClickHouse/issues/53149)) [#54332](https://github.com/ClickHouse/ClickHouse/pull/54332) ([Robert Schulze](https://github.com/rschu1ze)).
* Fixed wrong dereference problem in Context::setTemporaryStorageInCache [#54333](https://github.com/ClickHouse/ClickHouse/pull/54333) ([Alexey Gerasimchuck](https://github.com/Demilivor)).
* Used assert_cast instead of dynamic_cast in ExternalDataSourceCache [#54336](https://github.com/ClickHouse/ClickHouse/pull/54336) ([Alexey Gerasimchuck](https://github.com/Demilivor)).
* Fix bad punctuation in Keeper's logs [#54338](https://github.com/ClickHouse/ClickHouse/pull/54338) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Improved protection from dereferencing of nullptr [#54339](https://github.com/ClickHouse/ClickHouse/pull/54339) ([Alexey Gerasimchuck](https://github.com/Demilivor)).
* Fix filesystem cache test [#54343](https://github.com/ClickHouse/ClickHouse/pull/54343) ([Salvatore Mesoraca](https://github.com/aiven-sal)).
* Parallel replicas: remove unused code [#54354](https://github.com/ClickHouse/ClickHouse/pull/54354) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix flaky test test_storage_azure_blob_storage/test.py::test_schema_iference_cache [#54367](https://github.com/ClickHouse/ClickHouse/pull/54367) ([Kruglov Pavel](https://github.com/Avogar)).
* Enable hedged requests integration tests with tsan, use max_distributed_connections=1 to fix possible flakiness [#54371](https://github.com/ClickHouse/ClickHouse/pull/54371) ([Kruglov Pavel](https://github.com/Avogar)).
* Use abiv2 when generating OpenSSL .s files for powerpc64le [#54375](https://github.com/ClickHouse/ClickHouse/pull/54375) ([Boris Kuschel](https://github.com/bkuschel)).
* Disable prefer_localhost_replica in test for parallel replicas [#54377](https://github.com/ClickHouse/ClickHouse/pull/54377) ([Igor Nikonov](https://github.com/devcrafter)).
* Fix incorrect formatting of CREATE query with PRIMARY KEY [#54403](https://github.com/ClickHouse/ClickHouse/pull/54403) ([Nikolay Degterinsky](https://github.com/evillique)).
* Fix failed assert in attach thread during startup retries [#54408](https://github.com/ClickHouse/ClickHouse/pull/54408) ([Antonio Andelic](https://github.com/antonio2368)).
* Hashtable order fix on big endian platform [#54409](https://github.com/ClickHouse/ClickHouse/pull/54409) ([Suzy Wang](https://github.com/SuzyWangIBMer)).
* Small fine-tune for using ColumnNullable pointer [#54435](https://github.com/ClickHouse/ClickHouse/pull/54435) ([Alex Cheng](https://github.com/Alex-Cheng)).
* Update automated commit status comment [#54441](https://github.com/ClickHouse/ClickHouse/pull/54441) ([vdimir](https://github.com/vdimir)).
* Remove useless line [#54466](https://github.com/ClickHouse/ClickHouse/pull/54466) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add a log message on replicated table drop [#54467](https://github.com/ClickHouse/ClickHouse/pull/54467) ([Dmitry Novik](https://github.com/novikd)).
* Cleanup: unnecessary SelectQueryInfo usage around distributed [#54468](https://github.com/ClickHouse/ClickHouse/pull/54468) ([Igor Nikonov](https://github.com/devcrafter)).
* Add `instance_type` column to CI Logs and the `checks` table [#54469](https://github.com/ClickHouse/ClickHouse/pull/54469) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Refactor IHints [#54481](https://github.com/ClickHouse/ClickHouse/pull/54481) ([flynn](https://github.com/ucasfl)).
* Fix strange message [#54489](https://github.com/ClickHouse/ClickHouse/pull/54489) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Bump re2 to latest main [#54492](https://github.com/ClickHouse/ClickHouse/pull/54492) ([Robert Schulze](https://github.com/rschu1ze)).
* S3 artifacts [#54504](https://github.com/ClickHouse/ClickHouse/pull/54504) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Flush logs for system.backup_log test. [#54507](https://github.com/ClickHouse/ClickHouse/pull/54507) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
* Fix use-after-free in `MergeTreePrefetchedReadPool` [#54512](https://github.com/ClickHouse/ClickHouse/pull/54512) ([Anton Popov](https://github.com/CurtizJ)).
* Disable parallel replicas on shards with not enough nodes [#54519](https://github.com/ClickHouse/ClickHouse/pull/54519) ([Igor Nikonov](https://github.com/devcrafter)).
* Parallel replicas: cleanup unused params [#54520](https://github.com/ClickHouse/ClickHouse/pull/54520) ([Igor Nikonov](https://github.com/devcrafter)).
* FunctionHelpers remove areTypesEqual function [#54546](https://github.com/ClickHouse/ClickHouse/pull/54546) ([Maksim Kita](https://github.com/kitaisreal)).
* Add stateless test for clickhouse keeper-client --no-confirmation [#54547](https://github.com/ClickHouse/ClickHouse/pull/54547) ([Azat Khuzhin](https://github.com/azat)).
* Increase default timeout in tests for keeper-client [#54551](https://github.com/ClickHouse/ClickHouse/pull/54551) ([pufit](https://github.com/pufit)).
* clang-format: Disable namespace indentation and omit {} in if/for/while [#54554](https://github.com/ClickHouse/ClickHouse/pull/54554) ([Robert Schulze](https://github.com/rschu1ze)).
* ngramDistance* queries fix for big endian platform [#54555](https://github.com/ClickHouse/ClickHouse/pull/54555) ([Suzy Wang](https://github.com/SuzyWangIBMer)).
* Fix AST fuzzer crash in MergeTreeIndex{FullText|Inverted} [#54563](https://github.com/ClickHouse/ClickHouse/pull/54563) ([Robert Schulze](https://github.com/rschu1ze)).
* Remove output_format_markdown_escape_special_characters from settings changes history [#54585](https://github.com/ClickHouse/ClickHouse/pull/54585) ([Kruglov Pavel](https://github.com/Avogar)).
* Add basic logic to find releasable commits [#54604](https://github.com/ClickHouse/ClickHouse/pull/54604) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Fix reading of virtual columns in reverse order [#54610](https://github.com/ClickHouse/ClickHouse/pull/54610) ([Anton Popov](https://github.com/CurtizJ)).
* Fix possible CANNOT_READ_ALL_DATA during ZooKeeper client finalization and add some tests [#54632](https://github.com/ClickHouse/ClickHouse/pull/54632) ([Azat Khuzhin](https://github.com/azat)).
* Fix a bug in addData and subData functions [#54636](https://github.com/ClickHouse/ClickHouse/pull/54636) ([Nikolay Degterinsky](https://github.com/evillique)).
* Follow-up to [#54550](https://github.com/ClickHouse/ClickHouse/issues/54550) [#54641](https://github.com/ClickHouse/ClickHouse/pull/54641) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Remove broken lockless variant of re2 [#54642](https://github.com/ClickHouse/ClickHouse/pull/54642) ([Robert Schulze](https://github.com/rschu1ze)).
* Bump abseil [#54646](https://github.com/ClickHouse/ClickHouse/pull/54646) ([Robert Schulze](https://github.com/rschu1ze)).
* limit the delay before next try in S3 [#54651](https://github.com/ClickHouse/ClickHouse/pull/54651) ([Sema Checherinda](https://github.com/CheSema)).
* Fix parser unit tests [#54670](https://github.com/ClickHouse/ClickHouse/pull/54670) ([János Benjamin Antal](https://github.com/antaljanosbenjamin)).
* Fix: Log engine Mark file to read and write in little Endian for s390x [#54677](https://github.com/ClickHouse/ClickHouse/pull/54677) ([bhavnajindal](https://github.com/bhavnajindal)).
* Update WebObjectStorage.cpp [#54695](https://github.com/ClickHouse/ClickHouse/pull/54695) ([Kseniia Sumarokova](https://github.com/kssenii)).
* add cancelation point to s3 retries [#54697](https://github.com/ClickHouse/ClickHouse/pull/54697) ([Sema Checherinda](https://github.com/CheSema)).
* Revert default batch size for Keeper [#54745](https://github.com/ClickHouse/ClickHouse/pull/54745) ([Antonio Andelic](https://github.com/antonio2368)).
* Enable `allow_experimental_undrop_table_query` [#54754](https://github.com/ClickHouse/ClickHouse/pull/54754) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Fix 02882_clickhouse_keeper_client_no_confirmation test [#54761](https://github.com/ClickHouse/ClickHouse/pull/54761) ([Azat Khuzhin](https://github.com/azat)).
* Better exception message in checkDataPart [#54768](https://github.com/ClickHouse/ClickHouse/pull/54768) ([alesapin](https://github.com/alesapin)).
* Don't use default move assignment in TimerDescriptor [#54769](https://github.com/ClickHouse/ClickHouse/pull/54769) ([Kruglov Pavel](https://github.com/Avogar)).
* Add retries to rests test_async_query_sending/test_async_connect [#54772](https://github.com/ClickHouse/ClickHouse/pull/54772) ([Kruglov Pavel](https://github.com/Avogar)).
* update comment [#54780](https://github.com/ClickHouse/ClickHouse/pull/54780) ([flynn](https://github.com/ucasfl)).
* Fix broken tests for clickhouse-diagnostics [#54790](https://github.com/ClickHouse/ClickHouse/pull/54790) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* refine error code of duplicated index in create query [#54791](https://github.com/ClickHouse/ClickHouse/pull/54791) ([Han Fei](https://github.com/hanfei1991)).
* Do not set PR status label [#54799](https://github.com/ClickHouse/ClickHouse/pull/54799) ([vdimir](https://github.com/vdimir)).
* Prevent parquet schema inference reading the first 1 MB of the file unnecessarily [#54808](https://github.com/ClickHouse/ClickHouse/pull/54808) ([Michael Kolupaev](https://github.com/al13n321)).
* Prevent ParquetMetadata reading 40 MB from each file unnecessarily [#54809](https://github.com/ClickHouse/ClickHouse/pull/54809) ([Michael Kolupaev](https://github.com/al13n321)).
* Use appropriate error code instead of LOGICAL_ERROR [#54810](https://github.com/ClickHouse/ClickHouse/pull/54810) ([Yakov Olkhovskiy](https://github.com/yakov-olkhovskiy)).
* Adjusting `num_streams` by expected work in StorageS3 [#54815](https://github.com/ClickHouse/ClickHouse/pull/54815) ([pufit](https://github.com/pufit)).
* Fix test_backup_restore_on_cluster/test.py::test_stop_other_host_during_backup flakiness [#54816](https://github.com/ClickHouse/ClickHouse/pull/54816) ([Azat Khuzhin](https://github.com/azat)).
* Remove config files sizes check [#54824](https://github.com/ClickHouse/ClickHouse/pull/54824) ([Igor Nikonov](https://github.com/devcrafter)).
* Set correct size for signal pipe buffer [#54836](https://github.com/ClickHouse/ClickHouse/pull/54836) ([Antonio Andelic](https://github.com/antonio2368)).
* Refactor and split up vector search tests [#54839](https://github.com/ClickHouse/ClickHouse/pull/54839) ([Robert Schulze](https://github.com/rschu1ze)).
* Add some logging to StorageRabbitMQ [#54842](https://github.com/ClickHouse/ClickHouse/pull/54842) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Update CHANGELOG.md [#54843](https://github.com/ClickHouse/ClickHouse/pull/54843) ([Ilya Yatsishin](https://github.com/qoega)).
* Refactor and simplify multi-directory globs [#54863](https://github.com/ClickHouse/ClickHouse/pull/54863) ([Andrey Zvonov](https://github.com/zvonand)).
* KeeperTCPHandler.cpp: Fix clang-17 build [#54874](https://github.com/ClickHouse/ClickHouse/pull/54874) ([Robert Schulze](https://github.com/rschu1ze)).
* Decrease timeout for fast tests with a commit [#54878](https://github.com/ClickHouse/ClickHouse/pull/54878) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* More stable `02703_keeper_map_concurrent_create_drop` [#54879](https://github.com/ClickHouse/ClickHouse/pull/54879) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix division by zero in StorageS3 [#54904](https://github.com/ClickHouse/ClickHouse/pull/54904) ([pufit](https://github.com/pufit)).
* Set exception for promise in `CreatingSetsTransform` [#54920](https://github.com/ClickHouse/ClickHouse/pull/54920) ([Antonio Andelic](https://github.com/antonio2368)).
* Fix an exception message in Pipe::addTransform [#54926](https://github.com/ClickHouse/ClickHouse/pull/54926) ([Alex Cheng](https://github.com/Alex-Cheng)).
* Fix data race during BackupsWorker::backup_log initialization [#54928](https://github.com/ClickHouse/ClickHouse/pull/54928) ([Victor Krasnov](https://github.com/sirvickr)).
* Provide support for BSON on BE [#54933](https://github.com/ClickHouse/ClickHouse/pull/54933) ([Austin Kothig](https://github.com/kothiga)).
* Set a minimum limit of `num_streams` in StorageS3 [#54936](https://github.com/ClickHouse/ClickHouse/pull/54936) ([pufit](https://github.com/pufit)).
* Ipv4 read big endian [#54938](https://github.com/ClickHouse/ClickHouse/pull/54938) ([Suzy Wang](https://github.com/SuzyWangIBMer)).
* Fix data race in SYSTEM STOP LISTEN [#54939](https://github.com/ClickHouse/ClickHouse/pull/54939) ([Nikolay Degterinsky](https://github.com/evillique)).
* Add desperate instrumentation for debugging deadlock in MultiplexedConnections [#54940](https://github.com/ClickHouse/ClickHouse/pull/54940) ([Michael Kolupaev](https://github.com/al13n321)).
* Respect max_block_size while generating rows for system.stack_trace (will fix flakiness of the test) [#54946](https://github.com/ClickHouse/ClickHouse/pull/54946) ([Azat Khuzhin](https://github.com/azat)).
* Remove test `01051_system_stack_trace` [#54951](https://github.com/ClickHouse/ClickHouse/pull/54951) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Add a test for compatibility [#54960](https://github.com/ClickHouse/ClickHouse/pull/54960) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove test `02151_hash_table_sizes_stats` [#54961](https://github.com/ClickHouse/ClickHouse/pull/54961) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Remove 02151_hash_table_sizes_stats_distributed (fixes broken CI) [#54969](https://github.com/ClickHouse/ClickHouse/pull/54969) ([Azat Khuzhin](https://github.com/azat)).
* Use pregenerated gRPC protocol pb2 files to fix test flakyness. [#54976](https://github.com/ClickHouse/ClickHouse/pull/54976) ([Vitaly Baranov](https://github.com/vitlibar)).
* Delete a test [#54984](https://github.com/ClickHouse/ClickHouse/pull/54984) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Add assertion [#54985](https://github.com/ClickHouse/ClickHouse/pull/54985) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix test parallel replicas over distributed [#54987](https://github.com/ClickHouse/ClickHouse/pull/54987) ([Igor Nikonov](https://github.com/devcrafter)).
* Update README.md [#54990](https://github.com/ClickHouse/ClickHouse/pull/54990) ([Tyler Hannan](https://github.com/tylerhannan)).
* Re-enable clang-tidy checks disabled in the Clang 17 update [#54999](https://github.com/ClickHouse/ClickHouse/pull/54999) ([Robert Schulze](https://github.com/rschu1ze)).
* Print more information about one logical error in MergeTreeDataWriter [#55001](https://github.com/ClickHouse/ClickHouse/pull/55001) ([Michael Kolupaev](https://github.com/al13n321)).
* Add a test [#55003](https://github.com/ClickHouse/ClickHouse/pull/55003) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Lower log levels for `SSOCredentialsProvider` [#55012](https://github.com/ClickHouse/ClickHouse/pull/55012) ([Antonio Andelic](https://github.com/antonio2368)).
* Set exception for promise in `CreatingSetsTransform` in more cases [#55013](https://github.com/ClickHouse/ClickHouse/pull/55013) ([Antonio Andelic](https://github.com/antonio2368)).
* Setting compile_aggregate_expressions comment fix [#55020](https://github.com/ClickHouse/ClickHouse/pull/55020) ([Maksim Kita](https://github.com/kitaisreal)).
* Revert "Added field "is_deterministic" to system.functions" [#55022](https://github.com/ClickHouse/ClickHouse/pull/55022) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Get rid of the most of `os.path` stuff [#55028](https://github.com/ClickHouse/ClickHouse/pull/55028) ([Mikhail f. Shiryaev](https://github.com/Felixoid)).
* Fix pre-build scripts for old branches [#55032](https://github.com/ClickHouse/ClickHouse/pull/55032) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Review fix for [#54935](https://github.com/ClickHouse/ClickHouse/issues/54935) [#55042](https://github.com/ClickHouse/ClickHouse/pull/55042) ([flynn](https://github.com/ucasfl)).
* Update gtest_lru_file_cache.cpp [#55053](https://github.com/ClickHouse/ClickHouse/pull/55053) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix prebuild scripts one more time [#55059](https://github.com/ClickHouse/ClickHouse/pull/55059) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Use different names for variables inside build.sh [#55067](https://github.com/ClickHouse/ClickHouse/pull/55067) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Remove String Jaccard Index [#55080](https://github.com/ClickHouse/ClickHouse/pull/55080) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* I don't understand why backup log is not enabled by default [#55081](https://github.com/ClickHouse/ClickHouse/pull/55081) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Fix typo in packager when ccache is used [#55104](https://github.com/ClickHouse/ClickHouse/pull/55104) ([Ilya Yatsishin](https://github.com/qoega)).
* Reduce flakiness of 01455_opentelemetry_distributed [#55111](https://github.com/ClickHouse/ClickHouse/pull/55111) ([Michael Kolupaev](https://github.com/al13n321)).
* Fix build [#55113](https://github.com/ClickHouse/ClickHouse/pull/55113) ([Alexey Milovidov](https://github.com/alexey-milovidov)).

View File

@ -1259,9 +1259,13 @@ SELECT * FROM json_each_row_nested
- [input_format_import_nested_json](/docs/en/operations/settings/settings-formats.md/#input_format_import_nested_json) - map nested JSON data to nested tables (it works for JSONEachRow format). Default value - `false`.
- [input_format_json_read_bools_as_numbers](/docs/en/operations/settings/settings-formats.md/#input_format_json_read_bools_as_numbers) - allow to parse bools as numbers in JSON input formats. Default value - `true`.
- [input_format_json_read_numbers_as_strings](/docs/en/operations/settings/settings-formats.md/#input_format_json_read_numbers_as_strings) - allow to parse numbers as strings in JSON input formats. Default value - `false`.
- [input_format_json_read_objects_as_strings](/docs/en/operations/settings/settings-formats.md/#input_format_json_read_objects_as_strings) - allow to parse JSON objects as strings in JSON input formats. Default value - `false`.
- [input_format_json_read_numbers_as_strings](/docs/en/operations/settings/settings-formats.md/#input_format_json_read_numbers_as_strings) - allow to parse numbers as strings in JSON input formats. Default value - `true`.
- [input_format_json_read_arrays_as_strings](/docs/en/operations/settings/settings-formats.md/#input_format_json_read_arrays_as_strings) - allow to parse JSON arrays as strings in JSON input formats. Default value - `true`.
- [input_format_json_read_objects_as_strings](/docs/en/operations/settings/settings-formats.md/#input_format_json_read_objects_as_strings) - allow to parse JSON objects as strings in JSON input formats. Default value - `true`.
- [input_format_json_named_tuples_as_objects](/docs/en/operations/settings/settings-formats.md/#input_format_json_named_tuples_as_objects) - parse named tuple columns as JSON objects. Default value - `true`.
- [input_format_json_try_infer_numbers_from_strings](/docs/en/operations/settings/settings-formats.md/#input_format_json_try_infer_numbers_from_strings) - Try to infer numbers from string fields while schema inference. Default value - `false`.
- [input_format_json_try_infer_named_tuples_from_objects](/docs/en/operations/settings/settings-formats.md/#input_format_json_try_infer_named_tuples_from_objects) - try to infer named tuple from JSON objects during schema inference. Default value - `true`.
- [input_format_json_infer_incomplete_types_as_strings](/docs/en/operations/settings/settings-formats.md/#input_format_json_infer_incomplete_types_as_strings) - use type String for keys that contains only Nulls or empty objects/arrays during schema inference in JSON input formats. Default value - `true`.
- [input_format_json_defaults_for_missing_elements_in_named_tuple](/docs/en/operations/settings/settings-formats.md/#input_format_json_defaults_for_missing_elements_in_named_tuple) - insert default values for missing elements in JSON object while parsing named tuple. Default value - `true`.
- [input_format_json_ignore_unknown_keys_in_named_tuple](/docs/en/operations/settings/settings-formats.md/#input_format_json_ignore_unknown_keys_in_named_tuple) - Ignore unknown keys in json object for named tuples. Default value - `false`.
- [input_format_json_compact_allow_variable_number_of_columns](/docs/en/operations/settings/settings-formats.md/#input_format_json_compact_allow_variable_number_of_columns) - allow variable number of columns in JSONCompact/JSONCompactEachRow format, ignore extra columns and use default values on missing columns. Default value - `false`.

View File

@ -389,9 +389,25 @@ DESC format(JSONEachRow, '{"arr" : [null, 42, null]}')
└──────┴────────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
```
Tuples:
Named tuples:
In JSON formats we treat Arrays with elements of different types as Tuples.
When setting `input_format_json_try_infer_named_tuples_from_objects` is enabled, during schema inference ClickHouse will try to infer named Tuple from JSON objects.
The resulting named Tuple will contain all elements from all corresponding JSON objects from sample data.
```sql
SET input_format_json_try_infer_named_tuples_from_objects = 1;
DESC format(JSONEachRow, '{"obj" : {"a" : 42, "b" : "Hello"}}, {"obj" : {"a" : 43, "c" : [1, 2, 3]}}, {"obj" : {"d" : {"e" : 42}}}')
```
```response
┌─name─┬─type───────────────────────────────────────────────────────────────────────────────────────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ obj │ Tuple(a Nullable(Int64), b Nullable(String), c Array(Nullable(Int64)), d Tuple(e Nullable(Int64))) │ │ │ │ │ │
└──────┴────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
```
Unnamed Tuples:
In JSON formats we treat Arrays with elements of different types as Unnamed Tuples.
```sql
DESC format(JSONEachRow, '{"tuple" : [1, "Hello, World!", [1, 2, 3]]}')
```
@ -418,7 +434,10 @@ DESC format(JSONEachRow, $$
Maps:
In JSON we can read objects with values of the same type as Map type.
Note: it will work only when settings `input_format_json_read_objects_as_strings` and `input_format_json_try_infer_named_tuples_from_objects` are disabled.
```sql
SET input_format_json_read_objects_as_strings = 0, input_format_json_try_infer_named_tuples_from_objects = 0;
DESC format(JSONEachRow, '{"map" : {"key1" : 42, "key2" : 24, "key3" : 4}}')
```
```response
@ -448,14 +467,22 @@ Nested complex types:
DESC format(JSONEachRow, '{"value" : [[[42, 24], []], {"key1" : 42, "key2" : 24}]}')
```
```response
┌─name──┬─type───────────────────────────────────────────────────────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ value │ Tuple(Array(Array(Nullable(Int64))), Map(String, Nullable(Int64))) │ │ │ │ │ │
└───────┴────────────────────────────────────────────────────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
┌─name──┬─type─────────────────────────────────────────────────────────────────────────────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ value │ Tuple(Array(Array(Nullable(String))), Tuple(key1 Nullable(Int64), key2 Nullable(Int64))) │ │ │ │ │ │
└───────┴──────────────────────────────────────────────────────────────────────────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
```
If ClickHouse cannot determine the type, because the data contains only nulls, an exception will be thrown:
If ClickHouse cannot determine the type for some key, because the data contains only nulls/empty objects/empty arrays, type `String` will be used if setting `input_format_json_infer_incomplete_types_as_strings` is enabled or an exception will be thrown otherwise:
```sql
DESC format(JSONEachRow, '{"arr" : [null, null]}')
DESC format(JSONEachRow, '{"arr" : [null, null]}') SETTINGS input_format_json_infer_incomplete_types_as_strings = 1;
```
```response
┌─name─┬─type────────────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ arr │ Array(Nullable(String)) │ │ │ │ │ │
└──────┴─────────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
```
```sql
DESC format(JSONEachRow, '{"arr" : [null, null]}') SETTINGS input_format_json_infer_incomplete_types_as_strings = 0;
```
```response
Code: 652. DB::Exception: Received from localhost:9000. DB::Exception:
@ -466,31 +493,11 @@ most likely this column contains only Nulls or empty Arrays/Maps.
#### JSON settings {#json-settings}
##### input_format_json_read_objects_as_strings
Enabling this setting allows reading nested JSON objects as strings.
This setting can be used to read nested JSON objects without using JSON object type.
This setting is enabled by default.
```sql
SET input_format_json_read_objects_as_strings = 1;
DESC format(JSONEachRow, $$
{"obj" : {"key1" : 42, "key2" : [1,2,3,4]}}
{"obj" : {"key3" : {"nested_key" : 1}}}
$$)
```
```response
┌─name─┬─type─────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ obj │ Nullable(String) │ │ │ │ │ │
└──────┴──────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
```
##### input_format_json_try_infer_numbers_from_strings
Enabling this setting allows inferring numbers from string values.
This setting is enabled by default.
This setting is disabled by default.
**Example:**
@ -507,11 +514,69 @@ DESC format(JSONEachRow, $$
└───────┴─────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
```
##### input_format_json_try_infer_named_tuples_from_objects
Enabling this setting allows inferring named Tuples from JSON objects. The resulting named Tuple will contain all elements from all corresponding JSON objects from sample data.
It can be useful when JSON data is not sparse so the sample of data will contain all possible object keys.
This setting is enabled by default.
**Example**
```sql
SET input_format_json_try_infer_named_tuples_from_objects = 1;
DESC format(JSONEachRow, '{"obj" : {"a" : 42, "b" : "Hello"}}, {"obj" : {"a" : 43, "c" : [1, 2, 3]}}, {"obj" : {"d" : {"e" : 42}}}')
```
Result:
```
┌─name─┬─type───────────────────────────────────────────────────────────────────────────────────────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ obj │ Tuple(a Nullable(Int64), b Nullable(String), c Array(Nullable(Int64)), d Tuple(e Nullable(Int64))) │ │ │ │ │ │
└──────┴────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
```
```sql
SET input_format_json_try_infer_named_tuples_from_objects = 1;
DESC format(JSONEachRow, '{"array" : [{"a" : 42, "b" : "Hello"}, {}, {"c" : [1,2,3]}, {"d" : "2020-01-01"}]}')
```
Result:
```
┌─name──┬─type────────────────────────────────────────────────────────────────────────────────────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ array │ Array(Tuple(a Nullable(Int64), b Nullable(String), c Array(Nullable(Int64)), d Nullable(Date))) │ │ │ │ │ │
└───────┴─────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
```
##### input_format_json_read_objects_as_strings
Enabling this setting allows reading nested JSON objects as strings.
This setting can be used to read nested JSON objects without using JSON object type.
This setting is enabled by default.
Note: enabling this setting will take effect only if setting `input_format_json_try_infer_named_tuples_from_objects` is disabled.
```sql
SET input_format_json_read_objects_as_strings = 1, input_format_json_try_infer_named_tuples_from_objects = 0;
DESC format(JSONEachRow, $$
{"obj" : {"key1" : 42, "key2" : [1,2,3,4]}}
{"obj" : {"key3" : {"nested_key" : 1}}}
$$)
```
```response
┌─name─┬─type─────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ obj │ Nullable(String) │ │ │ │ │ │
└──────┴──────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
```
##### input_format_json_read_numbers_as_strings
Enabling this setting allows reading numeric values as strings.
This setting is disabled by default.
This setting is enabled by default.
**Example**
@ -549,6 +614,49 @@ DESC format(JSONEachRow, $$
└───────┴─────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
```
##### input_format_json_read_arrays_as_strings
Enabling this setting allows reading JSON array values as strings.
This setting is enabled by default.
**Example**
```sql
SET input_format_json_read_arrays_as_strings = 1;
SELECT arr, toTypeName(arr), JSONExtractArrayRaw(arr)[3] from format(JSONEachRow, 'arr String', '{"arr" : [1, "Hello", [1,2,3]]}');
```
```response
┌─arr───────────────────┬─toTypeName(arr)─┬─arrayElement(JSONExtractArrayRaw(arr), 3)─┐
│ [1, "Hello", [1,2,3]] │ String │ [1,2,3] │
└───────────────────────┴─────────────────┴───────────────────────────────────────────┘
```
##### input_format_json_infer_incomplete_types_as_strings
Enabling this setting allows to use String type for JSON keys that contain only `Null`/`{}`/`[]` in data sample during schema inference.
In JSON formats any value can be read as String if all corresponding settings are enabled (they are all enabled by default), and we can avoid errors like `Cannot determine type for column 'column_name' by first 25000 rows of data, most likely this column contains only Nulls or empty Arrays/Maps` during schema inference
by using String type for keys with unknown types.
Example:
```sql
SET input_format_json_infer_incomplete_types_as_strings = 1, input_format_json_try_infer_named_tuples_from_objects = 1;
DESCRIBE format(JSONEachRow, '{"obj" : {"a" : [1,2,3], "b" : "hello", "c" : null, "d" : {}, "e" : []}}');
SELECT * FROM format(JSONEachRow, '{"obj" : {"a" : [1,2,3], "b" : "hello", "c" : null, "d" : {}, "e" : []}}');
```
Result:
```
┌─name─┬─type───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ obj │ Tuple(a Array(Nullable(Int64)), b Nullable(String), c Nullable(String), d Nullable(String), e Array(Nullable(String))) │ │ │ │ │ │
└──────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
┌─obj────────────────────────────┐
│ ([1,2,3],'hello',NULL,'{}',[]) │
└────────────────────────────────┘
```
### CSV {#csv}
In CSV format ClickHouse extracts column values from the row according to delimiters. ClickHouse expects all types except numbers and strings to be enclosed in double quotes. If the value is in double quotes, ClickHouse tries to parse

View File

@ -381,6 +381,13 @@ Enabled by default.
Allow parsing numbers as strings in JSON input formats.
Enabled by default.
## input_format_json_try_infer_numbers_from_strings {#input_format_json_try_infer_numbers_from_strings}
If enabled, during schema inference ClickHouse will try to infer numbers from string fields.
It can be useful if JSON data contains quoted UInt64 numbers.
Disabled by default.
## input_format_json_read_objects_as_strings {#input_format_json_read_objects_as_strings}
@ -404,7 +411,76 @@ Result:
└────┴──────────────────────────┴────────────┘
```
Disabled by default.
Enabled by default.
## input_format_json_try_infer_named_tuples_from_objects {#input_format_json_try_infer_named_tuples_from_objects}
If enabled, during schema inference ClickHouse will try to infer named Tuple from JSON objects.
The resulting named Tuple will contain all elements from all corresponding JSON objects from sample data.
Example:
```sql
SET input_format_json_try_infer_named_tuples_from_objects = 1;
DESC format(JSONEachRow, '{"obj" : {"a" : 42, "b" : "Hello"}}, {"obj" : {"a" : 43, "c" : [1, 2, 3]}}, {"obj" : {"d" : {"e" : 42}}}')
```
Result:
```
┌─name─┬─type───────────────────────────────────────────────────────────────────────────────────────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ obj │ Tuple(a Nullable(Int64), b Nullable(String), c Array(Nullable(Int64)), d Tuple(e Nullable(Int64))) │ │ │ │ │ │
└──────┴────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
```
Enabled by default.
## input_format_json_read_arrays_as_strings {#input_format_json_read_arrays_as_strings}
Allow parsing JSON arrays as strings in JSON input formats.
Example:
```sql
SET input_format_json_read_arrays_as_strings = 1;
SELECT arr, toTypeName(arr), JSONExtractArrayRaw(arr)[3] from format(JSONEachRow, 'arr String', '{"arr" : [1, "Hello", [1,2,3]]}');
```
Result:
```
┌─arr───────────────────┬─toTypeName(arr)─┬─arrayElement(JSONExtractArrayRaw(arr), 3)─┐
│ [1, "Hello", [1,2,3]] │ String │ [1,2,3] │
└───────────────────────┴─────────────────┴───────────────────────────────────────────┘
```
Enabled by default.
## input_format_json_infer_incomplete_types_as_strings {#input_format_json_infer_incomplete_types_as_strings}
Allow to use String type for JSON keys that contain only `Null`/`{}`/`[]` in data sample during schema inference.
In JSON formats any value can be read as String, and we can avoid errors like `Cannot determine type for column 'column_name' by first 25000 rows of data, most likely this column contains only Nulls or empty Arrays/Maps` during schema inference
by using String type for keys with unknown types.
Example:
```sql
SET input_format_json_infer_incomplete_types_as_strings = 1, input_format_json_try_infer_named_tuples_from_objects = 1;
DESCRIBE format(JSONEachRow, '{"obj" : {"a" : [1,2,3], "b" : "hello", "c" : null, "d" : {}, "e" : []}}');
SELECT * FROM format(JSONEachRow, '{"obj" : {"a" : [1,2,3], "b" : "hello", "c" : null, "d" : {}, "e" : []}}');
```
Result:
```
┌─name─┬─type───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ obj │ Tuple(a Array(Nullable(Int64)), b Nullable(String), c Nullable(String), d Nullable(String), e Array(Nullable(String))) │ │ │ │ │ │
└──────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
┌─obj────────────────────────────┐
│ ([1,2,3],'hello',NULL,'{}',[]) │
└────────────────────────────────┘
```
Enabled by default.
## input_format_json_validate_types_from_metadata {#input_format_json_validate_types_from_metadata}

View File

@ -4659,6 +4659,10 @@ SELECT toFloat64('1.7091'), toFloat64('1.5008753E7') SETTINGS precise_float_pars
Interval (in milliseconds) for sending updates with partial data about the result table to the client (in interactive mode) during query execution. Setting to 0 disables partial results. Only supported for single-threaded GROUP BY without key, ORDER BY, LIMIT and OFFSET.
:::note
It's an experimental feature. Enable `allow_experimental_partial_result` setting first to use it.
:::
## max_rows_in_partial_result
Maximum rows to show in the partial result after every real-time update while the query runs (use partial result limit + OFFSET as a value in case of OFFSET in the query).
@ -4678,3 +4682,36 @@ The default value is `false`.
``` xml
<validate_tcp_client_information>true</validate_tcp_client_information>
```
## print_pretty_type_names {#print_pretty_type_names}
Allows to print deep-nested type names in a pretty way with indents in `DESCRIBE` query and in `toTypeName()` function.
Example:
```sql
CREATE TABLE test (a Tuple(b String, c Tuple(d Nullable(UInt64), e Array(UInt32), f Array(Tuple(g String, h Map(String, Array(Tuple(i String, j UInt64))))), k Date), l Nullable(String))) ENGINE=Memory;
DESCRIBE TABLE test FORMAT TSVRaw SETTINGS print_pretty_type_names=1;
```
```
a Tuple(
b String,
c Tuple(
d Nullable(UInt64),
e Array(UInt32),
f Array(Tuple(
g String,
h Map(
String,
Array(Tuple(
i String,
j UInt64
))
)
)),
k Date
),
l Nullable(String)
)
```

View File

@ -137,6 +137,54 @@ Like [makeDateTime](#makedatetime) but produces a [DateTime64](../../sql-referen
makeDateTime32(year, month, day, hour, minute, second[, fraction[, precision[, timezone]]])
```
## timestamp
Converts the first argument 'expr' to type [DateTime64(6)](../../sql-reference/data-types/datetime64.md).
If a second argument 'expr_time' is provided, it adds the specified time to the converted value.
**Syntax**
``` sql
timestamp(expr[, expr_time])
```
Alias: `TIMESTAMP`
**Arguments**
- `expr` - Date or date with time. Type: [String](../../sql-reference/data-types/string.md).
- `expr_time` - Optional parameter. Time to add. [String](../../sql-reference/data-types/string.md).
**Examples**
``` sql
SELECT timestamp('2023-12-31') as ts;
```
Result:
``` text
┌─────────────────────────ts─┐
│ 2023-12-31 00:00:00.000000 │
└────────────────────────────┘
```
``` sql
SELECT timestamp('2023-12-31 12:00:00', '12:00:00.11') as ts;
```
Result:
``` text
┌─────────────────────────ts─┐
│ 2024-01-01 00:00:00.110000 │
└────────────────────────────┘
```
**Returned value**
- [DateTime64](../../sql-reference/data-types/datetime64.md)(6)
## timeZone
Returns the timezone of the current session, i.e. the value of setting [session_timezone](../../operations/settings/settings.md#session_timezone).

View File

@ -113,6 +113,7 @@ For the query to run successfully, the following conditions must be met:
- Both tables must have the same structure.
- Both tables must have the same partition key, the same order by key and the same primary key.
- Both tables must have the same indices and projections.
- Both tables must have the same storage policy.
## REPLACE PARTITION
@ -132,6 +133,7 @@ For the query to run successfully, the following conditions must be met:
- Both tables must have the same structure.
- Both tables must have the same partition key, the same order by key and the same primary key.
- Both tables must have the same indices and projections.
- Both tables must have the same storage policy.
## MOVE PARTITION TO TABLE
@ -146,6 +148,7 @@ For the query to run successfully, the following conditions must be met:
- Both tables must have the same structure.
- Both tables must have the same partition key, the same order by key and the same primary key.
- Both tables must have the same indices and projections.
- Both tables must have the same storage policy.
- Both tables must be the same engine family (replicated or non-replicated).

View File

@ -162,6 +162,28 @@ The below get data from all `test-data.csv.gz` files from any folder inside `my-
SELECT * FROM s3('https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/**/test-data.csv.gz', 'CSV', 'name String, value UInt32', 'gzip');
```
Note. It is possible to specify custom URL mappers in the server configuration file. Example:
``` sql
SELECT * FROM s3('s3://clickhouse-public-datasets/my-test-bucket-768/**/test-data.csv.gz', 'CSV', 'name String, value UInt32', 'gzip');
```
The URL `'s3://clickhouse-public-datasets/my-test-bucket-768/**/test-data.csv.gz'` would be replaced to `'http://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/**/test-data.csv.gz'`
Custom mapper can be added into `config.xml`:
``` xml
<url_scheme_mappers>
<s3>
<to>https://{bucket}.s3.amazonaws.com</to>
</s3>
<gs>
<to>https://{bucket}.storage.googleapis.com</to>
</gs>
<oss>
<to>https://{bucket}.oss.aliyuncs.com</to>
</oss>
</url_scheme_mappers>
```
## Partitioned Write
If you specify `PARTITION BY` expression when inserting data into `S3` table, a separate file is created for each partition value. Splitting the data into separate files helps to improve reading operations efficiency.

View File

@ -485,6 +485,8 @@ try
unused_event,
[&](ConfigurationPtr config, bool /* initial_loading */)
{
updateLevels(*config, logger());
if (config->has("keeper_server"))
global_context->updateKeeperConfiguration(*config);

View File

@ -91,6 +91,18 @@
</formatting> -->
</logger>
<url_scheme_mappers>
<s3>
<to>https://{bucket}.s3.amazonaws.com</to>
</s3>
<gs>
<to>https://{bucket}.storage.googleapis.com</to>
</gs>
<oss>
<to>https://{bucket}.oss.aliyuncs.com</to>
</oss>
</url_scheme_mappers>
<!-- Add headers to response in options request. OPTIONS method is used in CORS preflight requests. -->
<!-- It is off by default. Next headers are obligate for CORS.-->
<!-- http_options_response>
@ -1217,14 +1229,13 @@
</asynchronous_insert_log>
<!-- Backup/restore log.
Uncomment to write backup/restore log records into a system table.
-->
<backup_log>
<database>system</database>
<table>backup_log</table>
<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>0</flush_interval_milliseconds>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</backup_log>
-->
<!-- <top_level_domains_path>/var/lib/clickhouse/top_level_domains/</top_level_domains_path> -->
<!-- Custom TLD lists.

View File

@ -1,10 +1,9 @@
#include <DataTypes/DataTypeNullable.h>
#include <AggregateFunctions/AggregateFunctionNull.h>
#include <AggregateFunctions/AggregateFunctionNothing.h>
#include <AggregateFunctions/AggregateFunctionCount.h>
#include <AggregateFunctions/AggregateFunctionState.h>
#include <AggregateFunctions/AggregateFunctionCombinatorFactory.h>
#include <AggregateFunctions/AggregateFunctionSimpleState.h>
namespace DB
{
@ -39,6 +38,34 @@ public:
return res;
}
template <typename T>
std::optional<AggregateFunctionPtr> tryTransformStateFunctionImpl(const AggregateFunctionPtr & nested_function,
const AggregateFunctionProperties & properties,
const DataTypes & arguments,
const Array & params) const
{
if (const T * function_state = typeid_cast<const T *>(nested_function.get()))
{
auto transformed_nested_function = transformAggregateFunction(function_state->getNestedFunction(), properties, arguments, params);
return std::make_shared<T>(
transformed_nested_function,
transformed_nested_function->getArgumentTypes(),
transformed_nested_function->getParameters());
}
return {};
}
AggregateFunctionPtr tryTransformStateFunction(const AggregateFunctionPtr & nested_function,
const AggregateFunctionProperties & properties,
const DataTypes & arguments,
const Array & params) const
{
return tryTransformStateFunctionImpl<AggregateFunctionState>(nested_function, properties, arguments, params)
.or_else([&]() { return tryTransformStateFunctionImpl<AggregateFunctionSimpleState>(nested_function, properties, arguments, params); })
.value_or(nullptr);
}
AggregateFunctionPtr transformAggregateFunction(
const AggregateFunctionPtr & nested_function,
const AggregateFunctionProperties & properties,
@ -82,17 +109,11 @@ public:
if (auto adapter = nested_function->getOwnNullAdapter(nested_function, arguments, params, properties))
return adapter;
/// If applied to aggregate function with -State combinator, we apply -Null combinator to it's nested_function instead of itself.
/// If applied to aggregate function with either -State/-SimpleState combinator, we apply -Null combinator to it's nested_function instead of itself.
/// Because Nullable AggregateFunctionState does not make sense and ruins the logic of managing aggregate function states.
if (const AggregateFunctionState * function_state = typeid_cast<const AggregateFunctionState *>(nested_function.get()))
if (const AggregateFunctionPtr new_function = tryTransformStateFunction(nested_function, properties, arguments, params))
{
auto transformed_nested_function = transformAggregateFunction(function_state->getNestedFunction(), properties, arguments, params);
return std::make_shared<AggregateFunctionState>(
transformed_nested_function,
transformed_nested_function->getArgumentTypes(),
transformed_nested_function->getParameters());
return new_function;
}
bool return_type_is_nullable = !properties.returns_default_when_only_null && nested_function->getResultType()->canBeInsideNullable();

View File

@ -5,9 +5,9 @@
namespace DB
{
String BackupCoordinationStage::formatGatheringMetadata(size_t pass)
String BackupCoordinationStage::formatGatheringMetadata(int attempt_no)
{
return fmt::format("{} ({})", GATHERING_METADATA, pass);
return fmt::format("{} ({})", GATHERING_METADATA, attempt_no);
}
}

View File

@ -15,7 +15,7 @@ namespace BackupCoordinationStage
/// Finding all tables and databases which we're going to put to the backup and collecting their metadata.
constexpr const char * GATHERING_METADATA = "gathering metadata";
String formatGatheringMetadata(size_t pass);
String formatGatheringMetadata(int attempt_no);
/// Making temporary hard links and prepare backup entries.
constexpr const char * EXTRACTING_DATA_FROM_TABLES = "extracting data from tables";

View File

@ -58,17 +58,21 @@ namespace
return str;
}
/// How long we should sleep after finding an inconsistency error.
std::chrono::milliseconds getSleepTimeAfterInconsistencyError(size_t pass)
String tableNameWithTypeToString(const QualifiedTableName & table_name, bool first_upper)
{
size_t ms;
if (pass == 1) /* pass is 1-based */
ms = 0;
else if ((pass % 10) != 1)
ms = 0;
else
ms = 1000;
return std::chrono::milliseconds{ms};
return tableNameWithTypeToString(table_name.database, table_name.table, first_upper);
}
/// How long we should sleep after finding an inconsistency error.
std::chrono::milliseconds getSleepTimeAfterInconsistencyError(int attempt_no, unsigned int attempts_before_sleep, std::chrono::milliseconds min_sleep, std::chrono::milliseconds max_sleep)
{
attempts_before_sleep = std::max(attempts_before_sleep, 1U);
if (((attempt_no + 1) % attempts_before_sleep) != 0)
return std::chrono::milliseconds::zero(); /// no sleep
int sleep_counter = attempt_no / attempts_before_sleep;
std::chrono::milliseconds sleep_time = intExp2(std::min(sleep_counter, 10)) * min_sleep;
return std::min(sleep_time, max_sleep);
}
}
@ -85,7 +89,11 @@ BackupEntriesCollector::BackupEntriesCollector(
, read_settings(read_settings_)
, context(context_)
, on_cluster_first_sync_timeout(context->getConfigRef().getUInt64("backups.on_cluster_first_sync_timeout", 180000))
, consistent_metadata_snapshot_timeout(context->getConfigRef().getUInt64("backups.consistent_metadata_snapshot_timeout", 600000))
, collect_metadata_timeout(context->getConfigRef().getUInt64("backups.collect_metadata_timeout", context->getConfigRef().getUInt64("backups.consistent_metadata_snapshot_timeout", 600000)))
, attempts_to_collect_metadata_before_sleep(context->getConfigRef().getUInt("backups.attempts_to_collect_metadata_before_sleep", 2))
, min_sleep_before_next_attempt_to_collect_metadata(context->getConfigRef().getUInt64("backups.min_sleep_before_next_attempt_to_collect_metadata", 100))
, max_sleep_before_next_attempt_to_collect_metadata(context->getConfigRef().getUInt64("backups.max_sleep_before_next_attempt_to_collect_metadata", 5000))
, compare_collected_metadata(context->getConfigRef().getBool("backups.compare_collected_metadata", true))
, log(&Poco::Logger::get("BackupEntriesCollector"))
, global_zookeeper_retries_info(
"BackupEntriesCollector",
@ -143,14 +151,14 @@ Strings BackupEntriesCollector::setStage(const String & new_stage, const String
backup_coordination->setStage(new_stage, message);
if (new_stage == Stage::formatGatheringMetadata(1))
if (new_stage == Stage::formatGatheringMetadata(0))
{
return backup_coordination->waitForStage(new_stage, on_cluster_first_sync_timeout);
}
else if (new_stage.starts_with(Stage::GATHERING_METADATA))
{
auto current_time = std::chrono::steady_clock::now();
auto end_of_timeout = std::max(current_time, consistent_metadata_snapshot_end_time);
auto end_of_timeout = std::max(current_time, collect_metadata_end_time);
return backup_coordination->waitForStage(
new_stage, std::chrono::duration_cast<std::chrono::milliseconds>(end_of_timeout - current_time));
}
@ -177,64 +185,55 @@ void BackupEntriesCollector::calculateRootPathInBackup()
/// Finds databases and tables which we will put to the backup.
void BackupEntriesCollector::gatherMetadataAndCheckConsistency()
{
setStage(Stage::formatGatheringMetadata(1));
/// With the default values the metadata collecting works in the following way:
/// 1) it tries to collect the metadata for the first time; then
/// 2) it tries to collect it again, and compares the results from the first and the second collecting; if they match, it's done; otherwise
/// 3) it sleeps 100 millisecond and tries to collect again, then compares the results from the second and the third collecting; if they match, it's done; otherwise
/// 4) it tries to collect again, then compares the results from the third and the fourth collecting; if they match, it's done; otherwise
/// 5) it sleeps 200 milliseconds and tries to collect again, then compares the results from the fourth and the fifth collecting; if they match, it's done;
/// ...
/// and so on, the sleep time is doubled each time until it reaches 5000 milliseconds.
/// And such attempts will be continued until 600000 milliseconds pass.
consistent_metadata_snapshot_end_time = std::chrono::steady_clock::now() + consistent_metadata_snapshot_timeout;
setStage(Stage::formatGatheringMetadata(0));
for (size_t pass = 1;; ++pass)
collect_metadata_end_time = std::chrono::steady_clock::now() + collect_metadata_timeout;
for (int attempt_no = 0;; ++attempt_no)
{
String next_stage = Stage::formatGatheringMetadata(pass + 1);
std::optional<Exception> inconsistency_error;
if (tryGatherMetadataAndCompareWithPrevious(inconsistency_error))
{
/// Gathered metadata and checked consistency, cool! But we have to check that other hosts cope with that too.
auto all_hosts_results = setStage(next_stage, "consistent");
bool need_another_attempt = false;
tryGatherMetadataAndCompareWithPrevious(attempt_no, inconsistency_error, need_another_attempt);
syncMetadataGatheringWithOtherHosts(attempt_no, inconsistency_error, need_another_attempt);
std::optional<String> host_with_inconsistency;
std::optional<String> inconsistency_error_on_other_host;
for (size_t i = 0; i != all_hosts.size(); ++i)
if (inconsistency_error && (std::chrono::steady_clock::now() > collect_metadata_end_time))
inconsistency_error->rethrow();
/// Rethrow or just log the inconsistency error.
if (!need_another_attempt)
break;
/// It's going to be another attempt, we need to sleep a bit.
if (inconsistency_error)
{
auto sleep_time = getSleepTimeAfterInconsistencyError(
attempt_no,
attempts_to_collect_metadata_before_sleep,
min_sleep_before_next_attempt_to_collect_metadata,
max_sleep_before_next_attempt_to_collect_metadata);
if (sleep_time.count())
{
if ((i < all_hosts_results.size()) && (all_hosts_results[i] != "consistent"))
{
host_with_inconsistency = all_hosts[i];
inconsistency_error_on_other_host = all_hosts_results[i];
break;
}
LOG_TRACE(log, "Sleeping {} before next attempt to collect metadata", to_string(sleep_time));
sleepForMilliseconds(std::chrono::duration_cast<std::chrono::milliseconds>(sleep_time).count());
}
if (!host_with_inconsistency)
break; /// All hosts managed to gather metadata and everything is consistent, so we can go further to writing the backup.
inconsistency_error = Exception{
ErrorCodes::INCONSISTENT_METADATA_FOR_BACKUP,
"Found inconsistency on host {}: {}",
*host_with_inconsistency,
*inconsistency_error_on_other_host};
}
else
{
/// Failed to gather metadata or something wasn't consistent. We'll let other hosts know that and try again.
setStage(next_stage, inconsistency_error->displayText());
}
/// Two passes is minimum (we need to compare with table names with previous ones to be sure we don't miss anything).
if (pass >= 2)
{
if (std::chrono::steady_clock::now() > consistent_metadata_snapshot_end_time)
inconsistency_error->rethrow();
else
LOG_WARNING(log, getExceptionMessageAndPattern(*inconsistency_error, /* with_stacktrace */ false));
}
auto sleep_time = getSleepTimeAfterInconsistencyError(pass);
if (sleep_time.count() > 0)
sleepForNanoseconds(std::chrono::duration_cast<std::chrono::nanoseconds>(sleep_time).count());
}
/// All hosts managed to gather metadata and everything is consistent, so we can go further to writing the backup.
LOG_INFO(log, "Will backup {} databases and {} tables", database_infos.size(), table_infos.size());
}
bool BackupEntriesCollector::tryGatherMetadataAndCompareWithPrevious(std::optional<Exception> & inconsistency_error)
void BackupEntriesCollector::tryGatherMetadataAndCompareWithPrevious(int attempt_no, std::optional<Exception> & inconsistency_error, bool & need_another_attempt)
{
try
{
@ -250,13 +249,75 @@ bool BackupEntriesCollector::tryGatherMetadataAndCompareWithPrevious(std::option
if (e.code() != ErrorCodes::INCONSISTENT_METADATA_FOR_BACKUP)
throw;
/// If gatherDatabasesMetadata() or gatherTablesMetadata() threw a INCONSISTENT_METADATA_FOR_BACKUP error then the metadata is not consistent by itself,
/// for example a CREATE QUERY could contain a wrong table name if the table has been just renamed. We need another attempt in that case.
inconsistency_error = e;
return false;
need_another_attempt = true;
return;
}
if (!compare_collected_metadata)
return;
/// We have to check consistency of collected information to protect from the case when some table or database is
/// renamed during this collecting making the collected information invalid.
return compareWithPrevious(inconsistency_error);
String mismatch_description;
if (!compareWithPrevious(mismatch_description))
{
/// Usually two passes is minimum.
/// (Because we need to compare with table names from the previous pass to be sure we are not going to miss anything).
if (attempt_no >= 1)
inconsistency_error = Exception{ErrorCodes::INCONSISTENT_METADATA_FOR_BACKUP, "{}", mismatch_description};
need_another_attempt = true;
}
}
void BackupEntriesCollector::syncMetadataGatheringWithOtherHosts(int attempt_no, std::optional<Exception> & inconsistency_error, bool & need_another_attempt)
{
String next_stage = Stage::formatGatheringMetadata(attempt_no + 1);
if (inconsistency_error)
{
/// Failed to gather metadata or something wasn't consistent. We'll let other hosts know that and try again.
LOG_WARNING(log, "Found inconsistency: {}", inconsistency_error->displayText());
setStage(next_stage, inconsistency_error->displayText());
return;
}
/// We've gathered metadata, cool! But we have to check that other hosts coped with that too.
auto all_hosts_results = setStage(next_stage, need_another_attempt ? "need_another_attempt" : "consistent");
size_t count = std::min(all_hosts.size(), all_hosts_results.size());
for (size_t i = 0; i != count; ++i)
{
const auto & other_host = all_hosts[i];
const auto & other_host_result = all_hosts_results[i];
if ((other_host_result != "consistent") && (other_host_result != "need_another_attempt"))
{
LOG_WARNING(log, "Found inconsistency on host {}: {}", other_host, other_host_result);
inconsistency_error = Exception{ErrorCodes::INCONSISTENT_METADATA_FOR_BACKUP, "Found inconsistency on host {}: {}", other_host, other_host_result};
need_another_attempt = true;
return;
}
}
if (need_another_attempt)
{
LOG_TRACE(log, "Needs another attempt of collecting metadata");
return;
}
for (size_t i = 0; i != count; ++i)
{
const auto & other_host = all_hosts[i];
const auto & other_host_result = all_hosts_results[i];
if (other_host_result == "need_another_attempt")
{
LOG_TRACE(log, "Host {} needs another attempt of collecting metadata", other_host);
need_another_attempt = true;
return;
}
}
}
void BackupEntriesCollector::gatherDatabasesMetadata()
@ -542,27 +603,29 @@ void BackupEntriesCollector::lockTablesForReading()
{
auto storage = table_info.storage;
if (storage)
{
table_info.table_lock = storage->tryLockForShare(context->getInitialQueryId(), context->getSettingsRef().lock_acquire_timeout);
if (table_info.table_lock == nullptr)
{
// Table was dropped while acquiring the lock
throw Exception(ErrorCodes::INCONSISTENT_METADATA_FOR_BACKUP, "{} was dropped during scanning",
tableNameWithTypeToString(table_name.database, table_name.table, true));
}
}
}
std::erase_if(
table_infos,
[](const std::pair<QualifiedTableName, TableInfo> & key_value)
{
const auto & table_info = key_value.second;
return table_info.storage && !table_info.table_lock; /// Table was dropped while acquiring the lock.
});
}
/// Check consistency of collected information about databases and tables.
bool BackupEntriesCollector::compareWithPrevious(std::optional<Exception> & inconsistency_error)
bool BackupEntriesCollector::compareWithPrevious(String & mismatch_description)
{
/// We need to scan tables at least twice to be sure that we haven't missed any table which could be renamed
/// while we were scanning.
std::vector<std::pair<String, String>> databases_metadata;
std::vector<std::pair<QualifiedTableName, String>> tables_metadata;
databases_metadata.reserve(database_infos.size());
tables_metadata.reserve(table_infos.size());
for (const auto & [database_name, database_info] : database_infos)
databases_metadata.emplace_back(database_name, database_info.create_database_query ? serializeAST(*database_info.create_database_query) : "");
for (const auto & [table_name, table_info] : table_infos)
@ -577,68 +640,59 @@ bool BackupEntriesCollector::compareWithPrevious(std::optional<Exception> & inco
previous_tables_metadata = std::move(tables_metadata);
});
/// Databases must be the same as during the previous scan.
if (databases_metadata != previous_databases_metadata)
enum class MismatchType { ADDED, REMOVED, CHANGED, NONE };
/// Helper function - used to compare the metadata of databases and tables.
auto find_mismatch = [](const auto & metadata, const auto & previous_metadata)
-> std::pair<MismatchType, typename std::remove_cvref_t<decltype(metadata)>::value_type::first_type>
{
std::vector<std::pair<String, String>> difference;
difference.reserve(databases_metadata.size());
std::set_difference(databases_metadata.begin(), databases_metadata.end(), previous_databases_metadata.begin(),
previous_databases_metadata.end(), std::back_inserter(difference));
auto [mismatch, p_mismatch] = std::mismatch(metadata.begin(), metadata.end(), previous_metadata.begin(), previous_metadata.end());
if (!difference.empty())
if ((mismatch != metadata.end()) && (p_mismatch != previous_metadata.end()))
{
inconsistency_error = Exception{
ErrorCodes::INCONSISTENT_METADATA_FOR_BACKUP,
"Database {} were created or changed its definition during scanning",
backQuoteIfNeed(difference[0].first)};
return false;
if (mismatch->first == p_mismatch->first)
return {MismatchType::CHANGED, mismatch->first};
else if (mismatch->first < p_mismatch->first)
return {MismatchType::ADDED, mismatch->first};
else
return {MismatchType::REMOVED, mismatch->first};
}
difference.clear();
difference.reserve(previous_databases_metadata.size());
std::set_difference(previous_databases_metadata.begin(), previous_databases_metadata.end(), databases_metadata.begin(),
databases_metadata.end(), std::back_inserter(difference));
if (!difference.empty())
else if (mismatch != metadata.end())
{
inconsistency_error = Exception{
ErrorCodes::INCONSISTENT_METADATA_FOR_BACKUP,
"Database {} were removed or changed its definition during scanning",
backQuoteIfNeed(difference[0].first)};
return false;
return {MismatchType::ADDED, mismatch->first};
}
else if (p_mismatch != previous_metadata.end())
{
return {MismatchType::REMOVED, p_mismatch->first};
}
else
{
return {MismatchType::NONE, {}};
}
};
/// Databases must be the same as during the previous scan.
if (auto mismatch = find_mismatch(databases_metadata, previous_databases_metadata); mismatch.first != MismatchType::NONE)
{
if (mismatch.first == MismatchType::ADDED)
mismatch_description = fmt::format("Database {} was added during scanning", backQuoteIfNeed(mismatch.second));
else if (mismatch.first == MismatchType::REMOVED)
mismatch_description = fmt::format("Database {} was removed during scanning", backQuoteIfNeed(mismatch.second));
else
mismatch_description = fmt::format("Database {} changed its definition during scanning", backQuoteIfNeed(mismatch.second));
return false;
}
/// Tables must be the same as during the previous scan.
if (tables_metadata != previous_tables_metadata)
if (auto mismatch = find_mismatch(tables_metadata, previous_tables_metadata); mismatch.first != MismatchType::NONE)
{
std::vector<std::pair<QualifiedTableName, String>> difference;
difference.reserve(tables_metadata.size());
std::set_difference(tables_metadata.begin(), tables_metadata.end(), previous_tables_metadata.begin(),
previous_tables_metadata.end(), std::back_inserter(difference));
if (!difference.empty())
{
inconsistency_error = Exception{
ErrorCodes::INCONSISTENT_METADATA_FOR_BACKUP,
"{} were created or changed its definition during scanning",
tableNameWithTypeToString(difference[0].first.database, difference[0].first.table, true)};
return false;
}
difference.clear();
difference.reserve(previous_tables_metadata.size());
std::set_difference(previous_tables_metadata.begin(), previous_tables_metadata.end(), tables_metadata.begin(),
tables_metadata.end(), std::back_inserter(difference));
if (!difference.empty())
{
inconsistency_error = Exception{
ErrorCodes::INCONSISTENT_METADATA_FOR_BACKUP,
"{} were removed or changed its definition during scanning",
tableNameWithTypeToString(difference[0].first.database, difference[0].first.table, true)};
return false;
}
if (mismatch.first == MismatchType::ADDED)
mismatch_description = fmt::format("{} was added during scanning", tableNameWithTypeToString(mismatch.second, true));
else if (mismatch.first == MismatchType::REMOVED)
mismatch_description = fmt::format("{} was removed during scanning", tableNameWithTypeToString(mismatch.second, true));
else
mismatch_description = fmt::format("{} changed its definition during scanning", tableNameWithTypeToString(mismatch.second, true));
return false;
}
return true;

View File

@ -64,7 +64,8 @@ private:
void gatherMetadataAndCheckConsistency();
bool tryGatherMetadataAndCompareWithPrevious(std::optional<Exception> & inconsistency_error);
void tryGatherMetadataAndCompareWithPrevious(int attempt_no, std::optional<Exception> & inconsistency_error, bool & need_another_attempt);
void syncMetadataGatheringWithOtherHosts(int attempt_no, std::optional<Exception> & inconsistency_error, bool & need_another_attempt);
void gatherDatabasesMetadata();
@ -81,7 +82,7 @@ private:
void gatherTablesMetadata();
std::vector<std::pair<ASTPtr, StoragePtr>> findTablesInDatabase(const String & database_name) const;
void lockTablesForReading();
bool compareWithPrevious(std::optional<Exception> & inconsistency_error);
bool compareWithPrevious(String & mismatch_description);
void makeBackupEntriesForDatabasesDefs();
void makeBackupEntriesForTablesDefs();
@ -97,8 +98,26 @@ private:
std::shared_ptr<IBackupCoordination> backup_coordination;
const ReadSettings read_settings;
ContextPtr context;
std::chrono::milliseconds on_cluster_first_sync_timeout;
std::chrono::milliseconds consistent_metadata_snapshot_timeout;
/// The time a BACKUP ON CLUSTER or RESTORE ON CLUSTER command will wait until all the nodes receive the BACKUP (or RESTORE) query and start working.
/// This setting is similar to `distributed_ddl_task_timeout`.
const std::chrono::milliseconds on_cluster_first_sync_timeout;
/// The time a BACKUP command will try to collect the metadata of tables & databases.
const std::chrono::milliseconds collect_metadata_timeout;
/// The number of attempts to collect the metadata before sleeping.
const unsigned int attempts_to_collect_metadata_before_sleep;
/// The minimum time clickhouse will wait after unsuccessful attempt before trying to collect the metadata again.
const std::chrono::milliseconds min_sleep_before_next_attempt_to_collect_metadata;
/// The maximum time clickhouse will wait after unsuccessful attempt before trying to collect the metadata again.
const std::chrono::milliseconds max_sleep_before_next_attempt_to_collect_metadata;
/// Whether we should collect the metadata after a successful attempt one more time and check that nothing has changed.
const bool compare_collected_metadata;
Poco::Logger * log;
/// Unfortunately we can use ZooKeeper for collecting information for backup
/// and we need to retry...
@ -139,7 +158,9 @@ private:
};
String current_stage;
std::chrono::steady_clock::time_point consistent_metadata_snapshot_end_time;
std::chrono::steady_clock::time_point collect_metadata_end_time;
std::unordered_map<String, DatabaseInfo> database_infos;
std::unordered_map<QualifiedTableName, TableInfo> table_infos;
std::vector<std::pair<String, String>> previous_databases_metadata;

View File

@ -34,6 +34,7 @@ static struct InitFiu
#define APPLY_FOR_FAILPOINTS(ONCE, REGULAR, PAUSEABLE_ONCE, PAUSEABLE) \
ONCE(replicated_merge_tree_commit_zk_fail_after_op) \
ONCE(replicated_merge_tree_insert_quorum_fail_0) \
REGULAR(use_delayed_remote_source) \
REGULAR(cluster_discovery_faults) \
REGULAR(dummy_failpoint) \

View File

@ -1,3 +1,5 @@
#include <algorithm>
#include <unordered_map>
#include <Poco/Util/AbstractConfiguration.h>
#include <Common/Macros.h>
#include <Common/Exception.h>
@ -36,6 +38,11 @@ Macros::Macros(const Poco::Util::AbstractConfiguration & config, const String &
}
}
Macros::Macros(std::map<String, String> map)
{
macros = std::move(map);
}
String Macros::expand(const String & s,
MacroExpansionInfo & info) const
{

View File

@ -27,6 +27,7 @@ class Macros
public:
Macros() = default;
Macros(const Poco::Util::AbstractConfiguration & config, const String & key, Poco::Logger * log = nullptr);
explicit Macros(std::map<String, String> map);
struct MacroExpansionInfo
{

View File

@ -190,6 +190,15 @@ void ThreadStatus::flushUntrackedMemory()
untracked_memory = 0;
}
bool ThreadStatus::isQueryCanceled() const
{
if (!thread_group)
return false;
chassert(local_data.query_is_canceled_predicate);
return local_data.query_is_canceled_predicate();
}
ThreadStatus::~ThreadStatus()
{
flushUntrackedMemory();

View File

@ -48,6 +48,8 @@ using InternalProfileEventsQueuePtr = std::shared_ptr<InternalProfileEventsQueue
using InternalProfileEventsQueueWeakPtr = std::weak_ptr<InternalProfileEventsQueue>;
using ThreadStatusPtr = ThreadStatus *;
using QueryIsCanceledPredicate = std::function<bool()>;
/** Thread group is a collection of threads dedicated to single task
* (query or other process like background merge).
*
@ -87,6 +89,8 @@ public:
String query_for_logs;
UInt64 normalized_query_hash = 0;
QueryIsCanceledPredicate query_is_canceled_predicate = {};
};
SharedData getSharedData()
@ -284,6 +288,8 @@ public:
void attachQueryForLog(const String & query_);
const String & getQueryForLog() const;
bool isQueryCanceled() const;
/// Proper cal for fatal_error_callback
void onFatalError();

View File

@ -311,6 +311,7 @@ class IColumn;
\
M(Bool, partial_result_on_first_cancel, false, "Allows query to return a partial result after cancel.", 0) \
\
M(Bool, allow_experimental_partial_result, 0, "Enable experimental feature: partial results for running queries.", 0) \
M(Milliseconds, partial_result_update_duration_ms, 0, "Interval (in milliseconds) for sending updates with partial data about the result table to the client (in interactive mode) during query execution. Setting to 0 disables partial results. Only supported for single-threaded GROUP BY without key, ORDER BY, LIMIT and OFFSET.", 0) \
M(UInt64, max_rows_in_partial_result, 10, "Maximum rows to show in the partial result after every real-time update while the query runs (use partial result limit + OFFSET as a value in case of OFFSET in the query).", 0) \
\
@ -806,6 +807,7 @@ class IColumn;
M(Timezone, session_timezone, "", "This setting can be removed in the future due to potential caveats. It is experimental and is not suitable for production usage. The default timezone for current session or query. The server default timezone if empty.", 0) \
M(Bool, allow_create_index_without_type, false, "Allow CREATE INDEX query without TYPE. Query will be ignored. Made for SQL compatibility tests.", 0) \
M(Bool, create_index_ignore_unique, false, "Ignore UNIQUE keyword in CREATE UNIQUE INDEX. Made for SQL compatibility tests.", 0) \
M(Bool, print_pretty_type_names, false, "Print pretty type names in DESCRIBE query and toTypeName() function", 0) \
// End of COMMON_SETTINGS
// Please add settings related to formats into the FORMAT_FACTORY_SETTINGS, move obsolete settings to OBSOLETE_SETTINGS and obsolete format settings to OBSOLETE_FORMAT_SETTINGS.
@ -923,10 +925,13 @@ class IColumn;
M(String, schema_inference_hints, "", "The list of column names and types to use in schema inference for formats without column names. The format: 'column_name1 column_type1, column_name2 column_type2, ...'", 0) \
M(Bool, schema_inference_make_columns_nullable, true, "If set to true, all inferred types will be Nullable in schema inference for formats without information about nullability.", 0) \
M(Bool, input_format_json_read_bools_as_numbers, true, "Allow to parse bools as numbers in JSON input formats", 0) \
M(Bool, input_format_json_try_infer_numbers_from_strings, true, "Try to infer numbers from string fields while schema inference", 0) \
M(Bool, input_format_json_try_infer_numbers_from_strings, false, "Try to infer numbers from string fields while schema inference", 0) \
M(Bool, input_format_json_validate_types_from_metadata, true, "For JSON/JSONCompact/JSONColumnsWithMetadata input formats this controls whether format parser should check if data types from input metadata match data types of the corresponding columns from the table", 0) \
M(Bool, input_format_json_read_numbers_as_strings, false, "Allow to parse numbers as strings in JSON input formats", 0) \
M(Bool, input_format_json_read_numbers_as_strings, true, "Allow to parse numbers as strings in JSON input formats", 0) \
M(Bool, input_format_json_read_objects_as_strings, true, "Allow to parse JSON objects as strings in JSON input formats", 0) \
M(Bool, input_format_json_read_arrays_as_strings, true, "Allow to parse JSON arrays as strings in JSON input formats", 0) \
M(Bool, input_format_json_try_infer_named_tuples_from_objects, true, "Try to infer named tuples from JSON objects in JSON input formats", 0) \
M(Bool, input_format_json_infer_incomplete_types_as_strings, true, "Use type String for keys that contains only Nulls or empty objects/arrays during schema inference in JSON input formats", 0) \
M(Bool, input_format_json_named_tuples_as_objects, true, "Deserialize named tuple columns as JSON objects", 0) \
M(Bool, input_format_json_ignore_unknown_keys_in_named_tuple, true, "Ignore unknown keys in json object for named tuples", 0) \
M(Bool, input_format_json_defaults_for_missing_elements_in_named_tuple, true, "Insert default value in named tuple element if it's missing in json object", 0) \

View File

@ -81,6 +81,11 @@ namespace SettingsChangesHistory
static std::map<ClickHouseVersion, SettingsChangesHistory::SettingsChanges> settings_changes_history =
{
{"23.9", {{"optimize_group_by_constant_keys", false, true, "Optimize group by constant keys by default"},
{"input_format_json_try_infer_named_tuples_from_objects", false, true, "Try to infer named Tuples from JSON objects by default"},
{"input_format_json_read_numbers_as_strings", false, true, "Allow to read numbers as strings in JSON formats by default"},
{"input_format_json_read_arrays_as_strings", false, true, "Allow to read arrays as strings in JSON formats by default"},
{"input_format_json_infer_incomplete_types_as_strings", false, true, "Allow to infer incomplete types as Strings in JSON formats by default"},
{"input_format_json_try_infer_numbers_from_strings", true, false, "Don't infer numbers from strings in JSON formats by default to prevent possible parsing errors"},
{"http_write_exception_in_output_format", false, true, "Output valid JSON/XML on exception in HTTP streaming."}}},
{"23.8", {{"rewrite_count_distinct_if_with_count_distinct_implementation", false, true, "Rewrite countDistinctIf with count_distinct_implementation configuration"}}},
{"23.7", {{"function_sleep_max_microseconds_per_block", 0, 3000000, "In previous versions, the maximum sleep time of 3 seconds was applied only for `sleep`, but not for `sleepEachRow` function. In the new version, we introduce this setting. If you set compatibility with the previous versions, we will disable the limit altogether."}}},

View File

@ -48,6 +48,7 @@ enum class TypeIndex
Object,
IPv4,
IPv6,
JSONPaths,
};
/**

View File

@ -13,6 +13,9 @@
#include <Core/NamesAndTypes.h>
#include <Columns/ColumnConst.h>
#include <IO/WriteHelpers.h>
#include <IO/Operators.h>
namespace DB
{
@ -59,6 +62,13 @@ size_t DataTypeArray::getNumberOfDimensions() const
return 1 + nested_array->getNumberOfDimensions(); /// Every modern C++ compiler optimizes tail recursion.
}
String DataTypeArray::doGetPrettyName(size_t indent) const
{
WriteBufferFromOwnString s;
s << "Array(" << nested->getPrettyName(indent) << ')';
return s.str();
}
static DataTypePtr create(const ASTPtr & arguments)
{

View File

@ -29,6 +29,8 @@ public:
return "Array(" + nested->getName() + ")";
}
std::string doGetPrettyName(size_t indent) const override;
const char * getFamilyName() const override
{
return "Array";

View File

@ -82,6 +82,16 @@ std::string DataTypeMap::doGetName() const
return s.str();
}
std::string DataTypeMap::doGetPrettyName(size_t indent) const
{
WriteBufferFromOwnString s;
s << "Map(\n"
<< fourSpaceIndent(indent + 1) << key_type->getPrettyName(indent + 1) << ",\n"
<< fourSpaceIndent(indent + 1) << value_type->getPrettyName(indent + 1) << '\n'
<< fourSpaceIndent(indent) << ')';
return s.str();
}
MutableColumnPtr DataTypeMap::createColumn() const
{
return ColumnMap::create(nested->createColumn());

View File

@ -29,6 +29,7 @@ public:
TypeIndex getTypeId() const override { return TypeIndex::Map; }
std::string doGetName() const override;
std::string doGetPrettyName(size_t indent) const override;
const char * getFamilyName() const override { return "Map"; }
String getSQLCompatibleName() const override { return "JSON"; }

View File

@ -94,6 +94,28 @@ std::string DataTypeTuple::doGetName() const
return s.str();
}
std::string DataTypeTuple::doGetPrettyName(size_t indent) const
{
size_t size = elems.size();
WriteBufferFromOwnString s;
s << "Tuple(\n";
for (size_t i = 0; i != size; ++i)
{
if (i != 0)
s << ",\n";
s << fourSpaceIndent(indent + 1);
if (have_explicit_names)
s << backQuoteIfNeed(names[i]) << ' ';
s << elems[i]->getPrettyName(indent + 1);
}
s << '\n' << fourSpaceIndent(indent) << ')';
return s.str();
}
static inline IColumn & extractElementColumn(IColumn & column, size_t idx)
{

View File

@ -32,6 +32,7 @@ public:
TypeIndex getTypeId() const override { return TypeIndex::Tuple; }
std::string doGetName() const override;
std::string doGetPrettyName(size_t indent) const override;
const char * getFamilyName() const override { return "Tuple"; }
String getSQLCompatibleName() const override { return "JSON"; }

View File

@ -5,6 +5,7 @@
#include <Common/Exception.h>
#include <Common/SipHash.h>
#include <Common/quoteString.h>
#include <IO/WriteHelpers.h>
#include <IO/Operators.h>

View File

@ -65,10 +65,18 @@ public:
/// Name of data type (examples: UInt64, Array(String)).
String getName() const
{
if (custom_name)
return custom_name->getName();
else
return doGetName();
if (custom_name)
return custom_name->getName();
else
return doGetName();
}
String getPrettyName(size_t indent = 0) const
{
if (custom_name)
return custom_name->getName();
else
return doGetPrettyName(indent);
}
DataTypePtr getPtr() const { return shared_from_this(); }
@ -131,6 +139,8 @@ protected:
virtual String doGetName() const { return getFamilyName(); }
virtual SerializationPtr doGetDefaultSerialization() const = 0;
virtual String doGetPrettyName(size_t /*indent*/) const { return doGetName(); }
public:
/** Create empty column for corresponding type and default serialization.
*/

View File

@ -323,10 +323,11 @@ void SerializationString::deserializeTextJSON(IColumn & column, ReadBuffer & ist
{
if (settings.json.read_objects_as_strings && !istr.eof() && *istr.position() == '{')
{
String field;
readJSONObjectPossiblyInvalid(field, istr);
ReadBufferFromString buf(field);
read(column, [&](ColumnString::Chars & data) { data.insert(field.begin(), field.end()); });
read(column, [&](ColumnString::Chars & data) { readJSONObjectPossiblyInvalid(data, istr); });
}
else if (settings.json.read_arrays_as_strings && !istr.eof() && *istr.position() == '[')
{
read(column, [&](ColumnString::Chars & data) { readJSONArrayInto(data, istr); });
}
else if (settings.json.read_numbers_as_strings && !istr.eof() && *istr.position() != '"')
{

View File

@ -222,6 +222,7 @@ bool canBeSafelyCasted(const DataTypePtr & from_type, const DataTypePtr & to_typ
case TypeIndex::Function:
case TypeIndex::AggregateFunction:
case TypeIndex::Nothing:
case TypeIndex::JSONPaths:
return false;
}

View File

@ -91,11 +91,15 @@ void transformTypesRecursively(DataTypes & types, std::function<void(DataTypes &
std::vector<DataTypes> nested_types;
const DataTypeTuple * type_tuple = typeid_cast<const DataTypeTuple *>(types[0].get());
size_t tuple_size = type_tuple->getElements().size();
bool have_explicit_names = type_tuple->haveExplicitNames();
nested_types.resize(tuple_size);
for (size_t elem_idx = 0; elem_idx < tuple_size; ++elem_idx)
nested_types[elem_idx].reserve(types.size());
/// Apply transform to elements only if all tuples are the same.
bool sizes_are_equal = true;
Names element_names = type_tuple->getElementNames();
bool all_element_names_are_equal = true;
for (const auto & type : types)
{
type_tuple = typeid_cast<const DataTypeTuple *>(type.get());
@ -105,11 +109,17 @@ void transformTypesRecursively(DataTypes & types, std::function<void(DataTypes &
break;
}
if (type_tuple->getElementNames() != element_names)
{
all_element_names_are_equal = false;
break;
}
for (size_t elem_idx = 0; elem_idx < tuple_size; ++elem_idx)
nested_types[elem_idx].emplace_back(type_tuple->getElements()[elem_idx]);
}
if (sizes_are_equal)
if (sizes_are_equal && all_element_names_are_equal)
{
std::vector<DataTypes> transposed_nested_types(types.size());
for (size_t elem_idx = 0; elem_idx < tuple_size; ++elem_idx)
@ -120,7 +130,12 @@ void transformTypesRecursively(DataTypes & types, std::function<void(DataTypes &
}
for (size_t i = 0; i != types.size(); ++i)
types[i] = std::make_shared<DataTypeTuple>(transposed_nested_types[i]);
{
if (have_explicit_names)
types[i] = std::make_shared<DataTypeTuple>(transposed_nested_types[i], element_names);
else
types[i] = std::make_shared<DataTypeTuple>(transposed_nested_types[i]);
}
}
}

View File

@ -450,11 +450,15 @@ String getAdditionalFormatInfoByEscapingRule(const FormatSettings & settings, Fo
break;
case FormatSettings::EscapingRule::JSON:
result += fmt::format(
", try_infer_numbers_from_strings={}, read_bools_as_numbers={}, read_objects_as_strings={}, read_numbers_as_strings={}, try_infer_objects={}",
", try_infer_numbers_from_strings={}, read_bools_as_numbers={}, read_objects_as_strings={}, read_numbers_as_strings={}, "
"read_arrays_as_strings={}, try_infer_objects_as_tuples={}, infer_incomplete_types_as_strings={}, try_infer_objects={}",
settings.json.try_infer_numbers_from_strings,
settings.json.read_bools_as_numbers,
settings.json.read_objects_as_strings,
settings.json.read_numbers_as_strings,
settings.json.read_arrays_as_strings,
settings.json.try_infer_objects_as_tuples,
settings.json.infer_incomplete_types_as_strings,
settings.json.allow_object_type);
break;
default:

View File

@ -110,12 +110,15 @@ FormatSettings getFormatSettings(ContextPtr context, const Settings & settings)
format_settings.json.read_bools_as_numbers = settings.input_format_json_read_bools_as_numbers;
format_settings.json.read_numbers_as_strings = settings.input_format_json_read_numbers_as_strings;
format_settings.json.read_objects_as_strings = settings.input_format_json_read_objects_as_strings;
format_settings.json.read_arrays_as_strings = settings.input_format_json_read_arrays_as_strings;
format_settings.json.try_infer_numbers_from_strings = settings.input_format_json_try_infer_numbers_from_strings;
format_settings.json.infer_incomplete_types_as_strings = settings.input_format_json_infer_incomplete_types_as_strings;
format_settings.json.validate_types_from_metadata = settings.input_format_json_validate_types_from_metadata;
format_settings.json.validate_utf8 = settings.output_format_json_validate_utf8;
format_settings.json_object_each_row.column_for_object_name = settings.format_json_object_each_row_column_for_object_name;
format_settings.json.allow_object_type = context->getSettingsRef().allow_experimental_object_type;
format_settings.json.compact_allow_variable_number_of_columns = settings.input_format_json_compact_allow_variable_number_of_columns;
format_settings.json.try_infer_objects_as_tuples = settings.input_format_json_try_infer_named_tuples_from_objects;
format_settings.null_as_default = settings.input_format_null_as_default;
format_settings.decimal_trailing_zeros = settings.output_format_decimal_trailing_zeros;
format_settings.parquet.row_group_rows = settings.output_format_parquet_row_group_size;

View File

@ -194,12 +194,16 @@ struct FormatSettings
bool read_bools_as_numbers = true;
bool read_numbers_as_strings = true;
bool read_objects_as_strings = true;
bool read_arrays_as_strings = true;
bool try_infer_numbers_from_strings = false;
bool validate_types_from_metadata = true;
bool validate_utf8 = false;
bool allow_object_type = false;
bool valid_output_on_exception = false;
bool compact_allow_variable_number_of_columns = false;
bool try_infer_objects_as_tuples = false;
bool infer_incomplete_types_as_strings = true;
} json;
struct

View File

@ -20,6 +20,7 @@
#include <Core/Block.h>
#include <Common/assert_cast.h>
#include <Common/SipHash.h>
namespace DB
{
@ -27,10 +28,195 @@ namespace DB
namespace ErrorCodes
{
extern const int TOO_DEEP_RECURSION;
extern const int NOT_IMPLEMENTED;
extern const int INCORRECT_DATA;
extern const int ONLY_NULLS_WHILE_READING_SCHEMA;
}
namespace
{
/// Special data type that represents JSON object as a set of paths and their types.
/// It supports merging two JSON objects and creating Named Tuple from itself.
/// It's used only for schema inference of Named Tuples from JSON objects.
/// Example:
/// JSON objects:
/// "obj1" : {"a" : {"b" : 1, "c" : {"d" : 'Hello'}}, "e" : "World"}
/// "obj2" : {"a" : {"b" : 2, "f" : [1,2,3]}, "g" : {"h" : 42}}
/// JSONPaths type for each object:
/// obj1 : {'a.b' : Int64, 'a.c.d' : String, 'e' : String}
/// obj2 : {'a.b' : Int64, 'a.f' : Array(Int64), 'g.h' : Int64}
/// Merged JSONPaths type for obj1 and obj2:
/// obj1 obj2 : {'a.b' : Int64, 'a.c.d' : String, 'a.f' : Array(Int64), 'e' : String, 'g.h' : Int64}
/// Result Named Tuple:
/// Tuple(a Tuple(b Int64, c Tuple(d String), f Array(Int64)), e String, g Tuple(h Int64))
class DataTypeJSONPaths : public IDataTypeDummy
{
public:
/// We create DataTypeJSONPaths on each row in input data, to
/// compare and merge such types faster, we use hash map to
/// store mapping path -> data_type. Path is a vector
/// of path components, to use hash map we need a hash
/// for std::vector<String>. We cannot just concatenate
/// components with '.' and store it as a string,
/// because components can also contain '.'
struct PathHash
{
size_t operator()(const std::vector<String> & path) const
{
SipHash hash;
hash.update(path.size());
for (const auto & part : path)
hash.update(part);
return hash.get64();
}
};
using Paths = std::unordered_map<std::vector<String>, DataTypePtr, PathHash>;
explicit DataTypeJSONPaths(Paths paths_) : paths(std::move(paths_))
{
}
DataTypeJSONPaths() = default;
const char * getFamilyName() const override { return "JSONPaths"; }
String doGetName() const override { return finalize()->getName(); }
TypeIndex getTypeId() const override { return TypeIndex::JSONPaths; }
String getSQLCompatibleName() const override
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Method getSQLCompatibleName is not implemented for JSONObjectForInference type");
}
bool isParametric() const override
{
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Method isParametric is not implemented for JSONObjectForInference type");
}
bool equals(const IDataType & rhs) const override
{
if (this == &rhs)
return true;
if (rhs.getTypeId() != getTypeId())
return false;
const auto & rhs_paths = assert_cast<const DataTypeJSONPaths &>(rhs).paths;
if (paths.size() != rhs_paths.size())
return false;
for (const auto & [path, type] : paths)
{
auto it = rhs_paths.find(path);
if (it == rhs_paths.end() || !it->second->equals(*type))
return false;
}
return true;
}
bool merge(const DataTypeJSONPaths & rhs, std::function<void(DataTypePtr & type1, DataTypePtr & type2)> transform_types)
{
for (const auto & [rhs_path, rhs_type] : rhs.paths)
{
auto [it, inserted] = paths.insert({rhs_path, rhs_type});
if (!inserted)
{
auto & type = it->second;
/// If types are different, try to apply provided transform function.
if (!type->equals(*rhs_type))
{
auto rhs_type_copy = rhs_type;
transform_types(type, rhs_type_copy);
/// If types for the same path are different even after transform, we cannot merge these objects.
if (!type->equals(*rhs_type_copy))
return false;
}
}
}
return true;
}
bool empty() const { return paths.empty(); }
DataTypePtr finalize() const
{
if (paths.empty())
throw Exception(ErrorCodes::ONLY_NULLS_WHILE_READING_SCHEMA, "Cannot infer named Tuple from JSON object because object is empty");
/// Construct a path tree from list of paths and their types and convert it to named Tuple.
/// Example:
/// Paths : {'a.b' : Int64, 'a.c.d' : String, 'e' : String, 'f.g' : Array(Int64), 'f.h' : String}
/// Tree:
/// ┌─ 'c' ─ 'd' (String)
/// ┌─ 'a' ┴─ 'b' (Int64)
/// root ─┼─ 'e' (String)
/// └─ 'f' ┬─ 'g' (Array(Int64))
/// └─ 'h' (String)
/// Result Named Tuple:
/// Tuple('a' Tuple('b' Int64, 'c' Tuple('d' String)), 'e' String, 'f' Tuple('g' Array(Int64), 'h' String))
PathNode root_node;
for (const auto & [path, type] : paths)
{
PathNode * current_node = &root_node;
String current_path;
for (const auto & name : path)
{
current_path += (current_path.empty() ? "" : ".") + name;
current_node = &current_node->nodes[name];
current_node->path = current_path;
}
current_node->leaf_type = type;
}
return root_node.getType();
}
private:
struct PathNode
{
/// Use just map to have result tuple with names in lexicographic order.
/// No strong reason for it, made for consistency.
std::map<String, PathNode> nodes;
DataTypePtr leaf_type;
/// Store path to this node for better exception message in case of ambiguous paths.
String path;
DataTypePtr getType() const
{
/// Check if we have ambiguous paths.
/// For example:
/// 'a.b.c' : Int32 and 'a.b' : String
/// Also check if leaf type is Nothing, because the next situation is possible:
/// {"a" : {"b" : null}} -> 'a.b' : Nullable(Nothing)
/// {"a" : {"b" : {"c" : 42}}} -> 'a.b.c' : Int32
/// And after merge we will have ambiguous paths 'a.b.c' : Int32 and 'a.b' : Nullable(Nothing),
/// but it's a valid case and we should ignore path 'a.b'.
if (leaf_type && !isNothing(removeNullable(leaf_type)) && !nodes.empty())
throw Exception(ErrorCodes::INCORRECT_DATA, "JSON objects have ambiguous paths: '{}' with type {} and '{}'", path, leaf_type->getName(), nodes.begin()->second.path);
if (nodes.empty())
return leaf_type;
Names node_names;
node_names.reserve(nodes.size());
DataTypes node_types;
node_types.reserve(nodes.size());
for (const auto & [name, node] : nodes)
{
node_names.push_back(name);
node_types.push_back(node.getType());
}
return std::make_shared<DataTypeTuple>(std::move(node_types), std::move(node_names));
}
};
Paths paths;
};
bool checkIfTypesAreEqual(const DataTypes & types)
{
if (types.empty())
@ -239,7 +425,7 @@ namespace
updateTypeIndexes(data_types, type_indexes);
}
/// If we have Tuple with the same nested types like Tuple(Int64, Int64),
/// If we have unnamed Tuple with the same nested types like Tuple(Int64, Int64),
/// convert it to Array(Int64). It's used for JSON values.
/// For example when we had type Tuple(Int64, Nullable(Nothing)) and we
/// transformed it to Tuple(Nullable(Int64), Nullable(Int64)) we will
@ -255,6 +441,9 @@ namespace
if (isTuple(type))
{
const auto * tuple_type = assert_cast<const DataTypeTuple *>(type.get());
if (tuple_type->haveExplicitNames())
return;
if (checkIfTypesAreEqual(tuple_type->getElements()))
type = std::make_shared<DataTypeArray>(tuple_type->getElements().back());
else
@ -269,7 +458,7 @@ namespace
template <bool is_json>
void transformInferredTypesIfNeededImpl(DataTypes & types, const FormatSettings & settings, JSONInferenceInfo * json_info = nullptr);
/// If we have Tuple and Array types, try to convert them all to Array
/// If we have unnamed Tuple and Array types, try to convert them all to Array
/// if there is a common type for all nested types.
/// For example, if we have [Tuple(Nullable(Nothing), String), Array(Date), Tuple(Date, String)]
/// it will convert them all to Array(String)
@ -286,7 +475,11 @@ namespace
{
if (isTuple(type))
{
const auto & current_tuple_size = assert_cast<const DataTypeTuple &>(*type).getElements().size();
const auto & tuple_type = assert_cast<const DataTypeTuple &>(*type);
if (tuple_type.haveExplicitNames())
return;
const auto & current_tuple_size = tuple_type.getElements().size();
if (!tuple_size)
tuple_size = current_tuple_size;
else
@ -324,57 +517,39 @@ namespace
}
}
/// If we have Map and Object(JSON) types, convert all Map types to Object(JSON).
/// If we have Map types with different value types, convert all Map types to Object(JSON)
void transformMapsAndObjectsToObjects(DataTypes & data_types, TypeIndexesSet & type_indexes)
void transformMapsAndStringsToStrings(DataTypes & data_types, TypeIndexesSet & type_indexes)
{
if (!type_indexes.contains(TypeIndex::Map))
return;
bool have_objects = type_indexes.contains(TypeIndex::Object);
bool maps_are_equal = true;
DataTypePtr first_map_type = nullptr;
for (const auto & type : data_types)
{
if (isMap(type))
{
if (!first_map_type)
first_map_type = type;
else
maps_are_equal &= type->equals(*first_map_type);
}
}
if (!have_objects && maps_are_equal)
/// Check if we have both String and Map
if (!type_indexes.contains(TypeIndex::Map) || !type_indexes.contains(TypeIndex::String))
return;
for (auto & type : data_types)
{
if (isMap(type))
type = std::make_shared<DataTypeObject>("json", true);
type = std::make_shared<DataTypeString>();
}
type_indexes.erase(TypeIndex::Map);
}
void transformMapsObjectsAndStringsToStrings(DataTypes & data_types, TypeIndexesSet & type_indexes)
void mergeJSONPaths(DataTypes & data_types, TypeIndexesSet & type_indexes, const FormatSettings & settings, JSONInferenceInfo * json_info)
{
bool have_maps = type_indexes.contains(TypeIndex::Map);
bool have_objects = type_indexes.contains(TypeIndex::Object);
bool have_strings = type_indexes.contains(TypeIndex::String);
/// Check if we have both String and Map/Object
if (!have_strings || (!have_maps && !have_objects))
if (!type_indexes.contains(TypeIndex::JSONPaths))
return;
std::shared_ptr<DataTypeJSONPaths> merged_type = std::make_shared<DataTypeJSONPaths>();
auto transform_func = [&](DataTypePtr & type1, DataTypePtr & type2){ transformInferredJSONTypesIfNeeded(type1, type2, settings, json_info); };
for (auto & type : data_types)
{
if (const auto * json_type = typeid_cast<const DataTypeJSONPaths *>(type.get()))
merged_type->merge(*json_type, transform_func);
}
for (auto & type : data_types)
{
if (isMap(type) || isObject(type))
type = std::make_shared<DataTypeString>();
if (type->getTypeId() == TypeIndex::JSONPaths)
type = merged_type;
}
type_indexes.erase(TypeIndex::Map);
type_indexes.erase(TypeIndex::Object);
}
template <bool is_json>
@ -409,6 +584,9 @@ namespace
/// Convert Bool to number (Int64/Float64) if needed.
if (settings.json.read_bools_as_numbers)
transformBoolsAndNumbersToNumbers(data_types, type_indexes);
if (settings.json.try_infer_objects_as_tuples)
mergeJSONPaths(data_types, type_indexes, settings, json_info);
};
auto transform_complex_types = [&](DataTypes & data_types, TypeIndexesSet & type_indexes)
@ -429,12 +607,8 @@ namespace
/// Convert JSON tuples and arrays to arrays if possible.
transformJSONTuplesAndArraysToArrays(data_types, settings, type_indexes, json_info);
/// Convert Maps to Objects if needed.
if (settings.json.allow_object_type)
transformMapsAndObjectsToObjects(data_types, type_indexes);
if (settings.json.read_objects_as_strings)
transformMapsObjectsAndStringsToStrings(data_types, type_indexes);
transformMapsAndStringsToStrings(data_types, type_indexes);
};
transformTypesRecursively(types, transform_simple_types, transform_complex_types);
@ -758,6 +932,79 @@ namespace
return std::make_shared<DataTypeString>();
}
bool tryReadJSONObject(ReadBuffer & buf, const FormatSettings & settings, DataTypeJSONPaths::Paths & paths, const std::vector<String> & path, JSONInferenceInfo * json_info, size_t depth)
{
if (depth > settings.max_parser_depth)
throw Exception(ErrorCodes::TOO_DEEP_RECURSION,
"Maximum parse depth ({}) exceeded. Consider raising max_parser_depth setting.", settings.max_parser_depth);
assertChar('{', buf);
skipWhitespaceIfAny(buf);
bool first = true;
while (!buf.eof() && *buf.position() != '}')
{
if (!first)
{
if (!checkChar(',', buf))
return false;
skipWhitespaceIfAny(buf);
}
else
first = false;
String key;
if (!tryReadJSONStringInto(key, buf))
return false;
skipWhitespaceIfAny(buf);
if (!checkChar(':', buf))
return false;
std::vector<String> current_path = path;
current_path.push_back(std::move(key));
skipWhitespaceIfAny(buf);
if (!buf.eof() && *buf.position() == '{')
{
if (!tryReadJSONObject(buf, settings, paths, current_path, json_info, depth + 1))
return false;
}
else
{
auto value_type = tryInferDataTypeForSingleFieldImpl<true>(buf, settings, json_info, depth + 1);
if (!value_type)
return false;
paths[std::move(current_path)] = value_type;
}
skipWhitespaceIfAny(buf);
}
/// No '}' at the end.
if (buf.eof())
return false;
assertChar('}', buf);
skipWhitespaceIfAny(buf);
/// If it was empty object and it's not root object, treat it as null, so we won't
/// lose this path if this key contains empty object in all sample data.
/// This case will be processed in JSONPaths type during finalize.
if (first && !path.empty())
paths[path] = std::make_shared<DataTypeNothing>();
return true;
}
DataTypePtr tryInferJSONPaths(ReadBuffer & buf, const FormatSettings & settings, JSONInferenceInfo * json_info, size_t depth)
{
DataTypeJSONPaths::Paths paths;
if (!tryReadJSONObject(buf, settings, paths, {}, json_info, depth))
return nullptr;
return std::make_shared<DataTypeJSONPaths>(std::move(paths));
}
template <bool is_json>
DataTypePtr tryInferMapOrObject(ReadBuffer & buf, const FormatSettings & settings, JSONInferenceInfo * json_info, size_t depth)
{
@ -828,28 +1075,22 @@ namespace
if (settings.json.allow_object_type)
return std::make_shared<DataTypeObject>("json", true);
}
/// Empty Map is Map(Nothing, Nothing)
return std::make_shared<DataTypeMap>(std::make_shared<DataTypeNothing>(), std::make_shared<DataTypeNothing>());
}
if constexpr (is_json)
{
/// If it's JSON field and one of value types is JSON Object, return also JSON Object.
for (const auto & value_type : value_types)
{
if (isObject(value_type))
return std::make_shared<DataTypeObject>("json", true);
}
if (settings.json.allow_object_type)
return std::make_shared<DataTypeObject>("json", true);
if (settings.json.read_objects_as_strings)
return std::make_shared<DataTypeString>();
transformInferredTypesIfNeededImpl<is_json>(value_types, settings, json_info);
if (!checkIfTypesAreEqual(value_types))
{
if (settings.json.allow_object_type)
return std::make_shared<DataTypeObject>("json", true);
if (settings.json.read_objects_as_strings)
return std::make_shared<DataTypeString>();
return nullptr;
}
return std::make_shared<DataTypeMap>(key_types.back(), value_types.back());
}
@ -874,7 +1115,7 @@ namespace
{
if (depth > settings.max_parser_depth)
throw Exception(ErrorCodes::TOO_DEEP_RECURSION,
"Maximum parse depth ({}) exceeded. Consider rising max_parser_depth setting.", settings.max_parser_depth);
"Maximum parse depth ({}) exceeded. Consider raising max_parser_depth setting.", settings.max_parser_depth);
skipWhitespaceIfAny(buf);
@ -894,7 +1135,15 @@ namespace
/// Map/Object for JSON { key1 : value1, key2 : value2, ...}
if (*buf.position() == '{')
{
if constexpr (is_json)
{
if (!settings.json.allow_object_type && settings.json.try_infer_objects_as_tuples)
return tryInferJSONPaths(buf, settings, json_info, depth);
}
return tryInferMapOrObject<is_json>(buf, settings, json_info, depth);
}
/// String
char quote = is_json ? '"' : '\'';
@ -936,44 +1185,104 @@ void transformInferredJSONTypesIfNeeded(
second = std::move(types[1]);
}
void transformJSONTupleToArrayIfPossible(DataTypePtr & data_type, const FormatSettings & settings, JSONInferenceInfo * json_info)
void transformFinalInferredJSONTypeIfNeededImpl(DataTypePtr & data_type, const FormatSettings & settings, JSONInferenceInfo * json_info, bool remain_nothing_types = false)
{
if (!data_type)
return;
if (!remain_nothing_types && isNothing(data_type) && settings.json.infer_incomplete_types_as_strings)
{
data_type = std::make_shared<DataTypeString>();
return;
}
if (const auto * nullable_type = typeid_cast<const DataTypeNullable *>(data_type.get()))
{
auto nested_type = nullable_type->getNestedType();
transformFinalInferredJSONTypeIfNeededImpl(nested_type, settings, json_info, remain_nothing_types);
data_type = std::make_shared<DataTypeNullable>(std::move(nested_type));
return;
}
if (const auto * json_paths = typeid_cast<const DataTypeJSONPaths *>(data_type.get()))
{
/// If all objects were empty, use type String, so these JSON objects will be read as Strings.
if (json_paths->empty() && settings.json.infer_incomplete_types_as_strings)
{
data_type = std::make_shared<DataTypeString>();
return;
}
data_type = json_paths->finalize();
transformFinalInferredJSONTypeIfNeededImpl(data_type, settings, json_info, remain_nothing_types);
return;
}
if (const auto * array_type = typeid_cast<const DataTypeArray *>(data_type.get()))
{
auto nested_type = array_type->getNestedType();
transformJSONTupleToArrayIfPossible(nested_type, settings, json_info);
transformFinalInferredJSONTypeIfNeededImpl(nested_type, settings, json_info, remain_nothing_types);
data_type = std::make_shared<DataTypeArray>(nested_type);
return;
}
if (const auto * map_type = typeid_cast<const DataTypeMap *>(data_type.get()))
{
auto key_type = map_type->getKeyType();
/// If all inferred Maps are empty, use type String, so these JSON objects will be read as Strings.
if (isNothing(key_type) && settings.json.infer_incomplete_types_as_strings)
key_type = std::make_shared<DataTypeString>();
auto value_type = map_type->getValueType();
transformJSONTupleToArrayIfPossible(value_type, settings, json_info);
data_type = std::make_shared<DataTypeMap>(map_type->getKeyType(), value_type);
transformFinalInferredJSONTypeIfNeededImpl(value_type, settings, json_info, remain_nothing_types);
data_type = std::make_shared<DataTypeMap>(key_type, value_type);
return;
}
if (const auto * tuple_type = typeid_cast<const DataTypeTuple *>(data_type.get()))
{
auto nested_types = tuple_type->getElements();
if (tuple_type->haveExplicitNames())
{
for (auto & nested_type : nested_types)
transformFinalInferredJSONTypeIfNeededImpl(nested_type, settings, json_info, remain_nothing_types);
data_type = std::make_shared<DataTypeTuple>(nested_types, tuple_type->getElementNames());
return;
}
for (auto & nested_type : nested_types)
transformJSONTupleToArrayIfPossible(nested_type, settings, json_info);
/// Don't change Nothing to String in nested types here, because we are not sure yet if it's Array or actual Tuple
transformFinalInferredJSONTypeIfNeededImpl(nested_type, settings, json_info, /*remain_nothing_types=*/ true);
auto nested_types_copy = nested_types;
transformInferredTypesIfNeededImpl<true>(nested_types_copy, settings, json_info);
if (checkIfTypesAreEqual(nested_types_copy))
{
data_type = std::make_shared<DataTypeArray>(nested_types_copy.back());
}
else
{
/// Now we should run transform one more time to convert Nothing to String if needed.
if (!remain_nothing_types)
{
for (auto & nested_type : nested_types)
transformFinalInferredJSONTypeIfNeededImpl(nested_type, settings, json_info);
}
data_type = std::make_shared<DataTypeTuple>(nested_types);
}
return;
}
}
void transformFinalInferredJSONTypeIfNeeded(DataTypePtr & data_type, const FormatSettings & settings, JSONInferenceInfo * json_info)
{
transformFinalInferredJSONTypeIfNeededImpl(data_type, settings, json_info);
}
DataTypePtr tryInferNumberFromString(std::string_view field, const FormatSettings & settings)
{
ReadBufferFromString buf(field);

View File

@ -69,11 +69,14 @@ void transformInferredTypesIfNeeded(DataTypePtr & first, DataTypePtr & second, c
/// we will convert both types to Object('JSON').
void transformInferredJSONTypesIfNeeded(DataTypePtr & first, DataTypePtr & second, const FormatSettings & settings, JSONInferenceInfo * json_info);
/// Check if type is Tuple(...), try to transform nested types to find a common type for them and if all nested types
/// are the same after transform, we convert this tuple to an Array with common nested type.
/// For example, if we have Tuple(String, Nullable(Nothing)) we will convert it to Array(String).
/// It's used when all rows were read and we have Tuple in the result type that can be actually an Array.
void transformJSONTupleToArrayIfPossible(DataTypePtr & data_type, const FormatSettings & settings, JSONInferenceInfo * json_info);
/// Make final transform for types inferred in JSON format. It does 3 types of transformation:
/// 1) Checks if type is unnamed Tuple(...), tries to transform nested types to find a common type for them and if all nested types
/// are the same after transform, it converts this tuple to an Array with common nested type.
/// For example, if we have Tuple(String, Nullable(Nothing)) we will convert it to Array(String).
/// It's used when all rows were read and we have Tuple in the result type that can be actually an Array.
/// 2) Finalizes all DataTypeJSONPaths to named Tuple.
/// 3) Converts all Nothing types to String types if input_format_json_infer_incomplete_types_as_strings is enabled.
void transformFinalInferredJSONTypeIfNeeded(DataTypePtr & data_type, const FormatSettings & settings, JSONInferenceInfo * json_info);
/// Make type Nullable recursively:
/// - Type -> Nullable(type)

View File

@ -108,27 +108,6 @@ struct ByteHammingDistanceImpl
}
};
struct ByteJaccardIndexImpl
{
using ResultType = Float64;
static ResultType inline process(
const char * __restrict haystack, size_t haystack_size, const char * __restrict needle, size_t needle_size)
{
if (haystack_size == 0 || needle_size == 0)
return 0;
std::unordered_set<char> haystack_set(haystack, haystack + haystack_size);
std::unordered_set<char> needle_set(needle, needle + needle_size);
size_t intersect = 0;
for (auto elem : haystack_set)
{
intersect += needle_set.contains(elem);
}
return static_cast<Float64>(intersect) / (haystack_set.size() + needle_set.size() - intersect);
}
};
struct ByteEditDistanceImpl
{
using ResultType = UInt64;
@ -185,33 +164,24 @@ struct NameByteHammingDistance
static constexpr auto name = "byteHammingDistance";
};
struct NameByteJaccardIndex
struct NameEditDistance
{
static constexpr auto name = "byteJaccardIndex";
};
struct NameByteEditDistance
{
static constexpr auto name = "byteEditDistance";
static constexpr auto name = "editDistance";
};
using FunctionByteHammingDistance = FunctionsStringSimilarity<FunctionStringDistanceImpl<ByteHammingDistanceImpl>, NameByteHammingDistance>;
using FunctionByteJaccardIndex = FunctionsStringSimilarity<FunctionStringDistanceImpl<ByteJaccardIndexImpl>, NameByteJaccardIndex>;
using FunctionByteEditDistance = FunctionsStringSimilarity<FunctionStringDistanceImpl<ByteEditDistanceImpl>, NameEditDistance>;
using FunctionByteEditDistance = FunctionsStringSimilarity<FunctionStringDistanceImpl<ByteEditDistanceImpl>, NameByteEditDistance>;
REGISTER_FUNCTION(StringHammingDistance)
REGISTER_FUNCTION(StringDistance)
{
factory.registerFunction<FunctionByteHammingDistance>(
FunctionDocumentation{.description = R"(Calculates the hamming distance between two bytes strings.)"});
FunctionDocumentation{.description = R"(Calculates Hamming distance between two byte-strings.)"});
factory.registerAlias("mismatches", NameByteHammingDistance::name);
factory.registerFunction<FunctionByteJaccardIndex>(
FunctionDocumentation{.description = R"(Calculates the jaccard similarity index between two bytes strings.)"});
factory.registerFunction<FunctionByteEditDistance>(
FunctionDocumentation{.description = R"(Calculates the edit distance between two bytes strings.)"});
factory.registerAlias("byteLevenshteinDistance", NameByteEditDistance::name);
FunctionDocumentation{.description = R"(Calculates the edit distance between two byte-strings.)"});
factory.registerAlias("levenshteinDistance", NameEditDistance::name);
}
}

View File

@ -1,5 +1,4 @@
#include <Columns/ColumnArray.h>
#include <Columns/ColumnsNumber.h>
#include <Columns/IColumn.h>
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypesNumber.h>
@ -7,11 +6,9 @@
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionHelpers.h>
#include <DataTypes/DataTypeNothing.h>
#include <DataTypes/getMostSubtype.h>
#include <Core/ColumnsWithTypeAndName.h>
#include <Core/ColumnWithTypeAndName.h>
#include <Interpreters/Context_fwd.h>
#include <base/types.h>
namespace DB
{

193
src/Functions/timestamp.cpp Normal file
View File

@ -0,0 +1,193 @@
#include <Functions/FunctionFactory.h>
#include <Functions/FunctionHelpers.h>
#include <DataTypes/DataTypeDateTime64.h>
#include <Columns/ColumnString.h>
#include <Columns/ColumnFixedString.h>
#include <Columns/ColumnsDateTime.h>
#include <IO/ReadBufferFromMemory.h>
#include <IO/ReadHelpers.h>
namespace DB
{
namespace ErrorCodes
{
extern const int ILLEGAL_COLUMN;
}
namespace
{
/** timestamp(expr[, expr_time])
*
* Emulates MySQL's TIMESTAMP() but supports only input format 'yyyy-mm-dd[ hh:mm:ss[.mmmmmm]]' instead of
* MySQLs possible input formats (https://dev.mysql.com/doc/refman/8.0/en/date-and-time-literals.html).
*/
class FunctionTimestamp : public IFunction
{
public:
static constexpr UInt32 DATETIME_SCALE = 6;
static constexpr auto name = "timestamp";
static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionTimestamp>(); }
String getName() const override { return name; }
bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; }
bool isVariadic() const override { return true; }
size_t getNumberOfArguments() const override { return 0; }
bool useDefaultImplementationForConstants() const override { return true; }
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
{
FunctionArgumentDescriptors mandatory_args{
{"timestamp", &isStringOrFixedString<IDataType>, nullptr, "String or FixedString"}
};
FunctionArgumentDescriptors optional_args{
{"time", &isString<IDataType>, nullptr, "String"}
};
validateFunctionArgumentTypes(*this, arguments, mandatory_args, optional_args);
return std::make_shared<DataTypeDateTime64>(DATETIME_SCALE);
}
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
const DateLUTImpl * local_time_zone = &DateLUT::instance();
auto col_result = ColumnDateTime64::create(input_rows_count, DATETIME_SCALE);
ColumnDateTime64::Container & vec_result = col_result->getData();
const IColumn * col_timestamp = arguments[0].column.get();
if (const ColumnString * col_timestamp_string = checkAndGetColumn<ColumnString>(col_timestamp))
{
const ColumnString::Chars * chars = &col_timestamp_string->getChars();
const IColumn::Offsets * offsets = &col_timestamp_string->getOffsets();
size_t current_offset = 0;
for (size_t i = 0; i < input_rows_count; ++i)
{
const size_t next_offset = (*offsets)[i];
const size_t string_size = next_offset - current_offset - 1;
ReadBufferFromMemory read_buffer(&(*chars)[current_offset], string_size);
DateTime64 value = 0;
readDateTime64Text(value, col_result->getScale(), read_buffer, *local_time_zone);
vec_result[i] = value;
current_offset = next_offset;
}
}
else if (const ColumnFixedString * col_timestamp_fixed_string = checkAndGetColumn<ColumnFixedString>(col_timestamp))
{
const ColumnString::Chars * chars = &col_timestamp_fixed_string->getChars();
const size_t fixed_string_size = col_timestamp_fixed_string->getN();
size_t current_offset = 0;
for (size_t i = 0; i < input_rows_count; ++i)
{
const size_t next_offset = current_offset + fixed_string_size;
ReadBufferFromMemory read_buffer(&(*chars)[current_offset], fixed_string_size);
DateTime64 value = 0;
readDateTime64Text(value, col_result->getScale(), read_buffer, *local_time_zone);
vec_result[i] = value;
current_offset = next_offset;
}
}
else
{
throw Exception(ErrorCodes::ILLEGAL_COLUMN,
"Illegal column {} of 1st argument of function {}. Must be String or FixedString",
col_timestamp->getName(),
getName());
}
if (arguments.size() == 1)
return col_result;
const IColumn * col_time = arguments[1].column.get();
if (const ColumnString * col_time_string = checkAndGetColumn<ColumnString>(col_time))
{
const ColumnString::Chars * chars = &col_time_string->getChars();
const IColumn::Offsets * offsets = &col_time_string->getOffsets();
size_t current_offset = 0;
for (size_t i = 0; i < input_rows_count; ++i)
{
const size_t next_offset = (*offsets)[i];
const size_t string_size = next_offset - current_offset - 1;
ReadBufferFromMemory read_buffer(&(*chars)[current_offset], string_size);
Decimal64 value = 0;
readTime64Text(value, col_result->getScale(), read_buffer);
vec_result[i].addOverflow(value);
current_offset = next_offset;
}
}
else if (const ColumnFixedString * col_time_fixed_string = checkAndGetColumn<ColumnFixedString>(col_time))
{
const ColumnString::Chars * chars = &col_time_fixed_string->getChars();
const size_t fixed_string_size = col_time_fixed_string->getN();
size_t current_offset = 0;
for (size_t i = 0; i < input_rows_count; ++i)
{
const size_t next_offset = current_offset + fixed_string_size;
ReadBufferFromMemory read_buffer(&(*chars)[current_offset], fixed_string_size);
Decimal64 value = 0;
readTime64Text(value, col_result->getScale(), read_buffer);
vec_result[i].addOverflow(value);
current_offset = next_offset;
}
}
else
{
throw Exception(ErrorCodes::ILLEGAL_COLUMN,
"Illegal column {} of 2nd argument of function {}. Must be String or FixedString",
col_time->getName(),
getName());
}
return col_result;
}
};
}
REGISTER_FUNCTION(Timestamp)
{
factory.registerFunction<FunctionTimestamp>(FunctionDocumentation{
.description = R"(
Converts the first argument 'expr' to type DateTime64(6).
If the second argument 'expr_time' is provided, it adds the specified time to the converted value.
:::)",
.syntax = "timestamp(expr[, expr_time])",
.arguments = {
{"expr", "Date or date with time. Type: String."},
{"expr_time", "Time to add. Type: String."}
},
.returned_value = "The result of conversion and, optionally, addition. Type: DateTime64(6).",
.examples = {
{"timestamp", "SELECT timestamp('2013-12-31')", "2013-12-31 00:00:00.000000"},
{"timestamp", "SELECT timestamp('2013-12-31 12:00:00')", "2013-12-31 12:00:00.000000"},
{"timestamp", "SELECT timestamp('2013-12-31 12:00:00', '12:00:00.11')", "2014-01-01 00:00:00.110000"},
},
.categories{"DateTime"}}, FunctionFactory::CaseInsensitive);
}
}

View File

@ -2,6 +2,7 @@
#include <Functions/IFunction.h>
#include <Core/Field.h>
#include <DataTypes/DataTypeString.h>
#include <Interpreters/Context.h>
namespace DB
@ -15,12 +16,15 @@ namespace
class FunctionToTypeName : public IFunction
{
public:
explicit FunctionToTypeName(bool print_pretty_type_names_) : print_pretty_type_names(print_pretty_type_names_)
{
}
static constexpr auto name = "toTypeName";
static FunctionPtr create(ContextPtr)
static FunctionPtr create(ContextPtr context)
{
return std::make_shared<FunctionToTypeName>();
return std::make_shared<FunctionToTypeName>(context->getSettingsRef().print_pretty_type_names);
}
String getName() const override
@ -49,15 +53,18 @@ public:
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr &, size_t input_rows_count) const override
{
return DataTypeString().createColumnConst(input_rows_count, arguments[0].type->getName());
return DataTypeString().createColumnConst(input_rows_count, print_pretty_type_names ? arguments[0].type->getPrettyName() : arguments[0].type->getName());
}
ColumnPtr getConstantResultForNonConstArguments(const ColumnsWithTypeAndName & arguments, const DataTypePtr &) const override
{
return DataTypeString().createColumnConst(1, arguments[0].type->getName());
return DataTypeString().createColumnConst(1, print_pretty_type_names ? arguments[0].type->getPrettyName() : arguments[0].type->getName());
}
ColumnNumbers getArgumentsThatDontImplyNullableReturnType(size_t /*number_of_arguments*/) const override { return {0}; }
private:
bool print_pretty_type_names;
};
}

View File

@ -989,8 +989,8 @@ template void readJSONStringInto<NullOutput>(NullOutput & s, ReadBuffer & buf);
template void readJSONStringInto<String>(String & s, ReadBuffer & buf);
template bool readJSONStringInto<String, bool>(String & s, ReadBuffer & buf);
template <typename Vector, typename ReturnType>
ReturnType readJSONObjectPossiblyInvalid(Vector & s, ReadBuffer & buf)
template <typename Vector, typename ReturnType, char opening_bracket, char closing_bracket>
ReturnType readJSONObjectOrArrayPossiblyInvalid(Vector & s, ReadBuffer & buf)
{
static constexpr bool throw_exception = std::is_same_v<ReturnType, void>;
@ -1001,8 +1001,8 @@ ReturnType readJSONObjectPossiblyInvalid(Vector & s, ReadBuffer & buf)
return ReturnType(false);
};
if (buf.eof() || *buf.position() != '{')
return error("JSON should start from opening curly bracket", ErrorCodes::INCORRECT_DATA);
if (buf.eof() || *buf.position() != opening_bracket)
return error("JSON object/array should start with corresponding opening bracket", ErrorCodes::INCORRECT_DATA);
s.push_back(*buf.position());
++buf.position();
@ -1012,7 +1012,7 @@ ReturnType readJSONObjectPossiblyInvalid(Vector & s, ReadBuffer & buf)
while (!buf.eof())
{
char * next_pos = find_first_symbols<'\\', '{', '}', '"'>(buf.position(), buf.buffer().end());
char * next_pos = find_first_symbols<'\\', opening_bracket, closing_bracket, '"'>(buf.position(), buf.buffer().end());
appendToStringOrVector(s, buf, next_pos);
buf.position() = next_pos;
@ -1035,22 +1035,37 @@ ReturnType readJSONObjectPossiblyInvalid(Vector & s, ReadBuffer & buf)
if (*buf.position() == '"')
quotes = !quotes;
else if (!quotes) // can be only '{' or '}'
balance += *buf.position() == '{' ? 1 : -1;
else if (!quotes) // can be only opening_bracket or closing_bracket
balance += *buf.position() == opening_bracket ? 1 : -1;
++buf.position();
if (balance == 0)
return ReturnType(true);
if (balance < 0)
if (balance < 0)
break;
}
return error("JSON should have equal number of opening and closing brackets", ErrorCodes::INCORRECT_DATA);
return error("JSON object/array should have equal number of opening and closing brackets", ErrorCodes::INCORRECT_DATA);
}
template <typename Vector, typename ReturnType>
ReturnType readJSONObjectPossiblyInvalid(Vector & s, ReadBuffer & buf)
{
return readJSONObjectOrArrayPossiblyInvalid<Vector, ReturnType, '{', '}'>(s, buf);
}
template void readJSONObjectPossiblyInvalid<String>(String & s, ReadBuffer & buf);
template void readJSONObjectPossiblyInvalid<PaddedPODArray<UInt8>>(PaddedPODArray<UInt8> & s, ReadBuffer & buf);
template <typename Vector>
void readJSONArrayInto(Vector & s, ReadBuffer & buf)
{
readJSONObjectOrArrayPossiblyInvalid<Vector, void, '[', ']'>(s, buf);
}
template void readJSONArrayInto<PaddedPODArray<UInt8>>(PaddedPODArray<UInt8> & s, ReadBuffer & buf);
template <typename ReturnType>
ReturnType readDateTextFallback(LocalDate & date, ReadBuffer & buf)

View File

@ -637,6 +637,9 @@ bool tryReadQuotedStringInto(Vector & s, ReadBuffer & buf);
template <typename Vector, typename ReturnType = void>
ReturnType readJSONObjectPossiblyInvalid(Vector & s, ReadBuffer & buf);
template <typename Vector>
void readJSONArrayInto(Vector & s, ReadBuffer & buf);
template <typename Vector>
void readStringUntilWhitespaceInto(Vector & s, ReadBuffer & buf);
@ -1092,6 +1095,97 @@ inline void readDateTimeText(LocalDateTime & datetime, ReadBuffer & buf)
datetime.second((s[6] - '0') * 10 + (s[7] - '0'));
}
/// In h*:mm:ss format.
template <typename ReturnType = void>
inline ReturnType readTimeTextImpl(time_t & time, ReadBuffer & buf)
{
static constexpr bool throw_exception = std::is_same_v<ReturnType, void>;
int16_t hours;
int16_t minutes;
int16_t seconds;
readIntText(hours, buf);
int negative_multiplier = hours < 0 ? -1 : 1;
// :mm:ss
const size_t remaining_time_size = 6;
char s[remaining_time_size];
size_t size = buf.read(s, remaining_time_size);
if (size != remaining_time_size)
{
s[size] = 0;
if constexpr (throw_exception)
throw ParsingException(ErrorCodes::CANNOT_PARSE_DATETIME, "Cannot parse DateTime {}", s);
else
return false;
}
minutes = (s[1] - '0') * 10 + (s[2] - '0');
seconds = (s[4] - '0') * 10 + (s[5] - '0');
time = hours * 3600 + (minutes * 60 + seconds) * negative_multiplier;
return ReturnType(true);
}
template <typename ReturnType>
inline ReturnType readTimeTextImpl(Decimal64 & time64, UInt32 scale, ReadBuffer & buf)
{
time_t whole;
if (!readTimeTextImpl<bool>(whole, buf))
{
return ReturnType(false);
}
DB::DecimalUtils::DecimalComponents<Decimal64> components{static_cast<Decimal64::NativeType>(whole), 0};
if (!buf.eof() && *buf.position() == '.')
{
++buf.position();
/// Read digits, up to 'scale' positions.
for (size_t i = 0; i < scale; ++i)
{
if (!buf.eof() && isNumericASCII(*buf.position()))
{
components.fractional *= 10;
components.fractional += *buf.position() - '0';
++buf.position();
}
else
{
/// Adjust to scale.
components.fractional *= 10;
}
}
/// Ignore digits that are out of precision.
while (!buf.eof() && isNumericASCII(*buf.position()))
++buf.position();
}
bool is_ok = true;
if constexpr (std::is_same_v<ReturnType, void>)
{
time64 = DecimalUtils::decimalFromComponents<Decimal64>(components, scale);
}
else
{
is_ok = DecimalUtils::tryGetDecimalFromComponents<Decimal64>(components, scale, time64);
}
return ReturnType(is_ok);
}
inline void readTime64Text(Decimal64 & time64, UInt32 scale, ReadBuffer & buf)
{
readTimeTextImpl<void>(time64, scale, buf);
}
/// Generic methods to read value in native binary format.
template <typename T>

View File

@ -68,6 +68,9 @@ bool Client::RetryStrategy::ShouldRetry(const Aws::Client::AWSError<Aws::Client:
if (attemptedRetries >= maxRetries)
return false;
if (CurrentThread::isInitialized() && CurrentThread::get().isQueryCanceled())
return false;
return error.ShouldRetry();
}
@ -601,9 +604,15 @@ Client::doRequestWithRetryNetworkErrors(const RequestType & request, RequestFn r
last_exception = std::current_exception();
auto error = Aws::Client::AWSError<Aws::Client::CoreErrors>(Aws::Client::CoreErrors::NETWORK_CONNECTION, /*retry*/ true);
/// Check if query is canceled
if (!client_configuration.retryStrategy->ShouldRetry(error, attempt_no))
break;
auto sleep_ms = client_configuration.retryStrategy->CalculateDelayBeforeNextRetry(error, attempt_no);
LOG_WARNING(log, "Request failed, now waiting {} ms before attempting again", sleep_ms);
sleepForMilliseconds(sleep_ms);
continue;
}
}

View File

@ -34,6 +34,7 @@ class Context;
namespace DB::S3
{
class ClientFactory;
class PocoHTTPClient;
struct PocoHTTPClientConfiguration : public Aws::Client::ClientConfiguration
{
@ -55,10 +56,10 @@ struct PocoHTTPClientConfiguration : public Aws::Client::ClientConfiguration
/// See PoolBase::BehaviourOnLimit
bool wait_on_pool_size_limit = true;
void updateSchemeAndRegion();
std::function<void(const DB::ProxyConfiguration &)> error_report;
void updateSchemeAndRegion();
private:
PocoHTTPClientConfiguration(
std::function<DB::ProxyConfiguration()> per_request_configuration_,

View File

@ -1,5 +1,8 @@
#include <IO/S3/URI.h>
#include <Poco/URI.h>
#include "Common/Macros.h"
#include <Interpreters/Context.h>
#include <Storages/NamedCollectionsHelpers.h>
#if USE_AWS_S3
#include <Common/Exception.h>
#include <Common/quoteString.h>
@ -18,6 +21,15 @@
namespace DB
{
struct URIConverter
{
static void modifyURI(Poco::URI & uri, std::unordered_map<std::string, std::string> mapper)
{
Macros macros({{"bucket", uri.getHost()}});
uri = macros.expand(mapper[uri.getScheme()]).empty()? uri : Poco::URI(macros.expand(mapper[uri.getScheme()]) + "/" + uri.getPathAndQuery());
}
};
namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
@ -46,6 +58,29 @@ URI::URI(const std::string & uri_)
uri = Poco::URI(uri_);
std::unordered_map<std::string, std::string> mapper;
auto context = Context::getGlobalContextInstance();
if (context)
{
const auto *config = &context->getConfigRef();
if (config->has("url_scheme_mappers"))
{
std::vector<String> config_keys;
config->keys("url_scheme_mappers", config_keys);
for (const std::string & config_key : config_keys)
mapper[config_key] = config->getString("url_scheme_mappers." + config_key + ".to");
}
else
{
mapper["s3"] = "https://{bucket}.s3.amazonaws.com";
mapper["gs"] = "https://{bucket}.storage.googleapis.com";
mapper["oss"] = "https://{bucket}.oss.aliyuncs.com";
}
if (!mapper.empty())
URIConverter::modifyURI(uri, mapper);
}
storage_name = S3;
if (uri.getHost().empty())

View File

@ -122,4 +122,8 @@ void writePointerHex(const void * ptr, WriteBuffer & buf)
buf.write(hex_str, 2 * sizeof(ptr));
}
String fourSpaceIndent(size_t indent)
{
return std::string(indent * 4, ' ');
}
}

View File

@ -1405,6 +1405,8 @@ struct PcgSerializer
void writePointerHex(const void * ptr, WriteBuffer & buf);
String fourSpaceIndent(size_t indent);
}
template<>

View File

@ -123,6 +123,8 @@ void FileCache::initialize()
{
fs::create_directories(getBasePath());
}
status_file = make_unique<StatusFile>(fs::path(getBasePath()) / "status", StatusFile::write_full_info);
}
catch (...)
{

View File

@ -13,6 +13,7 @@
#include <IO/ReadSettings.h>
#include <Common/ThreadPool.h>
#include <Common/StatusFile.h>
#include <Interpreters/Cache/LRUFileCachePriority.h>
#include <Interpreters/Cache/FileCache_fwd.h>
#include <Interpreters/Cache/FileSegment.h>
@ -166,6 +167,7 @@ private:
std::exception_ptr init_exception;
std::atomic<bool> is_initialized = false;
mutable std::mutex init_mutex;
std::unique_ptr<StatusFile> status_file;
CacheMetadata metadata;

View File

@ -1967,6 +1967,21 @@ void Context::killCurrentQuery() const
elem->cancelQuery(true);
}
bool Context::isCurrentQueryKilled() const
{
/// Here getProcessListElementSafe is used, not getProcessListElement call
/// getProcessListElement requires that process list exists
/// In the most cases it is true, because process list exists during the query execution time.
/// That is valid for all operations with parts, like read and write operations.
/// However that Context::isCurrentQueryKilled call could be used on the edges
/// when query is starting or finishing, in such edges context still exist but process list already expired
if (auto elem = getProcessListElementSafe())
return elem->isKilled();
return false;
}
String Context::getDefaultFormat() const
{
return default_format.empty() ? "TabSeparated" : default_format;
@ -2270,6 +2285,14 @@ QueryStatusPtr Context::getProcessListElement() const
throw Exception(ErrorCodes::LOGICAL_ERROR, "Weak pointer to process_list_elem expired during query execution, it's a bug");
}
QueryStatusPtr Context::getProcessListElementSafe() const
{
if (!has_process_list_elem)
return {};
if (auto res = process_list_elem.lock())
return res;
return {};
}
void Context::setUncompressedCache(const String & cache_policy, size_t max_size_in_bytes, double size_ratio)
{

View File

@ -718,6 +718,7 @@ public:
void setCurrentQueryId(const String & query_id);
void killCurrentQuery() const;
bool isCurrentQueryKilled() const;
bool hasInsertionTable() const { return !insertion_table_info.table.empty(); }
bool hasInsertionTableColumnNames() const { return insertion_table_info.column_names.has_value(); }
@ -867,6 +868,7 @@ public:
void setProcessListElement(QueryStatusPtr elem);
/// Can return nullptr if the query was not inserted into the ProcessList.
QueryStatusPtr getProcessListElement() const;
QueryStatusPtr getProcessListElementSafe() const;
/// List all queries.
ProcessList & getProcessList();

View File

@ -101,6 +101,11 @@ static auto getQueryInterpreter(const ASTSubquery & subquery, ExecuteScalarSubqu
void ExecuteScalarSubqueriesMatcher::visit(const ASTSubquery & subquery, ASTPtr & ast, Data & data)
{
/// subquery and ast can be the same object and ast will be moved.
/// Save these fields to avoid use after move.
String subquery_alias = subquery.alias;
bool prefer_alias_to_column_name = subquery.prefer_alias_to_column_name;
auto hash = subquery.getTreeHash();
const auto scalar_query_hash_str = toString(hash);
@ -210,7 +215,13 @@ void ExecuteScalarSubqueriesMatcher::visit(const ASTSubquery & subquery, ASTPtr
ast_new->setAlias(ast->tryGetAlias());
ast = std::move(ast_new);
return;
/// Empty subquery result is equivalent to NULL
block = interpreter->getSampleBlock().cloneEmpty();
String column_name = block.columns() > 0 ? block.safeGetByPosition(0).name : "dummy";
block = Block({
ColumnWithTypeAndName(type->createColumnConstWithDefaultValue(1)->convertToFullColumnIfConst(), type, column_name)
});
}
if (block.rows() != 1)
@ -258,13 +269,8 @@ void ExecuteScalarSubqueriesMatcher::visit(const ASTSubquery & subquery, ASTPtr
|| worthConvertingScalarToLiteral(scalar, data.max_literal_size)
|| !data.getContext()->hasQueryContext())
{
/// subquery and ast can be the same object and ast will be moved.
/// Save these fields to avoid use after move.
auto alias = subquery.alias;
auto prefer_alias_to_column_name = subquery.prefer_alias_to_column_name;
auto lit = std::make_unique<ASTLiteral>((*scalar.safeGetByPosition(0).column)[0]);
lit->alias = alias;
lit->alias = subquery_alias;
lit->prefer_alias_to_column_name = prefer_alias_to_column_name;
ast = addTypeConversionToAST(std::move(lit), scalar.safeGetByPosition(0).type->getName());
@ -273,7 +279,7 @@ void ExecuteScalarSubqueriesMatcher::visit(const ASTSubquery & subquery, ASTPtr
{
ast->as<ASTFunction>()->alias.clear();
auto func = makeASTFunction("identity", std::move(ast));
func->alias = alias;
func->alias = subquery_alias;
func->prefer_alias_to_column_name = prefer_alias_to_column_name;
ast = std::move(func);
}
@ -281,8 +287,8 @@ void ExecuteScalarSubqueriesMatcher::visit(const ASTSubquery & subquery, ASTPtr
else if (!data.replace_only_to_literals)
{
auto func = makeASTFunction("__getScalar", std::make_shared<ASTLiteral>(scalar_query_hash_str));
func->alias = subquery.alias;
func->prefer_alias_to_column_name = subquery.prefer_alias_to_column_name;
func->alias = subquery_alias;
func->prefer_alias_to_column_name = prefer_alias_to_column_name;
ast = std::move(func);
}

View File

@ -124,10 +124,16 @@ BlockIO InterpreterDescribeQuery::execute()
{
res_columns[0]->insert(column.name);
DataTypePtr type;
if (extend_object_types)
res_columns[1]->insert(storage_snapshot->getConcreteType(column.name)->getName());
type = storage_snapshot->getConcreteType(column.name);
else
res_columns[1]->insert(column.type->getName());
type = column.type;
if (getContext()->getSettingsRef().print_pretty_type_names)
res_columns[1]->insert(type->getPrettyName());
else
res_columns[1]->insert(type->getName());
if (column.default_desc.expression)
{

View File

@ -47,7 +47,15 @@ ThreadGroup::ThreadGroup(ContextPtr query_context_, FatalErrorCallback fatal_err
, query_context(query_context_)
, global_context(query_context_->getGlobalContext())
, fatal_error_callback(fatal_error_callback_)
{}
{
shared_data.query_is_canceled_predicate = [this] () -> bool {
if (auto context_locked = query_context.lock())
{
return context_locked->isCurrentQueryKilled();
}
return false;
};
}
std::vector<UInt64> ThreadGroup::getInvolvedThreadIds() const
{

View File

@ -721,8 +721,10 @@ TEST_F(FileCacheTest, writeBuffer)
auto holder2 = write_to_cache("key2", {"1", "22", "333", "4444", "55555"}, true);
file_segment_paths.emplace_back(holder2->front().getPathInLocalCache());
std::cerr << "\nFile segments: " << holder2->toString() << "\n";
ASSERT_EQ(fs::file_size(file_segment_paths.back()), 15);
ASSERT_TRUE(holder2->front().range() == FileSegment::Range(0, 15));
ASSERT_EQ(holder2->front().range(), FileSegment::Range(0, 15));
ASSERT_EQ(cache.getUsedCacheSize(), 22);
}
ASSERT_FALSE(fs::exists(file_segment_paths.back()));

View File

@ -294,7 +294,7 @@ NamesAndTypesList JSONColumnsSchemaReaderBase::readSchema()
/// Don't check/change types from hints.
if (!hints.contains(name))
{
transformJSONTupleToArrayIfPossible(type, format_settings, &inference_info);
transformFinalInferredJSONTypeIfNeeded(type, format_settings, &inference_info);
/// Check that we could determine the type of this column.
checkFinalInferredType(type, name, format_settings, nullptr, format_settings.max_rows_to_read_for_schema_inference, hints_parsing_error);
}

View File

@ -230,7 +230,7 @@ void JSONCompactEachRowRowSchemaReader::transformTypesIfNeeded(DataTypePtr & typ
void JSONCompactEachRowRowSchemaReader::transformFinalTypeIfNeeded(DataTypePtr & type)
{
transformJSONTupleToArrayIfPossible(type, format_settings, &inference_info);
transformFinalInferredJSONTypeIfNeeded(type, format_settings, &inference_info);
}
void registerInputFormatJSONCompactEachRow(FormatFactory & factory)

View File

@ -367,7 +367,7 @@ void JSONEachRowSchemaReader::transformTypesIfNeeded(DataTypePtr & type, DataTyp
void JSONEachRowSchemaReader::transformFinalTypeIfNeeded(DataTypePtr & type)
{
transformJSONTupleToArrayIfPossible(type, format_settings, &inference_info);
transformFinalInferredJSONTypeIfNeeded(type, format_settings, &inference_info);
}
void registerInputFormatJSONEachRow(FormatFactory & factory)

View File

@ -109,7 +109,7 @@ void JSONObjectEachRowSchemaReader::transformTypesIfNeeded(DataTypePtr & type, D
void JSONObjectEachRowSchemaReader::transformFinalTypeIfNeeded(DataTypePtr & type)
{
transformJSONTupleToArrayIfPossible(type, format_settings, &inference_info);
transformFinalInferredJSONTypeIfNeeded(type, format_settings, &inference_info);
}
void registerInputFormatJSONObjectEachRow(FormatFactory & factory)

View File

@ -6,6 +6,11 @@
namespace DB
{
namespace ErrorCodes
{
extern const int FUNCTION_NOT_ALLOWED;
}
BuildQueryPipelineSettings BuildQueryPipelineSettings::fromContext(ContextPtr from)
{
BuildQueryPipelineSettings settings;
@ -13,6 +18,10 @@ BuildQueryPipelineSettings BuildQueryPipelineSettings::fromContext(ContextPtr fr
const auto & context_settings = from->getSettingsRef();
settings.partial_result_limit = context_settings.max_rows_in_partial_result;
settings.partial_result_duration_ms = context_settings.partial_result_update_duration_ms.totalMilliseconds();
if (settings.partial_result_duration_ms && !context_settings.allow_experimental_partial_result)
throw Exception(ErrorCodes::FUNCTION_NOT_ALLOWED,
"Partial results are not allowed by default, it's an experimental feature. "
"Setting 'allow_experimental_partial_result' must be enabled to use 'partial_result_update_duration_ms'");
settings.actions_settings = ExpressionActionsSettings::fromSettings(context_settings, CompileExpressions::yes);
settings.process_list_element = from->getProcessListElement();

View File

@ -88,8 +88,7 @@ std::pair<std::vector<Values>, std::vector<RangesInDataParts>> split(RangesInDat
[[maybe_unused]] bool operator<(const PartsRangesIterator & other) const
{
// Accurate comparison of std::tie(value, event) > std::tie(other.value, other.event);
// Accurate comparison of `value > other.value`
for (size_t i = 0; i < value.size(); ++i)
{
if (applyVisitor(FieldVisitorAccurateLess(), value[i], other.value[i]))
@ -99,10 +98,12 @@ std::pair<std::vector<Values>, std::vector<RangesInDataParts>> split(RangesInDat
return true;
}
if (event > other.event)
return true;
return false;
/// Within the same part we should process events in order of mark numbers,
/// because they already ordered by value and range ends have greater mark numbers than the beginnings.
/// Otherwise we could get invalid ranges with the right bound that is less than the left bound.
const auto ev_mark = event == EventType::RangeStart ? range.begin : range.end;
const auto other_ev_mark = other.event == EventType::RangeStart ? other.range.begin : other.range.end;
return ev_mark > other_ev_mark;
}
Values value;

View File

@ -28,7 +28,9 @@ void ExpressionTransform::transform(Chunk & chunk)
ProcessorPtr ExpressionTransform::getPartialResultProcessor(const ProcessorPtr & /*current_processor*/, UInt64 /*partial_result_limit*/, UInt64 /*partial_result_duration_ms*/)
{
const auto & header = getInputPort().getHeader();
return std::make_shared<ExpressionTransform>(header, expression);
auto result = std::make_shared<ExpressionTransform>(header, expression);
result->setDescription("(Partial result)");
return result;
}
ConvertingTransform::ConvertingTransform(const Block & header_, ExpressionActionsPtr expression_)

View File

@ -24,6 +24,11 @@ PartialResultTransform::ShaphotResult MergeSortingPartialResultTransform::getRea
/// Add a copy of the first `partial_result_limit` rows to a generated_chunk
/// to send it later as a partial result in the next prepare stage of the current processor
auto generated_columns = merge_sorting_transform->chunks[0].cloneEmptyColumns();
/// It's possible that we had only empty chunks before remerge
if (merge_sorting_transform->chunks.empty())
return {{}, SnaphotStatus::NotReady};
size_t total_rows = 0;
for (const auto & merged_chunk : merge_sorting_transform->chunks)
{

View File

@ -14,6 +14,8 @@
#include <Columns/ColumnConst.h>
#include <Common/logger_useful.h>
#include <QueryPipeline/printPipeline.h>
namespace DB
{
@ -311,22 +313,24 @@ Pipe Pipe::unitePipes(Pipes pipes, Processors * collected_processors, bool allow
for (auto & pipe : pipes)
{
if (res.isPartialResultActive() && pipe.isPartialResultActive())
{
res.partial_result_ports.insert(res.partial_result_ports.end(), pipe.partial_result_ports.begin(), pipe.partial_result_ports.end());
}
else
{
if (pipe.isPartialResultActive())
pipe.dropPartialResult();
if (res.isPartialResultActive())
res.dropPartialResult();
}
if (!allow_empty_header || pipe.header)
assertCompatibleHeader(pipe.header, res.header, "Pipe::unitePipes");
res.processors->insert(res.processors->end(), pipe.processors->begin(), pipe.processors->end());
res.output_ports.insert(res.output_ports.end(), pipe.output_ports.begin(), pipe.output_ports.end());
if (res.isPartialResultActive() && pipe.isPartialResultActive())
{
res.partial_result_ports.insert(
res.partial_result_ports.end(),
pipe.partial_result_ports.begin(),
pipe.partial_result_ports.end());
}
else
res.dropPartialResult();
res.max_parallel_streams += pipe.max_parallel_streams;
if (pipe.totals_port)
@ -652,7 +656,15 @@ void Pipe::addPartialResultTransform(const ProcessorPtr & transform)
{
if (isPartialResultActive())
{
size_t new_outputs_size = transform->getOutputs().size();
size_t new_outputs_size = 0;
for (const auto & output : transform->getOutputs())
{
/// We do not use totals_port and extremes_port in partial result
if ((totals_port && totals_port == &output) || (extremes_port && extremes_port == &output))
continue;
++new_outputs_size;
}
auto partial_result_status = transform->getPartialResultProcessorSupportStatus();
if (partial_result_status == IProcessor::PartialResultStatus::SkipSupported && new_outputs_size != partial_result_ports.size())
@ -669,6 +681,7 @@ void Pipe::addPartialResultTransform(const ProcessorPtr & transform)
dropPort(partial_result_port, *processors, collected_processors);
partial_result_ports.assign(new_outputs_size, nullptr);
return;
}
if (partial_result_status != IProcessor::PartialResultStatus::FullSupported)
@ -678,12 +691,18 @@ void Pipe::addPartialResultTransform(const ProcessorPtr & transform)
auto & inputs = partial_result_transform->getInputs();
if (inputs.size() != partial_result_ports.size())
{
WriteBufferFromOwnString out;
if (processors && !processors->empty())
printPipeline(*processors, out);
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Cannot add partial result transform {} to Pipe because it has {} input ports, but {} expected",
"Cannot add partial result transform {} to Pipe because it has {} input ports, but {} expected\n{}",
partial_result_transform->getName(),
inputs.size(),
partial_result_ports.size());
partial_result_ports.size(), out.str());
}
size_t next_port = 0;
for (auto & input : inputs)
@ -995,4 +1014,10 @@ void Pipe::transform(const Transformer & transformer, bool check_ports)
max_parallel_streams = std::max<size_t>(max_parallel_streams, output_ports.size());
}
OutputPort * Pipe::getPartialResultPort(size_t pos) const
{
return partial_result_ports.empty() ? nullptr : partial_result_ports[pos];
}
}

View File

@ -48,7 +48,7 @@ public:
OutputPort * getOutputPort(size_t pos) const { return output_ports[pos]; }
OutputPort * getTotalsPort() const { return totals_port; }
OutputPort * getExtremesPort() const { return extremes_port; }
OutputPort * getPartialResultPort(size_t pos) const { return partial_result_ports.empty() ? nullptr : partial_result_ports[pos]; }
OutputPort * getPartialResultPort(size_t pos) const;
bool isPartialResultActive() { return is_partial_result_active; }

View File

@ -9,6 +9,7 @@
#include <Interpreters/ExpressionActions.h>
#include <QueryPipeline/ReadProgressCallback.h>
#include <QueryPipeline/Pipe.h>
#include <QueryPipeline/printPipeline.h>
#include <Processors/Sinks/EmptySink.h>
#include <Processors/Sinks/NullSink.h>
#include <Processors/Sinks/SinkToStorage.h>
@ -53,13 +54,19 @@ static void checkInput(const InputPort & input, const ProcessorPtr & processor)
processor->getName());
}
static void checkOutput(const OutputPort & output, const ProcessorPtr & processor)
static void checkOutput(const OutputPort & output, const ProcessorPtr & processor, const Processors & processors = {})
{
if (!output.isConnected())
{
WriteBufferFromOwnString out;
if (!processors.empty())
printPipeline(processors, out);
throw Exception(
ErrorCodes::LOGICAL_ERROR,
"Cannot create QueryPipeline because {} has disconnected output",
processor->getName());
"Cannot create QueryPipeline because {} {} has disconnected output: {}",
processor->getName(), processor->getDescription(), out.str());
}
}
static void checkPulling(
@ -109,7 +116,7 @@ static void checkPulling(
else if (partial_result && &out == partial_result)
found_partial_result = true;
else
checkOutput(out, processor);
checkOutput(out, processor, processors);
}
}

View File

@ -116,8 +116,7 @@ void QueryPipelineBuilder::init(QueryPipeline & pipeline)
pipe.activatePartialResult(pipeline.partial_result_limit, pipeline.partial_result_duration_ms);
pipe.partial_result_ports = {pipeline.partial_result};
}
if (!pipeline.partial_result)
else
pipe.dropPartialResult();
pipe.totals_port = pipeline.totals;

View File

@ -86,7 +86,10 @@ public:
void setSinks(const Pipe::ProcessorGetterWithStreamKind & getter);
/// Activate building separate pipeline for sending partial result.
void activatePartialResult(UInt64 partial_result_limit, UInt64 partial_result_duration_ms) { pipe.activatePartialResult(partial_result_limit, partial_result_duration_ms); }
void activatePartialResult(UInt64 partial_result_limit, UInt64 partial_result_duration_ms)
{
pipe.activatePartialResult(partial_result_limit, partial_result_duration_ms);
}
/// Check if building of a pipeline for sending partial result active.
bool isPartialResultActive() { return pipe.isPartialResultActive(); }

View File

@ -101,6 +101,7 @@ namespace DB::ErrorCodes
extern const int CLIENT_INFO_DOES_NOT_MATCH;
extern const int SUPPORT_IS_DISABLED;
extern const int UNSUPPORTED_METHOD;
extern const int FUNCTION_NOT_ALLOWED;
}
namespace
@ -888,7 +889,13 @@ void TCPHandler::processOrdinaryQueryWithProcessors()
std::unique_lock progress_lock(task_callback_mutex, std::defer_lock);
{
bool has_partial_result_setting = query_context->getSettingsRef().partial_result_update_duration_ms.totalMilliseconds() > 0;
const auto & settings = query_context->getSettingsRef();
bool has_partial_result_setting = settings.partial_result_update_duration_ms.totalMilliseconds() > 0;
if (has_partial_result_setting && !settings.allow_experimental_partial_result)
throw Exception(ErrorCodes::FUNCTION_NOT_ALLOWED,
"Partial results are not allowed by default, it's an experimental feature. "
"Setting 'allow_experimental_partial_result' must be enabled to use 'partial_result_update_duration_ms'");
PullingAsyncPipelineExecutor executor(pipeline, has_partial_result_setting);
CurrentMetrics::Increment query_thread_metric_increment{CurrentMetrics::QueryThread};

View File

@ -71,7 +71,7 @@ void GinIndexPostingsBuilder::add(UInt32 row_id)
}
}
UInt64 GinIndexPostingsBuilder::serialize(WriteBuffer & buffer) const
UInt64 GinIndexPostingsBuilder::serialize(WriteBuffer & buffer)
{
UInt64 written_bytes = 0;
buffer.write(rowid_lst_length);
@ -79,6 +79,7 @@ UInt64 GinIndexPostingsBuilder::serialize(WriteBuffer & buffer) const
if (useRoaring())
{
rowid_bitmap.runOptimize();
auto size = rowid_bitmap.getSizeInBytes();
writeVarUInt(size, buffer);

View File

@ -52,7 +52,7 @@ public:
void add(UInt32 row_id);
/// Serialize the content of builder to given WriteBuffer, returns the bytes of serialized data
UInt64 serialize(WriteBuffer & buffer) const;
UInt64 serialize(WriteBuffer & buffer);
/// Deserialize the postings list data from given ReadBuffer, return a pointer to the GinIndexPostingsList created by deserialization
static GinIndexPostingsListPtr deserialize(ReadBuffer & buffer);

View File

@ -7543,6 +7543,28 @@ MergeTreeData & MergeTreeData::checkStructureAndGetMergeTreeData(IStorage & sour
if (query_to_string(my_snapshot->getPrimaryKeyAST()) != query_to_string(src_snapshot->getPrimaryKeyAST()))
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Tables have different primary key");
const auto check_definitions = [](const auto & my_descriptions, const auto & src_descriptions)
{
if (my_descriptions.size() != src_descriptions.size())
return false;
std::unordered_set<std::string> my_query_strings;
for (const auto & description : my_descriptions)
my_query_strings.insert(queryToString(description.definition_ast));
for (const auto & src_description : src_descriptions)
if (!my_query_strings.contains(queryToString(src_description.definition_ast)))
return false;
return true;
};
if (!check_definitions(my_snapshot->getSecondaryIndices(), src_snapshot->getSecondaryIndices()))
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Tables have different secondary indices");
if (!check_definitions(my_snapshot->getProjections(), src_snapshot->getProjections()))
throw Exception(ErrorCodes::BAD_ARGUMENTS, "Tables have different projections");
return *src_data;
}

View File

@ -28,6 +28,7 @@ namespace DB
namespace FailPoints
{
extern const char replicated_merge_tree_commit_zk_fail_after_op[];
extern const char replicated_merge_tree_insert_quorum_fail_0[];
}
namespace ErrorCodes
@ -761,13 +762,7 @@ std::pair<std::vector<String>, bool> ReplicatedMergeTreeSinkImpl<async_insert>::
else
quorum_path = storage.zookeeper_path + "/quorum/status";
if (!retries_ctl.callAndCatchAll(
[&]()
{
waitForQuorum(
zookeeper, existing_part_name, quorum_path, quorum_info.is_active_node_version, replicas_num);
}))
return;
waitForQuorum(zookeeper, existing_part_name, quorum_path, quorum_info.is_active_node_version, replicas_num);
}
else
{
@ -1043,38 +1038,19 @@ std::pair<std::vector<String>, bool> ReplicatedMergeTreeSinkImpl<async_insert>::
if (isQuorumEnabled())
{
ZooKeeperRetriesControl quorum_retries_ctl("waitForQuorum", zookeeper_retries_info, context->getProcessListElement());
quorum_retries_ctl.retryLoop([&]()
if (is_already_existing_part)
{
if (storage.is_readonly)
{
/// stop retries if in shutdown
if (storage.shutdown_called)
throw Exception(
ErrorCodes::TABLE_IS_READ_ONLY, "Table is in readonly mode due to shutdown: replica_path={}", storage.replica_path);
/// We get duplicate part without fetch
/// Check if this quorum insert is parallel or not
if (zookeeper->exists(storage.zookeeper_path + "/quorum/parallel/" + part->name))
storage.updateQuorum(part->name, true);
else if (zookeeper->exists(storage.zookeeper_path + "/quorum/status"))
storage.updateQuorum(part->name, false);
}
quorum_retries_ctl.setUserError(ErrorCodes::TABLE_IS_READ_ONLY, "Table is in readonly mode: replica_path={}", storage.replica_path);
return;
}
zookeeper->setKeeper(storage.getZooKeeper());
if (is_already_existing_part)
{
/// We get duplicate part without fetch
/// Check if this quorum insert is parallel or not
if (zookeeper->exists(storage.zookeeper_path + "/quorum/parallel/" + part->name))
storage.updateQuorum(part->name, true);
else if (zookeeper->exists(storage.zookeeper_path + "/quorum/status"))
storage.updateQuorum(part->name, false);
}
if (!quorum_retries_ctl.callAndCatchAll(
[&]()
{ waitForQuorum(zookeeper, part->name, quorum_info.status_path, quorum_info.is_active_node_version, replicas_num); }))
return;
});
waitForQuorum(zookeeper, part->name, quorum_info.status_path, quorum_info.is_active_node_version, replicas_num);
}
return {conflict_block_ids, part_was_deduplicated};
}
@ -1099,6 +1075,16 @@ void ReplicatedMergeTreeSinkImpl<async_insert>::waitForQuorum(
try
{
fiu_do_on(FailPoints::replicated_merge_tree_insert_quorum_fail_0,
{
if (!zookeeper->fault_policy)
{
zookeeper->logger = log;
zookeeper->fault_policy = std::make_unique<RandomFaultInjection>(0, 0);
}
zookeeper->fault_policy->must_fail_before_op = true;
});
while (true)
{
zkutil::EventPtr event = std::make_shared<Poco::Event>();

View File

@ -75,6 +75,7 @@ const char * auto_contributors[] {
"Alexander Tretiakov",
"Alexander Yakovlev",
"Alexander Zaitsev",
"Alexander van Olst",
"Alexandr Kondratev",
"Alexandr Krasheninnikov",
"Alexandr Orlov",
@ -246,6 +247,7 @@ const char * auto_contributors[] {
"Cheng Pan",
"Chienlung Cheung",
"Christian",
"Christian Clauss",
"Christoph Wurm",
"Chun-Sheng, Li",
"Ciprian Hacman",
@ -259,6 +261,7 @@ const char * auto_contributors[] {
"Constantine Peresypkin",
"CoolT2",
"Cory Levy",
"CuiShuoGuo",
"CurtizJ",
"DF5HSE",
"DIAOZHAFENG",
@ -370,6 +373,7 @@ const char * auto_contributors[] {
"Federico Rodriguez",
"FeehanG",
"Feng Kaiyu",
"Fern",
"FgoDt",
"Filatenkov Artur",
"Filipe Caixeta",
@ -395,6 +399,7 @@ const char * auto_contributors[] {
"Geoff Genz",
"George",
"George G",
"George Gamezardashvili",
"George3d6",
"Georgy Ginzburg",
"Gervasio Varela",
@ -516,6 +521,7 @@ const char * auto_contributors[] {
"Jose",
"Josh Taylor",
"João Figueiredo",
"Julia Kartseva",
"Julian Gilyadov",
"Julian Maicher",
"Julian Zhou",
@ -573,8 +579,10 @@ const char * auto_contributors[] {
"Lars Eidnes",
"Latysheva Alexandra",
"Laurie Li",
"LaurieLY",
"Lemore",
"Leonardo Cecchi",
"Leonardo Maciel",
"Leonid Krylov",
"Leopold Schabel",
"Lev Borodin",
@ -634,6 +642,7 @@ const char * auto_contributors[] {
"Max",
"Max Akhmedov",
"Max Bruce",
"Max Kainov",
"Max Vetrov",
"MaxTheHuman",
"MaxWk",
@ -798,6 +807,7 @@ const char * auto_contributors[] {
"Persiyanov Dmitriy Andreevich",
"Pervakov Grigorii",
"Pervakov Grigory",
"Petr Vasilev",
"Philippe Ombredanne",
"PigInCloud",
"Potya",
@ -835,6 +845,7 @@ const char * auto_contributors[] {
"Roman",
"Roman Bug",
"Roman Chyrva",
"Roman G",
"Roman Heinrich",
"Roman Lipovsky",
"Roman Nikolaev",
@ -962,9 +973,11 @@ const char * auto_contributors[] {
"Thomas Berdy",
"Thomas Casteleyn",
"Thomas Panetti",
"Tiakon",
"Tian Xinhui",
"Tiaonmmn",
"Tigran Khudaverdyan",
"Tim Windelschmidt",
"Timur Magomedov",
"Timur Solodovnikov",
"TiunovNN",
@ -972,6 +985,7 @@ const char * auto_contributors[] {
"Tobias Lins",
"Tom Bombadil",
"Tom Risse",
"Tomas Barton",
"Tomáš Hromada",
"Tsarkova Anastasia",
"TszkitLo40",
@ -1079,6 +1093,7 @@ const char * auto_contributors[] {
"Yegor Levankov",
"Yingchun Lai",
"Yingfan Chen",
"Yinzheng-Sun",
"Yiğit Konur",
"Yohann Jardin",
"Yong Wang",
@ -1161,6 +1176,7 @@ const char * auto_contributors[] {
"avsharapov",
"awakeljw",
"awesomeleo",
"bakam412",
"bbkas",
"benamazing",
"benbiti",
@ -1524,6 +1540,7 @@ const char * auto_contributors[] {
"pkubaj",
"potya",
"presto53",
"priera",
"proller",
"pufit",
"pyos",
@ -1568,6 +1585,7 @@ const char * auto_contributors[] {
"selfuppen",
"serebrserg",
"serxa",
"seshWCS",
"sev7e0",
"sevirov",
"sfod",
@ -1578,6 +1596,7 @@ const char * auto_contributors[] {
"sichenzhao",
"simon-says",
"simpleton",
"slvrtrn",
"snyk-bot",
"songenjie",
"sperlingxx",
@ -1645,6 +1664,7 @@ const char * auto_contributors[] {
"vzakaznikov",
"wangchao",
"wangdh15",
"wangtao.2077",
"wangxiaobo",
"weeds085490",
"whysage",
@ -1662,6 +1682,7 @@ const char * auto_contributors[] {
"xlwh",
"xmy",
"xuelei",
"xuzifu666",
"yakkomajuri",
"yakov-olkhovskiy",
"yandd",
@ -1686,6 +1707,7 @@ const char * auto_contributors[] {
"yuefoo",
"yulu86",
"yuluxu",
"yur3k",
"yuuch",
"ywill3",
"zamulla",
@ -1719,6 +1741,7 @@ const char * auto_contributors[] {
"zzsmdfj",
"Šimon Podlipský",
"Александр",
"Александр Нам",
"Артем Стрельцов",
"Владислав Тихонов",
"Георгий Кондратьев",

View File

@ -2,7 +2,6 @@
import logging
import subprocess
import os
import sys
from pathlib import Path
@ -39,7 +38,7 @@ IMAGE_NAME = "clickhouse/fuzzer"
def get_run_command(
pr_info: PRInfo,
build_url: str,
workspace_path: str,
workspace_path: Path,
ci_logs_args: str,
image: DockerImage,
) -> str:
@ -69,14 +68,12 @@ def main():
stopwatch = Stopwatch()
temp_path = TEMP_PATH
reports_path = REPORTS_PATH
temp_path = Path(TEMP_PATH)
temp_path.mkdir(parents=True, exist_ok=True)
reports_path = Path(REPORTS_PATH)
check_name = sys.argv[1]
if not os.path.exists(temp_path):
os.makedirs(temp_path)
pr_info = PRInfo()
gh = Github(get_best_robot_token(), per_page=100)
@ -90,7 +87,6 @@ def main():
docker_image = get_image_with_version(reports_path, IMAGE_NAME)
build_name = get_build_name_for_check(check_name)
print(build_name)
urls = read_build_urls(build_name, reports_path)
if not urls:
raise Exception("No build URLs found")
@ -104,10 +100,10 @@ def main():
logging.info("Got build url %s", build_url)
workspace_path = os.path.join(temp_path, "workspace")
if not os.path.exists(workspace_path):
os.makedirs(workspace_path)
ci_logs_credentials = CiLogsCredentials(Path(temp_path) / "export-logs-config.sh")
workspace_path = temp_path / "workspace"
workspace_path.mkdir(parents=True, exist_ok=True)
ci_logs_credentials = CiLogsCredentials(temp_path / "export-logs-config.sh")
ci_logs_args = ci_logs_credentials.get_docker_arguments(
pr_info, stopwatch.start_time_str, check_name
)
@ -121,8 +117,8 @@ def main():
)
logging.info("Going to run %s", run_command)
run_log_path = os.path.join(temp_path, "run.log")
main_log_path = os.path.join(workspace_path, "main.log")
run_log_path = temp_path / "run.log"
main_log_path = workspace_path / "main.log"
with TeePopen(run_command, run_log_path) as process:
retcode = process.wait()
@ -132,7 +128,7 @@ def main():
logging.info("Run failed")
subprocess.check_call(f"sudo chown -R ubuntu:ubuntu {temp_path}", shell=True)
ci_logs_credentials.clean_ci_logs_from_credentials(Path(run_log_path))
ci_logs_credentials.clean_ci_logs_from_credentials(run_log_path)
check_name_lower = (
check_name.lower().replace("(", "").replace(")", "").replace(" ", "")
@ -141,40 +137,39 @@ def main():
paths = {
"run.log": run_log_path,
"main.log": main_log_path,
"fuzzer.log": os.path.join(workspace_path, "fuzzer.log"),
"report.html": os.path.join(workspace_path, "report.html"),
"core.zst": os.path.join(workspace_path, "core.zst"),
"dmesg.log": os.path.join(workspace_path, "dmesg.log"),
"fuzzer.log": workspace_path / "fuzzer.log",
"report.html": workspace_path / "report.html",
"core.zst": workspace_path / "core.zst",
"dmesg.log": workspace_path / "dmesg.log",
}
compressed_server_log_path = os.path.join(workspace_path, "server.log.zst")
if os.path.exists(compressed_server_log_path):
compressed_server_log_path = workspace_path / "server.log.zst"
if compressed_server_log_path.exists():
paths["server.log.zst"] = compressed_server_log_path
# The script can fail before the invocation of `zstd`, but we are still interested in its log:
not_compressed_server_log_path = os.path.join(workspace_path, "server.log")
if os.path.exists(not_compressed_server_log_path):
not_compressed_server_log_path = workspace_path / "server.log"
if not_compressed_server_log_path.exists():
paths["server.log"] = not_compressed_server_log_path
s3_helper = S3Helper()
for f in paths:
urls = []
report_url = ""
for file, path in paths.items():
try:
paths[f] = s3_helper.upload_test_report_to_s3(Path(paths[f]), s3_prefix + f)
url = s3_helper.upload_test_report_to_s3(path, s3_prefix + file)
report_url = url if file == "report.html" else report_url
urls.append(url)
except Exception as ex:
logging.info("Exception uploading file %s text %s", f, ex)
paths[f] = ""
logging.info("Exception uploading file %s text %s", file, ex)
# Try to get status message saved by the fuzzer
try:
with open(
os.path.join(workspace_path, "status.txt"), "r", encoding="utf-8"
) as status_f:
with open(workspace_path / "status.txt", "r", encoding="utf-8") as status_f:
status = status_f.readline().rstrip("\n")
with open(
os.path.join(workspace_path, "description.txt"), "r", encoding="utf-8"
) as desc_f:
with open(workspace_path / "description.txt", "r", encoding="utf-8") as desc_f:
description = desc_f.readline().rstrip("\n")
except:
status = "failure"
@ -186,9 +181,7 @@ def main():
if "fail" in status:
test_result.status = "FAIL"
if paths["report.html"]:
report_url = paths["report.html"]
else:
if not report_url:
report_url = upload_results(
s3_helper,
pr_info.number,
@ -196,7 +189,7 @@ def main():
[test_result],
[],
check_name,
[url for url in paths.values() if url],
urls,
)
ch_helper = ClickHouseHelper()

View File

@ -1,10 +1,10 @@
#!/usr/bin/env python3
from pathlib import Path
from typing import List, Tuple
import argparse
import csv
import logging
import os
from github import Github
@ -16,13 +16,13 @@ from s3_helper import S3Helper
from upload_result_helper import upload_results
def parse_args():
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser()
parser.add_argument("status", nargs="+", help="Path to status file")
parser.add_argument("files", nargs="+", type=Path, help="Path to status files")
return parser.parse_args()
def post_commit_status_from_file(file_path: str) -> List[str]:
def post_commit_status_from_file(file_path: Path) -> List[str]:
with open(file_path, "r", encoding="utf-8") as f:
res = list(csv.reader(f, delimiter="\t"))
if len(res) < 1:
@ -32,22 +32,24 @@ def post_commit_status_from_file(file_path: str) -> List[str]:
return res[0]
def process_result(file_path: str) -> Tuple[bool, TestResults]:
def process_result(file_path: Path) -> Tuple[bool, TestResults]:
test_results = [] # type: TestResults
state, report_url, description = post_commit_status_from_file(file_path)
prefix = os.path.basename(os.path.dirname(file_path))
prefix = file_path.parent.name
is_ok = state == "success"
if is_ok and report_url == "null":
return is_ok, test_results
status = f'OK: Bug reproduced (<a href="{report_url}">Report</a>)'
if not is_ok:
status = f'Bug is not reproduced (<a href="{report_url}">Report</a>)'
status = (
f'OK: Bug reproduced (<a href="{report_url}">Report</a>)'
if is_ok
else f'Bug is not reproduced (<a href="{report_url}">Report</a>)'
)
test_results.append(TestResult(f"{prefix}: {description}", status))
return is_ok, test_results
def process_all_results(file_paths: str) -> Tuple[bool, TestResults]:
def process_all_results(file_paths: List[Path]) -> Tuple[bool, TestResults]:
any_ok = False
all_results = []
for status_path in file_paths:
@ -59,12 +61,14 @@ def process_all_results(file_paths: str) -> Tuple[bool, TestResults]:
return any_ok, all_results
def main(args):
def main():
logging.basicConfig(level=logging.INFO)
args = parse_args()
status_files = args.files # type: List[Path]
check_name_with_group = "Bugfix validate check"
is_ok, test_results = process_all_results(args.status)
is_ok, test_results = process_all_results(status_files)
if not test_results:
logging.info("No results to upload")
@ -76,7 +80,7 @@ def main(args):
pr_info.number,
pr_info.sha,
test_results,
args.status,
status_files,
check_name_with_group,
)
@ -93,4 +97,4 @@ def main(args):
if __name__ == "__main__":
main(parse_args())
main()

View File

@ -4,7 +4,6 @@ from pathlib import Path
from typing import Tuple
import subprocess
import logging
import os
import sys
import time
@ -54,7 +53,7 @@ def _can_export_binaries(build_config: BuildConfig) -> bool:
def get_packager_cmd(
build_config: BuildConfig,
packager_path: str,
packager_path: Path,
output_path: Path,
cargo_cache_dir: Path,
build_version: str,
@ -105,12 +104,12 @@ def build_clickhouse(
with TeePopen(packager_cmd, build_log_path) as process:
retcode = process.wait()
if build_output_path.exists():
build_results = os.listdir(build_output_path)
results_exists = any(build_output_path.iterdir())
else:
build_results = []
results_exists = False
if retcode == 0:
if len(build_results) > 0:
if results_exists:
success = True
logging.info("Built successfully")
else:
@ -130,7 +129,7 @@ def check_for_success_run(
) -> None:
# TODO: Remove after S3 artifacts
# the final empty argument is necessary for distinguish build and build_suffix
logged_prefix = os.path.join(S3_BUILDS_BUCKET, s3_prefix, "")
logged_prefix = "/".join((S3_BUILDS_BUCKET, s3_prefix, ""))
logging.info("Checking for artifacts in %s", logged_prefix)
try:
# Performance artifacts are now part of regular build, so we're safe
@ -223,11 +222,12 @@ def main():
build_config = CI_CONFIG.build_config[build_name]
temp_path = Path(TEMP_PATH)
os.makedirs(temp_path, exist_ok=True)
temp_path.mkdir(parents=True, exist_ok=True)
repo_path = Path(REPO_COPY)
pr_info = PRInfo()
logging.info("Repo copy path %s", REPO_COPY)
logging.info("Repo copy path %s", repo_path)
s3_helper = S3Helper()
@ -263,7 +263,7 @@ def main():
logging.info("Build short name %s", build_name)
build_output_path = temp_path / build_name
os.makedirs(build_output_path, exist_ok=True)
build_output_path.mkdir(parents=True, exist_ok=True)
cargo_cache = CargoCache(
temp_path / "cargo_cache" / "registry", temp_path, s3_helper
)
@ -271,7 +271,7 @@ def main():
packager_cmd = get_packager_cmd(
build_config,
os.path.join(REPO_COPY, "docker/packager"),
repo_path / "docker" / "packager",
build_output_path,
cargo_cache.directory,
version.string,
@ -282,7 +282,7 @@ def main():
logging.info("Going to run packager with %s", packager_cmd)
logs_path = temp_path / "build_log"
os.makedirs(logs_path, exist_ok=True)
logs_path.mkdir(parents=True, exist_ok=True)
start = time.time()
log_path, build_status = build_clickhouse(
@ -316,7 +316,7 @@ def main():
"Uploaded performance.tar.zst to %s, now delete to avoid duplication",
performance_urls[0],
)
os.remove(performance_path)
performance_path.unlink()
build_urls = (
s3_helper.upload_build_directory_to_s3(
@ -413,7 +413,7 @@ FORMAT JSONCompactEachRow"""
}
url = f"https://{ci_logs_credentials.host}/"
profiles_dir = temp_path / "profiles_source"
os.makedirs(profiles_dir, exist_ok=True)
profiles_dir.mkdir(parents=True, exist_ok=True)
logging.info("Processing profile JSON files from {GIT_REPO_ROOT}/build_docker")
git_runner(
"./utils/prepare-time-trace/prepare-time-trace.sh "
@ -421,7 +421,7 @@ FORMAT JSONCompactEachRow"""
)
profile_data_file = temp_path / "profile.json"
with open(profile_data_file, "wb") as profile_fd:
for profile_sourse in os.listdir(profiles_dir):
for profile_sourse in profiles_dir.iterdir():
with open(profiles_dir / profile_sourse, "rb") as ps_fd:
profile_fd.write(ps_fd.read())

View File

@ -144,18 +144,18 @@ def download_build_with_progress(url: str, path: Path) -> None:
sys.stdout.write(f"\r[{eq_str}{space_str}] {percent}%")
sys.stdout.flush()
break
except Exception:
except Exception as e:
if sys.stdout.isatty():
sys.stdout.write("\n")
if i + 1 < DOWNLOAD_RETRIES_COUNT:
time.sleep(3)
if os.path.exists(path):
os.remove(path)
else:
raise DownloadException(
f"Cannot download dataset from {url}, all retries exceeded"
)
if i + 1 < DOWNLOAD_RETRIES_COUNT:
time.sleep(3)
else:
raise DownloadException(
f"Cannot download dataset from {url}, all retries exceeded"
) from e
if sys.stdout.isatty():
sys.stdout.write("\n")

View File

@ -58,10 +58,9 @@ def get_ccache_if_not_exists(
download_build_with_progress(url, compressed_cache)
path_to_decompress = path_to_ccache_dir.parent
if not path_to_decompress.exists():
os.makedirs(path_to_decompress)
path_to_decompress.mkdir(parents=True, exist_ok=True)
if os.path.exists(path_to_ccache_dir):
if path_to_ccache_dir.exists():
shutil.rmtree(path_to_ccache_dir)
logging.info("Ccache already exists, removing it")
@ -74,7 +73,7 @@ def get_ccache_if_not_exists(
if not cache_found:
logging.info("ccache not found anywhere, cannot download anything :(")
if os.path.exists(path_to_ccache_dir):
if path_to_ccache_dir.exists():
logging.info("But at least we have some local cache")
else:
logging.info("ccache downloaded")

View File

@ -28,6 +28,7 @@ import logging
import os
from contextlib import contextmanager
from datetime import date, datetime, timedelta
from pathlib import Path
from subprocess import CalledProcessError
from typing import List, Optional
@ -617,8 +618,8 @@ def stash():
def main():
if not os.path.exists(TEMP_PATH):
os.makedirs(TEMP_PATH)
temp_path = Path(TEMP_PATH)
temp_path.mkdir(parents=True, exist_ok=True)
args = parse_args()
if args.debug_helpers:
@ -636,7 +637,7 @@ def main():
args.backport_created_label,
)
# https://github.com/python/mypy/issues/3004
bp.gh.cache_path = f"{TEMP_PATH}/gh_cache" # type: ignore
bp.gh.cache_path = temp_path / "gh_cache"
bp.receive_release_prs()
bp.update_local_release_branches()
bp.receive_prs_for_backport()

View File

@ -1,11 +1,11 @@
#!/usr/bin/env python3
import csv
import os
import time
from typing import Dict, List, Optional, Union
from collections import defaultdict
from pathlib import Path
from typing import Dict, List, Optional, Union
import csv
import logging
import time
from github import Github
from github.GithubObject import _NotSetType, NotSet as NotSet
@ -292,9 +292,9 @@ def create_ci_report(pr_info: PRInfo, statuses: CommitStatuses) -> str:
def post_commit_status_to_file(
file_path: str, description: str, state: str, report_url: str
file_path: Path, description: str, state: str, report_url: str
) -> None:
if os.path.exists(file_path):
if file_path.exists():
raise Exception(f'File "{file_path}" already exists!')
with open(file_path, "w", encoding="utf-8") as f:
out = csv.writer(f, delimiter="\t")

Some files were not shown because too many files have changed in this diff Show More