diff --git a/.gitignore b/.gitignore index 585a4074767..9816f1cbb6c 100644 --- a/.gitignore +++ b/.gitignore @@ -248,3 +248,6 @@ website/package-lock.json # Ignore files for locally disabled tests /dbms/tests/queries/**/*.disabled + +# cquery cache +/.cquery-cache diff --git a/CHANGELOG.md b/CHANGELOG.md index 00ae12339b2..784be3b4982 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,98 @@ +## ClickHouse release 18.16.0, 2018-12-14 + +### New features: + +* `DEFAULT` expressions are evaluated for missing fields when loading data in semi-structured input formats (`JSONEachRow`, `TSKV`). [#3555](https://github.com/yandex/ClickHouse/pull/3555) +* The `ALTER TABLE` query now has the `MODIFY ORDER BY` action for changing the sorting key when adding or removing a table column. This is useful for tables in the `MergeTree` family that perform additional tasks when merging based on this sorting key, such as `SummingMergeTree`, `AggregatingMergeTree`, and so on. [#3581](https://github.com/yandex/ClickHouse/pull/3581) [#3755](https://github.com/yandex/ClickHouse/pull/3755) +* For tables in the `MergeTree` family, now you can specify a different sorting key (`ORDER BY`) and index (`PRIMARY KEY`). The sorting key can be longer than the index. [#3581](https://github.com/yandex/ClickHouse/pull/3581) +* Added the `hdfs` table function and the `HDFS` table engine for importing and exporting data to HDFS. [chenxing-xc](https://github.com/yandex/ClickHouse/pull/3617) +* Added functions for working with base64: `base64Encode`, `base64Decode`, `tryBase64Decode`. [Alexander Krasheninnikov](https://github.com/yandex/ClickHouse/pull/3350) +* Now you can use a parameter to configure the precision of the `uniqCombined` aggregate function (select the number of HyperLogLog cells). [#3406](https://github.com/yandex/ClickHouse/pull/3406) +* Added the `system.contributors` table that contains the names of everyone who made commits in ClickHouse. [#3452](https://github.com/yandex/ClickHouse/pull/3452) +* Added the ability to omit the partition for the `ALTER TABLE ... FREEZE` query in order to back up all partitions at once. [#3514](https://github.com/yandex/ClickHouse/pull/3514) +* Added `dictGet` and `dictGetOrDefault` functions that don't require specifying the type of return value. The type is determined automatically from the dictionary description. [Amos Bird](https://github.com/yandex/ClickHouse/pull/3564) +* Now you can specify comments for a column in the table description and change it using `ALTER`. [#3377](https://github.com/yandex/ClickHouse/pull/3377) +* Reading is supported for `Join` type tables with simple keys. [Amos Bird](https://github.com/yandex/ClickHouse/pull/3728) +* Now you can specify the options `join_use_nulls`, `max_rows_in_join`, `max_bytes_in_join`, and `join_overflow_mode` when creating a `Join` type table. [Amos Bird](https://github.com/yandex/ClickHouse/pull/3728) +* Added the `joinGet` function that allows you to use a `Join` type table like a dictionary. [Amos Bird](https://github.com/yandex/ClickHouse/pull/3728) +* Added the `partition_key`, `sorting_key`, `primary_key`, and `sampling_key` columns to the `system.tables` table in order to provide information about table keys. [#3609](https://github.com/yandex/ClickHouse/pull/3609) +* Added the `is_in_partition_key`, `is_in_sorting_key`, `is_in_primary_key`, and `is_in_sampling_key` columns to the `system.columns` table. [#3609](https://github.com/yandex/ClickHouse/pull/3609) +* Added the `min_time` and `max_time` columns to the `system.parts` table. These columns are populated when the partitioning key is an expression consisting of `DateTime` columns. [Emmanuel Donin de Rosière](https://github.com/yandex/ClickHouse/pull/3800) + +### Bug fixes: + +* Fixes and performance improvements for the `LowCardinality` data type. `GROUP BY` using `LowCardinality(Nullable(...))`. Getting the values of `extremes`. Processing high-order functions. `LEFT ARRAY JOIN`. Distributed `GROUP BY`. Functions that return `Array`. Execution of `ORDER BY`. Writing to `Distributed` tables (nicelulu). Backward compatibility for `INSERT` queries from old clients that implement the `Native` protocol. Support for `LowCardinality` for `JOIN`. Improved performance when working in a single stream. [#3823](https://github.com/yandex/ClickHouse/pull/3823) [#3803](https://github.com/yandex/ClickHouse/pull/3803) [#3799](https://github.com/yandex/ClickHouse/pull/3799) [#3769](https://github.com/yandex/ClickHouse/pull/3769) [#3744](https://github.com/yandex/ClickHouse/pull/3744) [#3681](https://github.com/yandex/ClickHouse/pull/3681) [#3651](https://github.com/yandex/ClickHouse/pull/3651) [#3649](https://github.com/yandex/ClickHouse/pull/3649) [#3641](https://github.com/yandex/ClickHouse/pull/3641) [#3632](https://github.com/yandex/ClickHouse/pull/3632) [#3568](https://github.com/yandex/ClickHouse/pull/3568) [#3523](https://github.com/yandex/ClickHouse/pull/3523) [#3518](https://github.com/yandex/ClickHouse/pull/3518) +* Fixed how the `select_sequential_consistency` option works. Previously, when this setting was enabled, an incomplete result was sometimes returned after beginning to write to a new partition. [#2863](https://github.com/yandex/ClickHouse/pull/2863) +* Databases are correctly specified when executing DDL `ON CLUSTER` queries and `ALTER UPDATE/DELETE`. [#3772](https://github.com/yandex/ClickHouse/pull/3772) [#3460](https://github.com/yandex/ClickHouse/pull/3460) +* Databases are correctly specified for subqueries inside a VIEW. [#3521](https://github.com/yandex/ClickHouse/pull/3521) +* Fixed a bug in `PREWHERE` with `FINAL` for `VersionedCollapsingMergeTree`. [7167bfd7](https://github.com/yandex/ClickHouse/commit/7167bfd7b365538f7a91c4307ad77e552ab4e8c1) +* Now you can use `KILL QUERY` to cancel queries that have not started yet because they are waiting for the table to be locked. [#3517](https://github.com/yandex/ClickHouse/pull/3517) +* Corrected date and time calculations if the clocks were moved back at midnight (this happens in Iran, and happened in Moscow from 1981 to 1983). Previously, this led to the time being reset a day earlier than necessary, and also caused incorrect formatting of the date and time in text format. [#3819](https://github.com/yandex/ClickHouse/pull/3819) +* Fixed bugs in some cases of `VIEW` and subqueries that omit the database. [Winter Zhang](https://github.com/yandex/ClickHouse/pull/3521) +* Fixed a race condition when simultaneously reading from a `MATERIALIZED VIEW` and deleting a `MATERIALIZED VIEW` due to not locking the internal `MATERIALIZED VIEW`. [#3404](https://github.com/yandex/ClickHouse/pull/3404) [#3694](https://github.com/yandex/ClickHouse/pull/3694) +* Fixed the error `Lock handler cannot be nullptr.` [#3689](https://github.com/yandex/ClickHouse/pull/3689) +* Fixed query processing when the `compile_expressions` option is enabled (it's enabled by default). Nondeterministic constant expressions like the `now` function are no longer unfolded. [#3457](https://github.com/yandex/ClickHouse/pull/3457) +* Fixed a crash when specifying a non-constant scale argument in `toDecimal32/64/128` functions. +* Fixed an error when trying to insert an array with `NULL` elements in the `Values` format into a column of type `Array` without `Nullable` (if `input_format_values_interpret_expressions` = 1). [#3487](https://github.com/yandex/ClickHouse/pull/3487) [#3503](https://github.com/yandex/ClickHouse/pull/3503) +* Fixed continuous error logging in `DDLWorker` if ZooKeeper is not available. [8f50c620](https://github.com/yandex/ClickHouse/commit/8f50c620334988b28018213ec0092fe6423847e2) +* Fixed the return type for `quantile*` functions from `Date` and `DateTime` types of arguments. [#3580](https://github.com/yandex/ClickHouse/pull/3580) +* Fixed the `WITH` clause if it specifies a simple alias without expressions. [#3570](https://github.com/yandex/ClickHouse/pull/3570) +* Fixed processing of queries with named sub-queries and qualified column names when `enable_optimize_predicate_expression` is enabled. [Winter Zhang](https://github.com/yandex/ClickHouse/pull/3588) +* Fixed the error `Attempt to attach to nullptr thread group` when working with materialized views. [Marek Vavruša](https://github.com/yandex/ClickHouse/pull/3623) +* Fixed a crash when passing certain incorrect arguments to the `arrayReverse` function. [73e3a7b6](https://github.com/yandex/ClickHouse/commit/73e3a7b662161d6005e7727d8a711b930386b871) +* Fixed the buffer overflow in the `extractURLParameter` function. Improved performance. Added correct processing of strings containing zero bytes. [141e9799](https://github.com/yandex/ClickHouse/commit/141e9799e49201d84ea8e951d1bed4fb6d3dacb5) +* Fixed buffer overflow in the `lowerUTF8` and `upperUTF8` functions. Removed the ability to execute these functions over `FixedString` type arguments. [#3662](https://github.com/yandex/ClickHouse/pull/3662) +* Fixed a rare race condition when deleting `MergeTree` tables. [#3680](https://github.com/yandex/ClickHouse/pull/3680) +* Fixed a race condition when reading from `Buffer` tables and simultaneously performing `ALTER` or `DROP` on the target tables. [#3719](https://github.com/yandex/ClickHouse/pull/3719) +* Fixed a segfault if the `max_temporary_non_const_columns` limit was exceeded. [#3788](https://github.com/yandex/ClickHouse/pull/3788) + +### Improvements: + +* The server does not write the processed configuration files to the `/etc/clickhouse-server/` directory. Instead, it saves them in the `preprocessed_configs` directory inside `path`. This means that the `/etc/clickhouse-server/` directory doesn't have write access for the `clickhouse` user, which improves security. [#2443](https://github.com/yandex/ClickHouse/pull/2443) +* The `min_merge_bytes_to_use_direct_io` option is set to 10 GiB by default. A merge that forms large parts of tables from the MergeTree family will be performed in `O_DIRECT` mode, which prevents excessive page cache eviction. [#3504](https://github.com/yandex/ClickHouse/pull/3504) +* Accelerated server start when there is a very large number of tables. [#3398](https://github.com/yandex/ClickHouse/pull/3398) +* Added a connection pool and HTTP `Keep-Alive` for connections between replicas. [#3594](https://github.com/yandex/ClickHouse/pull/3594) +* If the query syntax is invalid, the `400 Bad Request` code is returned in the `HTTP` interface (500 was returned previously). [31bc680a](https://github.com/yandex/ClickHouse/commit/31bc680ac5f4bb1d0360a8ba4696fa84bb47d6ab) +* The `join_default_strictness` option is set to `ALL` by default for compatibility. [120e2cbe](https://github.com/yandex/ClickHouse/commit/120e2cbe2ff4fbad626c28042d9b28781c805afe) +* Removed logging to `stderr` from the `re2` library for invalid or complex regular expressions. [#3723](https://github.com/yandex/ClickHouse/pull/3723) +* Added for the `Kafka` table engine: checks for subscriptions before beginning to read from Kafka; the kafka_max_block_size setting for the table. [Marek Vavruša](https://github.com/yandex/ClickHouse/pull/3396) +* The `cityHash64`, `farmHash64`, `metroHash64`, `sipHash64`, `halfMD5`, `murmurHash2_32`, `murmurHash2_64`, `murmurHash3_32`, and `murmurHash3_64` functions now work for any number of arguments and for arguments in the form of tuples. [#3451](https://github.com/yandex/ClickHouse/pull/3451) [#3519](https://github.com/yandex/ClickHouse/pull/3519) +* The `arrayReverse` function now works with any types of arrays. [73e3a7b6](https://github.com/yandex/ClickHouse/commit/73e3a7b662161d6005e7727d8a711b930386b871) +* Added an optional parameter: the slot size for the `timeSlots` function. [Kirill Shvakov](https://github.com/yandex/ClickHouse/pull/3724) +* For `FULL` and `RIGHT JOIN`, the `max_block_size` setting is used for a stream of non-joined data from the right table. [Amos Bird](https://github.com/yandex/ClickHouse/pull/3699) +* Added the `--secure` command line parameter in `clickhouse-benchmark` and `clickhouse-performance-test` to enable TLS. [#3688](https://github.com/yandex/ClickHouse/pull/3688) [#3690](https://github.com/yandex/ClickHouse/pull/3690) +* Type conversion when the structure of a `Buffer` type table does not match the structure of the destination table. [Vitaly Baranov](https://github.com/yandex/ClickHouse/pull/3603) +* Added the `tcp_keep_alive_timeout` option to enable keep-alive packets after inactivity for the specified time interval. [#3441](https://github.com/yandex/ClickHouse/pull/3441) +* Removed unnecessary quoting of values for the partition key in the `system.parts` table if it consists of a single column. [#3652](https://github.com/yandex/ClickHouse/pull/3652) +* The modulo function works for `Date` and `DateTime` data types. [#3385](https://github.com/yandex/ClickHouse/pull/3385) +* Added synonyms for the `POWER`, `LN`, `LCASE`, `UCASE`, `REPLACE`, `LOCATE`, `SUBSTR`, and `MID` functions. [#3774](https://github.com/yandex/ClickHouse/pull/3774) [#3763](https://github.com/yandex/ClickHouse/pull/3763) Some function names are case-insensitive for compatibility with the SQL standard. Added syntactic sugar `SUBSTRING(expr FROM start FOR length)` for compatibility with SQL. [#3804](https://github.com/yandex/ClickHouse/pull/3804) +* Added the ability to `mlock` memory pages corresponding to `clickhouse-server` executable code to prevent it from being forced out of memory. This feature is disabled by default. [#3553](https://github.com/yandex/ClickHouse/pull/3553) +* Improved performance when reading from `O_DIRECT` (with the `min_bytes_to_use_direct_io` option enabled). [#3405](https://github.com/yandex/ClickHouse/pull/3405) +* Improved performance of the `dictGet...OrDefault` function for a constant key argument and a non-constant default argument. [Amos Bird](https://github.com/yandex/ClickHouse/pull/3563) +* The `firstSignificantSubdomain` function now processes the domains `gov`, `mil`, and `edu`. [Igor Hatarist](https://github.com/yandex/ClickHouse/pull/3601) Improved performance. [#3628](https://github.com/yandex/ClickHouse/pull/3628) +* Ability to specify custom environment variables for starting `clickhouse-server` using the `SYS-V init.d` script by defining `CLICKHOUSE_PROGRAM_ENV` in `/etc/default/clickhouse`. +[Pavlo Bashynskyi](https://github.com/yandex/ClickHouse/pull/3612) +* Correct return code for the clickhouse-server init script. [#3516](https://github.com/yandex/ClickHouse/pull/3516) +* The `system.metrics` table now has the `VersionInteger` metric, and `system.build_options` has the added line `VERSION_INTEGER`, which contains the numeric form of the ClickHouse version, such as `18016000`. [#3644](https://github.com/yandex/ClickHouse/pull/3644) +* Removed the ability to compare the `Date` type with a number to avoid potential errors like `date = 2018-12-17`, where quotes around the date are omitted by mistake. [#3687](https://github.com/yandex/ClickHouse/pull/3687) +* Fixed the behavior of stateful functions like `rowNumberInAllBlocks`. They previously output a result that was one number larger due to starting during query analysis. [Amos Bird](https://github.com/yandex/ClickHouse/pull/3729) +* If the `force_restore_data` file can't be deleted, an error message is displayed. [Amos Bird](https://github.com/yandex/ClickHouse/pull/3794) + +### Build improvements: + +* Updated the `jemalloc` library, which fixes a potential memory leak. [Amos Bird](https://github.com/yandex/ClickHouse/pull/3557) +* Profiling with `jemalloc` is enabled by default in order to debug builds. [2cc82f5c](https://github.com/yandex/ClickHouse/commit/2cc82f5cbe266421cd4c1165286c2c47e5ffcb15) +* Added the ability to run integration tests when only `Docker` is installed on the system. [#3650](https://github.com/yandex/ClickHouse/pull/3650) +* Added the fuzz expression test in SELECT queries. [#3442](https://github.com/yandex/ClickHouse/pull/3442) +* Added a stress test for commits, which performs functional tests in parallel and in random order to detect more race conditions. [#3438](https://github.com/yandex/ClickHouse/pull/3438) +* Improved the method for starting clickhouse-server in a Docker image. [Elghazal Ahmed](https://github.com/yandex/ClickHouse/pull/3663) +* For a Docker image, added support for initializing databases using files in the `/docker-entrypoint-initdb.d` directory. [Konstantin Lebedev](https://github.com/yandex/ClickHouse/pull/3695) +* Fixes for builds on ARM. [#3709](https://github.com/yandex/ClickHouse/pull/3709) + +### Backward incompatible changes: + +* Removed the ability to compare the `Date` type with a number. Instead of `toDate('2018-12-18') = 17883`, you must use explicit type conversion `= toDate(17883)` [#3687](https://github.com/yandex/ClickHouse/pull/3687) + ## ClickHouse release 18.14.18, 2018-12-04 ### Bug fixes: @@ -90,7 +185,7 @@ ### Improvements: -* Significantly reduced memory consumption for requests with `ORDER BY` and `LIMIT`. See the `max_bytes_before_remerge_sort` setting. [#3205](https://github.com/yandex/ClickHouse/pull/3205) +* Significantly reduced memory consumption for queries with `ORDER BY` and `LIMIT`. See the `max_bytes_before_remerge_sort` setting. [#3205](https://github.com/yandex/ClickHouse/pull/3205) * In the absence of `JOIN` (`LEFT`, `INNER`, ...), `INNER JOIN` is assumed. [#3147](https://github.com/yandex/ClickHouse/pull/3147) * Qualified asterisks work correctly in queries with `JOIN`. [Winter Zhang](https://github.com/yandex/ClickHouse/pull/3202) * The `ODBC` table engine correctly chooses the method for quoting identifiers in the SQL dialect of a remote database. [Alexandr Krasheninnikov](https://github.com/yandex/ClickHouse/pull/3210) @@ -127,7 +222,7 @@ * If after merging data parts, the checksum for the resulting part differs from the result of the same merge in another replica, the result of the merge is deleted and the data part is downloaded from the other replica (this is the correct behavior). But after downloading the data part, it couldn't be added to the working set because of an error that the part already exists (because the data part was deleted with some delay after the merge). This led to cyclical attempts to download the same data. [#3194](https://github.com/yandex/ClickHouse/pull/3194) * Fixed incorrect calculation of total memory consumption by queries (because of incorrect calculation, the `max_memory_usage_for_all_queries` setting worked incorrectly and the `MemoryTracking` metric had an incorrect value). This error occurred in version 18.12.13. [Marek Vavruša](https://github.com/yandex/ClickHouse/pull/3344) * Fixed the functionality of `CREATE TABLE ... ON CLUSTER ... AS SELECT ...` This error occurred in version 18.12.13. [#3247](https://github.com/yandex/ClickHouse/pull/3247) -* Fixed unnecessary preparation of data structures for `JOIN`s on the server that initiates the request if the `JOIN` is only performed on remote servers. [#3340](https://github.com/yandex/ClickHouse/pull/3340) +* Fixed unnecessary preparation of data structures for `JOIN`s on the server that initiates the query if the `JOIN` is only performed on remote servers. [#3340](https://github.com/yandex/ClickHouse/pull/3340) * Fixed bugs in the `Kafka` engine: deadlocks after exceptions when starting to read data, and locks upon completion [Marek Vavruša](https://github.com/yandex/ClickHouse/pull/3215). * For `Kafka` tables, the optional `schema` parameter was not passed (the schema of the `Cap'n'Proto` format). [Vojtech Splichal](https://github.com/yandex/ClickHouse/pull/3150) * If the ensemble of ZooKeeper servers has servers that accept the connection but then immediately close it instead of responding to the handshake, ClickHouse chooses to connect another server. Previously, this produced the error `Cannot read all data. Bytes read: 0. Bytes expected: 4.` and the server couldn't start. [8218cf3a](https://github.com/yandex/ClickHouse/commit/8218cf3a5f39a43401953769d6d12a0bb8d29da9) @@ -208,7 +303,7 @@ * Added the `DECIMAL(digits, scale)` data type (`Decimal32(scale)`, `Decimal64(scale)`, `Decimal128(scale)`). To enable it, use the setting `allow_experimental_decimal_type`. [#2846](https://github.com/yandex/ClickHouse/pull/2846) [#2970](https://github.com/yandex/ClickHouse/pull/2970) [#3008](https://github.com/yandex/ClickHouse/pull/3008) [#3047](https://github.com/yandex/ClickHouse/pull/3047) * New `WITH ROLLUP` modifier for `GROUP BY` (alternative syntax: `GROUP BY ROLLUP(...)`). [#2948](https://github.com/yandex/ClickHouse/pull/2948) -* In requests with JOIN, the star character expands to a list of columns in all tables, in compliance with the SQL standard. You can restore the old behavior by setting `asterisk_left_columns_only` to 1 on the user configuration level. [Winter Zhang](https://github.com/yandex/ClickHouse/pull/2787) +* In queries with JOIN, the star character expands to a list of columns in all tables, in compliance with the SQL standard. You can restore the old behavior by setting `asterisk_left_columns_only` to 1 on the user configuration level. [Winter Zhang](https://github.com/yandex/ClickHouse/pull/2787) * Added support for JOIN with table functions. [Winter Zhang](https://github.com/yandex/ClickHouse/pull/2907) * Autocomplete by pressing Tab in clickhouse-client. [Sergey Shcherbin](https://github.com/yandex/ClickHouse/pull/2447) * Ctrl+C in clickhouse-client clears a query that was entered. [#2877](https://github.com/yandex/ClickHouse/pull/2877) @@ -294,7 +389,7 @@ ### Backward incompatible changes: -* In requests with JOIN, the star character expands to a list of columns in all tables, in compliance with the SQL standard. You can restore the old behavior by setting `asterisk_left_columns_only` to 1 on the user configuration level. +* In queries with JOIN, the star character expands to a list of columns in all tables, in compliance with the SQL standard. You can restore the old behavior by setting `asterisk_left_columns_only` to 1 on the user configuration level. ### Build changes: @@ -338,7 +433,7 @@ * Fixed an error for concurrent `Set` or `Join`. [Amos Bird](https://github.com/yandex/ClickHouse/pull/2823) * Fixed the `Block structure mismatch in UNION stream: different number of columns` error that occurred for `UNION ALL` queries inside a sub-query if one of the `SELECT` queries contains duplicate column names. [Winter Zhang](https://github.com/yandex/ClickHouse/pull/2094) * Fixed a memory leak if an exception occurred when connecting to a MySQL server. -* Fixed incorrect clickhouse-client response code in case of a request error. +* Fixed incorrect clickhouse-client response code in case of a query error. * Fixed incorrect behavior of materialized views containing DISTINCT. [#2795](https://github.com/yandex/ClickHouse/issues/2795) ### Backward incompatible changes @@ -452,7 +547,7 @@ The expression must be a chain of equalities joined by the AND operator. Each si * Fixed a problem with a very small timeout for sockets (one second) for reading and writing when sending and downloading replicated data, which made it impossible to download larger parts if there is a load on the network or disk (it resulted in cyclical attempts to download parts). This error occurred in version 1.1.54388. * Fixed issues when using chroot in ZooKeeper if you inserted duplicate data blocks in the table. * The `has` function now works correctly for an array with Nullable elements ([#2115](https://github.com/yandex/ClickHouse/issues/2115)). -* The `system.tables` table now works correctly when used in distributed queries. The `metadata_modification_time` and `engine_full` columns are now non-virtual. Fixed an error that occurred if only these columns were requested from the table. +* The `system.tables` table now works correctly when used in distributed queries. The `metadata_modification_time` and `engine_full` columns are now non-virtual. Fixed an error that occurred if only these columns were queried from the table. * Fixed how an empty `TinyLog` table works after inserting an empty data block ([#2563](https://github.com/yandex/ClickHouse/issues/2563)). * The `system.zookeeper` table works if the value of the node in ZooKeeper is NULL. @@ -701,7 +796,7 @@ The expression must be a chain of equalities joined by the AND operator. Each si * Added the `parseDateTimeBestEffort`, `parseDateTimeBestEffortOrZero`, and `parseDateTimeBestEffortOrNull` functions to read the DateTime from a string containing text in a wide variety of possible formats. * Data can be partially reloaded from external dictionaries during updating (load just the records in which the value of the specified field greater than in the previous download) (Arsen Hakobyan). * Added the `cluster` table function. Example: `cluster(cluster_name, db, table)`. The `remote` table function can accept the cluster name as the first argument, if it is specified as an identifier. -* The `remote` and `cluster` table functions can be used in `INSERT` requests. +* The `remote` and `cluster` table functions can be used in `INSERT` queries. * Added the `create_table_query` and `engine_full` virtual columns to the `system.tables`table . The `metadata_modification_time` column is virtual. * Added the `data_path` and `metadata_path` columns to `system.tables`and` system.databases` tables, and added the `path` column to the `system.parts` and `system.parts_columns` tables. * Added additional information about merges in the `system.part_log` table. @@ -1040,7 +1135,7 @@ This release contains bug fixes for the previous release 1.1.54310: ### Please note when upgrading: -* There is now a higher default value for the MergeTree setting `max_bytes_to_merge_at_max_space_in_pool` (the maximum total size of data parts to merge, in bytes): it has increased from 100 GiB to 150 GiB. This might result in large merges running after the server upgrade, which could cause an increased load on the disk subsystem. If the free space available on the server is less than twice the total amount of the merges that are running, this will cause all other merges to stop running, including merges of small data parts. As a result, INSERT requests will fail with the message "Merges are processing significantly slower than inserts." Use the ` SELECT * FROM system.merges` request to monitor the situation. You can also check the `DiskSpaceReservedForMerge` metric in the `system.metrics` table, or in Graphite. You don't need to do anything to fix this, since the issue will resolve itself once the large merges finish. If you find this unacceptable, you can restore the previous value for the `max_bytes_to_merge_at_max_space_in_pool` setting. To do this, go to the section in config.xml, set ```107374182400` and restart the server. +* There is now a higher default value for the MergeTree setting `max_bytes_to_merge_at_max_space_in_pool` (the maximum total size of data parts to merge, in bytes): it has increased from 100 GiB to 150 GiB. This might result in large merges running after the server upgrade, which could cause an increased load on the disk subsystem. If the free space available on the server is less than twice the total amount of the merges that are running, this will cause all other merges to stop running, including merges of small data parts. As a result, INSERT queries will fail with the message "Merges are processing significantly slower than inserts." Use the ` SELECT * FROM system.merges` query to monitor the situation. You can also check the `DiskSpaceReservedForMerge` metric in the `system.metrics` table, or in Graphite. You don't need to do anything to fix this, since the issue will resolve itself once the large merges finish. If you find this unacceptable, you can restore the previous value for the `max_bytes_to_merge_at_max_space_in_pool` setting. To do this, go to the section in config.xml, set ```107374182400` and restart the server. ## ClickHouse release 1.1.54284, 2017-08-29 @@ -1133,7 +1228,7 @@ This release contains bug fixes for the previous release 1.1.54276: ### New features: * Distributed DDL (for example, `CREATE TABLE ON CLUSTER`) -* The replicated request `ALTER TABLE CLEAR COLUMN IN PARTITION.` +* The replicated query `ALTER TABLE CLEAR COLUMN IN PARTITION.` * The engine for Dictionary tables (access to dictionary data in the form of a table). * Dictionary database engine (this type of database automatically has Dictionary tables available for all the connected external dictionaries). * You can check for updates to the dictionary by sending a request to the source. diff --git a/cmake/find_base64.cmake b/cmake/find_base64.cmake index 9b6e28a8ccf..8e52c8463c8 100644 --- a/cmake/find_base64.cmake +++ b/cmake/find_base64.cmake @@ -1,4 +1,4 @@ -if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/base64/lib/lib.c") +if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/base64/lib/lib.c") set (MISSING_INTERNAL_BASE64_LIBRARY 1) message (WARNING "submodule contrib/base64 is missing. to fix try run: \n git submodule update --init --recursive") endif () diff --git a/contrib/CMakeLists.txt b/contrib/CMakeLists.txt index 66173322659..989761bfb67 100644 --- a/contrib/CMakeLists.txt +++ b/contrib/CMakeLists.txt @@ -2,7 +2,7 @@ if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU") set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-maybe-uninitialized -Wno-format -Wno-misleading-indentation -Wno-stringop-overflow") - set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-old-style-cast -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-non-virtual-dtor -Wno-maybe-uninitialized -Wno-format -Wno-misleading-indentation -Wno-implicit-fallthrough -Wno-class-memaccess -std=c++1z") + set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-old-style-cast -Wno-unused-function -Wno-unused-variable -Wno-unused-but-set-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-non-virtual-dtor -Wno-maybe-uninitialized -Wno-format -Wno-misleading-indentation -Wno-implicit-fallthrough -Wno-class-memaccess -Wno-sign-compare -std=c++1z") elseif (CMAKE_CXX_COMPILER_ID STREQUAL "Clang") set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-unused-function -Wno-unused-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-format -Wno-parentheses-equality") set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-old-style-cast -Wno-unused-function -Wno-unused-variable -Wno-unused-result -Wno-deprecated-declarations -Wno-non-virtual-dtor -Wno-format -std=c++1z") diff --git a/dbms/programs/compressor/Compressor.cpp b/dbms/programs/compressor/Compressor.cpp index 4a412d987b4..544238bf581 100644 --- a/dbms/programs/compressor/Compressor.cpp +++ b/dbms/programs/compressor/Compressor.cpp @@ -61,7 +61,7 @@ int mainEntryClickHouseCompressor(int argc, char ** argv) ("block-size,b", boost::program_options::value()->default_value(DBMS_DEFAULT_BUFFER_SIZE), "compress in blocks of specified size") ("hc", "use LZ4HC instead of LZ4") ("zstd", "use ZSTD instead of LZ4") - ("level", "compression level") + ("level", boost::program_options::value(), "compression level") ("none", "use no compression instead of LZ4") ("stat", "print block statistics of compressed data") ; @@ -94,7 +94,9 @@ int mainEntryClickHouseCompressor(int argc, char ** argv) else if (use_none) method = DB::CompressionMethod::NONE; - DB::CompressionSettings settings(method, options.count("level") > 0 ? options["level"].as() : DB::CompressionSettings::getDefaultLevel(method)); + DB::CompressionSettings settings(method, options.count("level") + ? options["level"].as() + : DB::CompressionSettings::getDefaultLevel(method)); DB::ReadBufferFromFileDescriptor rb(STDIN_FILENO); DB::WriteBufferFromFileDescriptor wb(STDOUT_FILENO); diff --git a/dbms/programs/server/TCPHandler.cpp b/dbms/programs/server/TCPHandler.cpp index cfb0cd3cd58..e4126b6dd03 100644 --- a/dbms/programs/server/TCPHandler.cpp +++ b/dbms/programs/server/TCPHandler.cpp @@ -370,19 +370,7 @@ void TCPHandler::processInsertQuery(const Settings & global_settings) } /// Send block to the client - table structure. - Block block = state.io.out->getHeader(); - - /// Support insert from old clients without low cardinality type. - if (client_revision && client_revision < DBMS_MIN_REVISION_WITH_LOW_CARDINALITY_TYPE) - { - for (auto & col : block) - { - col.type = recursiveRemoveLowCardinality(col.type); - col.column = recursiveRemoveLowCardinality(col.column); - } - } - - sendData(block); + sendData(state.io.out->getHeader()); readData(global_settings); state.io.out->writeSuffix(); @@ -399,16 +387,6 @@ void TCPHandler::processOrdinaryQuery() { Block header = state.io.in->getHeader(); - /// Send data to old clients without low cardinality type. - if (client_revision && client_revision < DBMS_MIN_REVISION_WITH_LOW_CARDINALITY_TYPE) - { - for (auto & column : header) - { - column.column = recursiveRemoveLowCardinality(column.column); - column.type = recursiveRemoveLowCardinality(column.type); - } - } - if (header) sendData(header); } @@ -782,7 +760,8 @@ void TCPHandler::initBlockInput() state.block_in = std::make_shared( *state.maybe_compressed_in, header, - client_revision); + client_revision, + !connection_context.getSettingsRef().low_cardinality_allow_in_native_format); } } @@ -803,7 +782,8 @@ void TCPHandler::initBlockOutput(const Block & block) state.block_out = std::make_shared( *state.maybe_compressed_out, client_revision, - block.cloneEmpty()); + block.cloneEmpty(), + !connection_context.getSettingsRef().low_cardinality_allow_in_native_format); } } @@ -815,7 +795,8 @@ void TCPHandler::initLogsBlockOutput(const Block & block) state.logs_block_out = std::make_shared( *out, client_revision, - block.cloneEmpty()); + block.cloneEmpty(), + !connection_context.getSettingsRef().low_cardinality_allow_in_native_format); } } diff --git a/dbms/programs/server/TCPHandler.h b/dbms/programs/server/TCPHandler.h index 98b76268047..19641e88d25 100644 --- a/dbms/programs/server/TCPHandler.h +++ b/dbms/programs/server/TCPHandler.h @@ -25,7 +25,7 @@ namespace Poco { class Logger; } namespace DB { -class ColumnsDescription; +struct ColumnsDescription; /// State of query processing. struct QueryState diff --git a/dbms/programs/server/config.xml b/dbms/programs/server/config.xml index 514a081eaca..108e64e3387 100644 --- a/dbms/programs/server/config.xml +++ b/dbms/programs/server/config.xml @@ -187,6 +187,20 @@ + + + + localhost + 9000 + + + + + localhost + 1 + + + diff --git a/dbms/src/AggregateFunctions/AggregateFunctionBoundingRatio.cpp b/dbms/src/AggregateFunctions/AggregateFunctionBoundingRatio.cpp new file mode 100644 index 00000000000..88dc5bda29d --- /dev/null +++ b/dbms/src/AggregateFunctions/AggregateFunctionBoundingRatio.cpp @@ -0,0 +1,36 @@ +#include +#include +#include + + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; +} + +namespace +{ + +AggregateFunctionPtr createAggregateFunctionRate(const std::string & name, const DataTypes & argument_types, const Array & parameters) +{ + assertNoParameters(name, parameters); + assertBinary(name, argument_types); + + if (argument_types.size() < 2) + throw Exception("Aggregate function " + name + " requires at least two arguments", + ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); + + return std::make_shared(argument_types); +} + +} + +void registerAggregateFunctionRate(AggregateFunctionFactory & factory) +{ + factory.registerFunction("boundingRatio", createAggregateFunctionRate, AggregateFunctionFactory::CaseInsensitive); +} + +} diff --git a/dbms/src/AggregateFunctions/AggregateFunctionBoundingRatio.h b/dbms/src/AggregateFunctions/AggregateFunctionBoundingRatio.h new file mode 100644 index 00000000000..f89943f1fc6 --- /dev/null +++ b/dbms/src/AggregateFunctions/AggregateFunctionBoundingRatio.h @@ -0,0 +1,162 @@ +#pragma once + +#include +#include +#include +#include +#include +#include +#include + + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int BAD_ARGUMENTS; +} + +/** Tracks the leftmost and rightmost (x, y) data points. + */ +struct AggregateFunctionBoundingRatioData +{ + struct Point + { + Float64 x; + Float64 y; + }; + + bool empty = true; + Point left; + Point right; + + void add(Float64 x, Float64 y) + { + Point point{x, y}; + + if (empty) + { + left = point; + right = point; + empty = false; + } + else if (point.x < left.x) + { + left = point; + } + else if (point.x > right.x) + { + right = point; + } + } + + void merge(const AggregateFunctionBoundingRatioData & other) + { + if (empty) + { + *this = other; + } + else + { + if (other.left.x < left.x) + left = other.left; + if (other.right.x > right.x) + right = other.right; + } + } + + void serialize(WriteBuffer & buf) const + { + writeBinary(empty, buf); + + if (!empty) + { + writePODBinary(left, buf); + writePODBinary(right, buf); + } + } + + void deserialize(ReadBuffer & buf) + { + readBinary(empty, buf); + + if (!empty) + { + readPODBinary(left, buf); + readPODBinary(right, buf); + } + } +}; + + +class AggregateFunctionBoundingRatio final : public IAggregateFunctionDataHelper +{ +private: + /** Calculates the slope of a line between leftmost and rightmost data points. + * (y2 - y1) / (x2 - x1) + */ + Float64 getBoundingRatio(const AggregateFunctionBoundingRatioData & data) const + { + if (data.empty) + return std::numeric_limits::quiet_NaN(); + + return (data.right.y - data.left.y) / (data.right.x - data.left.x); + } + +public: + String getName() const override + { + return "boundingRatio"; + } + + AggregateFunctionBoundingRatio(const DataTypes & arguments) + { + const auto x_arg = arguments.at(0).get(); + const auto y_arg = arguments.at(0).get(); + + if (!x_arg->isValueRepresentedByNumber() || !y_arg->isValueRepresentedByNumber()) + throw Exception("Illegal types of arguments of aggregate function " + getName() + ", must have number representation.", + ErrorCodes::BAD_ARGUMENTS); + } + + DataTypePtr getReturnType() const override + { + return std::make_shared(); + } + + void add(AggregateDataPtr place, const IColumn ** columns, const size_t row_num, Arena *) const override + { + /// TODO Inefficient. + const auto x = applyVisitor(FieldVisitorConvertToNumber(), (*columns[0])[row_num]); + const auto y = applyVisitor(FieldVisitorConvertToNumber(), (*columns[1])[row_num]); + data(place).add(x, y); + } + + void merge(AggregateDataPtr place, ConstAggregateDataPtr rhs, Arena *) const override + { + data(place).merge(data(rhs)); + } + + void serialize(ConstAggregateDataPtr place, WriteBuffer & buf) const override + { + data(place).serialize(buf); + } + + void deserialize(AggregateDataPtr place, ReadBuffer & buf, Arena *) const override + { + data(place).deserialize(buf); + } + + void insertResultInto(ConstAggregateDataPtr place, IColumn & to) const override + { + static_cast(to).getData().push_back(getBoundingRatio(data(place))); + } + + const char * getHeaderFilePath() const override + { + return __FILE__; + } +}; + +} diff --git a/dbms/src/AggregateFunctions/AggregateFunctionHistogram.cpp b/dbms/src/AggregateFunctions/AggregateFunctionHistogram.cpp index de58d7a36d3..eaacb10be01 100644 --- a/dbms/src/AggregateFunctions/AggregateFunctionHistogram.cpp +++ b/dbms/src/AggregateFunctions/AggregateFunctionHistogram.cpp @@ -17,6 +17,7 @@ namespace ErrorCodes extern const int PARAMETER_OUT_OF_BOUND; } + namespace { @@ -44,6 +45,8 @@ AggregateFunctionPtr createAggregateFunctionHistogram(const std::string & name, throw Exception("Illegal type " + arguments[0]->getName() + " of argument for aggregate function " + name, ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); return res; + + return nullptr; } } diff --git a/dbms/src/AggregateFunctions/registerAggregateFunctions.cpp b/dbms/src/AggregateFunctions/registerAggregateFunctions.cpp index 800beda1d53..f5e15b6a887 100644 --- a/dbms/src/AggregateFunctions/registerAggregateFunctions.cpp +++ b/dbms/src/AggregateFunctions/registerAggregateFunctions.cpp @@ -15,6 +15,7 @@ void registerAggregateFunctionGroupArrayInsertAt(AggregateFunctionFactory &); void registerAggregateFunctionsQuantile(AggregateFunctionFactory &); void registerAggregateFunctionsSequenceMatch(AggregateFunctionFactory &); void registerAggregateFunctionWindowFunnel(AggregateFunctionFactory &); +void registerAggregateFunctionRate(AggregateFunctionFactory &); void registerAggregateFunctionsMinMaxAny(AggregateFunctionFactory &); void registerAggregateFunctionsStatisticsStable(AggregateFunctionFactory &); void registerAggregateFunctionsStatisticsSimple(AggregateFunctionFactory &); @@ -50,6 +51,7 @@ void registerAggregateFunctions() registerAggregateFunctionsQuantile(factory); registerAggregateFunctionsSequenceMatch(factory); registerAggregateFunctionWindowFunnel(factory); + registerAggregateFunctionRate(factory); registerAggregateFunctionsMinMaxAny(factory); registerAggregateFunctionsStatisticsStable(factory); registerAggregateFunctionsStatisticsSimple(factory); diff --git a/dbms/src/Core/Block.h b/dbms/src/Core/Block.h index a3198a0fb74..d8efc939ecd 100644 --- a/dbms/src/Core/Block.h +++ b/dbms/src/Core/Block.h @@ -97,8 +97,8 @@ public: /// Approximate number of allocated bytes in memory - for profiling and limits. size_t allocatedBytes() const; - operator bool() const { return !data.empty(); } - bool operator!() const { return data.empty(); } + operator bool() const { return !!columns(); } + bool operator!() const { return !this->operator bool(); } /** Get a list of column names separated by commas. */ std::string dumpNames() const; diff --git a/dbms/src/DataStreams/NativeBlockInputStream.cpp b/dbms/src/DataStreams/NativeBlockInputStream.cpp index 7cd4a571a60..7eeba3b9e50 100644 --- a/dbms/src/DataStreams/NativeBlockInputStream.cpp +++ b/dbms/src/DataStreams/NativeBlockInputStream.cpp @@ -29,8 +29,8 @@ NativeBlockInputStream::NativeBlockInputStream(ReadBuffer & istr_, UInt64 server { } -NativeBlockInputStream::NativeBlockInputStream(ReadBuffer & istr_, const Block & header_, UInt64 server_revision_) - : istr(istr_), header(header_), server_revision(server_revision_) +NativeBlockInputStream::NativeBlockInputStream(ReadBuffer & istr_, const Block & header_, UInt64 server_revision_, bool convert_types_to_low_cardinality_) + : istr(istr_), header(header_), server_revision(server_revision_), convert_types_to_low_cardinality(convert_types_to_low_cardinality_) { } @@ -154,7 +154,8 @@ Block NativeBlockInputStream::readImpl() column.column = std::move(read_column); /// Support insert from old clients without low cardinality type. - if (header && server_revision && server_revision < DBMS_MIN_REVISION_WITH_LOW_CARDINALITY_TYPE) + bool revision_without_low_cardinality = server_revision && server_revision < DBMS_MIN_REVISION_WITH_LOW_CARDINALITY_TYPE; + if (header && (convert_types_to_low_cardinality || revision_without_low_cardinality)) { column.column = recursiveLowCardinalityConversion(column.column, column.type, header.getByPosition(i).type); column.type = header.getByPosition(i).type; diff --git a/dbms/src/DataStreams/NativeBlockInputStream.h b/dbms/src/DataStreams/NativeBlockInputStream.h index f16b8c4a595..c9b565cd9d7 100644 --- a/dbms/src/DataStreams/NativeBlockInputStream.h +++ b/dbms/src/DataStreams/NativeBlockInputStream.h @@ -65,7 +65,7 @@ public: /// For cases when data structure (header) is known in advance. /// NOTE We may use header for data validation and/or type conversions. It is not implemented. - NativeBlockInputStream(ReadBuffer & istr_, const Block & header_, UInt64 server_revision_); + NativeBlockInputStream(ReadBuffer & istr_, const Block & header_, UInt64 server_revision_, bool convert_types_to_low_cardinality_ = false); /// For cases when we have an index. It allows to skip columns. Only columns specified in the index will be read. NativeBlockInputStream(ReadBuffer & istr_, UInt64 server_revision_, @@ -91,6 +91,8 @@ private: IndexForNativeFormat::Blocks::const_iterator index_block_end; IndexOfBlockForNativeFormat::Columns::const_iterator index_column_it; + bool convert_types_to_low_cardinality = false; + /// If an index is specified, then `istr` must be CompressedReadBufferFromFile. Unused otherwise. CompressedReadBufferFromFile * istr_concrete = nullptr; diff --git a/dbms/src/DataStreams/NativeBlockOutputStream.cpp b/dbms/src/DataStreams/NativeBlockOutputStream.cpp index c87d82b2506..1869badfe14 100644 --- a/dbms/src/DataStreams/NativeBlockOutputStream.cpp +++ b/dbms/src/DataStreams/NativeBlockOutputStream.cpp @@ -21,10 +21,10 @@ namespace ErrorCodes NativeBlockOutputStream::NativeBlockOutputStream( - WriteBuffer & ostr_, UInt64 client_revision_, const Block & header_, + WriteBuffer & ostr_, UInt64 client_revision_, const Block & header_, bool remove_low_cardinality_, WriteBuffer * index_ostr_, size_t initial_size_of_file_) : ostr(ostr_), client_revision(client_revision_), header(header_), - index_ostr(index_ostr_), initial_size_of_file(initial_size_of_file_) + index_ostr(index_ostr_), initial_size_of_file(initial_size_of_file_), remove_low_cardinality(remove_low_cardinality_) { if (index_ostr) { @@ -104,7 +104,7 @@ void NativeBlockOutputStream::write(const Block & block) ColumnWithTypeAndName column = block.safeGetByPosition(i); /// Send data to old clients without low cardinality type. - if (client_revision && client_revision < DBMS_MIN_REVISION_WITH_LOW_CARDINALITY_TYPE) + if (remove_low_cardinality || (client_revision && client_revision < DBMS_MIN_REVISION_WITH_LOW_CARDINALITY_TYPE)) { column.column = recursiveRemoveLowCardinality(column.column); column.type = recursiveRemoveLowCardinality(column.type); diff --git a/dbms/src/DataStreams/NativeBlockOutputStream.h b/dbms/src/DataStreams/NativeBlockOutputStream.h index 7e3f14e06ea..9834b90ef2a 100644 --- a/dbms/src/DataStreams/NativeBlockOutputStream.h +++ b/dbms/src/DataStreams/NativeBlockOutputStream.h @@ -23,7 +23,7 @@ public: /** If non-zero client_revision is specified, additional block information can be written. */ NativeBlockOutputStream( - WriteBuffer & ostr_, UInt64 client_revision_, const Block & header_, + WriteBuffer & ostr_, UInt64 client_revision_, const Block & header_, bool remove_low_cardinality_ = false, WriteBuffer * index_ostr_ = nullptr, size_t initial_size_of_file_ = 0); Block getHeader() const override { return header; } @@ -42,6 +42,8 @@ private: size_t initial_size_of_file; /// The initial size of the data file, if `append` done. Used for the index. /// If you need to write index, then `ostr` must be a CompressedWriteBuffer. CompressedWriteBuffer * ostr_concrete = nullptr; + + bool remove_low_cardinality; }; } diff --git a/dbms/src/DataTypes/DataTypeInterval.cpp b/dbms/src/DataTypes/DataTypeInterval.cpp index ab2993b884a..c7ee3ede334 100644 --- a/dbms/src/DataTypes/DataTypeInterval.cpp +++ b/dbms/src/DataTypes/DataTypeInterval.cpp @@ -19,6 +19,7 @@ void registerDataTypeInterval(DataTypeFactory & factory) factory.registerSimpleDataType("IntervalDay", [] { return DataTypePtr(std::make_shared(DataTypeInterval::Day)); }); factory.registerSimpleDataType("IntervalWeek", [] { return DataTypePtr(std::make_shared(DataTypeInterval::Week)); }); factory.registerSimpleDataType("IntervalMonth", [] { return DataTypePtr(std::make_shared(DataTypeInterval::Month)); }); + factory.registerSimpleDataType("IntervalQuarter", [] { return DataTypePtr(std::make_shared(DataTypeInterval::Quarter)); }); factory.registerSimpleDataType("IntervalYear", [] { return DataTypePtr(std::make_shared(DataTypeInterval::Year)); }); } diff --git a/dbms/src/DataTypes/DataTypeInterval.h b/dbms/src/DataTypes/DataTypeInterval.h index afbcf2d6a45..6f4f08c16c0 100644 --- a/dbms/src/DataTypes/DataTypeInterval.h +++ b/dbms/src/DataTypes/DataTypeInterval.h @@ -25,6 +25,7 @@ public: Day, Week, Month, + Quarter, Year }; @@ -46,6 +47,7 @@ public: case Day: return "Day"; case Week: return "Week"; case Month: return "Month"; + case Quarter: return "Quarter"; case Year: return "Year"; default: __builtin_unreachable(); } diff --git a/dbms/src/Functions/FunctionDateOrDateTimeAddInterval.h b/dbms/src/Functions/FunctionDateOrDateTimeAddInterval.h index c4b7639908f..9b27282ec19 100644 --- a/dbms/src/Functions/FunctionDateOrDateTimeAddInterval.h +++ b/dbms/src/Functions/FunctionDateOrDateTimeAddInterval.h @@ -113,6 +113,21 @@ struct AddMonthsImpl } }; +struct AddQuartersImpl +{ + static constexpr auto name = "addQuarters"; + + static inline UInt32 execute(UInt32 t, Int64 delta, const DateLUTImpl & time_zone) + { + return time_zone.addQuarters(t, delta); + } + + static inline UInt16 execute(UInt16 d, Int64 delta, const DateLUTImpl & time_zone) + { + return time_zone.addQuarters(DayNum(d), delta); + } +}; + struct AddYearsImpl { static constexpr auto name = "addYears"; @@ -149,6 +164,7 @@ struct SubtractHoursImpl : SubtractIntervalImpl { static constexpr struct SubtractDaysImpl : SubtractIntervalImpl { static constexpr auto name = "subtractDays"; }; struct SubtractWeeksImpl : SubtractIntervalImpl { static constexpr auto name = "subtractWeeks"; }; struct SubtractMonthsImpl : SubtractIntervalImpl { static constexpr auto name = "subtractMonths"; }; +struct SubtractQuartersImpl : SubtractIntervalImpl { static constexpr auto name = "subtractQuarters"; }; struct SubtractYearsImpl : SubtractIntervalImpl { static constexpr auto name = "subtractYears"; }; diff --git a/dbms/src/Functions/FunctionsConversion.cpp b/dbms/src/Functions/FunctionsConversion.cpp index fdfc153f594..a83a756010c 100644 --- a/dbms/src/Functions/FunctionsConversion.cpp +++ b/dbms/src/Functions/FunctionsConversion.cpp @@ -89,6 +89,7 @@ void registerFunctionsConversion(FunctionFactory & factory) factory.registerFunction>(); factory.registerFunction>(); factory.registerFunction>(); + factory.registerFunction>(); factory.registerFunction>(); } diff --git a/dbms/src/Functions/FunctionsConversion.h b/dbms/src/Functions/FunctionsConversion.h index 1428fec4f48..e461a4542f8 100644 --- a/dbms/src/Functions/FunctionsConversion.h +++ b/dbms/src/Functions/FunctionsConversion.h @@ -738,6 +738,7 @@ DEFINE_NAME_TO_INTERVAL(Hour) DEFINE_NAME_TO_INTERVAL(Day) DEFINE_NAME_TO_INTERVAL(Week) DEFINE_NAME_TO_INTERVAL(Month) +DEFINE_NAME_TO_INTERVAL(Quarter) DEFINE_NAME_TO_INTERVAL(Year) #undef DEFINE_NAME_TO_INTERVAL @@ -1138,6 +1139,9 @@ struct ToIntMonotonicity static IFunction::Monotonicity get(const IDataType & type, const Field & left, const Field & right) { + if (!type.isValueRepresentedByNumber()) + return {}; + size_t size_of_type = type.getSizeOfValueInMemory(); /// If type is expanding @@ -1153,14 +1157,10 @@ struct ToIntMonotonicity } /// If type is same, too. (Enum has separate case, because it is different data type) - if (checkAndGetDataType>(&type) || + if (checkAndGetDataType>(&type) || checkAndGetDataType>(&type)) return { true, true, true }; - /// In other cases, if range is unbounded, we don't know, whether function is monotonic or not. - if (left.isNull() || right.isNull()) - return {}; - /// If converting from float, for monotonicity, arguments must fit in range of result type. if (WhichDataType(type).isFloat()) { diff --git a/dbms/src/Functions/FunctionsStringSearch.cpp b/dbms/src/Functions/FunctionsStringSearch.cpp index 337edbbc168..af7ea515f4e 100644 --- a/dbms/src/Functions/FunctionsStringSearch.cpp +++ b/dbms/src/Functions/FunctionsStringSearch.cpp @@ -1080,7 +1080,7 @@ void registerFunctionsStringSearch(FunctionFactory & factory) factory.registerFunction(); factory.registerFunction(); factory.registerFunction(); - factory.registerFunction(); + factory.registerFunction(FunctionFactory::CaseInsensitive); factory.registerFunction(); factory.registerFunction(); factory.registerFunction(); diff --git a/dbms/src/Functions/abs.cpp b/dbms/src/Functions/abs.cpp index e94be787cc4..872c8404176 100644 --- a/dbms/src/Functions/abs.cpp +++ b/dbms/src/Functions/abs.cpp @@ -48,7 +48,7 @@ template <> struct FunctionUnaryArithmeticMonotonicity void registerFunctionAbs(FunctionFactory & factory) { - factory.registerFunction(); + factory.registerFunction(FunctionFactory::CaseInsensitive); } } diff --git a/dbms/src/Functions/addQuarters.cpp b/dbms/src/Functions/addQuarters.cpp new file mode 100644 index 00000000000..c37fb5561c8 --- /dev/null +++ b/dbms/src/Functions/addQuarters.cpp @@ -0,0 +1,18 @@ +#include +#include +#include + + +namespace DB +{ + +using FunctionAddQuarters = FunctionDateOrDateTimeAddInterval; + +void registerFunctionAddQuarters(FunctionFactory & factory) +{ + factory.registerFunction(); +} + +} + + diff --git a/dbms/src/Functions/rand.cpp b/dbms/src/Functions/rand.cpp index cd4ced96b7e..333396c1ecd 100644 --- a/dbms/src/Functions/rand.cpp +++ b/dbms/src/Functions/rand.cpp @@ -9,7 +9,7 @@ using FunctionRand = FunctionRandom; void registerFunctionRand(FunctionFactory & factory) { - factory.registerFunction(); + factory.registerFunction(FunctionFactory::CaseInsensitive); } } diff --git a/dbms/src/Functions/regexpQuoteMeta.cpp b/dbms/src/Functions/regexpQuoteMeta.cpp new file mode 100644 index 00000000000..cc8f1791578 --- /dev/null +++ b/dbms/src/Functions/regexpQuoteMeta.cpp @@ -0,0 +1,114 @@ +#include +#include +#include +#include +#include + + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int ILLEGAL_COLUMN; + extern const int ILLEGAL_TYPE_OF_ARGUMENT; +} + +class FunctionRegexpQuoteMeta : public IFunction +{ +public: + static constexpr auto name = "regexpQuoteMeta"; + + static FunctionPtr create(const Context &) + { + return std::make_shared(); + } + + String getName() const override + { + return name; + } + + size_t getNumberOfArguments() const override + { + return 1; + } + + bool useDefaultImplementationForConstants() const override + { + return true; + } + + DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override + { + if (!WhichDataType(arguments[0].type).isString()) + throw Exception( + "Illegal type " + arguments[0].type->getName() + " of 1 argument of function " + getName() + ". Must be String.", + ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + + return std::make_shared(); + } + + void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) override + { + const ColumnPtr & column_string = block.getByPosition(arguments[0]).column; + const ColumnString * input = checkAndGetColumn(column_string.get()); + + if (!input) + throw Exception( + "Illegal column " + block.getByPosition(arguments[0]).column->getName() + " of first argument of function " + getName(), + ErrorCodes::ILLEGAL_COLUMN); + + auto dst_column = ColumnString::create(); + auto & dst_data = dst_column->getChars(); + auto & dst_offsets = dst_column->getOffsets(); + + dst_offsets.resize(input_rows_count); + + const ColumnString::Offsets & src_offsets = input->getOffsets(); + + auto src_begin = reinterpret_cast(input->getChars().data()); + auto src_pos = src_begin; + + for (size_t row_idx = 0; row_idx < input_rows_count; ++row_idx) + { + /// NOTE This implementation slightly differs from re2::RE2::QuoteMeta. + /// It escapes zero byte as \0 instead of \x00 + /// and it escapes only required characters. + /// This is Ok. Look at comments in re2.cc + + const char * src_end = src_begin + src_offsets[row_idx] - 1; + + while (true) + { + const char * next_src_pos = find_first_symbols<'\0', '\\', '|', '(', ')', '^', '$', '.', '[', '?', '*', '+', '{', ':', '-'>(src_pos, src_end); + + size_t bytes_to_copy = next_src_pos - src_pos; + size_t old_dst_size = dst_data.size(); + dst_data.resize(old_dst_size + bytes_to_copy); + memcpySmallAllowReadWriteOverflow15(dst_data.data() + old_dst_size, src_pos, bytes_to_copy); + src_pos = next_src_pos + 1; + + if (next_src_pos == src_end) + { + dst_data.emplace_back('\0'); + break; + } + + dst_data.emplace_back('\\'); + dst_data.emplace_back(*next_src_pos); + } + + dst_offsets[row_idx] = dst_data.size(); + } + + block.getByPosition(result).column = std::move(dst_column); + } + +}; + +void registerFunctionRegexpQuoteMeta(FunctionFactory & factory) +{ + factory.registerFunction(); +} +} diff --git a/dbms/src/Functions/registerFunctionsDateTime.cpp b/dbms/src/Functions/registerFunctionsDateTime.cpp index 3e7f2a6affd..5751fc800a9 100644 --- a/dbms/src/Functions/registerFunctionsDateTime.cpp +++ b/dbms/src/Functions/registerFunctionsDateTime.cpp @@ -47,6 +47,7 @@ void registerFunctionAddHours(FunctionFactory &); void registerFunctionAddDays(FunctionFactory &); void registerFunctionAddWeeks(FunctionFactory &); void registerFunctionAddMonths(FunctionFactory &); +void registerFunctionAddQuarters(FunctionFactory &); void registerFunctionAddYears(FunctionFactory &); void registerFunctionSubtractSeconds(FunctionFactory &); void registerFunctionSubtractMinutes(FunctionFactory &); @@ -54,6 +55,7 @@ void registerFunctionSubtractHours(FunctionFactory &); void registerFunctionSubtractDays(FunctionFactory &); void registerFunctionSubtractWeeks(FunctionFactory &); void registerFunctionSubtractMonths(FunctionFactory &); +void registerFunctionSubtractQuarters(FunctionFactory &); void registerFunctionSubtractYears(FunctionFactory &); void registerFunctionDateDiff(FunctionFactory &); void registerFunctionToTimeZone(FunctionFactory &); @@ -106,6 +108,7 @@ void registerFunctionsDateTime(FunctionFactory & factory) registerFunctionAddDays(factory); registerFunctionAddWeeks(factory); registerFunctionAddMonths(factory); + registerFunctionAddQuarters(factory); registerFunctionAddYears(factory); registerFunctionSubtractSeconds(factory); registerFunctionSubtractMinutes(factory); @@ -113,6 +116,7 @@ void registerFunctionsDateTime(FunctionFactory & factory) registerFunctionSubtractDays(factory); registerFunctionSubtractWeeks(factory); registerFunctionSubtractMonths(factory); + registerFunctionSubtractQuarters(factory); registerFunctionSubtractYears(factory); registerFunctionDateDiff(factory); registerFunctionToTimeZone(factory); diff --git a/dbms/src/Functions/registerFunctionsString.cpp b/dbms/src/Functions/registerFunctionsString.cpp index 3a07d8bbd65..15d37d939b0 100644 --- a/dbms/src/Functions/registerFunctionsString.cpp +++ b/dbms/src/Functions/registerFunctionsString.cpp @@ -21,6 +21,8 @@ void registerFunctionSubstringUTF8(FunctionFactory &); void registerFunctionAppendTrailingCharIfAbsent(FunctionFactory &); void registerFunctionStartsWith(FunctionFactory &); void registerFunctionEndsWith(FunctionFactory &); +void registerFunctionTrim(FunctionFactory &); +void registerFunctionRegexpQuoteMeta(FunctionFactory &); #if USE_BASE64 void registerFunctionBase64Encode(FunctionFactory &); @@ -46,6 +48,8 @@ void registerFunctionsString(FunctionFactory & factory) registerFunctionAppendTrailingCharIfAbsent(factory); registerFunctionStartsWith(factory); registerFunctionEndsWith(factory); + registerFunctionTrim(factory); + registerFunctionRegexpQuoteMeta(factory); #if USE_BASE64 registerFunctionBase64Encode(factory); registerFunctionBase64Decode(factory); diff --git a/dbms/src/Functions/reverse.cpp b/dbms/src/Functions/reverse.cpp index 065e1d28073..b7447a7882b 100644 --- a/dbms/src/Functions/reverse.cpp +++ b/dbms/src/Functions/reverse.cpp @@ -147,7 +147,7 @@ private: void registerFunctionReverse(FunctionFactory & factory) { - factory.registerFunction(); + factory.registerFunction(FunctionFactory::CaseInsensitive); } } diff --git a/dbms/src/Functions/subtractQuarters.cpp b/dbms/src/Functions/subtractQuarters.cpp new file mode 100644 index 00000000000..6c066ed17a1 --- /dev/null +++ b/dbms/src/Functions/subtractQuarters.cpp @@ -0,0 +1,18 @@ +#include +#include +#include + + +namespace DB +{ + +using FunctionSubtractQuarters = FunctionDateOrDateTimeAddInterval; + +void registerFunctionSubtractQuarters(FunctionFactory & factory) +{ + factory.registerFunction(); +} + +} + + diff --git a/dbms/src/Functions/trim.cpp b/dbms/src/Functions/trim.cpp new file mode 100644 index 00000000000..10c9b6557aa --- /dev/null +++ b/dbms/src/Functions/trim.cpp @@ -0,0 +1,142 @@ +#include +#include +#include + +#if __SSE4_2__ +#include +#endif + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int ILLEGAL_TYPE_OF_ARGUMENT; +} + +struct TrimModeLeft +{ + static constexpr auto name = "trimLeft"; + static constexpr bool trim_left = true; + static constexpr bool trim_right = false; +}; + +struct TrimModeRight +{ + static constexpr auto name = "trimRight"; + static constexpr bool trim_left = false; + static constexpr bool trim_right = true; +}; + +struct TrimModeBoth +{ + static constexpr auto name = "trimBoth"; + static constexpr bool trim_left = true; + static constexpr bool trim_right = true; +}; + +template +class FunctionTrimImpl +{ +public: + static void vector( + const ColumnString::Chars & data, + const ColumnString::Offsets & offsets, + ColumnString::Chars & res_data, + ColumnString::Offsets & res_offsets) + { + size_t size = offsets.size(); + res_offsets.resize(size); + res_data.reserve(data.size()); + + size_t prev_offset = 0; + size_t res_offset = 0; + + const UInt8 * start; + size_t length; + + for (size_t i = 0; i < size; ++i) + { + execute(reinterpret_cast(&data[prev_offset]), offsets[i] - prev_offset - 1, start, length); + + res_data.resize(res_data.size() + length + 1); + memcpy(&res_data[res_offset], start, length); + res_offset += length + 1; + res_data[res_offset - 1] = '\0'; + + res_offsets[i] = res_offset; + prev_offset = offsets[i]; + } + } + + static void vector_fixed(const ColumnString::Chars &, size_t, ColumnString::Chars &) + { + throw Exception("Functions trimLeft, trimRight and trimBoth cannot work with FixedString argument", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT); + } + +private: + static void execute(const UInt8 * data, size_t size, const UInt8 *& res_data, size_t & res_size) + { + size_t chars_to_trim_left = 0; + size_t chars_to_trim_right = 0; + char whitespace = ' '; +#if __SSE4_2__ + const auto bytes_sse = sizeof(__m128i); + const auto size_sse = size - (size % bytes_sse); + const auto whitespace_mask = _mm_set1_epi8(whitespace); + constexpr auto base_sse_mode = _SIDD_UBYTE_OPS | _SIDD_CMP_EQUAL_EACH | _SIDD_NEGATIVE_POLARITY; + auto mask = bytes_sse; +#endif + + if constexpr (mode::trim_left) + { +#if __SSE4_2__ + /// skip whitespace from left in blocks of up to 16 characters + constexpr auto left_sse_mode = base_sse_mode | _SIDD_LEAST_SIGNIFICANT; + while (mask == bytes_sse && chars_to_trim_left < size_sse) + { + const auto chars = _mm_loadu_si128(reinterpret_cast(data + chars_to_trim_left)); + mask = _mm_cmpistri(whitespace_mask, chars, left_sse_mode); + chars_to_trim_left += mask; + } +#endif + /// skip remaining whitespace from left, character by character + while (chars_to_trim_left < size && data[chars_to_trim_left] == whitespace) + ++chars_to_trim_left; + } + + if constexpr (mode::trim_right) + { + constexpr auto right_sse_mode = base_sse_mode | _SIDD_MOST_SIGNIFICANT; + const auto trim_right_size = size - chars_to_trim_left; +#if __SSE4_2__ + /// try to skip whitespace from right in blocks of up to 16 characters + const auto trim_right_size_sse = trim_right_size - (trim_right_size % bytes_sse); + while (mask == bytes_sse && chars_to_trim_right < trim_right_size_sse) + { + const auto chars = _mm_loadu_si128(reinterpret_cast(data + size - chars_to_trim_right - bytes_sse)); + mask = _mm_cmpistri(whitespace_mask, chars, right_sse_mode); + chars_to_trim_right += mask; + } +#endif + /// skip remaining whitespace from right, character by character + while (chars_to_trim_right < trim_right_size && data[size - chars_to_trim_right - 1] == whitespace) + ++chars_to_trim_right; + } + + res_data = data + chars_to_trim_left; + res_size = size - chars_to_trim_left - chars_to_trim_right; + } +}; + +using FunctionTrimLeft = FunctionStringToString, TrimModeLeft>; +using FunctionTrimRight = FunctionStringToString, TrimModeRight>; +using FunctionTrimBoth = FunctionStringToString, TrimModeBoth>; + +void registerFunctionTrim(FunctionFactory & factory) +{ + factory.registerFunction(); + factory.registerFunction(); + factory.registerFunction(); +} +} diff --git a/dbms/src/IO/WriteBuffer.h b/dbms/src/IO/WriteBuffer.h index 1d25d721ed0..e6a73ea90e3 100644 --- a/dbms/src/IO/WriteBuffer.h +++ b/dbms/src/IO/WriteBuffer.h @@ -76,7 +76,7 @@ public: { nextIfAtEnd(); size_t bytes_to_copy = std::min(static_cast(working_buffer.end() - pos), n - bytes_copied); - std::memcpy(pos, from + bytes_copied, bytes_to_copy); + memcpy(pos, from + bytes_copied, bytes_to_copy); pos += bytes_to_copy; bytes_copied += bytes_to_copy; } diff --git a/dbms/src/Interpreters/Aggregator.cpp b/dbms/src/Interpreters/Aggregator.cpp index 1fd614affc4..03f04d791a0 100644 --- a/dbms/src/Interpreters/Aggregator.cpp +++ b/dbms/src/Interpreters/Aggregator.cpp @@ -615,7 +615,6 @@ void NO_INLINE Aggregator::executeImplCase( AggregateDataPtr overflow_row) const { /// NOTE When editing this code, also pay attention to SpecializedAggregator.h. - /// TODO for low cardinality optimization. /// For all rows. typename Method::Key prev_key; diff --git a/dbms/src/Interpreters/Cluster.cpp b/dbms/src/Interpreters/Cluster.cpp index 0cd6dde2625..db81bc58061 100644 --- a/dbms/src/Interpreters/Cluster.cpp +++ b/dbms/src/Interpreters/Cluster.cpp @@ -397,14 +397,24 @@ void Cluster::initMisc() std::unique_ptr Cluster::getClusterWithSingleShard(size_t index) const { - return std::unique_ptr{ new Cluster(*this, index) }; + return std::unique_ptr{ new Cluster(*this, {index}) }; } -Cluster::Cluster(const Cluster & from, size_t index) - : shards_info{from.shards_info[index]} +std::unique_ptr Cluster::getClusterWithMultipleShards(const std::vector & indices) const { - if (!from.addresses_with_failover.empty()) - addresses_with_failover.emplace_back(from.addresses_with_failover[index]); + return std::unique_ptr{ new Cluster(*this, indices) }; +} + +Cluster::Cluster(const Cluster & from, const std::vector & indices) + : shards_info{} +{ + for (size_t index : indices) + { + shards_info.emplace_back(from.shards_info.at(index)); + + if (!from.addresses_with_failover.empty()) + addresses_with_failover.emplace_back(from.addresses_with_failover.at(index)); + } initMisc(); } diff --git a/dbms/src/Interpreters/Cluster.h b/dbms/src/Interpreters/Cluster.h index 2ef8b889160..f998ad8f912 100644 --- a/dbms/src/Interpreters/Cluster.h +++ b/dbms/src/Interpreters/Cluster.h @@ -143,6 +143,9 @@ public: /// Get a subcluster consisting of one shard - index by count (from 0) of the shard of this cluster. std::unique_ptr getClusterWithSingleShard(size_t index) const; + /// Get a subcluster consisting of one or multiple shards - indexes by count (from 0) of the shard of this cluster. + std::unique_ptr getClusterWithMultipleShards(const std::vector & indices) const; + private: using SlotToShard = std::vector; SlotToShard slot_to_shard; @@ -153,8 +156,8 @@ public: private: void initMisc(); - /// For getClusterWithSingleShard implementation. - Cluster(const Cluster & from, size_t index); + /// For getClusterWithMultipleShards implementation. + Cluster(const Cluster & from, const std::vector & indices); String hash_of_addresses; /// Description of the cluster shards. diff --git a/dbms/src/Interpreters/DDLWorker.cpp b/dbms/src/Interpreters/DDLWorker.cpp index 60183cd0c3a..808ff9b1cf9 100644 --- a/dbms/src/Interpreters/DDLWorker.cpp +++ b/dbms/src/Interpreters/DDLWorker.cpp @@ -204,7 +204,6 @@ static bool isSupportedAlterType(int type) ASTAlterCommand::ADD_COLUMN, ASTAlterCommand::DROP_COLUMN, ASTAlterCommand::MODIFY_COLUMN, - ASTAlterCommand::MODIFY_PRIMARY_KEY, ASTAlterCommand::DROP_PARTITION, ASTAlterCommand::DELETE, ASTAlterCommand::UPDATE, diff --git a/dbms/src/Interpreters/Settings.h b/dbms/src/Interpreters/Settings.h index fc8ea2c4630..5b55ebba908 100644 --- a/dbms/src/Interpreters/Settings.h +++ b/dbms/src/Interpreters/Settings.h @@ -89,6 +89,7 @@ struct Settings M(SettingBool, skip_unavailable_shards, false, "Silently skip unavailable shards.") \ \ M(SettingBool, distributed_group_by_no_merge, false, "Do not merge aggregation states from different servers for distributed query processing - in case it is for certain that there are different keys on different shards.") \ + M(SettingBool, optimize_skip_unused_shards, false, "Assumes that data is distributed by sharding_key. Optimization to skip unused shards if SELECT query filters by sharding_key.") \ \ M(SettingUInt64, merge_tree_min_rows_for_concurrent_read, (20 * 8192), "If at least as many lines are read from one file, the reading can be parallelized.") \ M(SettingUInt64, merge_tree_min_rows_for_seek, 0, "You can skip reading more than that number of rows at the price of one seek per file.") \ @@ -294,7 +295,7 @@ struct Settings M(SettingBool, parallel_view_processing, false, "Enables pushing to attached views concurrently instead of sequentially.") \ M(SettingBool, enable_debug_queries, false, "Enables debug queries such as AST.") \ M(SettingBool, enable_unaligned_array_join, false, "Allow ARRAY JOIN with multiple arrays that have different sizes. When this settings is enabled, arrays will be resized to the longest one.") \ - + M(SettingBool, low_cardinality_allow_in_native_format, true, "Use LowCardinality type in Native format. Otherwise, convert LowCardinality columns to ordinary for select query, and convert ordinary columns to required LowCardinality for insert query.") \ #define DECLARE(TYPE, NAME, DEFAULT, DESCRIPTION) \ TYPE NAME {DEFAULT}; diff --git a/dbms/src/Interpreters/SpecializedAggregator.h b/dbms/src/Interpreters/SpecializedAggregator.h index 8ef66135a65..615911a7224 100644 --- a/dbms/src/Interpreters/SpecializedAggregator.h +++ b/dbms/src/Interpreters/SpecializedAggregator.h @@ -108,7 +108,10 @@ void NO_INLINE Aggregator::executeSpecialized( AggregateDataPtr overflow_row) const { typename Method::State state; - state.init(key_columns); + if constexpr (Method::low_cardinality_optimization) + state.init(key_columns, aggregation_state_cache); + else + state.init(key_columns); if (!no_more_keys) executeSpecializedCase( @@ -133,15 +136,19 @@ void NO_INLINE Aggregator::executeSpecializedCase( AggregateDataPtr overflow_row) const { /// For all rows. - typename Method::iterator it; typename Method::Key prev_key; + AggregateDataPtr value = nullptr; for (size_t i = 0; i < rows; ++i) { - bool inserted; /// Inserted a new key, or was this key already? - bool overflow = false; /// New key did not fit in the hash table because of no_more_keys. + bool inserted = false; /// Inserted a new key, or was this key already? /// Get the key to insert into the hash table. - typename Method::Key key = state.getKey(key_columns, params.keys_size, i, key_sizes, keys, *aggregates_pool); + typename Method::Key key; + if constexpr (!Method::low_cardinality_optimization) + key = state.getKey(key_columns, params.keys_size, i, key_sizes, keys, *aggregates_pool); + + AggregateDataPtr * aggregate_data = nullptr; + typename Method::iterator it; /// Is not used if Method::low_cardinality_optimization if (!no_more_keys) /// Insert. { @@ -150,8 +157,6 @@ void NO_INLINE Aggregator::executeSpecializedCase( { if (i != 0 && key == prev_key) { - AggregateDataPtr value = Method::getAggregateData(it->second); - /// Add values into aggregate functions. AggregateFunctionsList::forEach(AggregateFunctionsUpdater( aggregate_functions, offsets_of_aggregate_states, aggregate_columns, value, i, aggregates_pool)); @@ -163,19 +168,29 @@ void NO_INLINE Aggregator::executeSpecializedCase( prev_key = key; } - method.data.emplace(key, it, inserted); + if constexpr (Method::low_cardinality_optimization) + aggregate_data = state.emplaceKeyFromRow(method.data, i, inserted, params.keys_size, keys, *aggregates_pool); + else + { + method.data.emplace(key, it, inserted); + aggregate_data = &Method::getAggregateData(it->second); + } } else { /// Add only if the key already exists. - inserted = false; - it = method.data.find(key); - if (method.data.end() == it) - overflow = true; + if constexpr (Method::low_cardinality_optimization) + aggregate_data = state.findFromRow(method.data, i); + else + { + it = method.data.find(key); + if (method.data.end() != it) + aggregate_data = &Method::getAggregateData(it->second); + } } /// If the key does not fit, and the data does not need to be aggregated in a separate row, then there's nothing to do. - if (no_more_keys && overflow && !overflow_row) + if (!aggregate_data && !overflow_row) { method.onExistingKey(key, keys, *aggregates_pool); continue; @@ -184,22 +199,25 @@ void NO_INLINE Aggregator::executeSpecializedCase( /// If a new key is inserted, initialize the states of the aggregate functions, and possibly some stuff related to the key. if (inserted) { - AggregateDataPtr & aggregate_data = Method::getAggregateData(it->second); - aggregate_data = nullptr; + *aggregate_data = nullptr; - method.onNewKey(*it, params.keys_size, keys, *aggregates_pool); + if constexpr (!Method::low_cardinality_optimization) + method.onNewKey(*it, params.keys_size, keys, *aggregates_pool); AggregateDataPtr place = aggregates_pool->alignedAlloc(total_size_of_aggregate_states, align_aggregate_states); AggregateFunctionsList::forEach(AggregateFunctionsCreator( aggregate_functions, offsets_of_aggregate_states, place)); - aggregate_data = place; + *aggregate_data = place; + + if constexpr (Method::low_cardinality_optimization) + state.cacheAggregateData(i, place); } else method.onExistingKey(key, keys, *aggregates_pool); - AggregateDataPtr value = (!no_more_keys || !overflow) ? Method::getAggregateData(it->second) : overflow_row; + value = aggregate_data ? *aggregate_data : overflow_row; /// Add values into the aggregate functions. AggregateFunctionsList::forEach(AggregateFunctionsUpdater( diff --git a/dbms/src/Interpreters/evaluateConstantExpression.cpp b/dbms/src/Interpreters/evaluateConstantExpression.cpp index 769f45f9c31..29753a4c637 100644 --- a/dbms/src/Interpreters/evaluateConstantExpression.cpp +++ b/dbms/src/Interpreters/evaluateConstantExpression.cpp @@ -1,18 +1,20 @@ -#include +#include + #include #include -#include -#include -#include -#include +#include #include #include -#include -#include +#include #include -#include -#include +#include +#include +#include +#include +#include +#include #include +#include namespace DB @@ -77,4 +79,236 @@ ASTPtr evaluateConstantExpressionOrIdentifierAsLiteral(const ASTPtr & node, cons return evaluateConstantExpressionAsLiteral(node, context); } +namespace +{ + using Conjunction = ColumnsWithTypeAndName; + using Disjunction = std::vector; + + Disjunction analyzeEquals(const ASTIdentifier * identifier, const ASTLiteral * literal, const ExpressionActionsPtr & expr) + { + if (!identifier || !literal) + { + return {}; + } + + for (const auto & name_and_type : expr->getRequiredColumnsWithTypes()) + { + const auto & name = name_and_type.name; + const auto & type = name_and_type.type; + + if (name == identifier->name) + { + ColumnWithTypeAndName column; + // FIXME: what to do if field is not convertable? + column.column = type->createColumnConst(1, convertFieldToType(literal->value, *type)); + column.name = name; + column.type = type; + return {{std::move(column)}}; + } + } + + return {}; + } + + Disjunction andDNF(const Disjunction & left, const Disjunction & right) + { + if (left.empty()) + { + return right; + } + + Disjunction result; + + for (const auto & conjunct1 : left) + { + for (const auto & conjunct2 : right) + { + Conjunction new_conjunct{conjunct1}; + new_conjunct.insert(new_conjunct.end(), conjunct2.begin(), conjunct2.end()); + result.emplace_back(new_conjunct); + } + } + + return result; + } + + Disjunction analyzeFunction(const ASTFunction * fn, const ExpressionActionsPtr & expr) + { + if (!fn) + { + return {}; + } + + // TODO: enumerate all possible function names! + + if (fn->name == "equals") + { + const auto * left = fn->arguments->children.front().get(); + const auto * right = fn->arguments->children.back().get(); + const auto * identifier = typeid_cast(left) ? typeid_cast(left) + : typeid_cast(right); + const auto * literal = typeid_cast(left) ? typeid_cast(left) + : typeid_cast(right); + + return analyzeEquals(identifier, literal, expr); + } + else if (fn->name == "in") + { + const auto * left = fn->arguments->children.front().get(); + const auto * right = fn->arguments->children.back().get(); + const auto * identifier = typeid_cast(left); + const auto * inner_fn = typeid_cast(right); + + if (!inner_fn) + { + return {}; + } + + const auto * tuple = typeid_cast(inner_fn->children.front().get()); + + if (!tuple) + { + return {}; + } + + Disjunction result; + + for (const auto & child : tuple->children) + { + const auto * literal = typeid_cast(child.get()); + const auto dnf = analyzeEquals(identifier, literal, expr); + + if (dnf.empty()) + { + return {}; + } + + result.insert(result.end(), dnf.begin(), dnf.end()); + } + + return result; + } + else if (fn->name == "or") + { + const auto * args = typeid_cast(fn->children.front().get()); + + if (!args) + { + return {}; + } + + Disjunction result; + + for (const auto & arg : args->children) + { + const auto dnf = analyzeFunction(typeid_cast(arg.get()), expr); + + if (dnf.empty()) + { + return {}; + } + + result.insert(result.end(), dnf.begin(), dnf.end()); + } + + return result; + } + else if (fn->name == "and") + { + const auto * args = typeid_cast(fn->children.front().get()); + + if (!args) + { + return {}; + } + + Disjunction result; + + for (const auto & arg : args->children) + { + const auto dnf = analyzeFunction(typeid_cast(arg.get()), expr); + + if (dnf.empty()) + { + continue; + } + + result = andDNF(result, dnf); + } + + return result; + } + + return {}; + } +} + +std::optional evaluateExpressionOverConstantCondition(const ASTPtr & node, const ExpressionActionsPtr & target_expr) +{ + Blocks result; + + // TODO: `node` may be always-false literal. + + if (const auto fn = typeid_cast(node.get())) + { + const auto dnf = analyzeFunction(fn, target_expr); + + if (dnf.empty()) + { + return {}; + } + + auto hasRequiredColumns = [&target_expr](const Block & block) -> bool + { + for (const auto & name : target_expr->getRequiredColumns()) + { + bool hasColumn = false; + for (const auto & column_name : block.getNames()) + { + if (column_name == name) + { + hasColumn = true; + break; + } + } + + if (!hasColumn) + return false; + } + + return true; + }; + + for (const auto & conjunct : dnf) + { + Block block(conjunct); + + // Block should contain all required columns from `target_expr` + if (!hasRequiredColumns(block)) + { + return {}; + } + + target_expr->execute(block); + + if (block.rows() == 1) + { + result.push_back(block); + } + else if (block.rows() == 0) + { + // filter out cases like "WHERE a = 1 AND a = 2" + continue; + } + else + { + // FIXME: shouldn't happen + return {}; + } + } + } + + return {result}; +} + } diff --git a/dbms/src/Interpreters/evaluateConstantExpression.h b/dbms/src/Interpreters/evaluateConstantExpression.h index c35b7177622..a901612040b 100644 --- a/dbms/src/Interpreters/evaluateConstantExpression.h +++ b/dbms/src/Interpreters/evaluateConstantExpression.h @@ -1,17 +1,22 @@ #pragma once -#include +#include #include #include #include +#include +#include + namespace DB { class Context; +class ExpressionActions; class IDataType; +using ExpressionActionsPtr = std::shared_ptr; /** Evaluate constant expression and its type. * Used in rare cases - for elements of set for IN, for data to INSERT. @@ -20,17 +25,24 @@ class IDataType; std::pair> evaluateConstantExpression(const ASTPtr & node, const Context & context); -/** Evaluate constant expression - * and returns ASTLiteral with its value. +/** Evaluate constant expression and returns ASTLiteral with its value. */ ASTPtr evaluateConstantExpressionAsLiteral(const ASTPtr & node, const Context & context); -/** Evaluate constant expression - * and returns ASTLiteral with its value. +/** Evaluate constant expression and returns ASTLiteral with its value. * Also, if AST is identifier, then return string literal with its name. * Useful in places where some name may be specified as identifier, or as result of a constant expression. */ ASTPtr evaluateConstantExpressionOrIdentifierAsLiteral(const ASTPtr & node, const Context & context); +/** Try to fold condition to countable set of constant values. + * @param condition a condition that we try to fold. + * @param target_expr expression evaluated over a set of constants. + * @return optional blocks each with a single row and a single column for target expression, + * or empty blocks if condition is always false, + * or nothing if condition can't be folded to a set of constants. + */ +std::optional evaluateExpressionOverConstantCondition(const ASTPtr & condition, const ExpressionActionsPtr & target_expr); + } diff --git a/dbms/src/Parsers/ASTAlterQuery.cpp b/dbms/src/Parsers/ASTAlterQuery.cpp index 3577346df0f..feec84d6e98 100644 --- a/dbms/src/Parsers/ASTAlterQuery.cpp +++ b/dbms/src/Parsers/ASTAlterQuery.cpp @@ -25,11 +25,6 @@ ASTPtr ASTAlterCommand::clone() const res->column = column->clone(); res->children.push_back(res->column); } - if (primary_key) - { - res->primary_key = primary_key->clone(); - res->children.push_back(res->primary_key); - } if (order_by) { res->order_by = order_by->clone(); @@ -82,11 +77,6 @@ void ASTAlterCommand::formatImpl( settings.ostr << (settings.hilite ? hilite_keyword : "") << indent_str << "MODIFY COLUMN " << (settings.hilite ? hilite_none : ""); col_decl->formatImpl(settings, state, frame); } - else if (type == ASTAlterCommand::MODIFY_PRIMARY_KEY) - { - settings.ostr << (settings.hilite ? hilite_keyword : "") << indent_str << "MODIFY PRIMARY KEY " << (settings.hilite ? hilite_none : ""); - primary_key->formatImpl(settings, state, frame); - } else if (type == ASTAlterCommand::MODIFY_ORDER_BY) { settings.ostr << (settings.hilite ? hilite_keyword : "") << indent_str << "MODIFY ORDER BY " << (settings.hilite ? hilite_none : ""); diff --git a/dbms/src/Parsers/ASTAlterQuery.h b/dbms/src/Parsers/ASTAlterQuery.h index b73e1f38e2c..e1cd74cb4b5 100644 --- a/dbms/src/Parsers/ASTAlterQuery.h +++ b/dbms/src/Parsers/ASTAlterQuery.h @@ -26,7 +26,6 @@ public: DROP_COLUMN, MODIFY_COLUMN, COMMENT_COLUMN, - MODIFY_PRIMARY_KEY, MODIFY_ORDER_BY, DROP_PARTITION, @@ -55,10 +54,6 @@ public: */ ASTPtr column; - /** For MODIFY PRIMARY KEY - */ - ASTPtr primary_key; - /** For MODIFY ORDER BY */ ASTPtr order_by; diff --git a/dbms/src/Parsers/CommonParsers.h b/dbms/src/Parsers/CommonParsers.h index 414f4ceccbc..44c8ab17fb7 100644 --- a/dbms/src/Parsers/CommonParsers.h +++ b/dbms/src/Parsers/CommonParsers.h @@ -46,4 +46,98 @@ protected: } }; +class ParserInterval: public IParserBase +{ +public: + enum class IntervalKind + { + Incorrect, + Second, + Minute, + Hour, + Day, + Week, + Month, + Quarter, + Year + }; + + IntervalKind interval_kind; + + ParserInterval() : interval_kind(IntervalKind::Incorrect) {} + + const char * getToIntervalKindFunctionName() + { + switch (interval_kind) + { + case ParserInterval::IntervalKind::Second: + return "toIntervalSecond"; + case ParserInterval::IntervalKind::Minute: + return "toIntervalMinute"; + case ParserInterval::IntervalKind::Hour: + return "toIntervalHour"; + case ParserInterval::IntervalKind::Day: + return "toIntervalDay"; + case ParserInterval::IntervalKind::Week: + return "toIntervalWeek"; + case ParserInterval::IntervalKind::Month: + return "toIntervalMonth"; + case ParserInterval::IntervalKind::Quarter: + return "toIntervalQuarter"; + case ParserInterval::IntervalKind::Year: + return "toIntervalYear"; + default: + return nullptr; + } + } + +protected: + const char * getName() const override { return "interval"; } + + bool parseImpl(Pos & pos, ASTPtr & /*node*/, Expected & expected) override + { + if (ParserKeyword("SECOND").ignore(pos, expected) || ParserKeyword("SQL_TSI_SECOND").ignore(pos, expected) + || ParserKeyword("SS").ignore(pos, expected) || ParserKeyword("S").ignore(pos, expected)) + interval_kind = IntervalKind::Second; + else if ( + ParserKeyword("MINUTE").ignore(pos, expected) || ParserKeyword("SQL_TSI_MINUTE").ignore(pos, expected) + || ParserKeyword("MI").ignore(pos, expected) || ParserKeyword("N").ignore(pos, expected)) + interval_kind = IntervalKind::Minute; + else if ( + ParserKeyword("HOUR").ignore(pos, expected) || ParserKeyword("SQL_TSI_HOUR").ignore(pos, expected) + || ParserKeyword("HH").ignore(pos, expected)) + interval_kind = IntervalKind::Hour; + else if ( + ParserKeyword("DAY").ignore(pos, expected) || ParserKeyword("SQL_TSI_DAY").ignore(pos, expected) + || ParserKeyword("DD").ignore(pos, expected) || ParserKeyword("D").ignore(pos, expected)) + interval_kind = IntervalKind::Day; + else if ( + ParserKeyword("WEEK").ignore(pos, expected) || ParserKeyword("SQL_TSI_WEEK").ignore(pos, expected) + || ParserKeyword("WK").ignore(pos, expected) || ParserKeyword("WW").ignore(pos, expected)) + interval_kind = IntervalKind::Week; + else if ( + ParserKeyword("MONTH").ignore(pos, expected) || ParserKeyword("SQL_TSI_MONTH").ignore(pos, expected) + || ParserKeyword("MM").ignore(pos, expected) || ParserKeyword("M").ignore(pos, expected)) + interval_kind = IntervalKind::Month; + else if ( + ParserKeyword("QUARTER").ignore(pos, expected) || ParserKeyword("SQL_TSI_QUARTER").ignore(pos, expected) + || ParserKeyword("QQ").ignore(pos, expected) || ParserKeyword("Q").ignore(pos, expected)) + interval_kind = IntervalKind::Quarter; + else if ( + ParserKeyword("YEAR").ignore(pos, expected) || ParserKeyword("SQL_TSI_YEAR").ignore(pos, expected) + || ParserKeyword("YYYY").ignore(pos, expected) || ParserKeyword("YY").ignore(pos, expected)) + interval_kind = IntervalKind::Year; + else + interval_kind = IntervalKind::Incorrect; + + if (interval_kind == IntervalKind::Incorrect) + { + expected.add(pos, "YEAR, QUARTER, MONTH, WEEK, DAY, HOUR, MINUTE or SECOND"); + return false; + } + /// one of ParserKeyword already made ++pos + return true; + } +}; + } diff --git a/dbms/src/Parsers/ExpressionElementParsers.cpp b/dbms/src/Parsers/ExpressionElementParsers.cpp index 0912d2a5b7b..d01dfd1b7a8 100644 --- a/dbms/src/Parsers/ExpressionElementParsers.cpp +++ b/dbms/src/Parsers/ExpressionElementParsers.cpp @@ -388,6 +388,255 @@ bool ParserSubstringExpression::parseImpl(Pos & pos, ASTPtr & node, Expected & e return true; } +bool ParserTrimExpression::parseImpl(Pos & pos, ASTPtr & node, Expected & expected) +{ + /// Handles all possible TRIM/LTRIM/RTRIM call variants + + std::string func_name; + bool trim_left = false; + bool trim_right = false; + bool char_override = false; + ASTPtr expr_node; + ASTPtr pattern_node; + ASTPtr to_remove; + + if (ParserKeyword("LTRIM").ignore(pos, expected)) + { + if (pos->type != TokenType::OpeningRoundBracket) + return false; + ++pos; + trim_left = true; + } + else if (ParserKeyword("RTRIM").ignore(pos, expected)) + { + if (pos->type != TokenType::OpeningRoundBracket) + return false; + ++pos; + trim_right = true; + } + else if (ParserKeyword("TRIM").ignore(pos, expected)) + { + if (pos->type != TokenType::OpeningRoundBracket) + return false; + ++pos; + + if (ParserKeyword("BOTH").ignore(pos, expected)) + { + trim_left = true; + trim_right = true; + char_override = true; + } + else if (ParserKeyword("LEADING").ignore(pos, expected)) + { + trim_left = true; + char_override = true; + } + else if (ParserKeyword("TRAILING").ignore(pos, expected)) + { + trim_right = true; + char_override = true; + } + else + { + trim_left = true; + trim_right = true; + } + + if (char_override) + { + if (!ParserExpression().parse(pos, to_remove, expected)) + return false; + if (!ParserKeyword("FROM").ignore(pos, expected)) + return false; + + auto quote_meta_func_node = std::make_shared(); + auto quote_meta_list_args = std::make_shared(); + quote_meta_list_args->children = {to_remove}; + + quote_meta_func_node->name = "regexpQuoteMeta"; + quote_meta_func_node->arguments = std::move(quote_meta_list_args); + quote_meta_func_node->children.push_back(quote_meta_func_node->arguments); + + to_remove = std::move(quote_meta_func_node); + } + } + + if (!(trim_left || trim_right)) + return false; + + if (!ParserExpression().parse(pos, expr_node, expected)) + return false; + + if (pos->type != TokenType::ClosingRoundBracket) + return false; + ++pos; + + /// Convert to regexp replace function call + + if (char_override) + { + auto pattern_func_node = std::make_shared(); + auto pattern_list_args = std::make_shared(); + if (trim_left && trim_right) + { + pattern_list_args->children = { + std::make_shared("^["), + to_remove, + std::make_shared("]*|["), + to_remove, + std::make_shared("]*$") + }; + func_name = "replaceRegexpAll"; + } + else + { + if (trim_left) + { + pattern_list_args->children = { + std::make_shared("^["), + to_remove, + std::make_shared("]*") + }; + } + else + { + /// trim_right == false not possible + pattern_list_args->children = { + std::make_shared("["), + to_remove, + std::make_shared("]*$") + }; + } + func_name = "replaceRegexpOne"; + } + + pattern_func_node->name = "concat"; + pattern_func_node->arguments = std::move(pattern_list_args); + pattern_func_node->children.push_back(pattern_func_node->arguments); + + pattern_node = std::move(pattern_func_node); + } + else + { + if (trim_left && trim_right) + { + func_name = "trimBoth"; + } + else + { + if (trim_left) + { + func_name = "trimLeft"; + } + else + { + /// trim_right == false not possible + func_name = "trimRight"; + } + } + } + + auto expr_list_args = std::make_shared(); + if (char_override) + expr_list_args->children = {expr_node, pattern_node, std::make_shared("")}; + else + expr_list_args->children = {expr_node}; + + auto func_node = std::make_shared(); + func_node->name = func_name; + func_node->arguments = std::move(expr_list_args); + func_node->children.push_back(func_node->arguments); + + node = std::move(func_node); + return true; +} + +bool ParserLeftExpression::parseImpl(Pos & pos, ASTPtr & node, Expected & expected) +{ + /// Rewrites left(expr, length) to SUBSTRING(expr, 1, length) + + ASTPtr expr_node; + ASTPtr start_node; + ASTPtr length_node; + + if (!ParserKeyword("LEFT").ignore(pos, expected)) + return false; + + if (pos->type != TokenType::OpeningRoundBracket) + return false; + ++pos; + + if (!ParserExpression().parse(pos, expr_node, expected)) + return false; + + ParserToken(TokenType::Comma).ignore(pos, expected); + + if (!ParserExpression().parse(pos, length_node, expected)) + return false; + + if (pos->type != TokenType::ClosingRoundBracket) + return false; + ++pos; + + auto expr_list_args = std::make_shared(); + start_node = std::make_shared(1); + expr_list_args->children = {expr_node, start_node, length_node}; + + auto func_node = std::make_shared(); + func_node->name = "substring"; + func_node->arguments = std::move(expr_list_args); + func_node->children.push_back(func_node->arguments); + + node = std::move(func_node); + return true; +} + +bool ParserRightExpression::parseImpl(Pos & pos, ASTPtr & node, Expected & expected) +{ + /// Rewrites RIGHT(expr, length) to substring(expr, -length) + + ASTPtr expr_node; + ASTPtr length_node; + + if (!ParserKeyword("RIGHT").ignore(pos, expected)) + return false; + + if (pos->type != TokenType::OpeningRoundBracket) + return false; + ++pos; + + if (!ParserExpression().parse(pos, expr_node, expected)) + return false; + + ParserToken(TokenType::Comma).ignore(pos, expected); + + if (!ParserExpression().parse(pos, length_node, expected)) + return false; + + if (pos->type != TokenType::ClosingRoundBracket) + return false; + ++pos; + + auto start_expr_list_args = std::make_shared(); + start_expr_list_args->children = {length_node}; + + auto start_node = std::make_shared(); + start_node->name = "negate"; + start_node->arguments = std::move(start_expr_list_args); + start_node->children.push_back(start_node->arguments); + + auto expr_list_args = std::make_shared(); + expr_list_args->children = {expr_node, start_node}; + + auto func_node = std::make_shared(); + func_node->name = "substring"; + func_node->arguments = std::move(expr_list_args); + func_node->children.push_back(func_node->arguments); + + node = std::move(func_node); + return true; +} + bool ParserExtractExpression::parseImpl(Pos & pos, ASTPtr & node, Expected & expected) { auto begin = pos; @@ -402,26 +651,42 @@ bool ParserExtractExpression::parseImpl(Pos & pos, ASTPtr & node, Expected & exp ASTPtr expr; const char * function_name = nullptr; - if (ParserKeyword("SECOND").ignore(pos, expected)) - function_name = "toSecond"; - else if (ParserKeyword("MINUTE").ignore(pos, expected)) - function_name = "toMinute"; - else if (ParserKeyword("HOUR").ignore(pos, expected)) - function_name = "toHour"; - else if (ParserKeyword("DAY").ignore(pos, expected)) - function_name = "toDayOfMonth"; - - // TODO: SELECT toRelativeWeekNum(toDate('2017-06-15')) - toRelativeWeekNum(toStartOfYear(toDate('2017-06-15'))) - // else if (ParserKeyword("WEEK").ignore(pos, expected)) - // function_name = "toRelativeWeekNum"; - - else if (ParserKeyword("MONTH").ignore(pos, expected)) - function_name = "toMonth"; - else if (ParserKeyword("YEAR").ignore(pos, expected)) - function_name = "toYear"; - else + ParserInterval interval_parser; + if (!interval_parser.ignore(pos, expected)) return false; + switch (interval_parser.interval_kind) + { + case ParserInterval::IntervalKind::Second: + function_name = "toSecond"; + break; + case ParserInterval::IntervalKind::Minute: + function_name = "toMinute"; + break; + case ParserInterval::IntervalKind::Hour: + function_name = "toHour"; + break; + case ParserInterval::IntervalKind::Day: + function_name = "toDayOfMonth"; + break; + case ParserInterval::IntervalKind::Week: + // TODO: SELECT toRelativeWeekNum(toDate('2017-06-15')) - toRelativeWeekNum(toStartOfYear(toDate('2017-06-15'))) + // else if (ParserKeyword("WEEK").ignore(pos, expected)) + // function_name = "toRelativeWeekNum"; + return false; + case ParserInterval::IntervalKind::Month: + function_name = "toMonth"; + break; + case ParserInterval::IntervalKind::Quarter: + function_name = "toQuarter"; + break; + case ParserInterval::IntervalKind::Year: + function_name = "toYear"; + break; + default: + return false; + } + ParserKeyword s_from("FROM"); if (!s_from.ignore(pos, expected)) return false; @@ -449,6 +714,168 @@ bool ParserExtractExpression::parseImpl(Pos & pos, ASTPtr & node, Expected & exp return true; } +bool ParserDateAddExpression::parseImpl(Pos & pos, ASTPtr & node, Expected & expected) +{ + const char * function_name = nullptr; + ASTPtr timestamp_node; + ASTPtr offset_node; + + if (ParserKeyword("DATEADD").ignore(pos, expected) || ParserKeyword("DATE_ADD").ignore(pos, expected) + || ParserKeyword("TIMESTAMPADD").ignore(pos, expected) || ParserKeyword("TIMESTAMP_ADD").ignore(pos, expected)) + function_name = "plus"; + else if (ParserKeyword("DATESUB").ignore(pos, expected) || ParserKeyword("DATE_SUB").ignore(pos, expected) + || ParserKeyword("TIMESTAMPSUB").ignore(pos, expected) || ParserKeyword("TIMESTAMP_SUB").ignore(pos, expected)) + function_name = "minus"; + else + return false; + + if (pos->type != TokenType::OpeningRoundBracket) + return false; + ++pos; + + ParserInterval interval_parser; + if (interval_parser.ignore(pos, expected)) + { + /// function(unit, offset, timestamp) + if (pos->type != TokenType::Comma) + return false; + ++pos; + + if (!ParserExpression().parse(pos, offset_node, expected)) + return false; + + if (pos->type != TokenType::Comma) + return false; + ++pos; + + if (!ParserExpression().parse(pos, timestamp_node, expected)) + return false; + } + else + { + /// function(timestamp, INTERVAL offset unit) + if (!ParserExpression().parse(pos, timestamp_node, expected)) + return false; + + if (pos->type != TokenType::Comma) + return false; + ++pos; + + if (!ParserKeyword("INTERVAL").ignore(pos, expected)) + return false; + + if (!ParserExpression().parse(pos, offset_node, expected)) + return false; + + interval_parser.ignore(pos, expected); + + } + if (pos->type != TokenType::ClosingRoundBracket) + return false; + ++pos; + + const char * interval_function_name = interval_parser.getToIntervalKindFunctionName(); + + auto interval_expr_list_args = std::make_shared(); + interval_expr_list_args->children = {offset_node}; + + auto interval_func_node = std::make_shared(); + interval_func_node->name = interval_function_name; + interval_func_node->arguments = std::move(interval_expr_list_args); + interval_func_node->children.push_back(interval_func_node->arguments); + + auto expr_list_args = std::make_shared(); + expr_list_args->children = {timestamp_node, interval_func_node}; + + auto func_node = std::make_shared(); + func_node->name = function_name; + func_node->arguments = std::move(expr_list_args); + func_node->children.push_back(func_node->arguments); + + node = std::move(func_node); + + return true; +} + +bool ParserDateDiffExpression::parseImpl(Pos & pos, ASTPtr & node, Expected & expected) +{ + const char * interval_name = nullptr; + ASTPtr left_node; + ASTPtr right_node; + + if (!(ParserKeyword("DATEDIFF").ignore(pos, expected) || ParserKeyword("DATE_DIFF").ignore(pos, expected) + || ParserKeyword("TIMESTAMPDIFF").ignore(pos, expected) || ParserKeyword("TIMESTAMP_DIFF").ignore(pos, expected))) + return false; + + if (pos->type != TokenType::OpeningRoundBracket) + return false; + ++pos; + + ParserInterval interval_parser; + if (!interval_parser.ignore(pos, expected)) + return false; + + switch (interval_parser.interval_kind) + { + case ParserInterval::IntervalKind::Second: + interval_name = "second"; + break; + case ParserInterval::IntervalKind::Minute: + interval_name = "minute"; + break; + case ParserInterval::IntervalKind::Hour: + interval_name = "hour"; + break; + case ParserInterval::IntervalKind::Day: + interval_name = "day"; + break; + case ParserInterval::IntervalKind::Week: + interval_name = "week"; + break; + case ParserInterval::IntervalKind::Month: + interval_name = "month"; + break; + case ParserInterval::IntervalKind::Quarter: + interval_name = "quarter"; + break; + case ParserInterval::IntervalKind::Year: + interval_name = "year"; + break; + default: + return false; + } + + if (pos->type != TokenType::Comma) + return false; + ++pos; + + if (!ParserExpression().parse(pos, left_node, expected)) + return false; + + if (pos->type != TokenType::Comma) + return false; + ++pos; + + if (!ParserExpression().parse(pos, right_node, expected)) + return false; + + if (pos->type != TokenType::ClosingRoundBracket) + return false; + ++pos; + + auto expr_list_args = std::make_shared(); + expr_list_args->children = {std::make_shared(interval_name), left_node, right_node}; + + auto func_node = std::make_shared(); + func_node->name = "dateDiff"; + func_node->arguments = std::move(expr_list_args); + func_node->children.push_back(func_node->arguments); + + node = std::move(func_node); + + return true; +} + bool ParserNull::parseImpl(Pos & pos, ASTPtr & node, Expected & expected) { @@ -750,7 +1177,12 @@ bool ParserExpressionElement::parseImpl(Pos & pos, ASTPtr & node, Expected & exp || ParserLiteral().parse(pos, node, expected) || ParserCastExpression().parse(pos, node, expected) || ParserExtractExpression().parse(pos, node, expected) + || ParserDateAddExpression().parse(pos, node, expected) + || ParserDateDiffExpression().parse(pos, node, expected) || ParserSubstringExpression().parse(pos, node, expected) + || ParserTrimExpression().parse(pos, node, expected) + || ParserLeftExpression().parse(pos, node, expected) + || ParserRightExpression().parse(pos, node, expected) || ParserCase().parse(pos, node, expected) || ParserFunction().parse(pos, node, expected) || ParserQualifiedAsterisk().parse(pos, node, expected) diff --git a/dbms/src/Parsers/ExpressionElementParsers.h b/dbms/src/Parsers/ExpressionElementParsers.h index a52864d97d1..c6afbd171e4 100644 --- a/dbms/src/Parsers/ExpressionElementParsers.h +++ b/dbms/src/Parsers/ExpressionElementParsers.h @@ -103,6 +103,27 @@ protected: bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override; }; +class ParserTrimExpression : public IParserBase +{ +protected: + const char * getName() const override { return "TRIM expression"; } + bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override; +}; + +class ParserLeftExpression : public IParserBase +{ +protected: + const char * getName() const override { return "LEFT expression"; } + bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override; +}; + +class ParserRightExpression : public IParserBase +{ +protected: + const char * getName() const override { return "RIGHT expression"; } + bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override; +}; + class ParserExtractExpression : public IParserBase { protected: @@ -110,6 +131,19 @@ protected: bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override; }; +class ParserDateAddExpression : public IParserBase +{ +protected: + const char * getName() const override { return "DATE_ADD expression"; } + bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override; +}; + +class ParserDateDiffExpression : public IParserBase +{ +protected: + const char * getName() const override { return "DATE_DIFF expression"; } + bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override; +}; /** NULL literal. */ diff --git a/dbms/src/Parsers/ExpressionListParsers.cpp b/dbms/src/Parsers/ExpressionListParsers.cpp index ef75267cffe..de6fc2dc129 100644 --- a/dbms/src/Parsers/ExpressionListParsers.cpp +++ b/dbms/src/Parsers/ExpressionListParsers.cpp @@ -607,25 +607,13 @@ bool ParserIntervalOperatorExpression::parseImpl(Pos & pos, ASTPtr & node, Expec if (!ParserExpressionWithOptionalAlias(false).parse(pos, expr, expected)) return false; - const char * function_name = nullptr; - if (ParserKeyword("SECOND").ignore(pos, expected)) - function_name = "toIntervalSecond"; - else if (ParserKeyword("MINUTE").ignore(pos, expected)) - function_name = "toIntervalMinute"; - else if (ParserKeyword("HOUR").ignore(pos, expected)) - function_name = "toIntervalHour"; - else if (ParserKeyword("DAY").ignore(pos, expected)) - function_name = "toIntervalDay"; - else if (ParserKeyword("WEEK").ignore(pos, expected)) - function_name = "toIntervalWeek"; - else if (ParserKeyword("MONTH").ignore(pos, expected)) - function_name = "toIntervalMonth"; - else if (ParserKeyword("YEAR").ignore(pos, expected)) - function_name = "toIntervalYear"; - else + ParserInterval interval_parser; + if (!interval_parser.ignore(pos, expected)) return false; + const char * function_name = interval_parser.getToIntervalKindFunctionName(); + /// the function corresponding to the operator auto function = std::make_shared(); diff --git a/dbms/src/Parsers/ParserAlterQuery.cpp b/dbms/src/Parsers/ParserAlterQuery.cpp index 859c9b6af51..83ad42ebbcb 100644 --- a/dbms/src/Parsers/ParserAlterQuery.cpp +++ b/dbms/src/Parsers/ParserAlterQuery.cpp @@ -24,7 +24,6 @@ bool ParserAlterCommand::parseImpl(Pos & pos, ASTPtr & node, Expected & expected ParserKeyword s_clear_column("CLEAR COLUMN"); ParserKeyword s_modify_column("MODIFY COLUMN"); ParserKeyword s_comment_column("COMMENT COLUMN"); - ParserKeyword s_modify_primary_key("MODIFY PRIMARY KEY"); ParserKeyword s_modify_order_by("MODIFY ORDER BY"); ParserKeyword s_attach_partition("ATTACH PARTITION"); @@ -196,13 +195,6 @@ bool ParserAlterCommand::parseImpl(Pos & pos, ASTPtr & node, Expected & expected command->type = ASTAlterCommand::MODIFY_COLUMN; } - else if (s_modify_primary_key.ignore(pos, expected)) - { - if (!parser_exp_elem.parse(pos, command->primary_key, expected)) - return false; - - command->type = ASTAlterCommand::MODIFY_PRIMARY_KEY; - } else if (s_modify_order_by.ignore(pos, expected)) { if (!parser_exp_elem.parse(pos, command->order_by, expected)) @@ -247,14 +239,16 @@ bool ParserAlterCommand::parseImpl(Pos & pos, ASTPtr & node, Expected & expected command->children.push_back(command->col_decl); if (command->column) command->children.push_back(command->column); - if (command->primary_key) - command->children.push_back(command->primary_key); if (command->partition) command->children.push_back(command->partition); + if (command->order_by) + command->children.push_back(command->order_by); if (command->predicate) command->children.push_back(command->predicate); if (command->update_assignments) command->children.push_back(command->update_assignments); + if (command->comment) + command->children.push_back(command->comment); return true; } diff --git a/dbms/src/Storages/AlterCommands.cpp b/dbms/src/Storages/AlterCommands.cpp index 37ac0ed64a6..ba5ed9e21bd 100644 --- a/dbms/src/Storages/AlterCommands.cpp +++ b/dbms/src/Storages/AlterCommands.cpp @@ -101,13 +101,6 @@ std::optional AlterCommand::parse(const ASTAlterCommand * command_ command.comment = ast_comment.value.get(); return command; } - else if (command_ast->type == ASTAlterCommand::MODIFY_PRIMARY_KEY) - { - AlterCommand command; - command.type = AlterCommand::MODIFY_PRIMARY_KEY; - command.primary_key = command_ast->primary_key; - return command; - } else if (command_ast->type == ASTAlterCommand::MODIFY_ORDER_BY) { AlterCommand command; @@ -271,13 +264,6 @@ void AlterCommand::apply(ColumnsDescription & columns_description, ASTPtr & orde /// both old and new columns have default expression, update it columns_description.defaults[column_name].expression = default_expression; } - else if (type == MODIFY_PRIMARY_KEY) - { - if (!primary_key_ast) - order_by_ast = primary_key; - else - primary_key_ast = primary_key; - } else if (type == MODIFY_ORDER_BY) { if (!primary_key_ast) diff --git a/dbms/src/Storages/AlterCommands.h b/dbms/src/Storages/AlterCommands.h index af606aa84ef..f1adbdaf9b0 100644 --- a/dbms/src/Storages/AlterCommands.h +++ b/dbms/src/Storages/AlterCommands.h @@ -22,7 +22,6 @@ struct AlterCommand DROP_COLUMN, MODIFY_COLUMN, COMMENT_COLUMN, - MODIFY_PRIMARY_KEY, MODIFY_ORDER_BY, UKNOWN_TYPE, }; @@ -44,9 +43,6 @@ struct AlterCommand /// For ADD - after which column to add a new one. If an empty string, add to the end. To add to the beginning now it is impossible. String after_column; - /// For MODIFY_PRIMARY_KEY - ASTPtr primary_key; - /// For MODIFY_ORDER_BY ASTPtr order_by; @@ -73,7 +69,7 @@ class AlterCommands : public std::vector public: void apply(ColumnsDescription & columns_description, ASTPtr & order_by_ast, ASTPtr & primary_key_ast) const; - /// For storages that don't support MODIFY_PRIMARY_KEY or MODIFY_ORDER_BY. + /// For storages that don't support MODIFY_ORDER_BY. void apply(ColumnsDescription & columns_description) const; void validate(const IStorage & table, const Context & context); diff --git a/dbms/src/Storages/MergeTree/KeyCondition.cpp b/dbms/src/Storages/MergeTree/KeyCondition.cpp index 9484bd8c3cc..31a4e08707f 100644 --- a/dbms/src/Storages/MergeTree/KeyCondition.cpp +++ b/dbms/src/Storages/MergeTree/KeyCondition.cpp @@ -313,7 +313,7 @@ bool KeyCondition::addCondition(const String & column, const Range & range) return true; } -/** Computes value of constant expression and it data type. +/** Computes value of constant expression and its data type. * Returns false, if expression isn't constant. */ static bool getConstant(const ASTPtr & expr, Block & block_with_constants, Field & out_value, DataTypePtr & out_type) diff --git a/dbms/src/Storages/MergeTree/KeyCondition.h b/dbms/src/Storages/MergeTree/KeyCondition.h index d025f70bf09..1d700ad80d9 100644 --- a/dbms/src/Storages/MergeTree/KeyCondition.h +++ b/dbms/src/Storages/MergeTree/KeyCondition.h @@ -253,7 +253,7 @@ public: /// Get the maximum number of the key element used in the condition. size_t getMaxKeyColumn() const; - /// Impose an additional condition: the value in the column column must be in the `range` range. + /// Impose an additional condition: the value in the column `column` must be in the range `range`. /// Returns whether there is such a column in the key. bool addCondition(const String & column, const Range & range); diff --git a/dbms/src/Storages/MergeTree/MergeTreeData.cpp b/dbms/src/Storages/MergeTree/MergeTreeData.cpp index fd2d9d9d50d..b65d23f47e1 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeData.cpp +++ b/dbms/src/Storages/MergeTree/MergeTreeData.cpp @@ -1229,7 +1229,6 @@ void MergeTreeData::createConvertExpression(const DataPartPtr & part, const Name MergeTreeData::AlterDataPartTransactionPtr MergeTreeData::alterDataPart( const DataPartPtr & part, const NamesAndTypesList & new_columns, - const ASTPtr & new_primary_key_expr_list, bool skip_sanity_checks) { ExpressionActionsPtr expression; @@ -1290,63 +1289,6 @@ MergeTreeData::AlterDataPartTransactionPtr MergeTreeData::alterDataPart( DataPart::Checksums add_checksums; - /// Update primary key if needed. - size_t new_primary_key_file_size{}; - MergeTreeDataPartChecksum::uint128 new_primary_key_hash{}; - - if (new_primary_key_expr_list) - { - ASTPtr query = new_primary_key_expr_list; - auto syntax_result = SyntaxAnalyzer(context, {}).analyze(query, new_columns); - ExpressionActionsPtr new_primary_expr = ExpressionAnalyzer(query, syntax_result, context).getActions(true); - Block new_primary_key_sample = new_primary_expr->getSampleBlock(); - size_t new_key_size = new_primary_key_sample.columns(); - - Columns new_index(new_key_size); - - /// Copy the existing primary key columns. Fill new columns with default values. - /// NOTE default expressions are not supported. - - ssize_t prev_position_of_existing_column = -1; - for (size_t i = 0; i < new_key_size; ++i) - { - const String & column_name = new_primary_key_sample.safeGetByPosition(i).name; - - if (primary_key_sample.has(column_name)) - { - ssize_t position_of_existing_column = primary_key_sample.getPositionByName(column_name); - - if (position_of_existing_column < prev_position_of_existing_column) - throw Exception("Permuting of columns of primary key is not supported", ErrorCodes::BAD_ARGUMENTS); - - new_index[i] = part->index.at(position_of_existing_column); - prev_position_of_existing_column = position_of_existing_column; - } - else - { - const IDataType & type = *new_primary_key_sample.safeGetByPosition(i).type; - new_index[i] = type.createColumnConstWithDefaultValue(part->marks_count)->convertToFullColumnIfConst(); - } - } - - if (prev_position_of_existing_column == -1) - throw Exception("No common columns while modifying primary key", ErrorCodes::BAD_ARGUMENTS); - - String index_tmp_path = full_path + part->name + "/primary.idx.tmp"; - WriteBufferFromFile index_file(index_tmp_path); - HashingWriteBuffer index_stream(index_file); - - for (size_t i = 0, marks_count = part->marks_count; i < marks_count; ++i) - for (size_t j = 0; j < new_key_size; ++j) - new_primary_key_sample.getByPosition(j).type->serializeBinary(*new_index[j].get(), i, index_stream); - - transaction->rename_map["primary.idx.tmp"] = "primary.idx"; - - index_stream.next(); - new_primary_key_file_size = index_stream.count(); - new_primary_key_hash = index_stream.getHash(); - } - if (transaction->rename_map.empty() && !force_update_metadata) { transaction->clear(); @@ -1395,12 +1337,6 @@ MergeTreeData::AlterDataPartTransactionPtr MergeTreeData::alterDataPart( new_checksums.files[it.second] = add_checksums.files[it.first]; } - if (new_primary_key_file_size) - { - new_checksums.files["primary.idx"].file_size = new_primary_key_file_size; - new_checksums.files["primary.idx"].file_hash = new_primary_key_hash; - } - /// Write the checksums to the temporary file. if (!part->checksums.empty()) { diff --git a/dbms/src/Storages/MergeTree/MergeTreeData.h b/dbms/src/Storages/MergeTree/MergeTreeData.h index 4670f8b9560..b8f01c40077 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeData.h +++ b/dbms/src/Storages/MergeTree/MergeTreeData.h @@ -479,13 +479,11 @@ public: /// Performs ALTER of the data part, writes the result to temporary files. /// Returns an object allowing to rename temporary files to permanent files. - /// If new_primary_key_expr_list is not nullptr, will prepare the new primary.idx file. /// If the number of affected columns is suspiciously high and skip_sanity_checks is false, throws an exception. /// If no data transformations are necessary, returns nullptr. AlterDataPartTransactionPtr alterDataPart( const DataPartPtr & part, const NamesAndTypesList & new_columns, - const ASTPtr & new_primary_key_expr_list, bool skip_sanity_checks); /// Freezes all parts. diff --git a/dbms/src/Storages/MergeTree/ReplicatedMergeTreeAlterThread.cpp b/dbms/src/Storages/MergeTree/ReplicatedMergeTreeAlterThread.cpp index 6f30e27bc4f..e7fe9dba256 100644 --- a/dbms/src/Storages/MergeTree/ReplicatedMergeTreeAlterThread.cpp +++ b/dbms/src/Storages/MergeTree/ReplicatedMergeTreeAlterThread.cpp @@ -150,7 +150,7 @@ void ReplicatedMergeTreeAlterThread::run() /// Update the part and write result to temporary files. /// TODO: You can skip checking for too large changes if ZooKeeper has, for example, /// node /flags/force_alter. - auto transaction = storage.data.alterDataPart(part, columns_for_parts, nullptr, false); + auto transaction = storage.data.alterDataPart(part, columns_for_parts, false); if (!transaction) continue; diff --git a/dbms/src/Storages/StorageDistributed.cpp b/dbms/src/Storages/StorageDistributed.cpp index 9feeeb5bcf9..9086d1bf321 100644 --- a/dbms/src/Storages/StorageDistributed.cpp +++ b/dbms/src/Storages/StorageDistributed.cpp @@ -1,38 +1,41 @@ +#include + #include #include #include #include +#include -#include -#include #include +#include #include #include #include #include -#include -#include -#include -#include -#include -#include -#include -#include #include +#include #include +#include +#include +#include +#include +#include +#include +#include -#include +#include +#include +#include #include #include +#include #include -#include +#include #include -#include -#include #include #include @@ -58,6 +61,7 @@ namespace ErrorCodes extern const int INFINITE_LOOP; extern const int TYPE_MISMATCH; extern const int NO_SUCH_COLUMN_IN_TABLE; + extern const int TOO_MANY_ROWS; } @@ -133,6 +137,29 @@ void initializeFileNamesIncrement(const std::string & path, SimpleIncrement & in increment.set(getMaximumFileNumber(path)); } +/// the same as DistributedBlockOutputStream::createSelector, should it be static? +IColumn::Selector createSelector(const ClusterPtr cluster, const ColumnWithTypeAndName & result) +{ + const auto & slot_to_shard = cluster->getSlotToShard(); + +#define CREATE_FOR_TYPE(TYPE) \ + if (typeid_cast(result.type.get())) \ + return createBlockSelector(*result.column, slot_to_shard); + + CREATE_FOR_TYPE(UInt8) + CREATE_FOR_TYPE(UInt16) + CREATE_FOR_TYPE(UInt32) + CREATE_FOR_TYPE(UInt64) + CREATE_FOR_TYPE(Int8) + CREATE_FOR_TYPE(Int16) + CREATE_FOR_TYPE(Int32) + CREATE_FOR_TYPE(Int64) + +#undef CREATE_FOR_TYPE + + throw Exception{"Sharding key expression does not evaluate to an integer type", ErrorCodes::TYPE_MISMATCH}; +} + } @@ -267,6 +294,14 @@ BlockInputStreams StorageDistributed::read( : ClusterProxy::SelectStreamFactory( header, processed_stage, QualifiedTableName{remote_database, remote_table}, context.getExternalTables()); + if (settings.optimize_skip_unused_shards) + { + auto smaller_cluster = skipUnusedShards(cluster, query_info); + + if (smaller_cluster) + cluster = smaller_cluster; + } + return ClusterProxy::executeQuery( select_stream_factory, cluster, modified_query_ast, context, settings); } @@ -425,6 +460,41 @@ void StorageDistributed::ClusterNodeData::shutdownAndDropAllData() directory_monitor->shutdownAndDropAllData(); } +/// Returns a new cluster with fewer shards if constant folding for `sharding_key_expr` is possible +/// using constraints from "WHERE" condition, otherwise returns `nullptr` +ClusterPtr StorageDistributed::skipUnusedShards(ClusterPtr cluster, const SelectQueryInfo & query_info) +{ + const auto & select = typeid_cast(*query_info.query); + + if (!select.where_expression) + { + return nullptr; + } + + const auto & blocks = evaluateExpressionOverConstantCondition(select.where_expression, sharding_key_expr); + + // Can't get definite answer if we can skip any shards + if (!blocks) + { + return nullptr; + } + + std::set shards; + + for (const auto & block : *blocks) + { + if (!block.has(sharding_key_column_name)) + throw Exception("sharding_key_expr should evaluate as a single row", ErrorCodes::TOO_MANY_ROWS); + + const auto result = block.getByName(sharding_key_column_name); + const auto selector = createSelector(cluster, result); + + shards.insert(selector.begin(), selector.end()); + } + + return cluster->getClusterWithMultipleShards({shards.begin(), shards.end()}); +} + void registerStorageDistributed(StorageFactory & factory) { diff --git a/dbms/src/Storages/StorageDistributed.h b/dbms/src/Storages/StorageDistributed.h index 1ae53f5637c..e14d9f7081f 100644 --- a/dbms/src/Storages/StorageDistributed.h +++ b/dbms/src/Storages/StorageDistributed.h @@ -166,6 +166,8 @@ protected: const ASTPtr & sharding_key_, const String & data_path_, bool attach); + + ClusterPtr skipUnusedShards(ClusterPtr cluster, const SelectQueryInfo & query_info); }; } diff --git a/dbms/src/Storages/StorageMergeTree.cpp b/dbms/src/Storages/StorageMergeTree.cpp index f4f69e3ac87..6ee1e7ca9c9 100644 --- a/dbms/src/Storages/StorageMergeTree.cpp +++ b/dbms/src/Storages/StorageMergeTree.cpp @@ -210,24 +210,12 @@ void StorageMergeTree::alter( ASTPtr new_primary_key_ast = data.primary_key_ast; params.apply(new_columns, new_order_by_ast, new_primary_key_ast); - ASTPtr primary_expr_list_for_altering_parts; - for (const AlterCommand & param : params) - { - if (param.type == AlterCommand::MODIFY_PRIMARY_KEY) - { - if (supportsSampling()) - throw Exception("MODIFY PRIMARY KEY only supported for tables without sampling key", ErrorCodes::BAD_ARGUMENTS); - - primary_expr_list_for_altering_parts = MergeTreeData::extractKeyExpressionList(param.primary_key); - } - } - auto parts = data.getDataParts({MergeTreeDataPartState::PreCommitted, MergeTreeDataPartState::Committed, MergeTreeDataPartState::Outdated}); auto columns_for_parts = new_columns.getAllPhysical(); std::vector transactions; for (const MergeTreeData::DataPartPtr & part : parts) { - if (auto transaction = data.alterDataPart(part, columns_for_parts, primary_expr_list_for_altering_parts, false)) + if (auto transaction = data.alterDataPart(part, columns_for_parts, false)) transactions.push_back(std::move(transaction)); } @@ -238,19 +226,7 @@ void StorageMergeTree::alter( auto & storage_ast = typeid_cast(ast); if (new_order_by_ast.get() != data.order_by_ast.get()) - { - if (storage_ast.order_by) - { - /// The table was created using the "new" syntax (with key expressions in separate clauses). - storage_ast.set(storage_ast.order_by, new_order_by_ast); - } - else - { - /// Primary key is in the second place in table engine description and can be represented as a tuple. - /// TODO: Not always in second place. If there is a sampling key, then the third one. Fix it. - storage_ast.engine->arguments->children.at(1) = new_order_by_ast; - } - } + storage_ast.set(storage_ast.order_by, new_order_by_ast); if (new_primary_key_ast.get() != data.primary_key_ast.get()) storage_ast.set(storage_ast.primary_key, new_primary_key_ast); @@ -266,9 +242,6 @@ void StorageMergeTree::alter( /// Columns sizes could be changed data.recalculateColumnSizes(); - - if (primary_expr_list_for_altering_parts) - data.loadDataParts(false); } @@ -725,7 +698,7 @@ void StorageMergeTree::clearColumnInPartition(const ASTPtr & partition, const Fi if (part->info.partition_id != partition_id) throw Exception("Unexpected partition ID " + part->info.partition_id + ". This is a bug.", ErrorCodes::LOGICAL_ERROR); - if (auto transaction = data.alterDataPart(part, columns_for_parts, nullptr, false)) + if (auto transaction = data.alterDataPart(part, columns_for_parts, false)) transactions.push_back(std::move(transaction)); LOG_DEBUG(log, "Removing column " << get(column_name) << " from part " << part->name); diff --git a/dbms/src/Storages/StorageReplicatedMergeTree.cpp b/dbms/src/Storages/StorageReplicatedMergeTree.cpp index 10981823b66..afe8cbc02ab 100644 --- a/dbms/src/Storages/StorageReplicatedMergeTree.cpp +++ b/dbms/src/Storages/StorageReplicatedMergeTree.cpp @@ -1504,7 +1504,7 @@ void StorageReplicatedMergeTree::executeClearColumnInPartition(const LogEntry & LOG_DEBUG(log, "Clearing column " << entry.column_name << " in part " << part->name); - auto transaction = data.alterDataPart(part, columns_for_parts, nullptr, false); + auto transaction = data.alterDataPart(part, columns_for_parts, false); if (!transaction) continue; @@ -3059,12 +3059,6 @@ void StorageReplicatedMergeTree::alter(const AlterCommands & params, data.checkAlter(params); - for (const AlterCommand & param : params) - { - if (param.type == AlterCommand::MODIFY_PRIMARY_KEY) - throw Exception("Modification of primary key is not supported for replicated tables", ErrorCodes::NOT_IMPLEMENTED); - } - ColumnsDescription new_columns = data.getColumns(); ASTPtr new_order_by_ast = data.order_by_ast; ASTPtr new_primary_key_ast = data.primary_key_ast; diff --git a/dbms/src/Storages/StorageStripeLog.cpp b/dbms/src/Storages/StorageStripeLog.cpp index 018001db8dc..b86731f6f3f 100644 --- a/dbms/src/Storages/StorageStripeLog.cpp +++ b/dbms/src/Storages/StorageStripeLog.cpp @@ -139,7 +139,7 @@ public: data_out(data_out_compressed, CompressionSettings(CompressionMethod::LZ4), storage.max_compress_block_size), index_out_compressed(storage.full_path() + "index.mrk", INDEX_BUFFER_SIZE, O_WRONLY | O_APPEND | O_CREAT), index_out(index_out_compressed), - block_out(data_out, 0, storage.getSampleBlock(), &index_out, Poco::File(storage.full_path() + "data.bin").getSize()) + block_out(data_out, 0, storage.getSampleBlock(), false, &index_out, Poco::File(storage.full_path() + "data.bin").getSize()) { } diff --git a/dbms/tests/performance/no_data/bounding_ratio.xml b/dbms/tests/performance/no_data/bounding_ratio.xml new file mode 100644 index 00000000000..269a7e21e51 --- /dev/null +++ b/dbms/tests/performance/no_data/bounding_ratio.xml @@ -0,0 +1,19 @@ + + bounding_ratio + once + + + + + 1000 + 10000 + + + + + + + + SELECT boundingRatio(number, number) FROM system.numbers + SELECT (argMax(number, number) - argMin(number, number)) / (max(number) - min(number)) FROM system.numbers + diff --git a/dbms/tests/performance/right/right.xml b/dbms/tests/performance/right/right.xml new file mode 100644 index 00000000000..7622210133f --- /dev/null +++ b/dbms/tests/performance/right/right.xml @@ -0,0 +1,34 @@ + + right + loop + + + hits_100m_single + + + + + 10000 + + + 5000 + 20000 + + + + + + + + + + func + + right(URL, 16) + substring(URL, greatest(minus(plus(length(URL), 1), 16), 1)) + + + + + SELECT count() FROM hits_100m_single WHERE NOT ignore({func}) + diff --git a/dbms/tests/performance/trim/trim_numbers.xml b/dbms/tests/performance/trim/trim_numbers.xml new file mode 100644 index 00000000000..07587c024ac --- /dev/null +++ b/dbms/tests/performance/trim/trim_numbers.xml @@ -0,0 +1,34 @@ + + trim_numbers + loop + + + + 10000 + + + 5000 + 20000 + + + + + + + + + + func + + trim( + ltrim( + rtrim( + trim(LEADING '012345' FROM + trim(TRAILING '012345' FROM + trim(BOTH '012345' FROM + + + + + SELECT count() FROM numbers(10000000) WHERE NOT ignore({func}toString(number))) + diff --git a/dbms/tests/performance/trim/trim_urls.xml b/dbms/tests/performance/trim/trim_urls.xml new file mode 100644 index 00000000000..3687068f086 --- /dev/null +++ b/dbms/tests/performance/trim/trim_urls.xml @@ -0,0 +1,38 @@ + + trim_urls + loop + + + hits_100m_single + + + + + 10000 + + + 5000 + 20000 + + + + + + + + + + func + + trim( + ltrim( + rtrim( + trim(LEADING 'htpsw:/' FROM + trim(TRAILING '/' FROM + trim(BOTH 'htpsw:/' FROM + + + + + SELECT count() FROM hits_100m_single WHERE NOT ignore({func}URL)) + diff --git a/dbms/tests/performance/trim/trim_whitespace.xml b/dbms/tests/performance/trim/trim_whitespace.xml new file mode 100644 index 00000000000..d7fc5d967a6 --- /dev/null +++ b/dbms/tests/performance/trim/trim_whitespace.xml @@ -0,0 +1,35 @@ + + trim_whitespaces + loop + + + whitespaces + + + + + 30000 + + + + + + + + + + func + + value + trimLeft(value) + trimRight(value) + trimBoth(value) + replaceRegexpOne(value, '^ *', '') + replaceRegexpOne(value, ' *$', '') + replaceRegexpAll(value, '^ *| *$', '') + + + + + SELECT count() FROM whitespaces WHERE NOT ignore({func}) + diff --git a/dbms/tests/performance/trim/whitespaces.sql b/dbms/tests/performance/trim/whitespaces.sql new file mode 100644 index 00000000000..653bd2e7a5a --- /dev/null +++ b/dbms/tests/performance/trim/whitespaces.sql @@ -0,0 +1,17 @@ +CREATE TABLE whitespaces +( + value String +) +ENGINE = MergeTree() +PARTITION BY tuple() +ORDER BY tuple() + +INSERT INTO whitespaces SELECT value +FROM +( + SELECT + arrayStringConcat(groupArray(' ')) AS spaces, + concat(spaces, toString(any(number)), spaces) AS value + FROM numbers(100000000) + GROUP BY pow(number, intHash32(number) % 4) % 12345678 +) -- repeat something like this multiple times and/or just copy whitespaces table into itself diff --git a/dbms/tests/queries/0_stateless/00329_alter_primary_key.reference b/dbms/tests/queries/0_stateless/00329_alter_primary_key.reference deleted file mode 100644 index 2b32543ce55..00000000000 --- a/dbms/tests/queries/0_stateless/00329_alter_primary_key.reference +++ /dev/null @@ -1,130 +0,0 @@ -1 -2 -3 -2 -3 -1 -2 -3 -2 -3 -1 -2 -3 -2 -3 -2 -3 -1 -1 Hello -2 -2 World -3 -3 abc -4 def -2 -2 World -3 -3 abc -4 def -2 World -3 abc -4 def -2 -2 World -3 -3 abc -4 def -1 -2 -3 -1 -1 Hello -2 -2 World -3 -3 abc -4 def -2 -2 World -3 -3 abc -4 def -2 World -3 abc -4 def -2 -2 World -3 -3 abc -4 def -1 -2 -3 -1 -1 Hello -2 -2 World -3 -3 abc -4 def -1 -1 Hello -2 -2 World -3 -3 abc -4 def -1 -1 Hello -2 -2 World -3 -3 abc -4 def -2 -2 World -3 -3 abc -4 def -2 World -3 abc -4 def -2 -2 World -3 -3 abc -4 def -1 -2 -3 -1 -1 Hello -2 -2 World -3 -3 abc -4 def -2 -2 World -3 -3 abc -4 def -2 World -3 abc -4 def -2 -2 World -3 -3 abc -4 def -1 -2 -3 -*** Check table creation statement *** -CREATE TABLE test.pk2 ( x UInt32, y UInt32, z UInt32) ENGINE = MergeTree PRIMARY KEY (x, y) ORDER BY (x, y, z) SETTINGS index_granularity = 8192 -*** Check that the inserted values were correctly sorted *** -100 20 1 -100 20 2 -100 30 1 -100 30 2 diff --git a/dbms/tests/queries/0_stateless/00329_alter_primary_key.sql b/dbms/tests/queries/0_stateless/00329_alter_primary_key.sql deleted file mode 100644 index 0d0ad6d2f96..00000000000 --- a/dbms/tests/queries/0_stateless/00329_alter_primary_key.sql +++ /dev/null @@ -1,83 +0,0 @@ -SET send_logs_level = 'none'; - -DROP TABLE IF EXISTS test.pk; -CREATE TABLE test.pk (d Date DEFAULT '2000-01-01', x UInt64) ENGINE = MergeTree(d, x, 1); - -INSERT INTO test.pk (x) VALUES (1), (2), (3); - -SELECT x FROM test.pk ORDER BY x; -SELECT x FROM test.pk WHERE x >= 2 ORDER BY x; - -ALTER TABLE test.pk MODIFY PRIMARY KEY (x); - -SELECT x FROM test.pk ORDER BY x; -SELECT x FROM test.pk WHERE x >= 2 ORDER BY x; - -ALTER TABLE test.pk ADD COLUMN y String, MODIFY PRIMARY KEY (x, y); - -SELECT x, y FROM test.pk ORDER BY x, y; -SELECT x, y FROM test.pk WHERE x >= 2 ORDER BY x, y; -SELECT x, y FROM test.pk WHERE x >= 2 AND y = '' ORDER BY x, y; - -INSERT INTO test.pk (x, y) VALUES (1, 'Hello'), (2, 'World'), (3, 'abc'), (4, 'def'); - -SELECT x, y FROM test.pk ORDER BY x, y; -SELECT x, y FROM test.pk WHERE x >= 2 ORDER BY x, y; -SELECT x, y FROM test.pk WHERE x >= 2 AND y > '' ORDER BY x, y; -SELECT x, y FROM test.pk WHERE x >= 2 AND y >= '' ORDER BY x, y; -SELECT x, y FROM test.pk WHERE x > 2 AND y > 'z' ORDER BY x, y; -SELECT x, y FROM test.pk WHERE y < 'A' ORDER BY x, y; - -DETACH TABLE test.pk; -ATTACH TABLE test.pk (d Date DEFAULT '2000-01-01', x UInt64, y String) ENGINE = MergeTree(d, (x, y), 1); - -SELECT x, y FROM test.pk ORDER BY x, y; -SELECT x, y FROM test.pk WHERE x >= 2 ORDER BY x, y; -SELECT x, y FROM test.pk WHERE x >= 2 AND y > '' ORDER BY x, y; -SELECT x, y FROM test.pk WHERE x >= 2 AND y >= '' ORDER BY x, y; -SELECT x, y FROM test.pk WHERE x > 2 AND y > 'z' ORDER BY x, y; -SELECT x, y FROM test.pk WHERE y < 'A' ORDER BY x, y; - -SET max_rows_to_read = 3; -SELECT x, y FROM test.pk WHERE x > 2 AND y > 'z' ORDER BY x, y; -SET max_rows_to_read = 0; - -OPTIMIZE TABLE test.pk; -SELECT x, y FROM test.pk; -SELECT x, y FROM test.pk ORDER BY x, y; - -ALTER TABLE test.pk MODIFY PRIMARY KEY (x); - -SELECT x, y FROM test.pk ORDER BY x, y; -SELECT x, y FROM test.pk WHERE x >= 2 ORDER BY x, y; -SELECT x, y FROM test.pk WHERE x >= 2 AND y > '' ORDER BY x, y; -SELECT x, y FROM test.pk WHERE x >= 2 AND y >= '' ORDER BY x, y; -SELECT x, y FROM test.pk WHERE x > 2 AND y > 'z' ORDER BY x, y; -SELECT x, y FROM test.pk WHERE y < 'A' ORDER BY x, y; - -DETACH TABLE test.pk; -ATTACH TABLE test.pk (d Date DEFAULT '2000-01-01', x UInt64, y String) ENGINE = MergeTree(d, (x), 1); - -SELECT x, y FROM test.pk ORDER BY x, y; -SELECT x, y FROM test.pk WHERE x >= 2 ORDER BY x, y; -SELECT x, y FROM test.pk WHERE x >= 2 AND y > '' ORDER BY x, y; -SELECT x, y FROM test.pk WHERE x >= 2 AND y >= '' ORDER BY x, y; -SELECT x, y FROM test.pk WHERE x > 2 AND y > 'z' ORDER BY x, y; -SELECT x, y FROM test.pk WHERE y < 'A' ORDER BY x, y; - -DROP TABLE test.pk; - -DROP TABLE IF EXISTS test.pk2; -CREATE TABLE test.pk2 (x UInt32) ENGINE MergeTree ORDER BY x; - -ALTER TABLE test.pk2 ADD COLUMN y UInt32, ADD COLUMN z UInt32, MODIFY ORDER BY (x, y, z); -ALTER TABLE test.pk2 MODIFY PRIMARY KEY (y); -- { serverError 36 } -ALTER TABLE test.pk2 MODIFY PRIMARY KEY (x, y); -SELECT '*** Check table creation statement ***'; -SHOW CREATE TABLE test.pk2; - -INSERT INTO test.pk2 VALUES (100, 30, 2), (100, 30, 1), (100, 20, 2), (100, 20, 1); -SELECT '*** Check that the inserted values were correctly sorted ***'; -SELECT * FROM test.pk2; - -DROP TABLE test.pk2; diff --git a/dbms/tests/queries/0_stateless/00514_interval_operators.reference b/dbms/tests/queries/0_stateless/00514_interval_operators.reference index 8af8f56eb87..43238eecb3d 100644 --- a/dbms/tests/queries/0_stateless/00514_interval_operators.reference +++ b/dbms/tests/queries/0_stateless/00514_interval_operators.reference @@ -36,3 +36,4 @@ 2029-02-28 01:02:03 2017-03-29 01:02:03 2030-02-28 01:02:03 2017-04-29 01:02:03 2031-02-28 01:02:03 2017-05-29 01:02:03 +2015-11-29 01:02:03 diff --git a/dbms/tests/queries/0_stateless/00514_interval_operators.sql b/dbms/tests/queries/0_stateless/00514_interval_operators.sql index 9dc2f67322b..a4b6c983abf 100644 --- a/dbms/tests/queries/0_stateless/00514_interval_operators.sql +++ b/dbms/tests/queries/0_stateless/00514_interval_operators.sql @@ -2,3 +2,4 @@ SELECT toDateTime('2017-10-30 08:18:19') + INTERVAL 1 DAY + INTERVAL 1 MONTH - I SELECT toDateTime('2017-10-30 08:18:19') + INTERVAL 1 HOUR + INTERVAL 1000 MINUTE + INTERVAL 10 SECOND; SELECT toDateTime('2017-10-30 08:18:19') + INTERVAL 1 DAY + INTERVAL number MONTH FROM system.numbers LIMIT 20; SELECT toDateTime('2016-02-29 01:02:03') + INTERVAL number YEAR, toDateTime('2016-02-29 01:02:03') + INTERVAL number MONTH FROM system.numbers LIMIT 16; +SELECT toDateTime('2016-02-29 01:02:03') - INTERVAL 1 QUARTER; diff --git a/dbms/tests/queries/0_stateless/00619_extract.sql b/dbms/tests/queries/0_stateless/00619_extract.sql index 78ec812dad6..034ae55b5e3 100644 --- a/dbms/tests/queries/0_stateless/00619_extract.sql +++ b/dbms/tests/queries/0_stateless/00619_extract.sql @@ -13,7 +13,7 @@ SELECT EXTRACT(year FROM toDateTime('2017-12-31 18:59:58')); DROP TABLE IF EXISTS test.Orders; CREATE TABLE test.Orders (OrderId UInt64, OrderName String, OrderDate DateTime) engine = Log; insert into test.Orders values (1, 'Jarlsberg Cheese', toDateTime('2008-10-11 13:23:44')); -SELECT EXTRACT(YEAR FROM OrderDate) AS OrderYear, EXTRACT(MONTH FROM OrderDate) AS OrderMonth, EXTRACT(DAY FROM OrderDate) AS OrderDay, +SELECT EXTRACT(YYYY FROM OrderDate) AS OrderYear, EXTRACT(MONTH FROM OrderDate) AS OrderMonth, EXTRACT(DAY FROM OrderDate) AS OrderDay, EXTRACT(HOUR FROM OrderDate), EXTRACT(MINUTE FROM OrderDate), EXTRACT(SECOND FROM OrderDate) FROM test.Orders WHERE OrderId=1; DROP TABLE test.Orders; diff --git a/dbms/tests/queries/0_stateless/00653_monotonic_integer_cast.sql b/dbms/tests/queries/0_stateless/00653_monotonic_integer_cast.sql index 99025e59b89..29a44a4aa22 100644 --- a/dbms/tests/queries/0_stateless/00653_monotonic_integer_cast.sql +++ b/dbms/tests/queries/0_stateless/00653_monotonic_integer_cast.sql @@ -2,4 +2,3 @@ drop table if exists test.table; create table test.table (val Int32) engine = MergeTree order by val; insert into test.table values (-2), (0), (2); select count() from test.table where toUInt64(val) == 0; - diff --git a/dbms/tests/queries/0_stateless/00653_verification_monotonic_data_load.reference b/dbms/tests/queries/0_stateless/00653_verification_monotonic_data_load.reference new file mode 100644 index 00000000000..8900af059b8 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00653_verification_monotonic_data_load.reference @@ -0,0 +1,26 @@ +no monotonic int case: String -> UInt64 +no monotonic int case: FixedString -> UInt64 +monotonic int case: Int32 -> Int64 +monotonic int case: Int32 -> UInt64 +monotonic int case: Int32 -> Int32 +monotonic int case: Int32 -> UInt32 +monotonic int case: Int32 -> Int16 +monotonic int case: Int32 -> UInt16 +monotonic int case: UInt32 -> Int64 +monotonic int case: UInt32 -> UInt64 +monotonic int case: UInt32 -> Int32 +monotonic int case: UInt32 -> UInt32 +monotonic int case: UInt32 -> Int16 +monotonic int case: UInt32 -> UInt16 +monotonic int case: Enum16 -> Int32 +monotonic int case: Enum16 -> UInt32 +monotonic int case: Enum16 -> Int16 +monotonic int case: Enum16 -> UInt16 +monotonic int case: Enum16 -> Int8 +monotonic int case: Enum16 -> UInt8 +monotonic int case: Date -> Int32 +monotonic int case: Date -> UInt32 +monotonic int case: Date -> Int16 +monotonic int case: Date -> UInt16 +monotonic int case: Date -> Int8 +monotonic int case: Date -> UInt8 diff --git a/dbms/tests/queries/0_stateless/00653_verification_monotonic_data_load.sh b/dbms/tests/queries/0_stateless/00653_verification_monotonic_data_load.sh new file mode 100755 index 00000000000..325f19dc9ec --- /dev/null +++ b/dbms/tests/queries/0_stateless/00653_verification_monotonic_data_load.sh @@ -0,0 +1,85 @@ +#!/usr/bin/env bash + +#-------------------------------------------- +# Description of test result: +# Test the correctness of the optimization +# by asserting read marks in the log. +# Relation of read marks and optimization: +# read marks = +# the number of monotonic marks filtered through predicates +# + no monotonic marks count +#-------------------------------------------- + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +. $CURDIR/../shell_config.sh + +${CLICKHOUSE_CLIENT} --query="SYSTEM STOP MERGES;" + +${CLICKHOUSE_CLIENT} --query="DROP TABLE IF EXISTS test.string_test_table;" +${CLICKHOUSE_CLIENT} --query="DROP TABLE IF EXISTS test.fixed_string_test_table;" +${CLICKHOUSE_CLIENT} --query="DROP TABLE IF EXISTS test.signed_integer_test_table;" +${CLICKHOUSE_CLIENT} --query="DROP TABLE IF EXISTS test.unsigned_integer_test_table;" +${CLICKHOUSE_CLIENT} --query="DROP TABLE IF EXISTS test.enum_test_table;" +${CLICKHOUSE_CLIENT} --query="DROP TABLE IF EXISTS test.date_test_table;" + +${CLICKHOUSE_CLIENT} --query="CREATE TABLE test.string_test_table (val String) ENGINE = MergeTree ORDER BY val SETTINGS index_granularity = 1;" +${CLICKHOUSE_CLIENT} --query="CREATE TABLE test.fixed_string_test_table (val FixedString(1)) ENGINE = MergeTree ORDER BY val SETTINGS index_granularity = 1;" +${CLICKHOUSE_CLIENT} --query="CREATE TABLE test.signed_integer_test_table (val Int32) ENGINE = MergeTree ORDER BY val SETTINGS index_granularity = 1;" +${CLICKHOUSE_CLIENT} --query="CREATE TABLE test.unsigned_integer_test_table (val UInt32) ENGINE = MergeTree ORDER BY val SETTINGS index_granularity = 1;" +${CLICKHOUSE_CLIENT} --query="CREATE TABLE test.enum_test_table (val Enum16('hello' = 1, 'world' = 2, 'yandex' = 256, 'clickhouse' = 257)) ENGINE = MergeTree ORDER BY val SETTINGS index_granularity = 1;" +${CLICKHOUSE_CLIENT} --query="CREATE TABLE test.date_test_table (val Date) ENGINE = MergeTree ORDER BY val SETTINGS index_granularity = 1;" + + +${CLICKHOUSE_CLIENT} --query="INSERT INTO test.string_test_table VALUES ('0'), ('2'), ('2');" +${CLICKHOUSE_CLIENT} --query="INSERT INTO test.fixed_string_test_table VALUES ('0'), ('2'), ('2');" +# 131072 -> 17 bit is 1 +${CLICKHOUSE_CLIENT} --query="INSERT INTO test.signed_integer_test_table VALUES (-2), (0), (2), (2), (131072), (131073), (131073);" +${CLICKHOUSE_CLIENT} --query="INSERT INTO test.unsigned_integer_test_table VALUES (0), (2), (2), (131072), (131073), (131073);" +${CLICKHOUSE_CLIENT} --query="INSERT INTO test.enum_test_table VALUES ('hello'), ('world'), ('world'), ('yandex'), ('clickhouse'), ('clickhouse');" +${CLICKHOUSE_CLIENT} --query="INSERT INTO test.date_test_table VALUES (1), (2), (2), (256), (257), (257);" + +export CLICKHOUSE_CLIENT=`echo ${CLICKHOUSE_CLIENT} |sed 's/'"${CLICKHOUSE_CLIENT_SERVER_LOGS_LEVEL}"'/debug/g'` + +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.string_test_table WHERE toUInt64(val) == 0;" 2>&1 |grep -q "3 marks to read from 1 ranges" && echo "no monotonic int case: String -> UInt64" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.fixed_string_test_table WHERE toUInt64(val) == 0;" 2>&1 |grep -q "3 marks to read from 1 ranges" && echo "no monotonic int case: FixedString -> UInt64" + +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.signed_integer_test_table WHERE toInt64(val) == 0;" 2>&1 |grep -q "2 marks to read from" && echo "monotonic int case: Int32 -> Int64" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.signed_integer_test_table WHERE toUInt64(val) == 0;" 2>&1 |grep -q "2 marks to read from" && echo "monotonic int case: Int32 -> UInt64" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.signed_integer_test_table WHERE toInt32(val) == 0;" 2>&1 |grep -q "2 marks to read from" && echo "monotonic int case: Int32 -> Int32" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.signed_integer_test_table WHERE toUInt32(val) == 0;" 2>&1 |grep -q "2 marks to read from" && echo "monotonic int case: Int32 -> UInt32" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.signed_integer_test_table WHERE toInt16(val) == 0;" 2>&1 |grep -q "5 marks to read from" && echo "monotonic int case: Int32 -> Int16" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.signed_integer_test_table WHERE toUInt16(val) == 0;" 2>&1 |grep -q "5 marks to read from" && echo "monotonic int case: Int32 -> UInt16" + +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.unsigned_integer_test_table WHERE toInt64(val) == 0;" 2>&1 |grep -q "1 marks to read from" && echo "monotonic int case: UInt32 -> Int64" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.unsigned_integer_test_table WHERE toUInt64(val) == 0;" 2>&1 |grep -q "1 marks to read from" && echo "monotonic int case: UInt32 -> UInt64" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.unsigned_integer_test_table WHERE toInt32(val) == 0;" 2>&1 |grep -q "1 marks to read from" && echo "monotonic int case: UInt32 -> Int32" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.unsigned_integer_test_table WHERE toUInt32(val) == 0;" 2>&1 |grep -q "1 marks to read from" && echo "monotonic int case: UInt32 -> UInt32" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.unsigned_integer_test_table WHERE toInt16(val) == 0;" 2>&1 |grep -q "4 marks to read from" && echo "monotonic int case: UInt32 -> Int16" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.unsigned_integer_test_table WHERE toUInt16(val) == 0;" 2>&1 |grep -q "4 marks to read from" && echo "monotonic int case: UInt32 -> UInt16" + + +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.enum_test_table WHERE toInt32(val) == 1;" 2>&1 |grep -q "1 marks to read from" && echo "monotonic int case: Enum16 -> Int32" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.enum_test_table WHERE toUInt32(val) == 1;" 2>&1 |grep -q "1 marks to read from" && echo "monotonic int case: Enum16 -> UInt32" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.enum_test_table WHERE toInt16(val) == 1;" 2>&1 |grep -q "1 marks to read from" && echo "monotonic int case: Enum16 -> Int16" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.enum_test_table WHERE toUInt16(val) == 1;" 2>&1 |grep -q "1 marks to read from" && echo "monotonic int case: Enum16 -> UInt16" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.enum_test_table WHERE toInt8(val) == 1;" 2>&1 |grep -q "5 marks to read from" && echo "monotonic int case: Enum16 -> Int8" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.enum_test_table WHERE toUInt8(val) == 1;" 2>&1 |grep -q "5 marks to read from" && echo "monotonic int case: Enum16 -> UInt8" + + +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.date_test_table WHERE toInt32(val) == 1;" 2>&1 |grep -q "1 marks to read from" && echo "monotonic int case: Date -> Int32" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.date_test_table WHERE toUInt32(val) == 1;" 2>&1 |grep -q "1 marks to read from" && echo "monotonic int case: Date -> UInt32" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.date_test_table WHERE toInt16(val) == 1;" 2>&1 |grep -q "1 marks to read from" && echo "monotonic int case: Date -> Int16" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.date_test_table WHERE toUInt16(val) == 1;" 2>&1 |grep -q "1 marks to read from" && echo "monotonic int case: Date -> UInt16" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.date_test_table WHERE toInt8(val) == 1;" 2>&1 |grep -q "5 marks to read from" && echo "monotonic int case: Date -> Int8" +${CLICKHOUSE_CLIENT} --query="SELECT count() FROM test.date_test_table WHERE toUInt8(val) == 1;" 2>&1 |grep -q "5 marks to read from" && echo "monotonic int case: Date -> UInt8" + +export CLICKHOUSE_CLIENT=`echo ${CLICKHOUSE_CLIENT} |sed 's/debug/'"${CLICKHOUSE_CLIENT_SERVER_LOGS_LEVEL}"'/g'` + +${CLICKHOUSE_CLIENT} --query="DROP TABLE IF EXISTS test.string_test_table;" +${CLICKHOUSE_CLIENT} --query="DROP TABLE IF EXISTS test.fixed_string_test_table;" +${CLICKHOUSE_CLIENT} --query="DROP TABLE IF EXISTS test.signed_integer_test_table;" +${CLICKHOUSE_CLIENT} --query="DROP TABLE IF EXISTS test.unsigned_integer_test_table;" +${CLICKHOUSE_CLIENT} --query="DROP TABLE IF EXISTS test.enum_test_table;" +${CLICKHOUSE_CLIENT} --query="DROP TABLE IF EXISTS test.date_test_table;" + +${CLICKHOUSE_CLIENT} --query="SYSTEM START MERGES;" diff --git a/dbms/tests/queries/0_stateless/00715_bounding_ratio.reference b/dbms/tests/queries/0_stateless/00715_bounding_ratio.reference new file mode 100644 index 00000000000..f1e96af83a9 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00715_bounding_ratio.reference @@ -0,0 +1,21 @@ +1 +1 +1.5 +1.5 +1.5 +0 1.5 +1 1.5 +2 1.5 +3 1.5 +4 1.5 +5 1.5 +6 1.5 +7 1.5 +8 1.5 +9 1.5 + +0 1.5 +1.5 +nan +nan +1 diff --git a/dbms/tests/queries/0_stateless/00715_bounding_ratio.sql b/dbms/tests/queries/0_stateless/00715_bounding_ratio.sql new file mode 100644 index 00000000000..ff3cd4c606b --- /dev/null +++ b/dbms/tests/queries/0_stateless/00715_bounding_ratio.sql @@ -0,0 +1,26 @@ +drop table if exists rate_test; + +create table rate_test (timestamp UInt32, event UInt32) engine=Memory; +insert into rate_test values (0,1000),(1,1001),(2,1002),(3,1003),(4,1004),(5,1005),(6,1006),(7,1007),(8,1008); + +select 1.0 = boundingRatio(timestamp, event) from rate_test; + +drop table if exists rate_test2; +create table rate_test2 (uid UInt32 default 1,timestamp DateTime, event UInt32) engine=Memory; +insert into rate_test2(timestamp, event) values ('2018-01-01 01:01:01',1001),('2018-01-01 01:01:02',1002),('2018-01-01 01:01:03',1003),('2018-01-01 01:01:04',1004),('2018-01-01 01:01:05',1005),('2018-01-01 01:01:06',1006),('2018-01-01 01:01:07',1007),('2018-01-01 01:01:08',1008); + +select 1.0 = boundingRatio(timestamp, event) from rate_test2; + +drop table rate_test; +drop table rate_test2; + + +SELECT boundingRatio(number, number * 1.5) FROM numbers(10); +SELECT boundingRatio(1000 + number, number * 1.5) FROM numbers(10); +SELECT boundingRatio(1000 + number, number * 1.5 - 111) FROM numbers(10); +SELECT number % 10 AS k, boundingRatio(1000 + number, number * 1.5 - 111) FROM numbers(100) GROUP BY k WITH TOTALS ORDER BY k; + +SELECT boundingRatio(1000 + number, number * 1.5 - 111) FROM numbers(2); +SELECT boundingRatio(1000 + number, number * 1.5 - 111) FROM numbers(1); +SELECT boundingRatio(1000 + number, number * 1.5 - 111) FROM numbers(1) WHERE 0; +SELECT boundingRatio(number, exp(number)) = e() - 1 FROM numbers(2); diff --git a/dbms/tests/queries/0_stateless/000732_base64_functions.reference b/dbms/tests/queries/0_stateless/00732_base64_functions.reference similarity index 100% rename from dbms/tests/queries/0_stateless/000732_base64_functions.reference rename to dbms/tests/queries/0_stateless/00732_base64_functions.reference diff --git a/dbms/tests/queries/0_stateless/000732_base64_functions.sql b/dbms/tests/queries/0_stateless/00732_base64_functions.sql similarity index 100% rename from dbms/tests/queries/0_stateless/000732_base64_functions.sql rename to dbms/tests/queries/0_stateless/00732_base64_functions.sql diff --git a/dbms/tests/queries/0_stateless/00754_distributed_optimize_skip_select_on_unused_shards.reference b/dbms/tests/queries/0_stateless/00754_distributed_optimize_skip_select_on_unused_shards.reference new file mode 100644 index 00000000000..add8c239ade --- /dev/null +++ b/dbms/tests/queries/0_stateless/00754_distributed_optimize_skip_select_on_unused_shards.reference @@ -0,0 +1,16 @@ +OK +OK +1 +OK +0 +4 +2 +1 +1 +1 +4 +OK +OK +OK +OK +OK diff --git a/dbms/tests/queries/0_stateless/00754_distributed_optimize_skip_select_on_unused_shards.sh b/dbms/tests/queries/0_stateless/00754_distributed_optimize_skip_select_on_unused_shards.sh new file mode 100755 index 00000000000..6adcec4a14e --- /dev/null +++ b/dbms/tests/queries/0_stateless/00754_distributed_optimize_skip_select_on_unused_shards.sh @@ -0,0 +1,105 @@ +#!/usr/bin/env bash + +CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd) +. $CURDIR/../shell_config.sh + +${CLICKHOUSE_CLIENT} --query "DROP TABLE IF EXISTS test.mergetree;" +${CLICKHOUSE_CLIENT} --query "DROP TABLE IF EXISTS test.distributed;" + +${CLICKHOUSE_CLIENT} --query "CREATE TABLE test.mergetree (a Int64, b Int64, c Int64) ENGINE = MergeTree ORDER BY (a, b);" +${CLICKHOUSE_CLIENT} --query "CREATE TABLE test.distributed AS test.mergetree ENGINE = Distributed(test_unavailable_shard, test, mergetree, jumpConsistentHash(a+b, 2));" + +${CLICKHOUSE_CLIENT} --query "INSERT INTO test.mergetree VALUES (0, 0, 0);" +${CLICKHOUSE_CLIENT} --query "INSERT INTO test.mergetree VALUES (1, 0, 0);" +${CLICKHOUSE_CLIENT} --query "INSERT INTO test.mergetree VALUES (0, 1, 1);" +${CLICKHOUSE_CLIENT} --query "INSERT INTO test.mergetree VALUES (1, 1, 1);" + +# Should fail because second shard is unavailable +${CLICKHOUSE_CLIENT} --query "SELECT count(*) FROM test.distributed;" 2>&1 \ +| fgrep -q "All connection tries failed" && echo 'OK' || echo 'FAIL' + +# Should fail without setting `optimize_skip_unused_shards` +${CLICKHOUSE_CLIENT} --query "SELECT count(*) FROM test.distributed WHERE a = 0 AND b = 0;" 2>&1 \ +| fgrep -q "All connection tries failed" && echo 'OK' || echo 'FAIL' + +# Should pass now +${CLICKHOUSE_CLIENT} -n --query=" + SET optimize_skip_unused_shards = 1; + SELECT count(*) FROM test.distributed WHERE a = 0 AND b = 0; +" + +# Should still fail because of matching unavailable shard +${CLICKHOUSE_CLIENT} -n --query=" + SET optimize_skip_unused_shards = 1; + SELECT count(*) FROM test.distributed WHERE a = 2 AND b = 2; +" 2>&1 \ | fgrep -q "All connection tries failed" && echo 'OK' || echo 'FAIL' + +# Try more complext expressions for constant folding - all should pass. + +${CLICKHOUSE_CLIENT} -n --query=" + SET optimize_skip_unused_shards = 1; + SELECT count(*) FROM test.distributed WHERE a = 1 AND a = 0 AND b = 0; +" + +${CLICKHOUSE_CLIENT} -n --query=" + SET optimize_skip_unused_shards = 1; + SELECT count(*) FROM test.distributed WHERE a IN (0, 1) AND b IN (0, 1); +" + +${CLICKHOUSE_CLIENT} -n --query=" + SET optimize_skip_unused_shards = 1; + SELECT count(*) FROM test.distributed WHERE a = 0 AND b = 0 OR a = 1 AND b = 1; +" + +# TODO: should pass one day. +#${CLICKHOUSE_CLIENT} -n --query=" +# SET optimize_skip_unused_shards = 1; +# SELECT count(*) FROM test.distributed WHERE a = 0 AND b >= 0 AND b <= 1; +#" + +${CLICKHOUSE_CLIENT} -n --query=" + SET optimize_skip_unused_shards = 1; + SELECT count(*) FROM test.distributed WHERE a = 0 AND b = 0 AND c = 0; +" + +${CLICKHOUSE_CLIENT} -n --query=" + SET optimize_skip_unused_shards = 1; + SELECT count(*) FROM test.distributed WHERE a = 0 AND b = 0 AND c != 10; +" + +${CLICKHOUSE_CLIENT} -n --query=" + SET optimize_skip_unused_shards = 1; + SELECT count(*) FROM test.distributed WHERE a = 0 AND b = 0 AND (a+b)*b != 12; +" + +${CLICKHOUSE_CLIENT} -n --query=" + SET optimize_skip_unused_shards = 1; + SELECT count(*) FROM test.distributed WHERE (a = 0 OR a = 1) AND (b = 0 OR b = 1); +" + +# These ones should fail. + +${CLICKHOUSE_CLIENT} -n --query=" + SET optimize_skip_unused_shards = 1; + SELECT count(*) FROM test.distributed WHERE a = 0 AND b <= 1; +" 2>&1 \ | fgrep -q "All connection tries failed" && echo 'OK' || echo 'FAIL' + +${CLICKHOUSE_CLIENT} -n --query=" + SET optimize_skip_unused_shards = 1; + SELECT count(*) FROM test.distributed WHERE a = 0 AND c = 0; +" 2>&1 \ | fgrep -q "All connection tries failed" && echo 'OK' || echo 'FAIL' + +${CLICKHOUSE_CLIENT} -n --query=" + SET optimize_skip_unused_shards = 1; + SELECT count(*) FROM test.distributed WHERE a = 0 OR a = 1 AND b = 0; +" 2>&1 \ | fgrep -q "All connection tries failed" && echo 'OK' || echo 'FAIL' + +${CLICKHOUSE_CLIENT} -n --query=" + SET optimize_skip_unused_shards = 1; + SELECT count(*) FROM test.distributed WHERE a = 0 AND b = 0 OR a = 2 AND b = 2; +" 2>&1 \ | fgrep -q "All connection tries failed" && echo 'OK' || echo 'FAIL' + +${CLICKHOUSE_CLIENT} -n --query=" + SET optimize_skip_unused_shards = 1; + SELECT count(*) FROM test.distributed WHERE a = 0 AND b = 0 OR c = 0; +" 2>&1 \ | fgrep -q "All connection tries failed" && echo 'OK' || echo 'FAIL' diff --git a/dbms/tests/queries/0_stateless/00765_sql_compatibility_aliases.reference b/dbms/tests/queries/0_stateless/00765_sql_compatibility_aliases.reference index 7a70e443c1b..6a2a0523476 100644 --- a/dbms/tests/queries/0_stateless/00765_sql_compatibility_aliases.reference +++ b/dbms/tests/queries/0_stateless/00765_sql_compatibility_aliases.reference @@ -10,3 +10,19 @@ o 1 oo o +fo +foo +r +bar + +foo + foo +xxfoo +fooabba +fooabbafoo +foo* +-11 +-3 +2021-01-01 +2018-07-18 01:02:03 +2018-04-01 diff --git a/dbms/tests/queries/0_stateless/00765_sql_compatibility_aliases.sql b/dbms/tests/queries/0_stateless/00765_sql_compatibility_aliases.sql index 248514d134b..a7f1f3ad98a 100644 --- a/dbms/tests/queries/0_stateless/00765_sql_compatibility_aliases.sql +++ b/dbms/tests/queries/0_stateless/00765_sql_compatibility_aliases.sql @@ -12,3 +12,19 @@ select mid('foo', 3); select IF(3>2, 1, 0); select substring('foo' from 1 + 1); select SUBSTRING('foo' FROM 2 FOR 1); +select left('foo', 2); +select LEFT('foo', 123); +select RIGHT('bar', 1); +select right('bar', 123); +select ltrim('') || rtrim('') || trim(''); +select ltrim(' foo'); +select RTRIM(' foo '); +select trim(TRAILING 'x' FROM 'xxfooxx'); +select Trim(LEADING 'ab' FROM 'abbafooabba'); +select TRIM(both 'ab' FROM 'abbafooabbafooabba'); +select trim(LEADING '*[]{}|\\' FROM '\\|[[[}}}*foo*'); +select DATE_DIFF(MONTH, toDate('2018-12-18'), toDate('2018-01-01')); +select DATE_DIFF(QQ, toDate('2018-12-18'), toDate('2018-01-01')); +select DATE_ADD(YEAR, 3, toDate('2018-01-01')); +select timestamp_sub(SQL_TSI_MONTH, 5, toDateTime('2018-12-18 01:02:03')); +select timestamp_ADD(toDate('2018-01-01'), INTERVAL 3 MONTH); diff --git a/dbms/tests/queries/0_stateless/00807_regexp_quote_meta.reference b/dbms/tests/queries/0_stateless/00807_regexp_quote_meta.reference new file mode 100644 index 00000000000..f58c91433e3 --- /dev/null +++ b/dbms/tests/queries/0_stateless/00807_regexp_quote_meta.reference @@ -0,0 +1,12 @@ +hello +hel\\\\lo +h\\{ell}o +\\(h\\{ell}o\\) + +\\( +Hello\\( +\\(Hello +\\(\\(\\(\\(\\(\\(\\(\\(\\( +\\\\ +\\\0\\\\\\|\\(\\)\\^\\$\\.\\[\\?\\*\\+\\{ +1 diff --git a/dbms/tests/queries/0_stateless/00807_regexp_quote_meta.sql b/dbms/tests/queries/0_stateless/00807_regexp_quote_meta.sql new file mode 100644 index 00000000000..afac2d7bece --- /dev/null +++ b/dbms/tests/queries/0_stateless/00807_regexp_quote_meta.sql @@ -0,0 +1,12 @@ +SELECT regexpQuoteMeta('hello'); +SELECT regexpQuoteMeta('hel\\lo'); +SELECT regexpQuoteMeta('h{ell}o'); +SELECT regexpQuoteMeta('(h{ell}o)'); +SELECT regexpQuoteMeta(''); +SELECT regexpQuoteMeta('('); +SELECT regexpQuoteMeta('Hello('); +SELECT regexpQuoteMeta('(Hello'); +SELECT regexpQuoteMeta('((((((((('); +SELECT regexpQuoteMeta('\\'); +SELECT regexpQuoteMeta('\0\\|()^$.[?*+{'); +SELECT DISTINCT regexpQuoteMeta(toString(number)) = toString(number) FROM numbers(100000); diff --git a/dbms/tests/server-test.xml b/dbms/tests/server-test.xml index 82b76f62fa4..c20d34cce3f 100644 --- a/dbms/tests/server-test.xml +++ b/dbms/tests/server-test.xml @@ -53,6 +53,20 @@ Europe/Moscow + + + + localhost + 59000 + + + + + localhost + 1 + + + diff --git a/docs/en/interfaces/formats.md b/docs/en/interfaces/formats.md index 8ede8c5507d..a21e8b53d2a 100644 --- a/docs/en/interfaces/formats.md +++ b/docs/en/interfaces/formats.md @@ -169,7 +169,7 @@ When formatting, rows are enclosed in double quotes. A double quote inside a str clickhouse-client --format_csv_delimiter="|" --query="INSERT INTO test.csv FORMAT CSV" < data.csv ``` -*By default, the delimiter is `,`. See the [format_csv_delimiter](/operations/settings/settings/#format_csv_delimiter) setting for more information. +*By default, the delimiter is `,`. See the [format_csv_delimiter](../operations/settings/settings.md#format_csv_delimiter) setting for more information. When parsing, all values can be parsed either with or without quotes. Both double and single quotes are supported. Rows can also be arranged without quotes. In this case, they are parsed up to the delimiter character or line feed (CR or LF). In violation of the RFC, when parsing rows without quotes, the leading and trailing spaces and tabs are ignored. For the line feed, Unix (LF), Windows (CR LF) and Mac OS Classic (CR LF) types are all supported. diff --git a/docs/en/operations/server_settings/settings.md b/docs/en/operations/server_settings/settings.md index 4275b5514c0..cc65063c70b 100644 --- a/docs/en/operations/server_settings/settings.md +++ b/docs/en/operations/server_settings/settings.md @@ -681,8 +681,20 @@ For more information, see the section "[Replication](../../operations/table_engi **Example** ```xml - + + + example1 + 2181 + + + example2 + 2181 + + + example3 + 2181 + + ``` - [Original article](https://clickhouse.yandex/docs/en/operations/server_settings/settings/) diff --git a/docs/en/operations/settings/settings.md b/docs/en/operations/settings/settings.md index 2d63c3e5e9a..3b4cf268579 100644 --- a/docs/en/operations/settings/settings.md +++ b/docs/en/operations/settings/settings.md @@ -149,7 +149,7 @@ Default value: 0 (off). Used when performing `SELECT` from a distributed table that points to replicated tables. -## max_threads +## max_threads {#max_threads} The maximum number of query processing threads diff --git a/docs/en/query_language/misc.md b/docs/en/query_language/misc.md index 148f4fe69f9..159a7611206 100644 --- a/docs/en/query_language/misc.md +++ b/docs/en/query_language/misc.md @@ -4,8 +4,8 @@ This query is exactly the same as `CREATE`, but -- instead of the word `CREATE` it uses the word `ATTACH`. -- The query doesn't create data on the disk, but assumes that data is already in the appropriate places, and just adds information about the table to the server. +- Instead of the word `CREATE` it uses the word `ATTACH`. +- The query does not create data on the disk, but assumes that data is already in the appropriate places, and just adds information about the table to the server. After executing an ATTACH query, the server will know about the existence of the table. If the table was previously detached (``DETACH``), meaning that its structure is known, you can use shorthand without defining the structure. @@ -16,6 +16,41 @@ ATTACH TABLE [IF NOT EXISTS] [db.]name [ON CLUSTER cluster] This query is used when starting the server. The server stores table metadata as files with `ATTACH` queries, which it simply runs at launch (with the exception of system tables, which are explicitly created on the server). +## CHECK TABLE + +Checks if the data in the table is corrupted. + +``` sql +CHECK TABLE [db.]name +``` + +The `CHECK TABLE` query compares actual file sizes with the expected values which are stored on the server. If the file sizes do not match the stored values, it means the data is corrupted. This can be caused, for example, by a system crash during query execution. + +The query response contains the `result` column with a single row. The row has a value of + [Boolean](../data_types/boolean.md) type: + +- 0 - The data in the table is corrupted. +- 1 - The data maintains integrity. + +The `CHECK TABLE` query is only supported for the following table engines: + +- [Log](../operations/table_engines/log.md) +- [TinyLog](../operations/table_engines/tinylog.md) +- StripeLog + +These engines do not provide automatic data recovery on failure. Use the `CHECK TABLE` query to track data loss in a timely manner. + +To avoid data loss use the [MergeTree](../operations/table_engines/mergetree.md) family tables. + +**If the data is corrupted** + +If the table is corrupted, you can copy the non-corrupted data to another table. To do this: + +1. Create a new table with the same structure as damaged table. To do this execute the query `CREATE TABLE AS `. +2. Set the [max_threads](../operations/settings/settings.md#max_threads) value to 1 to process the next query in a single thread. To do this run the query `SET max_threads = 1`. +3. Execute the query `INSERT INTO SELECT * FROM `. This request copies the non-corrupted data from the damaged table to another table. Only the data before the corrupted part will be copied. +4. Restart the `clickhouse-client` to reset the `max_threads` value. + ## DESCRIBE TABLE ``` sql @@ -198,8 +233,8 @@ SHOW [TEMPORARY] TABLES [FROM db] [LIKE 'pattern'] [INTO OUTFILE filename] [FORM Displays a list of tables -- tables from the current database, or from the 'db' database if "FROM db" is specified. -- all tables, or tables whose name matches the pattern, if "LIKE 'pattern'" is specified. +- Tables from the current database, or from the 'db' database if "FROM db" is specified. +- All tables, or tables whose name matches the pattern, if "LIKE 'pattern'" is specified. This query is identical to: `SELECT name FROM system.tables WHERE database = 'db' [AND name LIKE 'pattern'] [INTO OUTFILE filename] [FORMAT format]`. @@ -207,7 +242,7 @@ See also the section "LIKE operator". ## TRUNCATE -```sql +``` sql TRUNCATE TABLE [IF EXISTS] [db.]name [ON CLUSTER cluster] ``` diff --git a/docs/ru/operations/server_settings/settings.md b/docs/ru/operations/server_settings/settings.md index b7e9095ebac..4996a283f8d 100644 --- a/docs/ru/operations/server_settings/settings.md +++ b/docs/ru/operations/server_settings/settings.md @@ -683,7 +683,20 @@ ClickHouse использует ZooKeeper для хранения метадан **Пример** ```xml - + + + example1 + 2181 + + + example2 + 2181 + + + example3 + 2181 + + ``` [Оригинальная статья](https://clickhouse.yandex/docs/ru/operations/server_settings/settings/) diff --git a/docs/zh/operations/server_settings/settings.md b/docs/zh/operations/server_settings/settings.md index 6e1e5e44e59..24a384a9087 100644 --- a/docs/zh/operations/server_settings/settings.md +++ b/docs/zh/operations/server_settings/settings.md @@ -680,7 +680,20 @@ For more information, see the section "[Replication](../../operations/table_engi **Example** ```xml - + + + example1 + 2181 + + + example2 + 2181 + + + example3 + 2181 + + ``` diff --git a/libs/libcommon/include/common/DateLUTImpl.h b/libs/libcommon/include/common/DateLUTImpl.h index 56d9cc04dd1..55a94f3733a 100644 --- a/libs/libcommon/include/common/DateLUTImpl.h +++ b/libs/libcommon/include/common/DateLUTImpl.h @@ -584,6 +584,16 @@ public: } } + inline time_t addQuarters(time_t t, Int64 delta) const + { + return addMonths(t, delta * 3); + } + + inline DayNum addQuarters(DayNum d, Int64 delta) const + { + return addMonths(d, delta * 3); + } + /// Saturation can occur if 29 Feb is mapped to non-leap year. inline time_t addYears(time_t t, Int64 delta) const {