Merge remote-tracking branch 'upstream/master'

This commit is contained in:
BayoNet 2018-07-26 20:03:23 +03:00
commit 5b9dfcb486
732 changed files with 11462 additions and 7342 deletions

11
.gitignore vendored
View File

@ -10,12 +10,13 @@
*.logrt
/build
/docs/en_single_page/
/docs/ru_single_page/
/docs/venv/
/docs/build/
/docs/build
/docs/edit
/docs/tools/venv/
/docs/en/development/build/
/docs/ru/development/build/
/docs/en/single.md
/docs/ru/single.md
# callgrind files
callgrind.out.*
@ -240,3 +241,5 @@ node_modules
public
website/docs
website/presentations
.DS_Store
*/.DS_Store

View File

@ -1,4 +1 @@
# en:
# ru:

View File

@ -1,28 +1,152 @@
# ClickHouse release 1.1.54385, 2018-06-01
## ClickHouse release 1.1.54394, 2018-07-12
## Bug fixes:
### New features:
* Added the `histogram` aggregate function ([Mikhail Surin](https://github.com/yandex/ClickHouse/pull/2521)).
* Now `OPTIMIZE TABLE ... FINAL` can be used without specifying partitions for `ReplicatedMergeTree` ([Amos Bird](https://github.com/yandex/ClickHouse/pull/2600)).
### Bug fixes:
* Fixed a problem with a very small timeout for sockets (one second) for reading and writing when sending and downloading replicated data, which made it impossible to download larger parts if there is a load on the network or disk (it resulted in cyclical attempts to download parts). This error occurred in version 1.1.54388.
* Fixed issues when using chroot in ZooKeeper if you inserted duplicate data blocks in the table.
* The `has` function now works correctly for an array with Nullable elements ([#2115](https://github.com/yandex/ClickHouse/issues/2115)).
* The `system.tables` table now works correctly when used in distributed queries. The `metadata_modification_time` and `engine_full` columns are now non-virtual. Fixed an error that occurred if only these columns were requested from the table.
* Fixed how an empty `TinyLog` table works after inserting an empty data block ([#2563](https://github.com/yandex/ClickHouse/issues/2563)).
* The `system.zookeeper` table works if the value of the node in ZooKeeper is NULL.
## ClickHouse release 1.1.54390, 2018-07-06
### New features:
* Queries can be sent in `multipart/form-data` format (in the `query` field), which is useful if external data is also sent for query processing ([Olga Hvostikova](https://github.com/yandex/ClickHouse/pull/2490)).
* Added the ability to enable or disable processing single or double quotes when reading data in CSV format. You can configure this in the `format_csv_allow_single_quotes` and `format_csv_allow_double_quotes` settings ([Amos Bird](https://github.com/yandex/ClickHouse/pull/2574)).
* Now `OPTIMIZE TABLE ... FINAL` can be used without specifying the partition for non-replicated variants of `MergeTree` ([Amos Bird](https://github.com/yandex/ClickHouse/pull/2599)).
### Improvements:
* Improved performance, reduced memory consumption, and correct tracking of memory consumption with use of the IN operator when a table index could be used ([#2584](https://github.com/yandex/ClickHouse/pull/2584)).
* Removed redundant checking of checksums when adding a data part. This is important when there are a large number of replicas, because in these cases the total number of checks was equal to N^2.
* Added support for `Array(Tuple(...))` arguments for the `arrayEnumerateUniq` function ([#2573](https://github.com/yandex/ClickHouse/pull/2573)).
* Added `Nullable` support for the `runningDifference` function ([#2594](https://github.com/yandex/ClickHouse/pull/2594)).
* Improved query analysis performance when there is a very large number of expressions ([#2572](https://github.com/yandex/ClickHouse/pull/2572)).
* Faster selection of data parts for merging in `ReplicatedMergeTree` tables. Faster recovery of the ZooKeeper session ([#2597](https://github.com/yandex/ClickHouse/pull/2597)).
* The `format_version.txt` file for `MergeTree` tables is re-created if it is missing, which makes sense if ClickHouse is launched after copying the directory structure without files ([Ciprian Hacman](https://github.com/yandex/ClickHouse/pull/2593)).
### Bug fixes:
* Fixed a bug when working with ZooKeeper that could make it impossible to recover the session and readonly states of tables before restarting the server.
* Fixed a bug when working with ZooKeeper that could result in old nodes not being deleted if the session is interrupted.
* Fixed an error in the `quantileTDigest` function for Float arguments (this bug was introduced in version 1.1.54388) ([Mikhail Surin](https://github.com/yandex/ClickHouse/pull/2553)).
* Fixed a bug in the index for MergeTree tables if the primary key column is located inside the function for converting types between signed and unsigned integers of the same size ([#2603](https://github.com/yandex/ClickHouse/pull/2603)).
* Fixed segfault if `macros` are used but they aren't in the config file ([#2570](https://github.com/yandex/ClickHouse/pull/2570)).
* Fixed switching to the default database when reconnecting the client ([#2583](https://github.com/yandex/ClickHouse/pull/2583)).
* Fixed a bug that occurred when the `use_index_for_in_with_subqueries` setting was disabled.
### Security fix:
* Sending files is no longer possible when connected to MySQL (`LOAD DATA LOCAL INFILE`).
## ClickHouse release 1.1.54388, 2018-06-28
### New features:
* Support for the `ALTER TABLE t DELETE WHERE` query for replicated tables. Added the `system.mutations` table to track progress of this type of queries.
* Support for the `ALTER TABLE t [REPLACE|ATTACH] PARTITION` query for MergeTree tables.
* Support for the `TRUNCATE TABLE` query ([Winter Zhang](https://github.com/yandex/ClickHouse/pull/2260)).
* Several new `SYSTEM` queries for replicated tables (`RESTART REPLICAS`, `SYNC REPLICA`, `[STOP|START] [MERGES|FETCHES|SENDS REPLICATED|REPLICATION QUEUES]`).
* Added the ability to write to a table with the MySQL engine and the corresponding table function ([sundy-li](https://github.com/yandex/ClickHouse/pull/2294)).
* Added the `url()` table function and the `URL` table engine ([Alexander Sapin](https://github.com/yandex/ClickHouse/pull/2501)).
* Added the `windowFunnel` aggregate function ([sundy-li](https://github.com/yandex/ClickHouse/pull/2352)).
* New `startsWith` and `endsWith` functions for strings ([Vadim Plakhtinsky](https://github.com/yandex/ClickHouse/pull/2429)).
* The `numbers()` table function now allows you to specify the offset ([Winter Zhang](https://github.com/yandex/ClickHouse/pull/2535)).
* The password to `clickhouse-client` can be entered interactively.
* Server logs can now be sent to syslog ([Alexander Krasheninnikov](https://github.com/yandex/ClickHouse/pull/2459)).
* Support for logging in dictionaries with a shared library source ([Alexander Sapin](https://github.com/yandex/ClickHouse/pull/2472)).
* Support for custom CSV delimiters ([Ivan Zhukov](https://github.com/yandex/ClickHouse/pull/2263)).
* Added the `date_time_input_format` setting. If you switch this setting to `'best_effort'`, DateTime values will be read in a wide range of formats.
* Added the `clickhouse-obfuscator` utility for data obfuscation. Usage example: publishing data used in performance tests.
### Experimental features:
* Added the ability to calculate `and` arguments only where they are needed ([Anastasia Tsarkova](https://github.com/yandex/ClickHouse/pull/2272)).
* JIT compilation to native code is now available for some expressions ([pyos](https://github.com/yandex/ClickHouse/pull/2277)).
### Bug fixes:
* Duplicates no longer appear for a query with `DISTINCT` and `ORDER BY`.
* Queries with `ARRAY JOIN` and `arrayFilter` no longer return an incorrect result.
* Fixed an error when reading an array column from a Nested structure ([#2066](https://github.com/yandex/ClickHouse/issues/2066)).
* Fixed an error when analyzing queries with a HAVING section like `HAVING tuple IN (...)`.
* Fixed an error when analyzing queries with recursive aliases.
* Fixed an error when reading from ReplacingMergeTree with a condition in PREWHERE that filters all rows ([#2525](https://github.com/yandex/ClickHouse/issues/2525)).
* User profile settings were not applied when using sessions in the HTTP interface.
* Fixed how settings are applied from the command line parameters in `clickhouse-local`.
* The ZooKeeper client library now uses the session timeout received from the server.
* Fixed a bug in the ZooKeeper client library when the client waited for the server response longer than the timeout.
* Fixed pruning of parts for queries with conditions on partition key columns ([#2342](https://github.com/yandex/ClickHouse/issues/2342)).
* Merges are now possible after `CLEAR COLUMN IN PARTITION` ([#2315](https://github.com/yandex/ClickHouse/issues/2315)).
* Type mapping in the ODBC table function has been fixed ([sundy-li](https://github.com/yandex/ClickHouse/pull/2268)).
* Type comparisons have been fixed for `DateTime` with and without the time zone ([Alexander Bocharov](https://github.com/yandex/ClickHouse/pull/2400)).
* Fixed syntactic parsing and formatting of the `CAST` operator.
* Fixed insertion into a materialized view for the Distributed table engine ([Babacar Diassé](https://github.com/yandex/ClickHouse/pull/2411)).
* Fixed a race condition when writing data from the `Kafka` engine to materialized views ([Yangkuan Liu](https://github.com/yandex/ClickHouse/pull/2448)).
* Fixed SSRF in the `remote()` table function.
* Fixed exit behavior of `clickhouse-client` in multiline mode ([#2510](https://github.com/yandex/ClickHouse/issues/2510)).
### Improvements:
* Background tasks in replicated tables are now performed in a thread pool instead of in separate threads ([Silviu Caragea](https://github.com/yandex/ClickHouse/pull/1722)).
* Improved LZ4 compression performance.
* Faster analysis for queries with a large number of JOINs and sub-queries.
* The DNS cache is now updated automatically when there are too many network errors.
* Table inserts no longer occur if the insert into one of the materialized views is not possible because it has too many parts.
* Corrected the discrepancy in the event counters `Query`, `SelectQuery`, and `InsertQuery`.
* Expressions like `tuple IN (SELECT tuple)` are allowed if the tuple types match.
* A server with replicated tables can start even if you haven't configured ZooKeeper.
* When calculating the number of available CPU cores, limits on cgroups are now taken into account ([Atri Sharma](https://github.com/yandex/ClickHouse/pull/2325)).
* Added chown for config directories in the systemd config file ([Mikhail Shiryaev](https://github.com/yandex/ClickHouse/pull/2421)).
### Build changes:
* The gcc8 compiler can be used for builds.
* Added the ability to build llvm from a submodule.
* The version of the librdkafka library has been updated to v0.11.4.
* Added the ability to use the system libcpuid library. The library version has been updated to 0.4.0.
* Fixed the build using the vectorclass library ([Babacar Diassé](https://github.com/yandex/ClickHouse/pull/2274)).
* Cmake now generates files for ninja by default (like when using `-G Ninja`).
* Added the ability to use the libtinfo library instead of libtermcap ([Georgy Kondratiev](https://github.com/yandex/ClickHouse/pull/2519)).
* Fixed a header file conflict in Fedora Rawhide ([#2520](https://github.com/yandex/ClickHouse/issues/2520)).
### Backward incompatible changes:
* Removed escaping in `Vertical` and `Pretty*` formats and deleted the `VerticalRaw` format.
* If servers with version 1.1.54388 (or newer) and servers with older version are used simultaneously in distributed query and the query has `cast(x, 'Type')` expression in the form without `AS` keyword and with `cast` not in uppercase, then the exception with message like `Not found column cast(0, 'UInt8') in block` will be thrown. Solution: update server on all cluster nodes.
## ClickHouse release 1.1.54385, 2018-06-01
### Bug fixes:
* Fixed an error that in some cases caused ZooKeeper operations to block.
# ClickHouse release 1.1.54383, 2018-05-22
## ClickHouse release 1.1.54383, 2018-05-22
## Bug fixes:
### Bug fixes:
* Fixed a slowdown of replication queue if a table has many replicas.
# ClickHouse release 1.1.54381, 2018-05-14
## ClickHouse release 1.1.54381, 2018-05-14
## Bug fixes:
### Bug fixes:
* Fixed a nodes leak in ZooKeeper when ClickHouse loses connection to ZooKeeper server.
# ClickHouse release 1.1.54380, 2018-04-21
## ClickHouse release 1.1.54380, 2018-04-21
## New features:
### New features:
* Added table function `file(path, format, structure)`. An example reading bytes from `/dev/urandom`: `ln -s /dev/urandom /var/lib/clickhouse/user_files/random` `clickhouse-client -q "SELECT * FROM file('random', 'RowBinary', 'd UInt8') LIMIT 10"`.
## Improvements:
### Improvements:
* Subqueries could be wrapped by `()` braces (to enhance queries readability). For example, `(SELECT 1) UNION ALL (SELECT 1)`.
* Simple `SELECT` queries from table `system.processes` are not counted in `max_concurrent_queries` limit.
## Bug fixes:
### Bug fixes:
* Fixed incorrect behaviour of `IN` operator when select from `MATERIALIZED VIEW`.
* Fixed incorrect filtering by partition index in expressions like `WHERE partition_key_column IN (...)`
* Fixed inability to execute `OPTIMIZE` query on non-leader replica if the table was `REANAME`d.
@ -30,11 +154,11 @@
* Fixed freezing of `KILL QUERY` queries.
* Fixed an error in ZooKeeper client library which led to watches loses, freezing of distributed DDL queue and slowing replication queue if non-empty `chroot` prefix is used in ZooKeeper configuration.
## Backward incompatible changes:
### Backward incompatible changes:
* Removed support of expressions like `(a, b) IN (SELECT (a, b))` (instead of them you can use their equivalent `(a, b) IN (SELECT a, b)`). In previous releases, these expressions led to undetermined data filtering or caused errors.
# ClickHouse release 1.1.54378, 2018-04-16
## New features:
## ClickHouse release 1.1.54378, 2018-04-16
### New features:
* Logging level can be changed without restarting the server.
* Added the `SHOW CREATE DATABASE` query.
@ -48,7 +172,7 @@
* Multiple comma-separated `topics` can be specified for the `Kafka` engine (Tobias Adamson).
* When a query is stopped by `KILL QUERY` or `replace_running_query`, the client receives the `Query was cancelled` exception instead of an incomplete response.
## Improvements:
### Improvements:
* `ALTER TABLE ... DROP/DETACH PARTITION` queries are run at the front of the replication queue.
* `SELECT ... FINAL` and `OPTIMIZE ... FINAL` can be used even when the table has a single data part.
@ -59,7 +183,7 @@
* More robust crash recovery for asynchronous insertion into `Distributed` tables.
* The return type of the `countEqual` function changed from `UInt32` to `UInt64` (谢磊).
## Bug fixes:
### Bug fixes:
* Fixed an error with `IN` when the left side of the expression is `Nullable`.
* Correct results are now returned when using tuples with `IN` when some of the tuple components are in the table index.
@ -75,31 +199,31 @@
* `SummingMergeTree` now works correctly for summation of nested data structures with a composite key.
* Fixed the possibility of a race condition when choosing the leader for `ReplicatedMergeTree` tables.
## Build changes:
### Build changes:
* The build supports `ninja` instead of `make` and uses it by default for building releases.
* Renamed packages: `clickhouse-server-base` is now `clickhouse-common-static`; `clickhouse-server-common` is now `clickhouse-server`; `clickhouse-common-dbg` is now `clickhouse-common-static-dbg`. To install, use `clickhouse-server clickhouse-client`. Packages with the old names will still load in the repositories for backward compatibility.
## Backward-incompatible changes:
### Backward-incompatible changes:
* Removed the special interpretation of an IN expression if an array is specified on the left side. Previously, the expression `arr IN (set)` was interpreted as "at least one `arr` element belongs to the `set`". To get the same behavior in the new version, write `arrayExists(x -> x IN (set), arr)`.
* Disabled the incorrect use of the socket option `SO_REUSEPORT`, which was incorrectly enabled by default in the Poco library. Note that on Linux there is no longer any reason to simultaneously specify the addresses `::` and `0.0.0.0` for listen use just `::`, which allows listening to the connection both over IPv4 and IPv6 (with the default kernel config settings). You can also revert to the behavior from previous versions by specifying `<listen_reuse_port>1</listen_reuse_port>` in the config.
# ClickHouse release 1.1.54370, 2018-03-16
## ClickHouse release 1.1.54370, 2018-03-16
## New features:
### New features:
* Added the `system.macros` table and auto updating of macros when the config file is changed.
* Added the `SYSTEM RELOAD CONFIG` query.
* Added the `maxIntersections(left_col, right_col)` aggregate function, which returns the maximum number of simultaneously intersecting intervals `[left; right]`. The `maxIntersectionsPosition(left, right)` function returns the beginning of the "maximum" interval. ([Michael Furmur](https://github.com/yandex/ClickHouse/pull/2012)).
## Improvements:
### Improvements:
* When inserting data in a `Replicated` table, fewer requests are made to `ZooKeeper` (and most of the user-level errors have disappeared from the `ZooKeeper` log).
* Added the ability to create aliases for sets. Example: `WITH (1, 2, 3) AS set SELECT number IN set FROM system.numbers LIMIT 10`.
## Bug fixes:
### Bug fixes:
* Fixed the `Illegal PREWHERE` error when reading from `Merge` tables over `Distributed` tables.
* Added fixes that allow you to run `clickhouse-server` in IPv4-only Docker containers.
@ -113,9 +237,9 @@
* Restored the behavior for queries like `SELECT * FROM remote('server2', default.table) WHERE col IN (SELECT col2 FROM default.table)` when the right side argument of the `IN` should use a remote `default.table` instead of a local one. This behavior was broken in version 1.1.54358.
* Removed extraneous error-level logging of `Not found column ... in block`.
# ClickHouse release 1.1.54356, 2018-03-06
## ClickHouse release 1.1.54356, 2018-03-06
## New features:
### New features:
* Aggregation without `GROUP BY` for an empty set (such as `SELECT count(*) FROM table WHERE 0`) now returns a result with one row with null values for aggregate functions, in compliance with the SQL standard. To restore the old behavior (return an empty result), set `empty_result_for_aggregation_by_empty_set` to 1.
* Added type conversion for `UNION ALL`. Different alias names are allowed in `SELECT` positions in `UNION ALL`, in compliance with the SQL standard.
@ -150,7 +274,7 @@
* `RENAME TABLE` can be performed for `VIEW`.
* Added the `odbc_default_field_size` option, which allows you to extend the maximum size of the value loaded from an ODBC source (by default, it is 1024).
## Improvements:
### Improvements:
* Limits and quotas on the result are no longer applied to intermediate data for `INSERT SELECT` queries or for `SELECT` subqueries.
* Fewer false triggers of `force_restore_data` when checking the status of `Replicated` tables when the server starts.
@ -166,7 +290,7 @@
* `Enum` values can be used in `min`, `max`, `sum` and some other functions. In these cases, it uses the corresponding numeric values. This feature was previously available but was lost in the release 1.1.54337.
* Added `max_expanded_ast_elements` to restrict the size of the AST after recursively expanding aliases.
## Bug fixes:
### Bug fixes:
* Fixed cases when unnecessary columns were removed from subqueries in error, or not removed from subqueries containing `UNION ALL`.
* Fixed a bug in merges for `ReplacingMergeTree` tables.
@ -192,18 +316,18 @@
* Fixed a crash when passing arrays of different sizes to an `arrayReduce` function when using aggregate functions from multiple arguments.
* Prohibited the use of queries with `UNION ALL` in a `MATERIALIZED VIEW`.
## Backward incompatible changes:
### Backward incompatible changes:
* Removed the `distributed_ddl_allow_replicated_alter` option. This behavior is enabled by default.
* Removed the `UnsortedMergeTree` engine.
# ClickHouse release 1.1.54343, 2018-02-05
## ClickHouse release 1.1.54343, 2018-02-05
* Added macros support for defining cluster names in distributed DDL queries and constructors of Distributed tables: `CREATE TABLE distr ON CLUSTER '{cluster}' (...) ENGINE = Distributed('{cluster}', 'db', 'table')`.
* Now the table index is used for conditions like `expr IN (subquery)`.
* Improved processing of duplicates when inserting to Replicated tables, so they no longer slow down execution of the replication queue.
# ClickHouse release 1.1.54342, 2018-01-22
## ClickHouse release 1.1.54342, 2018-01-22
This release contains bug fixes for the previous release 1.1.54337:
* Fixed a regression in 1.1.54337: if the default user has readonly access, then the server refuses to start up with the message `Cannot create database in readonly mode`.
@ -214,9 +338,9 @@ This release contains bug fixes for the previous release 1.1.54337:
* Buffer tables now work correctly when MATERIALIZED columns are present in the destination table (by zhang2014).
* Fixed a bug in implementation of NULL.
# ClickHouse release 1.1.54337, 2018-01-18
## ClickHouse release 1.1.54337, 2018-01-18
## New features:
### New features:
* Added support for storage of multidimensional arrays and tuples (`Tuple` data type) in tables.
* Added support for table functions in `DESCRIBE` and `INSERT` queries. Added support for subqueries in `DESCRIBE`. Examples: `DESC TABLE remote('host', default.hits)`; `DESC TABLE (SELECT 1)`; `INSERT INTO TABLE FUNCTION remote('host', default.hits)`. Support for `INSERT INTO TABLE` syntax in addition to `INSERT INTO`.
@ -247,7 +371,7 @@ This release contains bug fixes for the previous release 1.1.54337:
* Added the `--silent` option for the `clickhouse-local` tool. It suppresses printing query execution info in stderr.
* Added support for reading values of type `Date` from text in a format where the month and/or day of the month is specified using a single digit instead of two digits (Amos Bird).
## Performance optimizations:
### Performance optimizations:
* Improved performance of `min`, `max`, `any`, `anyLast`, `anyHeavy`, `argMin`, `argMax` aggregate functions for String arguments.
* Improved performance of `isInfinite`, `isFinite`, `isNaN`, `roundToExp2` functions.
@ -256,7 +380,7 @@ This release contains bug fixes for the previous release 1.1.54337:
* Lowered memory usage for `JOIN` in the case when the left and right parts have columns with identical names that are not contained in `USING`.
* Improved performance of `varSamp`, `varPop`, `stddevSamp`, `stddevPop`, `covarSamp`, `covarPop`, and `corr` aggregate functions by reducing computational stability. The old functions are available under the names: `varSampStable`, `varPopStable`, `stddevSampStable`, `stddevPopStable`, `covarSampStable`, `covarPopStable`, `corrStable`.
## Bug fixes:
### Bug fixes:
* Fixed data deduplication after running a `DROP PARTITION` query. In the previous version, dropping a partition and INSERTing the same data again was not working because INSERTed blocks were considered duplicates.
* Fixed a bug that could lead to incorrect interpretation of the `WHERE` clause for `CREATE MATERIALIZED VIEW` queries with `POPULATE`.
@ -295,7 +419,7 @@ This release contains bug fixes for the previous release 1.1.54337:
* Fixed the `SYSTEM DROP DNS CACHE` query: the cache was flushed but addresses of cluster nodes were not updated.
* Fixed the behavior of `MATERIALIZED VIEW` after executing `DETACH TABLE` for the table under the view (Marek Vavruša).
## Build improvements:
### Build improvements:
* Builds use `pbuilder`. The build process is almost completely independent of the build host environment.
* A single build is used for different OS versions. Packages and binaries have been made compatible with a wide range of Linux systems.
@ -309,7 +433,7 @@ This release contains bug fixes for the previous release 1.1.54337:
* Removed usage of GNU extensions from the code. Enabled the `-Wextra` option. When building with `clang`, `libc++` is used instead of `libstdc++`.
* Extracted `clickhouse_parsers` and `clickhouse_common_io` libraries to speed up builds of various tools.
## Backward incompatible changes:
### Backward incompatible changes:
* The format for marks in `Log` type tables that contain `Nullable` columns was changed in a backward incompatible way. If you have these tables, you should convert them to the `TinyLog` type before starting up the new server version. To do this, replace `ENGINE = Log` with `ENGINE = TinyLog` in the corresponding `.sql` file in the `metadata` directory. If your table doesn't have `Nullable` columns or if the type of your table is not `Log`, then you don't need to do anything.
* Removed the `experimental_allow_extended_storage_definition_syntax` setting. Now this feature is enabled by default.
@ -320,16 +444,16 @@ This release contains bug fixes for the previous release 1.1.54337:
* In previous server versions there was an undocumented feature: if an aggregate function depends on parameters, you can still specify it without parameters in the AggregateFunction data type. Example: `AggregateFunction(quantiles, UInt64)` instead of `AggregateFunction(quantiles(0.5, 0.9), UInt64)`. This feature was lost. Although it was undocumented, we plan to support it again in future releases.
* Enum data types cannot be used in min/max aggregate functions. The possibility will be returned back in future release.
## Please note when upgrading:
### Please note when upgrading:
* When doing a rolling update on a cluster, at the point when some of the replicas are running the old version of ClickHouse and some are running the new version, replication is temporarily stopped and the message `unknown parameter 'shard'` appears in the log. Replication will continue after all replicas of the cluster are updated.
* If you have different ClickHouse versions on the cluster, you can get incorrect results for distributed queries with the aggregate functions `varSamp`, `varPop`, `stddevSamp`, `stddevPop`, `covarSamp`, `covarPop`, and `corr`. You should update all cluster nodes.
# ClickHouse release 1.1.54327, 2017-12-21
## ClickHouse release 1.1.54327, 2017-12-21
This release contains bug fixes for the previous release 1.1.54318:
* Fixed bug with possible race condition in replication that could lead to data loss. This issue affects versions 1.1.54310 and 1.1.54318. If you use one of these versions with Replicated tables, the update is strongly recommended. This issue shows in logs in Warning messages like `Part ... from own log doesn't exist.` The issue is relevant even if you don't see these messages in logs.
# ClickHouse release 1.1.54318, 2017-11-30
## ClickHouse release 1.1.54318, 2017-11-30
This release contains bug fixes for the previous release 1.1.54310:
* Fixed incorrect row deletions during merges in the SummingMergeTree engine
@ -338,9 +462,9 @@ This release contains bug fixes for the previous release 1.1.54310:
* Fixed an issue that was causing the replication queue to stop running
* Fixed rotation and archiving of server logs
# ClickHouse release 1.1.54310, 2017-11-01
## ClickHouse release 1.1.54310, 2017-11-01
## New features:
### New features:
* Custom partitioning key for the MergeTree family of table engines.
* [Kafka](https://clickhouse.yandex/docs/en/single/index.html#document-table_engines/kafka) table engine.
* Added support for loading [CatBoost](https://catboost.yandex/) models and applying them to data stored in ClickHouse.
@ -356,12 +480,12 @@ This release contains bug fixes for the previous release 1.1.54310:
* Added support for the Cap'n Proto input format.
* You can now customize compression level when using the zstd algorithm.
## Backward incompatible changes:
### Backward incompatible changes:
* Creation of temporary tables with an engine other than Memory is forbidden.
* Explicit creation of tables with the View or MaterializedView engine is forbidden.
* During table creation, a new check verifies that the sampling key expression is included in the primary key.
## Bug fixes:
### Bug fixes:
* Fixed hangups when synchronously inserting into a Distributed table.
* Fixed nonatomic adding and removing of parts in Replicated tables.
* Data inserted into a materialized view is not subjected to unnecessary deduplication.
@ -371,15 +495,15 @@ This release contains bug fixes for the previous release 1.1.54310:
* Fixed hangups when the disk volume containing server logs is full.
* Fixed an overflow in the `toRelativeWeekNum` function for the first week of the Unix epoch.
## Build improvements:
### Build improvements:
* Several third-party libraries (notably Poco) were updated and converted to git submodules.
# ClickHouse release 1.1.54304, 2017-10-19
## ClickHouse release 1.1.54304, 2017-10-19
## New features:
### New features:
* TLS support in the native protocol (to enable, set `tcp_ssl_port` in `config.xml`)
## Bug fixes:
### Bug fixes:
* `ALTER` for replicated tables now tries to start running as soon as possible
* Fixed crashing when reading data with the setting `preferred_block_size_bytes=0`
* Fixed crashes of `clickhouse-client` when `Page Down` is pressed
@ -392,16 +516,16 @@ This release contains bug fixes for the previous release 1.1.54310:
* Users are updated correctly when `users.xml` is invalid
* Correct handling when an executable dictionary returns a non-zero response code
# ClickHouse release 1.1.54292, 2017-09-20
## ClickHouse release 1.1.54292, 2017-09-20
## New features:
### New features:
* Added the `pointInPolygon` function for working with coordinates on a coordinate plane.
* Added the `sumMap` aggregate function for calculating the sum of arrays, similar to `SummingMergeTree`.
* Added the `trunc` function. Improved performance of the rounding functions (`round`, `floor`, `ceil`, `roundToExp2`) and corrected the logic of how they work. Changed the logic of the `roundToExp2` function for fractions and negative numbers.
* The ClickHouse executable file is now less dependent on the libc version. The same ClickHouse executable file can run on a wide variety of Linux systems. Note: There is still a dependency when using compiled queries (with the setting `compile = 1`, which is not used by default).
* Reduced the time needed for dynamic compilation of queries.
## Bug fixes:
### Bug fixes:
* Fixed an error that sometimes produced `part ... intersects previous part` messages and weakened replica consistency.
* Fixed an error that caused the server to lock up if ZooKeeper was unavailable during shutdown.
* Removed excessive logging when restoring replicas.
@ -409,9 +533,9 @@ This release contains bug fixes for the previous release 1.1.54310:
* Fixed an error in the concat function that occurred if the first column in a block has the Array type.
* Progress is now displayed correctly in the system.merges table.
# ClickHouse release 1.1.54289, 2017-09-13
## ClickHouse release 1.1.54289, 2017-09-13
## New features:
### New features:
* `SYSTEM` queries for server administration: `SYSTEM RELOAD DICTIONARY`, `SYSTEM RELOAD DICTIONARIES`, `SYSTEM DROP DNS CACHE`, `SYSTEM SHUTDOWN`, `SYSTEM KILL`.
* Added functions for working with arrays: `concat`, `arraySlice`, `arrayPushBack`, `arrayPushFront`, `arrayPopBack`, `arrayPopFront`.
* Added the `root` and `identity` parameters for the ZooKeeper configuration. This allows you to isolate individual users on the same ZooKeeper cluster.
@ -426,7 +550,7 @@ This release contains bug fixes for the previous release 1.1.54310:
* Option to set `umask` in the config file.
* Improved performance for queries with `DISTINCT`.
## Bug fixes:
### Bug fixes:
* Improved the process for deleting old nodes in ZooKeeper. Previously, old nodes sometimes didn't get deleted if there were very frequent inserts, which caused the server to be slow to shut down, among other things.
* Fixed randomization when choosing hosts for the connection to ZooKeeper.
* Fixed the exclusion of lagging replicas in distributed queries if the replica is localhost.
@ -439,28 +563,28 @@ This release contains bug fixes for the previous release 1.1.54310:
* Resolved the appearance of zombie processes when using a dictionary with an `executable` source.
* Fixed segfault for the HEAD query.
## Improvements to development workflow and ClickHouse build:
### Improvements to development workflow and ClickHouse build:
* You can use `pbuilder` to build ClickHouse.
* You can use `libc++` instead of `libstdc++` for builds on Linux.
* Added instructions for using static code analysis tools: `Coverity`, `clang-tidy`, and `cppcheck`.
## Please note when upgrading:
### Please note when upgrading:
* There is now a higher default value for the MergeTree setting `max_bytes_to_merge_at_max_space_in_pool` (the maximum total size of data parts to merge, in bytes): it has increased from 100 GiB to 150 GiB. This might result in large merges running after the server upgrade, which could cause an increased load on the disk subsystem. If the free space available on the server is less than twice the total amount of the merges that are running, this will cause all other merges to stop running, including merges of small data parts. As a result, INSERT requests will fail with the message "Merges are processing significantly slower than inserts." Use the `SELECT * FROM system.merges` request to monitor the situation. You can also check the `DiskSpaceReservedForMerge` metric in the `system.metrics` table, or in Graphite. You don't need to do anything to fix this, since the issue will resolve itself once the large merges finish. If you find this unacceptable, you can restore the previous value for the `max_bytes_to_merge_at_max_space_in_pool` setting (to do this, go to the `<merge_tree>` section in config.xml, set `<max_bytes_to_merge_at_max_space_in_pool>107374182400</max_bytes_to_merge_at_max_space_in_pool>` and restart the server).
# ClickHouse release 1.1.54284, 2017-08-29
## ClickHouse release 1.1.54284, 2017-08-29
* This is bugfix release for previous 1.1.54282 release. It fixes ZooKeeper nodes leak in `parts/` directory.
# ClickHouse release 1.1.54282, 2017-08-23
## ClickHouse release 1.1.54282, 2017-08-23
This is a bugfix release. The following bugs were fixed:
* `DB::Exception: Assertion violation: !_path.empty()` error when inserting into a Distributed table.
* Error when parsing inserted data in RowBinary format if the data begins with ';' character.
* Errors during runtime compilation of certain aggregate functions (e.g. `groupArray()`).
# ClickHouse release 1.1.54276, 2017-08-16
## ClickHouse release 1.1.54276, 2017-08-16
## New features:
### New features:
* You can use an optional WITH clause in a SELECT query. Example query: `WITH 1+1 AS a SELECT a, a*a`
* INSERT can be performed synchronously in a Distributed table: OK is returned only after all the data is saved on all the shards. This is activated by the setting insert_distributed_sync=1.
@ -471,7 +595,7 @@ This is a bugfix release. The following bugs were fixed:
* Added support for non-constant arguments and negative offsets in the function `substring(str, pos, len).`
* Added the max_size parameter for the `groupArray(max_size)(column)` aggregate function, and optimized its performance.
## Major changes:
### Major changes:
* Improved security: all server files are created with 0640 permissions (can be changed via <umask> config parameter).
* Improved error messages for queries with invalid syntax.
@ -479,11 +603,11 @@ This is a bugfix release. The following bugs were fixed:
* Significantly increased the performance of data merges for the ReplacingMergeTree engine.
* Improved performance for asynchronous inserts from a Distributed table by batching multiple source inserts. To enable this functionality, use the setting distributed_directory_monitor_batch_inserts=1.
## Backward incompatible changes:
### Backward incompatible changes:
* Changed the binary format of aggregate states of `groupArray(array_column)` functions for arrays.
## Complete list of changes:
### Complete list of changes:
* Added the `output_format_json_quote_denormals` setting, which enables outputting nan and inf values in JSON format.
* Optimized thread allocation when reading from a Distributed table.
@ -502,7 +626,7 @@ This is a bugfix release. The following bugs were fixed:
* It is possible to connect to MySQL through a socket in the file system.
* The `system.parts` table has a new column with information about the size of marks, in bytes.
## Bug fixes:
### Bug fixes:
* Distributed tables using a Merge table now work correctly for a SELECT query with a condition on the _table field.
* Fixed a rare race condition in ReplicatedMergeTree when checking data parts.
@ -526,15 +650,15 @@ This is a bugfix release. The following bugs were fixed:
* Fixed the "Cannot mremap" error when using arrays in IN and JOIN clauses with more than 2 billion elements.
* Fixed the failover for dictionaries with MySQL as the source.
## Improved workflow for developing and assembling ClickHouse:
### Improved workflow for developing and assembling ClickHouse:
* Builds can be assembled in Arcadia.
* You can use gcc 7 to compile ClickHouse.
* Parallel builds using ccache+distcc are faster now.
# ClickHouse release 1.1.54245, 2017-07-04
## ClickHouse release 1.1.54245, 2017-07-04
## New features:
### New features:
* Distributed DDL (for example, `CREATE TABLE ON CLUSTER`).
* The replicated request `ALTER TABLE CLEAR COLUMN IN PARTITION.`
@ -546,16 +670,16 @@ This is a bugfix release. The following bugs were fixed:
* Sessions in the HTTP interface.
* The OPTIMIZE query for a Replicated table can can run not only on the leader.
## Backward incompatible changes:
### Backward incompatible changes:
* Removed SET GLOBAL.
## Minor changes:
### Minor changes:
* If an alert is triggered, the full stack trace is printed into the log.
* Relaxed the verification of the number of damaged or extra data parts at startup (there were too many false positives).
## Bug fixes:
### Bug fixes:
* Fixed a bad connection "sticking" when inserting into a Distributed table.
* GLOBAL IN now works for a query from a Merge table that looks at a Distributed table.

View File

@ -1,6 +1,79 @@
# ClickHouse release 1.1.54388, 2018-06-28
## ClickHouse release 18.1.0, 2018-07-23
## Новые возможности:
### Новые возможности:
* Поддержка запроса `ALTER TABLE t DELETE WHERE` для нереплицированных MergeTree-таблиц ([#2634](https://github.com/yandex/ClickHouse/pull/2634)).
* Поддержка произвольных типов для семейства агрегатных функций `uniq*` ([#2010](https://github.com/yandex/ClickHouse/issues/2010)).
* Поддержка произвольных типов в операторах сравнения ([#2026](https://github.com/yandex/ClickHouse/issues/2026)).
* Возможность в `users.xml` указывать маску подсети в формате `10.0.0.1/255.255.255.0`. Это необходимо для использования "дырявых" масок IPv6 сетей ([#2637](https://github.com/yandex/ClickHouse/pull/2637)).
* Добавлена функция `arrayDistinct` ([#2670](https://github.com/yandex/ClickHouse/pull/2670)).
* Движок SummingMergeTree теперь может работать со столбцами типа AggregateFunction ([Constantin S. Pan](https://github.com/yandex/ClickHouse/pull/2566)).
### Улучшения:
* Изменена схема версионирования релизов. Теперь первый компонент содержит год релиза (A.D.; по московскому времени; из номера вычитается 2000), второй - номер крупных изменений (увеличивается для большинства релизов), третий - патч-версия. Релизы по-прежнему обратно совместимы, если другое не указано в changelog.
* Ускорено преобразование чисел с плавающей точкой в строку ([Amos Bird](https://github.com/yandex/ClickHouse/pull/2664)).
* Теперь, если при вставке из-за ошибок парсинга пропущено некоторое количество строк (такое возможно про включённых настройках `input_allow_errors_num`, `input_allow_errors_ratio`), это количество пишется в лог сервера ([Leonardo Cecchi](https://github.com/yandex/ClickHouse/pull/2669)).
### Исправление ошибок:
* Исправлена работа команды TRUNCATE для временных таблиц ([Amos Bird](https://github.com/yandex/ClickHouse/pull/2624)).
* Исправлен редкий deadlock в клиентской библиотеке ZooKeeper, который возникал при сетевой ошибке во время вычитывания ответа ([c315200](https://github.com/yandex/ClickHouse/commit/c315200e64b87e44bdf740707fc857d1fdf7e947)).
* Исправлена ошибка при CAST в Nullable типы ([#1322](https://github.com/yandex/ClickHouse/issues/1322)).
* Исправлен неправильный результат функции `maxIntersection()` в случае совпадения границ отрезков ([Michael Furmur](https://github.com/yandex/ClickHouse/pull/2657)).
* Исправлено неверное преобразование цепочки OR-выражений в аргументе функции ([chenxing-xc](https://github.com/yandex/ClickHouse/pull/2663)).
* Исправлена деградация производительности запросов, содержащих выражение `IN (подзапрос)` внутри другого подзапроса ([#2571](https://github.com/yandex/ClickHouse/issues/2571)).
* Исправлена несовместимость серверов разных версий при распределённых запросах, использующих функцию `CAST` не в верхнем регистре ([fe8c4d6](https://github.com/yandex/ClickHouse/commit/fe8c4d64e434cacd4ceef34faa9005129f2190a5)).
* Добавлено недостающее квотирование идентификаторов при запросах к внешним СУБД ([#2635](https://github.com/yandex/ClickHouse/issues/2635)).
### Обратно несовместимые изменения:
* Не работает преобразование строки, содержащей число ноль, в DateTime. Пример: `SELECT toDateTime('0')`. По той же причине не работает `DateTime DEFAULT '0'` в таблицах, а также `<null_value>0</null_value>` в словарях. Решение: заменить `0` на `0000-00-00 00:00:00`.
## ClickHouse release 1.1.54394, 2018-07-12
### Новые возможности:
* Добавлена агрегатная функция `histogram` ([Михаил Сурин](https://github.com/yandex/ClickHouse/pull/2521)).
* Возможность использования `OPTIMIZE TABLE ... FINAL` без указания партиции для `ReplicatedMergeTree` ([Amos Bird](https://github.com/yandex/ClickHouse/pull/2600)).
### Исправление ошибок:
* Исправлена ошибка - выставление слишком маленького таймаута у сокетов (одна секунда) для чтения и записи при отправке и скачивании реплицируемых данных, что приводило к невозможности скачать куски достаточно большого размера при наличии некоторой нагрузки на сеть или диск (попытки скачивания кусков циклически повторяются). Ошибка возникла в версии 1.1.54388.
* Исправлена работа при использовании chroot в ZooKeeper, в случае вставки дублирующихся блоков данных в таблицу.
* Исправлена работа функции `has` для случая массива с Nullable элементами ([#2115](https://github.com/yandex/ClickHouse/issues/2521)).
* Исправлена работа таблицы `system.tables` при её использовании в распределённых запросах; столбцы `metadata_modification_time` и `engine_full` сделаны невиртуальными; исправлена ошибка в случае, если из таблицы были запрошены только эти столбцы.
* Исправлена работа пустой таблицы типа `TinyLog` после вставки в неё пустого блока данных ([#2563](https://github.com/yandex/ClickHouse/issues/2563)).
* Таблица `system.zookeeper` работает в случае, если значение узла в ZooKeeper равно NULL.
## ClickHouse release 1.1.54390, 2018-07-06
### Новые возможности:
* Возможность отправки запроса в формате `multipart/form-data` (в поле `query`), что полезно, если при этом также отправляются внешние данные для обработки запроса ([Ольга Хвостикова](https://github.com/yandex/ClickHouse/pull/2490)).
* Добавлена возможность включить или отключить обработку одинарных или двойных кавычек при чтении данных в формате CSV. Это задаётся настройками `format_csv_allow_single_quotes` и `format_csv_allow_double_quotes` ([Amos Bird](https://github.com/yandex/ClickHouse/pull/2574))
* Возможность использования `OPTIMIZE TABLE ... FINAL` без указания партиции для не реплицированных вариантов`MergeTree` ([Amos Bird](https://github.com/yandex/ClickHouse/pull/2599)).
### Улучшения:
* Увеличена производительность, уменьшено потребление памяти, добавлен корректный учёт потребления памяти, при использовании оператора IN в случае, когда для его работы может использоваться индекс таблицы ([#2584](https://github.com/yandex/ClickHouse/pull/2584)).
* Убраны избыточные проверки чексумм при добавлении куска. Это важно в случае большого количества реплик, так как в этом случае суммарное количество проверок было равно N^2.
* Добавлена поддержка аргументов типа `Array(Tuple(...))` для функции `arrayEnumerateUniq` ([#2573](https://github.com/yandex/ClickHouse/pull/2573)).
* Добавлена поддержка `Nullable` для функции `runningDifference`. ([#2594](https://github.com/yandex/ClickHouse/pull/2594))
* Увеличена производительность анализа запроса в случае очень большого количества выражений ([#2572](https://github.com/yandex/ClickHouse/pull/2572)).
* Более быстрый выбор кусков для слияния в таблицах типа `ReplicatedMergeTree`. Более быстрое восстановление сессии с ZooKeeper. ([#2597](https://github.com/yandex/ClickHouse/pull/2597)).
* Файл `format_version.txt` для таблиц семейства `MergeTree` создаётся заново при его отсутствии, что имеет смысл в случае запуска ClickHouse после копирования структуры директорий без файлов ([Ciprian Hacman](https://github.com/yandex/ClickHouse/pull/2593)).
### Исправление ошибок:
* Исправлена ошибка при работе с ZooKeeper, которая могла приводить к невозможности восстановления сессии и readonly состояниям таблиц до перезапуска сервера.
* Исправлена ошибка при работе с ZooKeeper, которая могла приводить к неудалению старых узлов при разрыве сессии.
* Исправлена ошибка в функции `quantileTDigest` для Float аргументов (ошибка появилась в версии 1.1.54388) ([Михаил Сурин](https://github.com/yandex/ClickHouse/pull/2553)).
* Исправлена ошибка работы индекса таблиц типа MergeTree, если в условии, столбец первичного ключа расположен внутри функции преобразования типов между знаковым и беззнаковым целым одного размера ([#2603](https://github.com/yandex/ClickHouse/pull/2603)).
* Исправлен segfault, если в конфигурационном файле нет `macros`, но они используются ([#2570](https://github.com/yandex/ClickHouse/pull/2570)).
* Исправлено переключение на базу данных по-умолчанию при переподключении клиента ([#2583](https://github.com/yandex/ClickHouse/pull/2583)).
* Исправлена ошибка в случае отключенной настройки `use_index_for_in_with_subqueries`.
### Исправления безопасности:
* При соединениях с MySQL удалена возможность отправки файлов (`LOAD DATA LOCAL INFILE`).
## ClickHouse release 1.1.54388, 2018-06-28
### Новые возможности:
* Добавлена поддержка запроса `ALTER TABLE t DELETE WHERE` для реплицированных таблиц и таблица `system.mutations`.
* Добавлена поддержка запроса `ALTER TABLE t [REPLACE|ATTACH] PARTITION` для *MergeTree-таблиц.
* Добавлена поддержка запроса `TRUNCATE TABLE` ([Winter Zhang](https://github.com/yandex/ClickHouse/pull/2260))
@ -17,11 +90,11 @@
* Добавлена настройка `date_time_input_format`. Если переключить эту настройку в значение `'best_effort'`, значения DateTime будут читаться в широком диапазоне форматов.
* Добавлена утилита `clickhouse-obfuscator` для обфускации данных. Пример использования: публикация данных, используемых в тестах производительности.
## Экспериментальные возможности:
### Экспериментальные возможности:
* Добавлена возможность вычислять аргументы функции `and` только там, где они нужны ([Анастасия Царькова](https://github.com/yandex/ClickHouse/pull/2272))
* Добавлена возможность JIT-компиляции в нативный код некоторых выражений ([pyos](https://github.com/yandex/ClickHouse/pull/2277)).
## Исправление ошибок:
### Исправление ошибок:
* Исправлено появление дублей в запросе с `DISTINCT` и `ORDER BY`.
* Запросы с `ARRAY JOIN` и `arrayFilter` раньше возвращали некорректный результат.
* Исправлена ошибка при чтении столбца-массива из Nested-структуры ([#2066](https://github.com/yandex/ClickHouse/issues/2066)).
@ -42,7 +115,7 @@
* Исправлена SSRF в табличной функции remote().
* Исправлен выход из `clickhouse-client` в multiline-режиме ([#2510](https://github.com/yandex/ClickHouse/issues/2510)).
## Улучшения:
### Улучшения:
* Фоновые задачи в реплицированных таблицах теперь выполняются не в отдельных потоках, а в пуле потоков ([Silviu Caragea](https://github.com/yandex/ClickHouse/pull/1722))
* Улучшена производительность разжатия LZ4.
* Ускорен анализ запроса с большим числом JOIN-ов и подзапросов.
@ -54,7 +127,7 @@
* При расчёте количества доступных ядер CPU теперь учитываются ограничения cgroups ([Atri Sharma](https://github.com/yandex/ClickHouse/pull/2325)).
* Добавлен chown директорий конфигов в конфигурационном файле systemd ([Михаил Ширяев](https://github.com/yandex/ClickHouse/pull/2421)).
## Изменения сборки:
### Изменения сборки:
* Добавлена возможность сборки компилятором gcc8.
* Добавлена возможность сборки llvm из submodule.
* Используемая версия библиотеки librdkafka обновлена до v0.11.4.
@ -64,33 +137,34 @@
* Добавлена возможность использования библиотеки libtinfo вместо libtermcap ([Георгий Кондратьев](https://github.com/yandex/ClickHouse/pull/2519)).
* Исправлен конфликт заголовочных файлов в Fedora Rawhide ([#2520](https://github.com/yandex/ClickHouse/issues/2520)).
## Обратно несовместимые изменения:
### Обратно несовместимые изменения:
* Убран escaping в форматах `Vertical` и `Pretty*`, удалён формат `VerticalRaw`.
* Если в распределённых запросах одновременно участвуют серверы версии 1.1.54388 или новее и более старые, то при использовании выражения `cast(x, 'Type')`, записанного без указания `AS`, если слово `cast` указано не в верхнем регистре, возникает ошибка вида `Not found column cast(0, 'UInt8') in block`. Решение: обновить сервер на всём кластере.
# ClickHouse release 1.1.54385, 2018-06-01
## Исправление ошибок:
## ClickHouse release 1.1.54385, 2018-06-01
### Исправление ошибок:
* Исправлена ошибка, которая в некоторых случаях приводила к блокировке операций с ZooKeeper.
# ClickHouse release 1.1.54383, 2018-05-22
## Исправление ошибок:
## ClickHouse release 1.1.54383, 2018-05-22
### Исправление ошибок:
* Исправлена деградация скорости выполнения очереди репликации при большом количестве реплик
# ClickHouse release 1.1.54381, 2018-05-14
## ClickHouse release 1.1.54381, 2018-05-14
## Исправление ошибок:
### Исправление ошибок:
* Исправлена ошибка, приводящая к "утеканию" метаданных в ZooKeeper при потере соединения с сервером ZooKeeper.
# ClickHouse release 1.1.54380, 2018-04-21
## ClickHouse release 1.1.54380, 2018-04-21
## Новые возможности:
### Новые возможности:
* Добавлена табличная функция `file(path, format, structure)`. Пример, читающий байты из `/dev/urandom`: `ln -s /dev/urandom /var/lib/clickhouse/user_files/random` `clickhouse-client -q "SELECT * FROM file('random', 'RowBinary', 'd UInt8') LIMIT 10"`.
## Улучшения:
### Улучшения:
* Добавлена возможность оборачивать подзапросы скобками `()` для повышения читаемости запросов. Например: `(SELECT 1) UNION ALL (SELECT 1)`.
* Простые запросы `SELECT` из таблицы `system.processes` не учитываются в ограничении `max_concurrent_queries`.
## Исправление ошибок:
### Исправление ошибок:
* Исправлена неправильная работа оператора `IN` в `MATERIALIZED VIEW`.
* Исправлена неправильная работа индекса по ключу партиционирования в выражениях типа `partition_key_column IN (...)`.
* Исправлена невозможность выполнить `OPTIMIZE` запрос на лидирующей реплике после выполнения `RENAME` таблицы.
@ -98,13 +172,13 @@
* Исправлены зависания запросов `KILL QUERY`.
* Исправлена ошибка в клиентской библиотеке ZooKeeper, которая при использовании непустого префикса `chroot` в конфигурации приводила к потере watch'ей, остановке очереди distributed DDL запросов и замедлению репликации.
## Обратно несовместимые изменения:
### Обратно несовместимые изменения:
* Убрана поддержка выражений типа `(a, b) IN (SELECT (a, b))` (можно использовать эквивалентные выражение `(a, b) IN (SELECT a, b)`). Раньше такие запросы могли приводить к недетерминированной фильтрации в `WHERE`.
# ClickHouse release 1.1.54378, 2018-04-16
## ClickHouse release 1.1.54378, 2018-04-16
## Новые возможности:
### Новые возможности:
* Возможность изменения уровня логгирования без перезагрузки сервера.
* Добавлен запрос `SHOW CREATE DATABASE`.
@ -118,7 +192,7 @@
* Возможность указания нескольких `topics` через запятую для движка `Kafka` (Tobias Adamson)
* При остановке запроса по причине `KILL QUERY` или `replace_running_query`, клиент получает исключение `Query was cancelled` вместо неполного результата.
## Улучшения:
### Улучшения:
* Запросы вида `ALTER TABLE ... DROP/DETACH PARTITION` выполняются впереди очереди репликации.
* Возможность использовать `SELECT ... FINAL` и `OPTIMIZE ... FINAL` даже в случае, если данные в таблице представлены одним куском.
@ -129,7 +203,7 @@
* Более надёжное восстановление после сбоев при асинхронной вставке в `Distributed` таблицы.
* Возвращаемый тип функции `countEqual` изменён с `UInt32` на `UInt64` (谢磊)
## Исправление ошибок:
### Исправление ошибок:
* Исправлена ошибка c `IN` где левая часть выражения `Nullable`.
* Исправлен неправильный результат при использовании кортежей с `IN` в случае, если часть компоненнтов кортежа есть в индексе таблицы.
@ -145,31 +219,31 @@
* Исправлена работа `SummingMergeTree` в случае суммирования вложенных структур данных с составным ключом.
* Исправлена возможность возникновения race condition при выборе лидера таблиц `ReplicatedMergeTree`.
## Изменения сборки:
### Изменения сборки:
* Поддержка `ninja` вместо `make` при сборке. `ninja` используется по-умолчанию при сборке релизов.
* Переименованы пакеты `clickhouse-server-base` в `clickhouse-common-static`; `clickhouse-server-common` в `clickhouse-server`; `clickhouse-common-dbg` в `clickhouse-common-static-dbg`. Для установки используйте `clickhouse-server clickhouse-client`. Для совместимости, пакеты со старыми именами продолжают загружаться в репозиторий.
## Обратно несовместимые изменения:
### Обратно несовместимые изменения:
* Удалена специальная интерпретация выражения IN, если слева указан массив. Ранее выражение вида `arr IN (set)` воспринималось как "хотя бы один элемент `arr` принадлежит множеству `set`". Для получения такого же поведения в новой версии, напишите `arrayExists(x -> x IN (set), arr)`.
* Отключено ошибочное использование опции сокета `SO_REUSEPORT` (которая по ошибке включена по-умолчанию в библиотеке Poco). Стоит обратить внимание, что на Linux системах теперь не имеет смысла указывать одновременно адреса `::` и `0.0.0.0` для listen - следует использовать лишь адрес `::`, который (с настройками ядра по-умолчанию) позволяет слушать соединения как по IPv4 так и по IPv6. Также вы можете вернуть поведение старых версий, указав в конфиге `<listen_reuse_port>1</listen_reuse_port>`.
# ClickHouse release 1.1.54370, 2018-03-16
## ClickHouse release 1.1.54370, 2018-03-16
## Новые возможности:
### Новые возможности:
* Добавлена системная таблица `system.macros` и автоматическое обновление макросов при изменении конфигурационного файла.
* Добавлен запрос `SYSTEM RELOAD CONFIG`.
* Добавлена агрегатная функция `maxIntersections(left_col, right_col)`, возвращающая максимальное количество одновременно пересекающихся интервалов `[left; right]`. Функция `maxIntersectionsPosition(left, right)` возвращает начало такого "максимального" интервала. ([Michael Furmur](https://github.com/yandex/ClickHouse/pull/2012)).
## Улучшения:
### Улучшения:
* При вставке данных в `Replicated`-таблицу делается меньше обращений к `ZooKeeper` (также из лога `ZooKeeper` исчезло большинство user-level ошибок).
* Добавлена возможность создавать алиасы для множеств. Пример: `WITH (1, 2, 3) AS set SELECT number IN set FROM system.numbers LIMIT 10`.
## Исправление ошибок:
### Исправление ошибок:
* Исправлена ошибка `Illegal PREWHERE` при чтении из Merge-таблицы над `Distributed`-таблицами.
* Добавлены исправления, позволяющие запускать clickhouse-server в IPv4-only docker-контейнерах.
@ -184,9 +258,9 @@
* Устранено ненужное Error-level логирование `Not found column ... in block`.
# Релиз ClickHouse 1.1.54362, 2018-03-11
## Релиз ClickHouse 1.1.54362, 2018-03-11
## Новые возможности:
### Новые возможности:
* Агрегация без `GROUP BY` по пустому множеству (как например, `SELECT count(*) FROM table WHERE 0`) теперь возвращает результат из одной строки с нулевыми значениями агрегатных функций, в соответствии со стандартом SQL. Вы можете вернуть старое поведение (возвращать пустой результат), выставив настройку `empty_result_for_aggregation_by_empty_set` в значение 1.
* Добавлено приведение типов при `UNION ALL`. Допустимо использование столбцов с разными алиасами в соответствующих позициях `SELECT` в `UNION ALL`, что соответствует стандарту SQL.
@ -224,7 +298,7 @@
* Добавлена настройка `odbc_default_field_size`, позволяющая расширить максимальный размер значения, загружаемого из ODBC источника (по-умолчанию - 1024).
* В таблицу `system.processes` и в `SHOW PROCESSLIST` добавлены столбцы `is_cancelled` и `peak_memory_usage`.
## Улучшения:
### Улучшения:
* Ограничения на результат и квоты на результат теперь не применяются к промежуточным данным для запросов `INSERT SELECT` и для подзапросов в `SELECT`.
* Уменьшено количество ложных срабатываний при проверке состояния `Replicated` таблиц при запуске сервера, приводивших к необходимости выставления флага `force_restore_data`.
@ -240,7 +314,7 @@
* Значения типа `Enum` можно использовать в функциях `min`, `max`, `sum` и некоторых других - в этих случаях используются соответствующие числовые значения. Эта возможность присутствовала ранее, но была потеряна в релизе 1.1.54337.
* Добавлено ограничение `max_expanded_ast_elements` действующее на размер AST после рекурсивного раскрытия алиасов.
## Исправление ошибок:
### Исправление ошибок:
* Исправлены случаи ошибочного удаления ненужных столбцов из подзапросов, а также отсутствие удаления ненужных столбцов из подзапросов, содержащих `UNION ALL`.
* Исправлена ошибка в слияниях для таблиц типа `ReplacingMergeTree`.
@ -268,19 +342,19 @@
* Запрещено использование запросов с `UNION ALL` в `MATERIALIZED VIEW`.
* Исправлена ошибка, которая может возникать при инициализации системной таблицы `part_log` при старте сервера (по-умолчанию `part_log` выключен).
## Обратно несовместимые изменения:
### Обратно несовместимые изменения:
* Удалена настройка `distributed_ddl_allow_replicated_alter`. Соответствующее поведение включено по-умолчанию.
* Удалена настройка `strict_insert_defaults`. Если вы использовали эту функциональность, напишите на `clickhouse-feedback@yandex-team.com`.
* Удалён движок таблиц `UnsortedMergeTree`.
# Релиз ClickHouse 1.1.54343, 2018-02-05
## Релиз ClickHouse 1.1.54343, 2018-02-05
* Добавлена возможность использовать макросы при задании имени кластера в распределенных DLL запросах и создании Distributed-таблиц: `CREATE TABLE distr ON CLUSTER '{cluster}' (...) ENGINE = Distributed('{cluster}', 'db', 'table')`.
* Теперь при вычислении запросов вида `SELECT ... FROM table WHERE expr IN (subquery)` используется индекс таблицы `table`.
* Улучшена обработка дубликатов при вставке в Replicated-таблицы, теперь они не приводят к излишнему замедлению выполнения очереди репликации.
# Релиз ClickHouse 1.1.54342, 2018-01-22
## Релиз ClickHouse 1.1.54342, 2018-01-22
Релиз содержит исправление к предыдущему релизу 1.1.54337:
* Исправлена регрессия в версии 1.1.54337: если пользователь по-умолчанию имеет readonly доступ, то сервер отказывался стартовать с сообщением `Cannot create database in readonly mode`.
@ -291,9 +365,9 @@
* Таблицы типа Buffer теперь работают при наличии MATERIALIZED столбцов в таблице назначения (by zhang2014).
* Исправлена одна из ошибок в реализации NULL.
# Релиз ClickHouse 1.1.54337, 2018-01-18
## Релиз ClickHouse 1.1.54337, 2018-01-18
## Новые возможности:
### Новые возможности:
* Добавлена поддержка хранения многомерных массивов и кортежей (тип данных `Tuple`) в таблицах.
* Поддержка табличных функций для запросов `DESCRIBE` и `INSERT`. Поддержка подзапроса в запросе `DESCRIBE`. Примеры: `DESC TABLE remote('host', default.hits)`; `DESC TABLE (SELECT 1)`; `INSERT INTO TABLE FUNCTION remote('host', default.hits)`. Возможность писать `INSERT INTO TABLE` вместо `INSERT INTO`.
@ -322,9 +396,9 @@
* Добавлена поддержка `ALTER` для таблиц типа `Null` (Anastasiya Tsarkova).
* Функция `reinterpretAsString` расширена на все типы данных, значения которых хранятся в памяти непрерывно.
* Для программы `clickhouse-local` добавлена опция `--silent` для подавления вывода информации о выполнении запроса в stderr.
* Добавлена поддержка чтения `Date` в текстовом виде в формате, где месяц и день месяца могут быть указаны одной цифрой вместо двух (Amos Bird).
* Добавлена поддержка чтения `Date` в текстовом виде в формате, где месяц и день месяца могут быть указаны одной цифрой вместо двух (Amos Bird).
## Увеличение производительности:
### Увеличение производительности:
* Увеличена производительность агрегатных функций `min`, `max`, `any`, `anyLast`, `anyHeavy`, `argMin`, `argMax` от строковых аргументов.
* Увеличена производительность функций `isInfinite`, `isFinite`, `isNaN`, `roundToExp2`.
@ -333,7 +407,7 @@
* Уменьшено потребление памяти при `JOIN`, если левая и правая часть содержали столбцы с одинаковым именем, не входящие в `USING`.
* Увеличена производительность агрегатных функций `varSamp`, `varPop`, `stddevSamp`, `stddevPop`, `covarSamp`, `covarPop`, `corr` за счёт уменьшения стойкости к вычислительной погрешности. Старые версии функций добавлены под именами `varSampStable`, `varPopStable`, `stddevSampStable`, `stddevPopStable`, `covarSampStable`, `covarPopStable`, `corrStable`.
## Исправления ошибок:
### Исправления ошибок:
* Исправлена работа дедупликации блоков после `DROP` или `DETATH PARTITION`. Раньше удаление партиции и вставка тех же самых данных заново не работала, так как вставленные заново блоки считались дубликатами.
* Исправлена ошибка, в связи с которой может неправильно обрабатываться `WHERE` для запросов на создание `MATERIALIZED VIEW` с указанием `POPULATE`.
@ -343,7 +417,7 @@
* Добавлена недостающая поддержка типа данных `UUID` для `DISTINCT`, `JOIN`, в агрегатных функциях `uniq` и во внешних словарях (Иванов Евгений). Поддержка `UUID` всё ещё остаётся не полной.
* Исправлено поведение `SummingMergeTree` для строк, в которых все значения после суммирования равны нулю.
* Многочисленные доработки для движка таблиц `Kafka` (Marek Vavruša).
* Исправлена некорректная работа движка таблиц `Join` (Amos Bird).
* Исправлена некорректная работа движка таблиц `Join` (Amos Bird).
* Исправлена работа аллокатора под FreeBSD и OS X.
* Функция `extractAll` теперь может доставать пустые вхождения.
* Исправлена ошибка, не позволяющая подключить при сборке `libressl` вместо `openssl`.
@ -367,12 +441,12 @@
* Исправлена работа `DISTINCT` при условии, что все столбцы константные.
* Исправлено форматирование запроса в случае наличия функции `tupleElement` со сложным константным выражением в качестве номера элемента.
* Исправлена работа `Dictionary` таблиц для словарей типа `range_hashed`.
* Исправлена ошибка, приводящая к появлению лишних строк при `FULL` и `RIGHT JOIN` (Amos Bird).
* Исправлена ошибка, приводящая к появлению лишних строк при `FULL` и `RIGHT JOIN` (Amos Bird).
* Исправлено падение сервера в случае создания и удаления временных файлов в `config.d` директориях в момент перечитывания конфигурации.
* Исправлена работа запроса `SYSTEM DROP DNS CACHE`: ранее сброс DNS кэша не приводил к повторному резолвингу имён хостов кластера.
* Исправлено поведение `MATERIALIZED VIEW` после `DETACH TABLE` таблицы, на которую он смотрит (Marek Vavruša).
## Улучшения сборки:
### Улучшения сборки:
* Для сборки используется `pbuilder`. Сборка максимально независима от окружения на сборочной машине.
* Для разных версий систем выкладывается один и тот же пакет, который совместим с широким диапазоном Linux систем.
@ -386,27 +460,27 @@
* Удалено использование расширений GNU из кода и включена опция `-Wextra`. При сборке с помощью `clang` по-умолчанию используется `libc++` вместо `libstdc++`.
* Выделены библиотеки `clickhouse_parsers` и `clickhouse_common_io` для более быстрой сборки утилит.
## Обратно несовместимые изменения:
### Обратно несовместимые изменения:
* Формат засечек (marks) для таблиц типа `Log`, содержащих `Nullable` столбцы, изменён обратно-несовместимым образом. В случае наличия таких таблиц, вы можете преобразовать их в `TinyLog` до запуска новой версии сервера. Для этого в соответствующем таблице файле `.sql` в директории `metadata`, замените `ENGINE = Log` на `ENGINE = TinyLog`. Если в таблице нет `Nullable` столбцов или тип таблицы не `Log`, то ничего делать не нужно.
* Удалена настройка `experimental_allow_extended_storage_definition_syntax`. Соответствующая функциональность включена по-умолчанию.
* Функция `runningIncome` переименована в `runningDifferenceStartingWithFirstValue` во избежание путаницы.
* Удалена возможность написания `FROM ARRAY JOIN arr` без указания таблицы после FROM (Amos Bird).
* Удалена возможность написания `FROM ARRAY JOIN arr` без указания таблицы после FROM (Amos Bird).
* Удалён формат `BlockTabSeparated`, использовавшийся лишь для демонстрационных целей.
* Изменён формат состояния агрегатных функций `varSamp`, `varPop`, `stddevSamp`, `stddevPop`, `covarSamp`, `covarPop`, `corr`. Если вы использовали эти состояния для хранения в таблицах (тип данных `AggregateFunction` от этих функций или материализованные представления, хранящие эти состояния), напишите на clickhouse-feedback@yandex-team.com.
* В предыдущих версиях существовала недокументированная возможность: в типе данных AggregateFunction можно было не указывать параметры для агрегатной функции, которая зависит от параметров. Пример: `AggregateFunction(quantiles, UInt64)` вместо `AggregateFunction(quantiles(0.5, 0.9), UInt64)`. Эта возможность потеряна. Не смотря на то, что возможность не документирована, мы собираемся вернуть её в ближайших релизах.
* Значения типа данных Enum не могут быть переданы в агрегатные функции min/max. Возможность будет возвращена обратно в следующем релизе.
## На что обратить внимание при обновлении:
### На что обратить внимание при обновлении:
* При обновлении кластера, на время, когда на одних репликах работает новая версия сервера, а на других - старая, репликация будет приостановлена и в логе появятся сообщения вида `unknown parameter 'shard'`. Репликация продолжится после обновления всех реплик кластера.
* Если на серверах кластера работают разные версии ClickHouse, то возможен неправильный результат распределённых запросов, использующих функции `varSamp`, `varPop`, `stddevSamp`, `stddevPop`, `covarSamp`, `covarPop`, `corr`. Необходимо обновить все серверы кластера.
# Релиз ClickHouse 1.1.54327, 2017-12-21
## Релиз ClickHouse 1.1.54327, 2017-12-21
Релиз содержит исправление к предыдущему релизу 1.1.54318:
* Исправлена проблема с возможным race condition при репликации, которая может приводить к потере данных. Проблеме подвержены версии 1.1.54310 и 1.1.54318. Если вы их используете и у вас есть Replicated таблицы, то обновление обязательно. Понять, что эта проблема существует, можно по сообщениям в логе Warning вида `Part ... from own log doesn't exist.` Даже если таких сообщений нет, проблема всё-равно актуальна.
# Релиз ClickHouse 1.1.54318, 2017-11-30
## Релиз ClickHouse 1.1.54318, 2017-11-30
Релиз содержит изменения к предыдущему релизу 1.1.54310 с исправлением следующих багов:
* Исправлено некорректное удаление строк при слияниях в движке SummingMergeTree
@ -415,9 +489,9 @@
* Исправлена проблема, приводящая к остановке выполнения очереди репликации
* Исправлено ротирование и архивация логов сервера
# Релиз ClickHouse 1.1.54310, 2017-11-01
## Релиз ClickHouse 1.1.54310, 2017-11-01
## Новые возможности:
### Новые возможности:
* Произвольный ключ партиционирования для таблиц семейства MergeTree.
* Движок таблиц [Kafka](https://clickhouse.yandex/docs/en/single/index.html#document-table_engines/kafka).
* Возможность загружать модели [CatBoost](https://catboost.yandex/) и применять их к данным, хранящимся в ClickHouse.
@ -433,12 +507,12 @@
* Поддержка входного формата Capn Proto.
* Возможность задавать уровень сжатия при использовании алгоритма zstd.
## Обратно несовместимые изменения:
### Обратно несовместимые изменения:
* Запрещено создание временных таблиц с движком, отличным от Memory.
* Запрещено явное создание таблиц с движком View и MaterializedView.
* При создании таблицы теперь проверяется, что ключ сэмплирования входит в первичный ключ.
## Исправления ошибок:
### Исправления ошибок:
* Исправлено зависание при синхронной вставке в Distributed таблицу.
* Исправлена неатомарность при добавлении/удалении кусков в реплицированных таблицах.
* Данные, вставляемые в материализованное представление, теперь не подвергаются излишней дедупликации.
@ -448,14 +522,14 @@
* Исправлено зависание при недостатке места на диске в разделе с логами.
* Исправлено переполнение в функции toRelativeWeekNum для первой недели Unix-эпохи.
## Улучшения сборки:
### Улучшения сборки:
* Несколько сторонних библиотек (в частности, Poco) обновлены и переведены на git submodules.
# Релиз ClickHouse 1.1.54304, 2017-10-19
## Новые возможности:
## Релиз ClickHouse 1.1.54304, 2017-10-19
### Новые возможности:
* Добавлена поддержка TLS в нативном протоколе (включается заданием `tcp_ssl_port` в `config.xml`)
## Исправления ошибок:
### Исправления ошибок:
* `ALTER` для реплицированных таблиц теперь пытается начать выполнение как можно быстрее
* Исправлены падения при чтении данных с настройкой `preferred_block_size_bytes=0`
* Исправлено падение `clickhouse-client` при нажатии `Page Down`
@ -468,16 +542,16 @@
* Корректное обновление пользователей при невалидном `users.xml`
* Корректная обработка случаев, когда executable-словарь возвращает ненулевой код ответа
# Релиз ClickHouse 1.1.54292, 2017-09-20
## Релиз ClickHouse 1.1.54292, 2017-09-20
## Новые возможности:
### Новые возможности:
* Добавлена функция `pointInPolygon` для работы с координатами на плоскости.
* Добавлена агрегатная функция `sumMap`, обеспечивающая суммирование массивов аналогично `SummingMergeTree`.
* Добавлена функция `trunc`. Увеличена производительность функций округления `round`, `floor`, `ceil`, `roundToExp2`. Исправлена логика работы функций округления. Изменена логика работы функции `roundToExp2` для дробных и отрицательных чисел.
* Ослаблена зависимость исполняемого файла ClickHouse от версии libc. Один и тот же исполняемый файл ClickHouse может запускаться и работать на широком множестве Linux систем. Замечание: зависимость всё ещё присутствует при использовании скомпилированных запросов (настройка `compile = 1`, по-умолчанию не используется).
* Уменьшено время динамической компиляции запросов.
## Исправления ошибок:
### Исправления ошибок:
* Исправлена ошибка, которая могла приводить к сообщениям `part ... intersects previous part` и нарушению консистентности реплик.
* Исправлена ошибка, приводящая к блокировке при завершении работы сервера, если в это время ZooKeeper недоступен.
* Удалено избыточное логгирование при восстановлении реплик.
@ -485,9 +559,9 @@
* Исправлена ошибка в функции concat, возникающая в случае, если первый столбец блока имеет тип Array.
* Исправлено отображение прогресса в таблице system.merges.
# Релиз ClickHouse 1.1.54289, 2017-09-13
## Релиз ClickHouse 1.1.54289, 2017-09-13
## Новые возможности:
### Новые возможности:
* Запросы `SYSTEM` для административных действий с сервером: `SYSTEM RELOAD DICTIONARY`, `SYSTEM RELOAD DICTIONARIES`, `SYSTEM DROP DNS CACHE`, `SYSTEM SHUTDOWN`, `SYSTEM KILL`.
* Добавлены функции для работы с массивами: `concat`, `arraySlice`, `arrayPushBack`, `arrayPushFront`, `arrayPopBack`, `arrayPopFront`.
* Добавлены параметры `root` и `identity` для конфигурации ZooKeeper. Это позволяет изолировать разных пользователей одного ZooKeeper кластера.
@ -502,7 +576,7 @@
* Возможность задать `umask` в конфигурационном файле.
* Увеличена производительность запросов с `DISTINCT`.
## Исправления ошибок:
### Исправления ошибок:
* Более оптимальная процедура удаления старых нод в ZooKeeper. Ранее в случае очень частых вставок, старые ноды могли не успевать удаляться, что приводило, в том числе, к очень долгому завершению сервера.
* Исправлена рандомизация при выборе хостов для соединения с ZooKeeper.
* Исправлено отключение отстающей реплики при распределённых запросах, если реплика является localhost.
@ -515,28 +589,28 @@
* Исправлено появление zombie процессов при работе со словарём с источником `executable`.
* Исправлен segfault при запросе HEAD.
## Улучшения процесса разработки и сборки ClickHouse:
### Улучшения процесса разработки и сборки ClickHouse:
* Возможность сборки с помощью `pbuilder`.
* Возможность сборки с использованием `libc++` вместо `libstdc++` под Linux.
* Добавлены инструкции для использования статических анализаторов кода `Coverity`, `clang-tidy`, `cppcheck`.
## На что обратить внимание при обновлении:
### На что обратить внимание при обновлении:
* Увеличено значение по-умолчанию для настройки MergeTree `max_bytes_to_merge_at_max_space_in_pool` (максимальный суммарный размер кусков в байтах для мержа) со 100 GiB до 150 GiB. Это может привести к запуску больших мержей после обновления сервера, что может вызвать повышенную нагрузку на дисковую подсистему. Если же на серверах, где это происходит, количество свободного места менее чем в два раза больше суммарного объёма выполняющихся мержей, то в связи с этим перестанут выполняться какие-либо другие мержи, включая мержи мелких кусков. Это приведёт к тому, что INSERT-ы будут отклоняться с сообщением "Merges are processing significantly slower than inserts". Для наблюдения, используйте запрос `SELECT * FROM system.merges`. Вы также можете смотреть на метрику `DiskSpaceReservedForMerge` в таблице `system.metrics` или в Graphite. Для исправления этой ситуации можно ничего не делать, так как она нормализуется сама после завершения больших мержей. Если же вас это не устраивает, вы можете вернуть настройку `max_bytes_to_merge_at_max_space_in_pool` в старое значение, прописав в config.xml в секции `<merge_tree>` `<max_bytes_to_merge_at_max_space_in_pool>107374182400</max_bytes_to_merge_at_max_space_in_pool>` и перезапустить сервер.
# Релиз ClickHouse 1.1.54284, 2017-08-29
## Релиз ClickHouse 1.1.54284, 2017-08-29
* Релиз содержит изменения к предыдущему релизу 1.1.54282, которые исправляют утечку записей о кусках в ZooKeeper
# Релиз ClickHouse 1.1.54282, 2017-08-23
## Релиз ClickHouse 1.1.54282, 2017-08-23
Релиз содержит исправления к предыдущему релизу 1.1.54276:
* Исправлена ошибка `DB::Exception: Assertion violation: !_path.empty()` при вставке в Distributed таблицу.
* Исправлен парсинг при вставке в формате RowBinary, если входные данные начинаются с ';'.
* Исправлена ошибка при рантайм-компиляции некоторых агрегатных функций (например, `groupArray()`).
# Релиз ClickHouse 1.1.54276, 2017-08-16
## Релиз ClickHouse 1.1.54276, 2017-08-16
## Новые возможности:
### Новые возможности:
* Добавлена опциональная секция WITH запроса SELECT. Пример запроса: `WITH 1+1 AS a SELECT a, a*a`
* Добавлена возможность синхронной вставки в Distributed таблицу: выдается Ok только после того как все данные записались на все шарды. Активируется настройкой insert_distributed_sync=1
* Добавлен тип данных UUID для работы с 16-байтовыми идентификаторами
@ -546,17 +620,17 @@
* Добавлена поддержка неконстантных аргументов и отрицательных смещений в функции `substring(str, pos, len)`
* Добавлен параметр max_size для агрегатной функции `groupArray(max_size)(column)`, и оптимизирована её производительность
## Основные изменения:
### Основные изменения:
* Улучшение безопасности: все файлы сервера создаются с правами 0640 (можно поменять, через параметр <umask> в конфиге).
* Улучшены сообщения об ошибках в случае синтаксически неверных запросов
* Значительно уменьшен расход оперативной памяти и улучшена производительность слияний больших MergeTree-кусков данных
* Значительно увеличена производительность слияний данных для движка ReplacingMergeTree
* Улучшена производительность асинхронных вставок из Distributed таблицы за счет объединения нескольких исходных вставок. Функциональность включается настройкой distributed_directory_monitor_batch_inserts=1.
## Обратно несовместимые изменения:
### Обратно несовместимые изменения:
* Изменился бинарный формат агрегатных состояний функции `groupArray(array_column)` для массивов
## Полный список изменений:
### Полный список изменений:
* Добавлена настройка `output_format_json_quote_denormals`, включающая вывод nan и inf значений в формате JSON
* Более оптимальное выделение потоков при чтении из Distributed таблиц
* Разрешено задавать настройки в режиме readonly, если их значение не изменяется
@ -574,7 +648,7 @@
* Возможность подключения к MySQL через сокет на файловой системе
* В таблицу system.parts добавлен столбец с информацией о размере marks в байтах
## Исправления багов:
### Исправления багов:
* Исправлена некорректная работа Distributed таблиц, использующих Merge таблицы, при SELECT с условием на поле _table
* Исправлен редкий race condition в ReplicatedMergeTree при проверке кусков данных
* Исправлено возможное зависание процедуры leader election при старте сервера
@ -597,15 +671,15 @@
* Исправлена ошибка "Cannot mremap" при использовании множеств в секциях IN, JOIN, содержащих более 2 млрд. элементов
* Исправлен failover для словарей с источником MySQL
## Улучшения процесса разработки и сборки ClickHouse:
### Улучшения процесса разработки и сборки ClickHouse:
* Добавлена возмозможность сборки в Arcadia
* Добавлена возможность сборки с помощью gcc 7
* Ускорена параллельная сборка с помощью ccache+distcc
# Релиз ClickHouse 1.1.54245, 2017-07-04
## Релиз ClickHouse 1.1.54245, 2017-07-04
## Новые возможности:
### Новые возможности:
* Распределённые DDL (например, `CREATE TABLE ON CLUSTER`)
* Реплицируемый запрос `ALTER TABLE CLEAR COLUMN IN PARTITION`
* Движок таблиц Dictionary (доступ к данным словаря в виде таблицы)
@ -616,14 +690,14 @@
* Сессии в HTTP интерфейсе
* Запрос OPTIMIZE для Replicated таблицы теперь можно выполнять не только на лидере
## Обратно несовместимые изменения:
### Обратно несовместимые изменения:
* Убрана команда SET GLOBAL
## Мелкие изменения:
### Мелкие изменения:
* Теперь после получения сигнала в лог печатается полный стектрейс
* Ослаблена проверка на количество повреждённых/лишних кусков при старте (было слишком много ложных срабатываний)
## Исправления багов:
### Исправления багов:
* Исправлено залипание плохого соединения при вставке в Distributed таблицу
* GLOBAL IN теперь работает при запросе из таблицы Merge, смотрящей в Distributed
* Теперь правильно определяется количество ядер на виртуалках Google Compute Engine

View File

@ -43,7 +43,8 @@ include (cmake/arch.cmake)
if (CMAKE_GENERATOR STREQUAL "Ninja")
# Turn on colored output. https://github.com/ninja-build/ninja/wiki/FAQ
set (COMPILER_FLAGS "${COMPILER_FLAGS} -fdiagnostics-color=always")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fdiagnostics-color=always")
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fdiagnostics-color=always")
endif ()
if (NOT MSVC)
@ -274,6 +275,9 @@ include (cmake/find_rdkafka.cmake)
include (cmake/find_capnp.cmake)
include (cmake/find_llvm.cmake)
include (cmake/find_cpuid.cmake)
if (ENABLE_TESTS)
include (cmake/find_gtest.cmake)
endif ()
include (cmake/find_contrib_lib.cmake)
find_contrib_lib(cityhash)

View File

@ -1,39 +0,0 @@
## How to increase maxfiles on macOS
To increase maxfiles on macOS, create the following file:
(Note: you'll need to use sudo)
/Library/LaunchDaemons/limit.maxfiles.plist:
```
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>limit.maxfiles</string>
<key>ProgramArguments</key>
<array>
<string>launchctl</string>
<string>limit</string>
<string>maxfiles</string>
<string>524288</string>
<string>524288</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>ServiceIPC</key>
<false/>
</dict>
</plist>
```
Execute the following command:
```
sudo chown root:wheel /Library/LaunchDaemons/limit.maxfiles.plist
```
Reboot.
To check if it's working, you can use `ulimit -n` command.

View File

@ -1,6 +1,13 @@
# ClickHouse
ClickHouse is an open-source column-oriented database management system that allows generating analytical data reports in real time.
Learn more about ClickHouse at [https://clickhouse.yandex/](https://clickhouse.yandex/)
[![Build Status](https://travis-ci.org/yandex/ClickHouse.svg?branch=master)](https://travis-ci.org/yandex/ClickHouse)
## Useful links
* [Official website](https://clickhouse.yandex/) has quick high-level overview of ClickHouse on main page.
* [Tutorial](https://clickhouse.yandex/tutorial.html) shows how to set up and query small ClickHouse cluster.
* [Documentation](https://clickhouse.yandex/docs/en/) provides more in-depth information.
* [Contacts](https://clickhouse.yandex/#contacts) can help to get your questions answered if there are any.

View File

@ -1,4 +1,14 @@
option (USE_INTERNAL_CPUID_LIBRARY "Set to FALSE to use system cpuid library instead of bundled" ${NOT_UNBUNDLED})
# Freebsd: /usr/local/include/libcpuid/libcpuid_types.h:61:29: error: conflicting declaration 'typedef long long int int64_t'
# TODO: test new libcpuid - maybe already fixed
if (NOT ARCH_ARM)
if (ARCH_FREEBSD)
set (DEFAULT_USE_INTERNAL_CPUID_LIBRARY 1)
else ()
set (DEFAULT_USE_INTERNAL_CPUID_LIBRARY ${NOT_UNBUNDLED})
endif ()
option (USE_INTERNAL_CPUID_LIBRARY "Set to FALSE to use system cpuid library instead of bundled" ${DEFAULT_USE_INTERNAL_CPUID_LIBRARY})
endif ()
#if (USE_INTERNAL_CPUID_LIBRARY AND NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/libcpuid/include/cpuid/libcpuid.h")
# message (WARNING "submodule contrib/libcpuid is missing. to fix try run: \n git submodule update --init --recursive")

View File

@ -24,6 +24,15 @@ if (ENABLE_EMBEDDED_COMPILER)
endif ()
endif ()
if (LLVM_FOUND)
find_library (LLD_LIBRARY_TEST lldCore PATHS ${LLVM_LIBRARY_DIRS})
find_path (LLD_INCLUDE_DIR_TEST NAMES lld/Core/AbsoluteAtom.h PATHS ${LLVM_INCLUDE_DIRS})
if (NOT LLD_LIBRARY_TEST OR NOT LLD_INCLUDE_DIR_TEST)
set (LLVM_FOUND 0)
message(WARNING "liblld (${LLD_LIBRARY_TEST}, ${LLD_INCLUDE_DIR_TEST}) not found in ${LLVM_INCLUDE_DIRS} ${LLVM_LIBRARY_DIRS}. Disabling internal compiler.")
endif ()
endif ()
if (LLVM_FOUND)
# Remove dynamically-linked zlib and libedit from LLVM's dependencies:
set_target_properties(LLVMSupport PROPERTIES INTERFACE_LINK_LIBRARIES "-lpthread;LLVMDemangle;${ZLIB_LIBRARIES}")

View File

@ -75,7 +75,7 @@ if (ENABLE_TCMALLOC AND USE_INTERNAL_GPERFTOOLS_LIBRARY)
add_subdirectory (libtcmalloc)
endif ()
if (NOT ARCH_ARM)
if (USE_INTERNAL_CPUID_LIBRARY)
add_subdirectory (libcpuid)
endif ()
@ -149,5 +149,9 @@ if (USE_INTERNAL_POCO_LIBRARY)
endif ()
if (USE_INTERNAL_LLVM_LIBRARY)
# ld: unknown option: --color-diagnostics
if (APPLE AND COMPILER_GCC)
set (LINKER_SUPPORTS_COLOR_DIAGNOSTICS 0 CACHE INTERNAL "")
endif ()
add_subdirectory (llvm/llvm)
endif ()

View File

@ -144,6 +144,7 @@ target_link_libraries (clickhouse_common_io
${EXECINFO_LIBRARY}
${ELF_LIBRARY}
${Boost_SYSTEM_LIBRARY}
apple_rt
${CMAKE_DL_LIBS}
)
@ -244,8 +245,6 @@ add_subdirectory (programs)
add_subdirectory (tests)
if (ENABLE_TESTS)
include (${ClickHouse_SOURCE_DIR}/cmake/find_gtest.cmake)
if (USE_INTERNAL_GTEST_LIBRARY)
# Google Test from sources
add_subdirectory(${ClickHouse_SOURCE_DIR}/contrib/googletest/googletest ${CMAKE_CURRENT_BINARY_DIR}/googletest)

View File

@ -1,24 +1,25 @@
# This strings autochanged from release_lib.sh:
set(VERSION_DESCRIBE v1.1.54388-testing)
set(VERSION_REVISION 54388)
set(VERSION_GITHASH 2447755700f40af317cb80ba8800b94d6350d148)
set(VERSION_REVISION 54397 CACHE STRING "")
set(VERSION_MAJOR 18 CACHE STRING "")
set(VERSION_MINOR 2 CACHE STRING "")
set(VERSION_PATCH 0 CACHE STRING "")
set(VERSION_GITHASH 6ad677d7d6961a0c9088ccd9eff55779cfdaa654 CACHE STRING "")
set(VERSION_DESCRIBE v18.2.0-testing CACHE STRING "")
set(VERSION_STRING 18.2.0 CACHE STRING "")
# end of autochange
set (VERSION_MAJOR 1)
set (VERSION_MINOR 1)
set (VERSION_PATCH ${VERSION_REVISION})
set (VERSION_EXTRA "")
set (VERSION_TWEAK "")
set(VERSION_EXTRA "" CACHE STRING "")
set(VERSION_TWEAK "" CACHE STRING "")
set (VERSION_STRING "${VERSION_MAJOR}.${VERSION_MINOR}.${VERSION_PATCH}")
if (VERSION_TWEAK)
set(VERSION_STRING "${VERSION_STRING}.${VERSION_TWEAK}")
string(CONCAT VERSION_STRING ${VERSION_STRING} "." ${VERSION_TWEAK})
endif ()
if (VERSION_EXTRA)
set(VERSION_STRING "${VERSION_STRING}${VERSION_EXTRA}")
string(CONCAT VERSION_STRING ${VERSION_STRING} "." ${VERSION_EXTRA})
endif ()
set (VERSION_FULL "${PROJECT_NAME} ${VERSION_STRING}")
set (VERSION_NAME "${PROJECT_NAME}")
set (VERSION_FULL "${VERSION_NAME} ${VERSION_STRING}")
if (APPLE)
# dirty hack: ld: malformed 64-bit a.b.c.d.e version number: 1.1.54160

View File

@ -17,7 +17,8 @@ set(TMP_HEADERS_DIR "${CMAKE_CURRENT_BINARY_DIR}/headers")
# Make and install empty dir for debian package if compiler disabled
add_custom_target(make-headers-directory ALL COMMAND ${CMAKE_COMMAND} -E make_directory ${TMP_HEADERS_DIR})
install(DIRECTORY ${TMP_HEADERS_DIR} DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/clickhouse COMPONENT clickhouse)
if (USE_EMBEDDED_COMPILER)
# TODO: fix on macos copy_headers.sh: sed --posix
if (USE_EMBEDDED_COMPILER AND NOT APPLE)
add_custom_target(copy-headers ALL env CLANG=${CMAKE_CURRENT_BINARY_DIR}/../clickhouse-clang BUILD_PATH=${ClickHouse_BINARY_DIR} DESTDIR=${ClickHouse_SOURCE_DIR} ${ClickHouse_SOURCE_DIR}/copy_headers.sh ${ClickHouse_SOURCE_DIR} ${TMP_HEADERS_DIR} DEPENDS clickhouse-clang WORKING_DIRECTORY ${ClickHouse_SOURCE_DIR} SOURCES ${ClickHouse_SOURCE_DIR}/copy_headers.sh)
if (USE_INTERNAL_LLVM_LIBRARY)

View File

@ -1,6 +1,8 @@
add_library (clickhouse-client-lib Client.cpp)
target_link_libraries (clickhouse-client-lib clickhouse_functions clickhouse_aggregate_functions ${LINE_EDITING_LIBS} ${Boost_PROGRAM_OPTIONS_LIBRARY})
target_include_directories (clickhouse-client-lib SYSTEM PRIVATE ${READLINE_INCLUDE_DIR})
if (READLINE_INCLUDE_DIR)
target_include_directories (clickhouse-client-lib SYSTEM PRIVATE ${READLINE_INCLUDE_DIR})
endif ()
if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-client clickhouse-client.cpp)

View File

@ -28,6 +28,7 @@
#include <Common/StringUtils/StringUtils.h>
#include <Common/typeid_cast.h>
#include <Common/Config/ConfigProcessor.h>
#include <Common/config_version.h>
#include <Core/Types.h>
#include <Core/QueryProcessingStage.h>
#include <IO/ReadBufferFromFileDescriptor.h>
@ -374,7 +375,6 @@ private:
echo_queries = config().getBool("echo", false);
}
connection_parameters = ConnectionParameters(config());
connect();
/// Initialize DateLUT here to avoid counting time spent here as query execution time.
@ -492,6 +492,8 @@ private:
void connect()
{
connection_parameters = ConnectionParameters(config());
if (is_interactive)
std::cout << "Connecting to "
<< (!connection_parameters.default_database.empty() ? "database " + connection_parameters.default_database + " at " : "")
@ -1315,10 +1317,7 @@ private:
void showClientVersion()
{
std::cout << "ClickHouse client version " << DBMS_VERSION_MAJOR
<< "." << DBMS_VERSION_MINOR
<< "." << ClickHouseRevision::get()
<< "." << std::endl;
std::cout << DBMS_NAME << " client version " << VERSION_STRING << "." << std::endl;
}
public:

View File

@ -15,17 +15,18 @@
</openSSL>
<!--
It's a custom prompt settings for the clickhouse-client
Possible macros:
Possible macros:
{host}
{port}
{user}
{database}
{database}
{display_name}
Terminal colors: https://misc.flogisoft.com/bash/tip_colors_and_formatting
See also: https://wiki.hackzine.org/development/misc/readline-color-prompt.html
-->
<prompt_by_server_display_name>
<default>{display_name} :) </default>
<test>{display_name} \e[1;32m:)\e[0m </test> <!-- if it matched to the substring "test" in the server display name - -->
<production>{display_name} \e[1;31m:)\e[0m </production> <!-- if it matched to the substring "production" in the server display name -->
<test>{display_name} \x01\e[1;32m\x02:)\x01\e[0m\x02 </test> <!-- if it matched to the substring "test" in the server display name - -->
<production>{display_name} \x01\e[1;31m\x02:)\x01\e[0m\x02 </production> <!-- if it matched to the substring "production" in the server display name -->
</prompt_by_server_display_name>
</config>

View File

@ -17,6 +17,7 @@
#include <Common/Config/ConfigProcessor.h>
#include <Common/escapeForFileName.h>
#include <Common/ClickHouseRevision.h>
#include <Common/config_version.h>
#include <IO/ReadBufferFromString.h>
#include <IO/WriteBufferFromString.h>
#include <IO/WriteBufferFromFileDescriptor.h>
@ -355,10 +356,7 @@ void LocalServer::setupUsers()
static void showClientVersion()
{
std::cout << "ClickHouse client version " << DBMS_VERSION_MAJOR
<< "." << DBMS_VERSION_MINOR
<< "." << ClickHouseRevision::get()
<< "." << std::endl;
std::cout << DBMS_NAME << " client version " << VERSION_STRING << "." << std::endl;
}
std::string LocalServer::getHelpHeader() const

View File

@ -58,13 +58,13 @@ It is designed to retain the following properties of data:
Most of the properties above are viable for performance testing:
- reading data, filtering, aggregation and sorting will work at almost the same speed
as on original data due to saved cardinalities, magnitudes, compression ratios, etc.
as on original data due to saved cardinalities, magnitudes, compression ratios, etc.
It works in deterministic fashion: you define a seed value and transform is totally determined by input data and by seed.
Some transforms are one to one and could be reversed, so you need to have large enough seed and keep it in secret.
It use some cryptographic primitives to transform data, but from the cryptographic point of view,
it doesn't do anything properly and you should never consider the result as secure, unless you have other reasons for it.
it doesn't do anything properly and you should never consider the result as secure, unless you have other reasons for it.
It may retain some data you don't want to publish.
@ -74,7 +74,7 @@ So, the user will be able to count exact ratio of mobile traffic.
Another example, suppose you have some private data in your table, like user email and you don't want to publish any single email address.
If your table is large enough and contain multiple different emails and there is no email that have very high frequency than all others,
it will perfectly anonymize all data. But if you have small amount of different values in a column, it can possibly reproduce some of them.
it will perfectly anonymize all data. But if you have small amount of different values in a column, it can possibly reproduce some of them.
And you should take care and look at exact algorithm, how this tool works, and probably fine tune some of it command line parameters.
This tool works fine only with reasonable amount of data (at least 1000s of rows).
@ -87,6 +87,7 @@ namespace DB
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
extern const int NOT_IMPLEMENTED;
extern const int CANNOT_SEEK_THROUGH_FILE;
}
@ -682,7 +683,7 @@ public:
}
if (table.end() == it)
throw Exception("Logical error in markov model");
throw Exception("Logical error in markov model", ErrorCodes::LOGICAL_ERROR);
size_t offset_from_begin_of_string = pos - data;
size_t determinator_sliding_window_size = params.determinator_sliding_window_size;
@ -703,7 +704,8 @@ public:
/// If string is greater than desired_size, increase probability of end.
double end_probability_multiplier = 0;
Int64 num_bytes_after_desired_size = (pos - data) - desired_size;
if (num_bytes_after_desired_size)
if (num_bytes_after_desired_size > 0)
end_probability_multiplier = std::pow(1.25, num_bytes_after_desired_size);
CodePoint code = it->second.sample(determinator, end_probability_multiplier);
@ -711,6 +713,14 @@ public:
if (code == END)
break;
if (num_bytes_after_desired_size > 0)
{
/// Heuristic: break at ASCII non-alnum code point.
/// This allows to be close to desired_size but not break natural looking words.
if (code < 128 && !isAlphaNumericASCII(code))
break;
}
if (!writeCodePoint(code, pos, end))
break;
@ -884,7 +894,7 @@ public:
if (auto type = typeid_cast<const DataTypeNullable *>(&data_type))
return std::make_unique<NullableModel>(get(*type->getNestedType(), seed, markov_model_params));
throw Exception("Unsupported data type");
throw Exception("Unsupported data type", ErrorCodes::NOT_IMPLEMENTED);
}
};

View File

@ -3,6 +3,7 @@
#include <memory>
#include <sys/resource.h>
#include <errno.h>
#include <Poco/Version.h>
#include <Poco/DirectoryIterator.h>
#include <Poco/Net/HTTPServer.h>
#include <Poco/Net/NetException.h>
@ -341,7 +342,7 @@ int Server::main(const std::vector<std::string> & /*args*/)
Poco::ThreadPool server_pool(3, config().getUInt("max_connections", 1024));
Poco::Net::HTTPServerParams::Ptr http_params = new Poco::Net::HTTPServerParams;
http_params->setTimeout(settings.receive_timeout);
http_params->setTimeout(settings.http_receive_timeout);
http_params->setKeepAliveTimeout(keep_alive_timeout);
std::vector<std::unique_ptr<Poco::Net::TCPServer>> servers;

View File

@ -1,36 +1,27 @@
#include "TCPHandler.h"
#include <iomanip>
#include <Poco/Net/NetException.h>
#include <Common/ClickHouseRevision.h>
#include <Common/Stopwatch.h>
#include <IO/Progress.h>
#include <IO/CompressedReadBuffer.h>
#include <IO/CompressedWriteBuffer.h>
#include <IO/ReadBufferFromPocoSocket.h>
#include <IO/WriteBufferFromPocoSocket.h>
#include <IO/CompressionSettings.h>
#include <IO/copyData.h>
#include <DataStreams/AsynchronousBlockInputStream.h>
#include <DataStreams/NativeBlockInputStream.h>
#include <DataStreams/NativeBlockOutputStream.h>
#include <Interpreters/executeQuery.h>
#include <Interpreters/Quota.h>
#include <Interpreters/TablesStatus.h>
#include <Storages/StorageMemory.h>
#include <Storages/StorageReplicatedMergeTree.h>
#include <Common/ClickHouseRevision.h>
#include <Common/Stopwatch.h>
#include <Common/ExternalTable.h>
#include "TCPHandler.h"
#include <Common/NetException.h>
#include <Common/config_version.h>
#include <ext/scope_guard.h>

View File

@ -55,7 +55,8 @@
<ip>127.0.0.1</ip>
Each element of list has one of the following forms:
<ip> IP-address or network mask. Examples: 213.180.204.3 or 10.0.0.1/8 or 2a02:6b8::3 or 2a02:6b8::3/64.
<ip> IP-address or network mask. Examples: 213.180.204.3 or 10.0.0.1/8 or 10.0.0.1/255.255.255.0
2a02:6b8::3 or 2a02:6b8::3/64 or 2a02:6b8::3/ffff:ffff:ffff:ffff::.
<host> Hostname. Example: server01.yandex.ru.
To check access, DNS query is performed, and all received addresses compared to peer address.
<host_regexp> Regular expression for host names. Example, ^server\d\d-\d\d-\d\.yandex\.ru$

View File

@ -18,6 +18,9 @@ public:
DataTypes transformArguments(const DataTypes & arguments) const override
{
if (0 == arguments.size())
throw Exception("-Array aggregate functions require at least one argument", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
DataTypes nested_arguments;
for (const auto & type : arguments)
{

View File

@ -38,9 +38,9 @@ void registerAggregateFunctionsBitwise(AggregateFunctionFactory & factory)
factory.registerFunction("groupBitXor", createAggregateFunctionBitwise<AggregateFunctionGroupBitXorData>);
/// Aliases for compatibility with MySQL.
factory.registerFunction("BIT_OR", createAggregateFunctionBitwise<AggregateFunctionGroupBitOrData>, AggregateFunctionFactory::CaseInsensitive);
factory.registerFunction("BIT_AND", createAggregateFunctionBitwise<AggregateFunctionGroupBitAndData>, AggregateFunctionFactory::CaseInsensitive);
factory.registerFunction("BIT_XOR", createAggregateFunctionBitwise<AggregateFunctionGroupBitXorData>, AggregateFunctionFactory::CaseInsensitive);
factory.registerAlias("BIT_OR", "groupBitOr", AggregateFunctionFactory::CaseInsensitive);
factory.registerAlias("BIT_AND", "groupBitAnd", AggregateFunctionFactory::CaseInsensitive);
factory.registerAlias("BIT_XOR", "groupBitXor", AggregateFunctionFactory::CaseInsensitive);
}
}

View File

@ -15,6 +15,10 @@ namespace DB
*/
class AggregateFunctionCombinatorFactory final: public ext::singleton<AggregateFunctionCombinatorFactory>
{
private:
using Dict = std::unordered_map<std::string, AggregateFunctionCombinatorPtr>;
Dict dict;
public:
/// Not thread safe. You must register before using tryGet.
void registerCombinator(const AggregateFunctionCombinatorPtr & value);
@ -22,8 +26,10 @@ public:
/// Example: if the name is 'avgIf', it will return combinator -If.
AggregateFunctionCombinatorPtr tryFindSuffix(const std::string & name) const;
private:
std::unordered_map<std::string, AggregateFunctionCombinatorPtr> dict;
const Dict & getAllAggregateFunctionCombinators() const
{
return dict;
}
};
}

View File

@ -78,11 +78,12 @@ AggregateFunctionPtr AggregateFunctionFactory::get(
AggregateFunctionPtr AggregateFunctionFactory::getImpl(
const String & name,
const String & name_param,
const DataTypes & argument_types,
const Array & parameters,
int recursion_level) const
{
String name = getAliasToOrName(name_param);
/// Find by exact match.
auto it = aggregate_functions.find(name);
if (it != aggregate_functions.end())
@ -103,8 +104,8 @@ AggregateFunctionPtr AggregateFunctionFactory::getImpl(
if (AggregateFunctionCombinatorPtr combinator = AggregateFunctionCombinatorFactory::instance().tryFindSuffix(name))
{
if (combinator->getName() == "Null")
throw Exception("Aggregate function combinator 'Null' is only for internal usage", ErrorCodes::UNKNOWN_AGGREGATE_FUNCTION);
if (combinator->isForInternalUsageOnly())
throw Exception("Aggregate function combinator '" + combinator->getName() + "' is only for internal usage", ErrorCodes::UNKNOWN_AGGREGATE_FUNCTION);
String nested_name = name.substr(0, name.size() - combinator->getName().size());
DataTypes nested_types = combinator->transformArguments(argument_types);
@ -126,10 +127,11 @@ AggregateFunctionPtr AggregateFunctionFactory::tryGet(const String & name, const
bool AggregateFunctionFactory::isAggregateFunctionName(const String & name, int recursion_level) const
{
if (aggregate_functions.count(name))
if (aggregate_functions.count(name) || isAlias(name))
return true;
if (recursion_level == 0 && case_insensitive_aggregate_functions.count(Poco::toLower(name)))
String name_lowercase = Poco::toLower(name);
if (recursion_level == 0 && (case_insensitive_aggregate_functions.count(name_lowercase) || isAlias(name_lowercase)))
return true;
if (AggregateFunctionCombinatorPtr combinator = AggregateFunctionCombinatorFactory::instance().tryFindSuffix(name))

View File

@ -1,6 +1,7 @@
#pragma once
#include <AggregateFunctions/IAggregateFunction.h>
#include <Common/IFactoryWithAliases.h>
#include <ext/singleton.h>
@ -20,27 +21,18 @@ class IDataType;
using DataTypePtr = std::shared_ptr<const IDataType>;
using DataTypes = std::vector<DataTypePtr>;
/** Creator have arguments: name of aggregate function, types of arguments, values of parameters.
* Parameters are for "parametric" aggregate functions.
* For example, in quantileWeighted(0.9)(x, weight), 0.9 is "parameter" and x, weight are "arguments".
*/
using AggregateFunctionCreator = std::function<AggregateFunctionPtr(const String &, const DataTypes &, const Array &)>;
/** Creates an aggregate function by name.
*/
class AggregateFunctionFactory final : public ext::singleton<AggregateFunctionFactory>
class AggregateFunctionFactory final : public ext::singleton<AggregateFunctionFactory>, public IFactoryWithAliases<AggregateFunctionCreator>
{
friend class StorageSystemFunctions;
public:
/** Creator have arguments: name of aggregate function, types of arguments, values of parameters.
* Parameters are for "parametric" aggregate functions.
* For example, in quantileWeighted(0.9)(x, weight), 0.9 is "parameter" and x, weight are "arguments".
*/
using Creator = std::function<AggregateFunctionPtr(const String &, const DataTypes &, const Array &)>;
/// For compatibility with SQL, it's possible to specify that certain aggregate function name is case insensitive.
enum CaseSensitiveness
{
CaseSensitive,
CaseInsensitive
};
/// Register a function by its name.
/// No locking, you must register all functions before usage of get.
void registerFunction(
@ -77,6 +69,13 @@ private:
/// Case insensitive aggregate functions will be additionally added here with lowercased name.
AggregateFunctions case_insensitive_aggregate_functions;
const AggregateFunctions & getCreatorMap() const override { return aggregate_functions; }
const AggregateFunctions & getCaseInsensitiveCreatorMap() const override { return case_insensitive_aggregate_functions; }
String getFactoryName() const override { return "AggregateFunctionFactory"; }
};
}

View File

@ -0,0 +1,56 @@
#include <AggregateFunctions/AggregateFunctionHistogram.h>
#include <AggregateFunctions/AggregateFunctionFactory.h>
#include <AggregateFunctions/FactoryHelpers.h>
#include <AggregateFunctions/Helpers.h>
#include <Common/FieldVisitors.h>
namespace DB
{
namespace ErrorCodes
{
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
extern const int BAD_ARGUMENTS;
extern const int UNSUPPORTED_PARAMETER;
extern const int PARAMETER_OUT_OF_BOUND;
}
namespace
{
AggregateFunctionPtr createAggregateFunctionHistogram(const std::string & name, const DataTypes & arguments, const Array & params)
{
if (params.size() != 1)
throw Exception("Function " + name + " requires single parameter: bins count", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
if (params[0].getType() != Field::Types::UInt64)
throw Exception("Invalid type for bins count", ErrorCodes::UNSUPPORTED_PARAMETER);
UInt32 bins_count = applyVisitor(FieldVisitorConvertToNumber<UInt32>(), params[0]);
auto limit = AggregateFunctionHistogramData::bins_count_limit;
if (bins_count > limit)
throw Exception("Unsupported bins count. Should not be greater than " + std::to_string(limit), ErrorCodes::PARAMETER_OUT_OF_BOUND);
if (bins_count == 0)
throw Exception("Bin count should be positive", ErrorCodes::BAD_ARGUMENTS);
assertUnary(name, arguments);
AggregateFunctionPtr res(createWithNumericType<AggregateFunctionHistogram>(*arguments[0], bins_count));
if (!res)
throw Exception("Illegal type " + arguments[0]->getName() + " of argument for aggregate function " + name, ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
return res;
}
}
void registerAggregateFunctionHistogram(AggregateFunctionFactory & factory)
{
factory.registerFunction("histogram", createAggregateFunctionHistogram);
}
}

View File

@ -0,0 +1,376 @@
#pragma once
#include <Common/Arena.h>
#include <Common/NaNUtils.h>
#include <Columns/ColumnVector.h>
#include <Columns/ColumnTuple.h>
#include <Columns/ColumnArray.h>
#include <DataTypes/DataTypesNumber.h>
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypeTuple.h>
#include <IO/WriteBuffer.h>
#include <IO/ReadBuffer.h>
#include <IO/VarInt.h>
#include <AggregateFunctions/IAggregateFunction.h>
#include <math.h>
#include <queue>
#include <stddef.h>
namespace DB
{
namespace ErrorCodes
{
extern const int TOO_LARGE_ARRAY_SIZE;
extern const int INCORRECT_DATA;
}
/**
* distance compression algorigthm implementation
* http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf
*/
class AggregateFunctionHistogramData
{
public:
using Mean = Float64;
using Weight = Float64;
constexpr static size_t bins_count_limit = 250;
private:
struct WeightedValue
{
Mean mean;
Weight weight;
WeightedValue operator+ (const WeightedValue& other)
{
return {mean + other.weight * (other.mean - mean) / (other.weight + weight), other.weight + weight};
}
};
private:
// quantity of stored weighted-values
UInt32 size;
// calculated lower and upper bounds of seen points
Mean lower_bound;
Mean upper_bound;
// Weighted values representation of histogram.
WeightedValue points[0];
private:
void sort()
{
std::sort(points, points + size,
[](const WeightedValue & first, const WeightedValue & second)
{
return first.mean < second.mean;
});
}
template <typename T>
struct PriorityQueueStorage
{
size_t size = 0;
T * data_ptr;
PriorityQueueStorage(T * value)
: data_ptr(value)
{
}
void push_back(T val)
{
data_ptr[size] = std::move(val);
++size;
}
void pop_back() { --size; }
T * begin() { return data_ptr; }
T * end() const { return data_ptr + size; }
bool empty() const { return size == 0; }
T & front() { return *data_ptr; }
const T & front() const { return *data_ptr; }
using value_type = T;
using reference = T&;
using const_reference = const T&;
using size_type = size_t;
};
/**
* Repeatedly fuse most close values until max_bins bins left
*/
void compress(UInt32 max_bins)
{
sort();
auto new_size = size;
if (size <= max_bins)
return;
// Maintain doubly-linked list of "active" points
// and store neighbour pairs in priority queue by distance
UInt32 previous[size + 1];
UInt32 next[size + 1];
bool active[size + 1];
std::fill(active, active + size, true);
active[size] = false;
auto delete_node = [&](UInt32 i)
{
previous[next[i]] = previous[i];
next[previous[i]] = next[i];
active[i] = false;
};
for (size_t i = 0; i <= size; ++i)
{
previous[i] = i - 1;
next[i] = i + 1;
}
next[size] = 0;
previous[0] = size;
using QueueItem = std::pair<Mean, UInt32>;
QueueItem storage[2 * size - max_bins];
std::priority_queue<
QueueItem,
PriorityQueueStorage<QueueItem>,
std::greater<QueueItem>>
queue{std::greater<QueueItem>(),
PriorityQueueStorage<QueueItem>(storage)};
auto quality = [&](UInt32 i) { return points[next[i]].mean - points[i].mean; };
for (size_t i = 0; i + 1 < size; ++i)
queue.push({quality(i), i});
while (new_size > max_bins && !queue.empty())
{
auto min_item = queue.top();
queue.pop();
auto left = min_item.second;
auto right = next[left];
if (!active[left] || !active[right] || quality(left) > min_item.first)
continue;
points[left] = points[left] + points[right];
delete_node(right);
if (active[next[left]])
queue.push({quality(left), left});
if (active[previous[left]])
queue.push({quality(previous[left]), previous[left]});
--new_size;
}
size_t left = 0;
for (size_t right = 0; right < size; ++right)
{
if (active[right])
{
points[left] = points[right];
++left;
}
}
size = new_size;
}
/***
* Delete too close points from histogram.
* Assumes that points are sorted.
*/
void unique()
{
if (size == 0)
return;
size_t left = 0;
for (auto right = left + 1; right < size; ++right)
{
// Fuse points if their text representations differ only in last digit
auto min_diff = 10 * (points[left].mean + points[right].mean) * std::numeric_limits<Mean>::epsilon();
if (points[left].mean + min_diff >= points[right].mean)
{
points[left] = points[left] + points[right];
}
else
{
++left;
points[left] = points[right];
}
}
size = left + 1;
}
public:
AggregateFunctionHistogramData()
: size(0)
, lower_bound(std::numeric_limits<Mean>::max())
, upper_bound(std::numeric_limits<Mean>::lowest())
{
static_assert(offsetof(AggregateFunctionHistogramData, points) == sizeof(AggregateFunctionHistogramData), "points should be last member");
}
static size_t structSize(size_t max_bins)
{
return sizeof(AggregateFunctionHistogramData) + max_bins * 2 * sizeof(WeightedValue);
}
void insertResultInto(ColumnVector<Mean> & to_lower, ColumnVector<Mean> & to_upper, ColumnVector<Weight> & to_weights, UInt32 max_bins)
{
compress(max_bins);
unique();
for (size_t i = 0; i < size; ++i)
{
to_lower.insert((i == 0) ? lower_bound : (points[i].mean + points[i - 1].mean) / 2);
to_upper.insert((i + 1 == size) ? upper_bound : (points[i].mean + points[i + 1].mean) / 2);
// linear density approximation
Weight lower_weight = (i == 0) ? points[i].weight : ((points[i - 1].weight) + points[i].weight * 3) / 4;
Weight upper_weight = (i + 1 == size) ? points[i].weight : (points[i + 1].weight + points[i].weight * 3) / 4;
to_weights.insert((lower_weight + upper_weight) / 2);
}
}
void add(Mean value, Weight weight, UInt32 max_bins)
{
// nans break sort and compression
// infs don't fit in bins partition method
if (!isFinite(value))
throw Exception("Invalid value (inf or nan) for aggregation by 'histogram' function", ErrorCodes::INCORRECT_DATA);
points[size] = {value, weight};
++size;
lower_bound = std::min(lower_bound, value);
upper_bound = std::max(upper_bound, value);
if (size >= max_bins * 2)
compress(max_bins);
}
void merge(const AggregateFunctionHistogramData& other, UInt32 max_bins)
{
lower_bound = std::min(lower_bound, other.lower_bound);
upper_bound = std::max(lower_bound, other.upper_bound);
for (size_t i = 0; i < other.size; i++)
{
add(other.points[i].mean, other.points[i].weight, max_bins);
}
}
void write(WriteBuffer & buf) const
{
buf.write(reinterpret_cast<const char *>(&lower_bound), sizeof(lower_bound));
buf.write(reinterpret_cast<const char *>(&upper_bound), sizeof(upper_bound));
writeVarUInt(size, buf);
buf.write(reinterpret_cast<const char *>(points), size * sizeof(WeightedValue));
}
void read(ReadBuffer & buf, UInt32 max_bins)
{
buf.read(reinterpret_cast<char *>(&lower_bound), sizeof(lower_bound));
buf.read(reinterpret_cast<char *>(&upper_bound), sizeof(upper_bound));
readVarUInt(size, buf);
if (size > max_bins * 2)
throw Exception("Too many bins", ErrorCodes::TOO_LARGE_ARRAY_SIZE);
buf.read(reinterpret_cast<char *>(points), size * sizeof(WeightedValue));
}
};
template <typename T>
class AggregateFunctionHistogram final: public IAggregateFunctionDataHelper<AggregateFunctionHistogramData, AggregateFunctionHistogram<T>>
{
private:
using Data = AggregateFunctionHistogramData;
const UInt32 max_bins;
public:
AggregateFunctionHistogram(UInt32 max_bins)
: max_bins(max_bins)
{
}
size_t sizeOfData() const override
{
return Data::structSize(max_bins);
}
DataTypePtr getReturnType() const override
{
DataTypes types;
auto mean = std::make_shared<DataTypeNumber<Data::Mean>>();
auto weight = std::make_shared<DataTypeNumber<Data::Weight>>();
// lower bound
types.emplace_back(mean);
// upper bound
types.emplace_back(mean);
// weight
types.emplace_back(weight);
auto tuple = std::make_shared<DataTypeTuple>(types);
return std::make_shared<DataTypeArray>(tuple);
}
void add(AggregateDataPtr place, const IColumn ** columns, size_t row_num, Arena *) const override
{
auto val = static_cast<const ColumnVector<T> &>(*columns[0]).getData()[row_num];
this->data(place).add(static_cast<Data::Mean>(val), 1, max_bins);
}
void merge(AggregateDataPtr place, ConstAggregateDataPtr rhs, Arena *) const override
{
this->data(place).merge(this->data(rhs), max_bins);
}
void serialize(ConstAggregateDataPtr place, WriteBuffer & buf) const override
{
this->data(place).write(buf);
}
void deserialize(AggregateDataPtr place, ReadBuffer & buf, Arena *) const override
{
this->data(place).read(buf, max_bins);
}
void insertResultInto(ConstAggregateDataPtr place, IColumn & to) const override
{
auto& data = this->data(const_cast<AggregateDataPtr>(place));
auto & to_array = static_cast<ColumnArray &>(to);
ColumnArray::Offsets & offsets_to = to_array.getOffsets();
auto & to_tuple = static_cast<ColumnTuple &>(to_array.getData());
auto & to_lower = static_cast<ColumnVector<Data::Mean> &>(to_tuple.getColumn(0));
auto & to_upper = static_cast<ColumnVector<Data::Mean> &>(to_tuple.getColumn(1));
auto & to_weights = static_cast<ColumnVector<Data::Weight> &>(to_tuple.getColumn(2));
data.insertResultInto(to_lower, to_upper, to_weights, max_bins);
offsets_to.push_back(to_tuple.size());
}
const char * getHeaderFilePath() const override { return __FILE__; }
String getName() const override { return "histogram"; }
};
}

View File

@ -137,7 +137,8 @@ public:
/// const_cast because we will sort the array
auto & array = const_cast<typename MaxIntersectionsData<PointType>::Array &>(this->data(place).value);
std::sort(array.begin(), array.end(), [](const auto & a, const auto & b) { return a.first < b.first; });
/// Sort by position; for equal position, sort by weight to get deterministic result.
std::sort(array.begin(), array.end());
for (const auto & point_weight : array)
{

View File

@ -18,6 +18,8 @@ class AggregateFunctionCombinatorNull final : public IAggregateFunctionCombinato
public:
String getName() const override { return "Null"; }
bool isForInternalUsageOnly() const override { return true; }
DataTypes transformArguments(const DataTypes & arguments) const override
{
size_t size = arguments.size();

View File

@ -93,30 +93,14 @@ void registerAggregateFunctionsQuantile(AggregateFunctionFactory & factory)
createAggregateFunctionQuantile<QuantileTDigest, NameQuantilesTDigestWeighted, true, Float32, true>);
/// 'median' is an alias for 'quantile'
factory.registerFunction("median",
createAggregateFunctionQuantile<QuantileReservoirSampler, NameQuantile, false, Float64, false>);
factory.registerFunction("medianDeterministic",
createAggregateFunctionQuantile<QuantileReservoirSamplerDeterministic, NameQuantileDeterministic, true, Float64, false>);
factory.registerFunction("medianExact",
createAggregateFunctionQuantile<QuantileExact, NameQuantileExact, false, void, false>);
factory.registerFunction("medianExactWeighted",
createAggregateFunctionQuantile<QuantileExactWeighted, NameQuantileExactWeighted, true, void, false>);
factory.registerFunction("medianTiming",
createAggregateFunctionQuantile<QuantileTiming, NameQuantileTiming, false, Float32, false>);
factory.registerFunction("medianTimingWeighted",
createAggregateFunctionQuantile<QuantileTiming, NameQuantileTimingWeighted, true, Float32, false>);
factory.registerFunction("medianTDigest",
createAggregateFunctionQuantile<QuantileTDigest, NameQuantileTDigest, false, Float32, false>);
factory.registerFunction("medianTDigestWeighted",
createAggregateFunctionQuantile<QuantileTDigest, NameQuantileTDigestWeighted, true, Float32, false>);
factory.registerAlias("median", NameQuantile::name);
factory.registerAlias("medianDeterministic", NameQuantileDeterministic::name);
factory.registerAlias("medianExact", NameQuantileExact::name);
factory.registerAlias("medianExactWeighted", NameQuantileExactWeighted::name);
factory.registerAlias("medianTiming", NameQuantileTiming::name);
factory.registerAlias("medianTimingWeighted", NameQuantileTimingWeighted::name);
factory.registerAlias("medianTDigest", NameQuantileTDigest::name);
factory.registerAlias("medianTDigestWeighted", NameQuantileTDigestWeighted::name);
}
}

View File

@ -24,10 +24,10 @@ namespace ErrorCodes
namespace
{
/** `DataForVariadic` is a data structure that will be used for `uniq` aggregate function of multiple arguments.
* It differs, for example, in that it uses a trivial hash function, since `uniq` of many arguments first hashes them out itself.
*/
template <typename Data, typename DataForVariadic>
AggregateFunctionPtr createAggregateFunctionUniq(const std::string & name, const DataTypes & argument_types, const Array & params)
{
@ -37,6 +37,8 @@ AggregateFunctionPtr createAggregateFunctionUniq(const std::string & name, const
throw Exception("Incorrect number of arguments for aggregate function " + name,
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
bool use_exact_hash_function = !isAllArgumentsContiguousInMemory(argument_types);
if (argument_types.size() == 1)
{
const IDataType & argument_type = *argument_types[0];
@ -51,25 +53,25 @@ AggregateFunctionPtr createAggregateFunctionUniq(const std::string & name, const
return std::make_shared<AggregateFunctionUniq<DataTypeDateTime::FieldType, Data>>();
else if (typeid_cast<const DataTypeString *>(&argument_type) || typeid_cast<const DataTypeFixedString *>(&argument_type))
return std::make_shared<AggregateFunctionUniq<String, Data>>();
else if (typeid_cast<const DataTypeTuple *>(&argument_type))
return std::make_shared<AggregateFunctionUniqVariadic<DataForVariadic, true>>(argument_types);
else if (typeid_cast<const DataTypeUUID *>(&argument_type))
return std::make_shared<AggregateFunctionUniq<DataTypeUUID::FieldType, Data>>();
}
else
{
/// If there are several arguments, then no tuples allowed among them.
for (const auto & type : argument_types)
if (typeid_cast<const DataTypeTuple *>(type.get()))
throw Exception("Tuple argument of function " + name + " must be the only argument",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
else if (typeid_cast<const DataTypeTuple *>(&argument_type))
{
if (use_exact_hash_function)
return std::make_shared<AggregateFunctionUniqVariadic<DataForVariadic, true, true>>(argument_types);
else
return std::make_shared<AggregateFunctionUniqVariadic<DataForVariadic, false, true>>(argument_types);
}
}
/// "Variadic" method also works as a fallback generic case for single argument.
return std::make_shared<AggregateFunctionUniqVariadic<DataForVariadic, false>>(argument_types);
if (use_exact_hash_function)
return std::make_shared<AggregateFunctionUniqVariadic<DataForVariadic, true, false>>(argument_types);
else
return std::make_shared<AggregateFunctionUniqVariadic<DataForVariadic, false, false>>(argument_types);
}
template <template <typename> class Data, typename DataForVariadic>
template <bool is_exact, template <typename> class Data, typename DataForVariadic>
AggregateFunctionPtr createAggregateFunctionUniq(const std::string & name, const DataTypes & argument_types, const Array & params)
{
assertNoParameters(name, params);
@ -78,6 +80,10 @@ AggregateFunctionPtr createAggregateFunctionUniq(const std::string & name, const
throw Exception("Incorrect number of arguments for aggregate function " + name,
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
/// We use exact hash function if the user wants it;
/// or if the arguments are not contiguous in memory, because only exact hash function have support for this case.
bool use_exact_hash_function = is_exact || !isAllArgumentsContiguousInMemory(argument_types);
if (argument_types.size() == 1)
{
const IDataType & argument_type = *argument_types[0];
@ -92,22 +98,22 @@ AggregateFunctionPtr createAggregateFunctionUniq(const std::string & name, const
return std::make_shared<AggregateFunctionUniq<DataTypeDateTime::FieldType, Data<DataTypeDateTime::FieldType>>>();
else if (typeid_cast<const DataTypeString *>(&argument_type) || typeid_cast<const DataTypeFixedString *>(&argument_type))
return std::make_shared<AggregateFunctionUniq<String, Data<String>>>();
else if (typeid_cast<const DataTypeTuple *>(&argument_type))
return std::make_shared<AggregateFunctionUniqVariadic<DataForVariadic, true>>(argument_types);
else if (typeid_cast<const DataTypeUUID *>(&argument_type))
return std::make_shared<AggregateFunctionUniq<DataTypeUUID::FieldType, Data<DataTypeUUID::FieldType>>>();
}
else
{
/// If there are several arguments, then no tuples allowed among them.
for (const auto & type : argument_types)
if (typeid_cast<const DataTypeTuple *>(type.get()))
throw Exception("Tuple argument of function " + name + " must be the only argument",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
else if (typeid_cast<const DataTypeTuple *>(&argument_type))
{
if (use_exact_hash_function)
return std::make_shared<AggregateFunctionUniqVariadic<DataForVariadic, true, true>>(argument_types);
else
return std::make_shared<AggregateFunctionUniqVariadic<DataForVariadic, false, true>>(argument_types);
}
}
/// "Variadic" method also works as a fallback generic case for single argument.
return std::make_shared<AggregateFunctionUniqVariadic<DataForVariadic, false>>(argument_types);
if (use_exact_hash_function)
return std::make_shared<AggregateFunctionUniqVariadic<DataForVariadic, true, false>>(argument_types);
else
return std::make_shared<AggregateFunctionUniqVariadic<DataForVariadic, false, false>>(argument_types);
}
}
@ -118,13 +124,13 @@ void registerAggregateFunctionsUniq(AggregateFunctionFactory & factory)
createAggregateFunctionUniq<AggregateFunctionUniqUniquesHashSetData, AggregateFunctionUniqUniquesHashSetDataForVariadic>);
factory.registerFunction("uniqHLL12",
createAggregateFunctionUniq<AggregateFunctionUniqHLL12Data, AggregateFunctionUniqHLL12DataForVariadic>);
createAggregateFunctionUniq<false, AggregateFunctionUniqHLL12Data, AggregateFunctionUniqHLL12DataForVariadic>);
factory.registerFunction("uniqExact",
createAggregateFunctionUniq<AggregateFunctionUniqExactData, AggregateFunctionUniqExactData<String>>);
createAggregateFunctionUniq<true, AggregateFunctionUniqExactData, AggregateFunctionUniqExactData<String>>);
factory.registerFunction("uniqCombined",
createAggregateFunctionUniq<AggregateFunctionUniqCombinedData, AggregateFunctionUniqCombinedData<UInt64>>);
createAggregateFunctionUniq<false, AggregateFunctionUniqCombinedData, AggregateFunctionUniqCombinedData<UInt64>>);
}
}

View File

@ -337,12 +337,10 @@ public:
* You can pass multiple arguments as is; You can also pass one argument - a tuple.
* But (for the possibility of efficient implementation), you can not pass several arguments, among which there are tuples.
*/
template <typename Data, bool argument_is_tuple>
class AggregateFunctionUniqVariadic final : public IAggregateFunctionDataHelper<Data, AggregateFunctionUniqVariadic<Data, argument_is_tuple>>
template <typename Data, bool is_exact, bool argument_is_tuple>
class AggregateFunctionUniqVariadic final : public IAggregateFunctionDataHelper<Data, AggregateFunctionUniqVariadic<Data, is_exact, argument_is_tuple>>
{
private:
static constexpr bool is_exact = std::is_same_v<Data, AggregateFunctionUniqExactData<String>>;
size_t num_args = 0;
public:
@ -363,7 +361,7 @@ public:
void add(AggregateDataPtr place, const IColumn ** columns, size_t row_num, Arena *) const override
{
this->data(place).set.insert(UniqVariadicHash<is_exact, argument_is_tuple>::apply(num_args, columns, row_num));
this->data(place).set.insert(typename Data::Set::value_type(UniqVariadicHash<is_exact, argument_is_tuple>::apply(num_args, columns, row_num)));
}
void merge(AggregateDataPtr place, ConstAggregateDataPtr rhs, Arena *) const override

View File

@ -46,6 +46,8 @@ AggregateFunctionPtr createAggregateFunctionUniqUpTo(const std::string & name, c
throw Exception("Incorrect number of arguments for aggregate function " + name,
ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
bool use_exact_hash_function = !isAllArgumentsContiguousInMemory(argument_types);
if (argument_types.size() == 1)
{
const IDataType & argument_type = *argument_types[0];
@ -60,22 +62,22 @@ AggregateFunctionPtr createAggregateFunctionUniqUpTo(const std::string & name, c
return std::make_shared<AggregateFunctionUniqUpTo<DataTypeDateTime::FieldType>>(threshold);
else if (typeid_cast<const DataTypeString *>(&argument_type) || typeid_cast<const DataTypeFixedString*>(&argument_type))
return std::make_shared<AggregateFunctionUniqUpTo<String>>(threshold);
else if (typeid_cast<const DataTypeTuple *>(&argument_type))
return std::make_shared<AggregateFunctionUniqUpToVariadic<true>>(argument_types, threshold);
else if (typeid_cast<const DataTypeUUID *>(&argument_type))
return std::make_shared<AggregateFunctionUniqUpTo<DataTypeUUID::FieldType>>(threshold);
}
else
{
/// If there are several arguments, then no tuples allowed among them.
for (const auto & type : argument_types)
if (typeid_cast<const DataTypeTuple *>(type.get()))
throw Exception("Tuple argument of function " + name + " must be the only argument",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
else if (typeid_cast<const DataTypeTuple *>(&argument_type))
{
if (use_exact_hash_function)
return std::make_shared<AggregateFunctionUniqUpToVariadic<true, true>>(argument_types, threshold);
else
return std::make_shared<AggregateFunctionUniqUpToVariadic<false, true>>(argument_types, threshold);
}
}
/// "Variadic" method also works as a fallback generic case for single argument.
return std::make_shared<AggregateFunctionUniqUpToVariadic<false>>(argument_types, threshold);
if (use_exact_hash_function)
return std::make_shared<AggregateFunctionUniqUpToVariadic<true, false>>(argument_types, threshold);
else
return std::make_shared<AggregateFunctionUniqUpToVariadic<false, false>>(argument_types, threshold);
}
}

View File

@ -180,9 +180,9 @@ public:
* You can pass multiple arguments as is; You can also pass one argument - a tuple.
* But (for the possibility of effective implementation), you can not pass several arguments, among which there are tuples.
*/
template <bool argument_is_tuple>
template <bool is_exact, bool argument_is_tuple>
class AggregateFunctionUniqUpToVariadic final
: public IAggregateFunctionDataHelper<AggregateFunctionUniqUpToData<UInt64>, AggregateFunctionUniqUpToVariadic<argument_is_tuple>>
: public IAggregateFunctionDataHelper<AggregateFunctionUniqUpToData<UInt64>, AggregateFunctionUniqUpToVariadic<is_exact, argument_is_tuple>>
{
private:
size_t num_args = 0;
@ -212,7 +212,7 @@ public:
void add(AggregateDataPtr place, const IColumn ** columns, size_t row_num, Arena *) const override
{
this->data(place).insert(UniqVariadicHash<false, argument_is_tuple>::apply(num_args, columns, row_num), threshold);
this->data(place).insert(UInt64(UniqVariadicHash<is_exact, argument_is_tuple>::apply(num_args, columns, row_num)), threshold);
}
void merge(AggregateDataPtr place, ConstAggregateDataPtr rhs, Arena *) const override

View File

@ -56,12 +56,12 @@ void registerAggregateFunctionsStatisticsSimple(AggregateFunctionFactory & facto
factory.registerFunction("corr", createAggregateFunctionStatisticsBinary<AggregateFunctionCorrSimple>, AggregateFunctionFactory::CaseInsensitive);
/// Synonims for compatibility.
factory.registerFunction("VAR_SAMP", createAggregateFunctionStatisticsUnary<AggregateFunctionVarSampSimple>, AggregateFunctionFactory::CaseInsensitive);
factory.registerFunction("VAR_POP", createAggregateFunctionStatisticsUnary<AggregateFunctionVarPopSimple>, AggregateFunctionFactory::CaseInsensitive);
factory.registerFunction("STDDEV_SAMP", createAggregateFunctionStatisticsUnary<AggregateFunctionStddevSampSimple>, AggregateFunctionFactory::CaseInsensitive);
factory.registerFunction("STDDEV_POP", createAggregateFunctionStatisticsUnary<AggregateFunctionStddevPopSimple>, AggregateFunctionFactory::CaseInsensitive);
factory.registerFunction("COVAR_SAMP", createAggregateFunctionStatisticsBinary<AggregateFunctionCovarSampSimple>, AggregateFunctionFactory::CaseInsensitive);
factory.registerFunction("COVAR_POP", createAggregateFunctionStatisticsBinary<AggregateFunctionCovarPopSimple>, AggregateFunctionFactory::CaseInsensitive);
factory.registerAlias("VAR_SAMP", "varSamp", AggregateFunctionFactory::CaseInsensitive);
factory.registerAlias("VAR_POP", "varPop", AggregateFunctionFactory::CaseInsensitive);
factory.registerAlias("STDDEV_SAMP", "stddevSamp", AggregateFunctionFactory::CaseInsensitive);
factory.registerAlias("STDDEV_POP", "stddevPop", AggregateFunctionFactory::CaseInsensitive);
factory.registerAlias("COVAR_SAMP", "covarSamp", AggregateFunctionFactory::CaseInsensitive);
factory.registerAlias("COVAR_POP", "covarPop", AggregateFunctionFactory::CaseInsensitive);
}
}

View File

@ -32,6 +32,8 @@ class IAggregateFunctionCombinator
public:
virtual String getName() const = 0;
virtual bool isForInternalUsageOnly() const { return false; }
/** From the arguments for combined function (ex: UInt64, UInt8 for sumIf),
* get the arguments for nested function (ex: UInt64 for sum).
* If arguments are not suitable for combined function, throw an exception.

View File

@ -0,0 +1,32 @@
#include <AggregateFunctions/UniqVariadicHash.h>
#include <DataTypes/DataTypeTuple.h>
#include <Common/typeid_cast.h>
namespace DB
{
/// If some arguments are not contiguous, we cannot use simple hash function,
/// because it requires method IColumn::getDataAt to work.
/// Note that we treat single tuple argument in the same way as multiple arguments.
bool isAllArgumentsContiguousInMemory(const DataTypes & argument_types)
{
auto check_all_arguments_are_contiguous_in_memory = [](const DataTypes & types)
{
for (const auto & type : types)
if (!type->isValueUnambiguouslyRepresentedInContiguousMemoryRegion())
return false;
return true;
};
const DataTypeTuple * single_argument_as_tuple = nullptr;
if (argument_types.size() == 1)
single_argument_as_tuple = typeid_cast<const DataTypeTuple *>(argument_types[0].get());
if (single_argument_as_tuple)
return check_all_arguments_are_contiguous_in_memory(single_argument_as_tuple->getElements());
else
return check_all_arguments_are_contiguous_in_memory(argument_types);
}
}

View File

@ -27,6 +27,12 @@ template <bool exact, bool for_tuple>
struct UniqVariadicHash;
/// If some arguments are not contiguous, we cannot use simple hash function,
/// because it requires method IColumn::getDataAt to work.
/// Note that we treat single tuple argument in the same way as multiple arguments.
bool isAllArgumentsContiguousInMemory(const DataTypes & argument_types);
template <>
struct UniqVariadicHash<false, false>
{

View File

@ -74,8 +74,8 @@ template <typename Hash = UniquesHashSetDefaultHash>
class UniquesHashSet : private HashTableAllocatorWithStackMemory<(1ULL << UNIQUES_HASH_SET_INITIAL_SIZE_DEGREE) * sizeof(UInt32)>
{
private:
using Value_t = UInt64;
using HashValue_t = UInt32;
using Value = UInt64;
using HashValue = UInt32;
using Allocator = HashTableAllocatorWithStackMemory<(1ULL << UNIQUES_HASH_SET_INITIAL_SIZE_DEGREE) * sizeof(UInt32)>;
UInt32 m_size; /// Number of elements
@ -83,7 +83,7 @@ private:
UInt8 skip_degree; /// Skip elements not divisible by 2 ^ skip_degree
bool has_zero; /// The hash table contains an element with a hash value of 0.
HashValue_t * buf;
HashValue * buf;
#ifdef UNIQUES_HASH_SET_COUNT_COLLISIONS
/// For profiling.
@ -92,7 +92,7 @@ private:
void alloc(UInt8 new_size_degree)
{
buf = reinterpret_cast<HashValue_t *>(Allocator::alloc((1ULL << new_size_degree) * sizeof(buf[0])));
buf = reinterpret_cast<HashValue *>(Allocator::alloc((1ULL << new_size_degree) * sizeof(buf[0])));
size_degree = new_size_degree;
}
@ -108,15 +108,15 @@ private:
inline size_t buf_size() const { return 1ULL << size_degree; }
inline size_t max_fill() const { return 1ULL << (size_degree - 1); }
inline size_t mask() const { return buf_size() - 1; }
inline size_t place(HashValue_t x) const { return (x >> UNIQUES_HASH_BITS_FOR_SKIP) & mask(); }
inline size_t place(HashValue x) const { return (x >> UNIQUES_HASH_BITS_FOR_SKIP) & mask(); }
/// The value is divided by 2 ^ skip_degree
inline bool good(HashValue_t hash) const
inline bool good(HashValue hash) const
{
return hash == ((hash >> skip_degree) << skip_degree);
}
HashValue_t hash(Value_t key) const
HashValue hash(Value key) const
{
return Hash()(key);
}
@ -141,7 +141,7 @@ private:
{
if (unlikely(buf[i] && i != place(buf[i])))
{
HashValue_t x = buf[i];
HashValue x = buf[i];
buf[i] = 0;
reinsertImpl(x);
}
@ -157,7 +157,7 @@ private:
new_size_degree = size_degree + 1;
/// Expand the space.
buf = reinterpret_cast<HashValue_t *>(Allocator::realloc(buf, old_size * sizeof(buf[0]), (1ULL << new_size_degree) * sizeof(buf[0])));
buf = reinterpret_cast<HashValue *>(Allocator::realloc(buf, old_size * sizeof(buf[0]), (1ULL << new_size_degree) * sizeof(buf[0])));
size_degree = new_size_degree;
/** Now some items may need to be moved to a new location.
@ -174,7 +174,7 @@ private:
*/
for (size_t i = 0; i < old_size || buf[i]; ++i)
{
HashValue_t x = buf[i];
HashValue x = buf[i];
if (!x)
continue;
@ -204,7 +204,7 @@ private:
}
/// Insert a value.
void insertImpl(HashValue_t x)
void insertImpl(HashValue x)
{
if (x == 0)
{
@ -234,7 +234,7 @@ private:
/** Insert a value into the new buffer that was in the old buffer.
* Used when increasing the size of the buffer, as well as when reading from a file.
*/
void reinsertImpl(HashValue_t x)
void reinsertImpl(HashValue x)
{
size_t place_value = place(x);
while (buf[place_value])
@ -272,6 +272,8 @@ private:
public:
using value_type = Value;
UniquesHashSet() :
m_size(0),
skip_degree(0),
@ -312,9 +314,9 @@ public:
free();
}
void insert(Value_t x)
void insert(Value x)
{
HashValue_t hash_value = hash(x);
HashValue hash_value = hash(x);
if (!good(hash_value))
return;
@ -380,7 +382,7 @@ public:
if (has_zero)
{
HashValue_t x = 0;
HashValue x = 0;
DB::writeIntBinary(x, wb);
}
@ -409,7 +411,7 @@ public:
for (size_t i = 0; i < m_size; ++i)
{
HashValue_t x = 0;
HashValue x = 0;
DB::readIntBinary(x, rb);
if (x == 0)
has_zero = true;
@ -443,7 +445,7 @@ public:
for (size_t i = 0; i < rhs_size; ++i)
{
HashValue_t x = 0;
HashValue x = 0;
DB::readIntBinary(x, rb);
insertHash(x);
}
@ -459,7 +461,7 @@ public:
if (size > UNIQUES_HASH_MAX_SIZE)
throw Poco::Exception("Cannot read UniquesHashSet: too large size_degree.");
rb.ignore(sizeof(HashValue_t) * size);
rb.ignore(sizeof(HashValue) * size);
}
void writeText(DB::WriteBuffer & wb) const
@ -505,7 +507,7 @@ public:
for (size_t i = 0; i < m_size; ++i)
{
HashValue_t x = 0;
HashValue x = 0;
DB::assertChar(',', rb);
DB::readIntText(x, rb);
if (x == 0)
@ -515,7 +517,7 @@ public:
}
}
void insertHash(HashValue_t hash_value)
void insertHash(HashValue hash_value)
{
if (!good(hash_value))
return;

View File

@ -33,6 +33,7 @@ void registerAggregateFunctionCombinatorState(AggregateFunctionCombinatorFactory
void registerAggregateFunctionCombinatorMerge(AggregateFunctionCombinatorFactory &);
void registerAggregateFunctionCombinatorNull(AggregateFunctionCombinatorFactory &);
void registerAggregateFunctionHistogram(AggregateFunctionFactory & factory);
void registerAggregateFunctions()
{
@ -57,6 +58,7 @@ void registerAggregateFunctions()
registerAggregateFunctionTopK(factory);
registerAggregateFunctionsBitwise(factory);
registerAggregateFunctionsMaxIntersections(factory);
registerAggregateFunctionHistogram(factory);
}
{

View File

@ -19,6 +19,7 @@
#include <Common/CurrentMetrics.h>
#include <Common/DNSResolver.h>
#include <Common/StringUtils/StringUtils.h>
#include <Common/config_version.h>
#include <Interpreters/ClientInfo.h>
#include <Common/config.h>

View File

@ -87,3 +87,14 @@ const std::string & Collator::getLocale() const
{
return locale;
}
std::vector<std::string> Collator::getAvailableCollations()
{
std::vector<std::string> result;
#if USE_ICU
size_t available_locales_count = ucol_countAvailable();
for (size_t i = 0; i < available_locales_count; ++i)
result.push_back(ucol_getAvailable(i));
#endif
return result;
}

View File

@ -1,6 +1,7 @@
#pragma once
#include <string>
#include <vector>
#include <boost/noncopyable.hpp>
struct UCollator;
@ -15,6 +16,8 @@ public:
const std::string & getLocale() const;
static std::vector<std::string> getAvailableCollations();
private:
std::string locale;
UCollator * collator;

View File

@ -118,8 +118,8 @@ void BackgroundSchedulePool::TaskInfo::execute()
executing = false;
/// In case was scheduled while executing (including a scheduleAfter which expired) we schedule the task
/// on the queue. We don't call the function again here because this way all tasks
/// will have their chance to execute
/// on the queue. We don't call the function again here because this way all tasks
/// will have their chance to execute
if (scheduled)
pool.queue.enqueueNotification(new TaskNotification(shared_from_this()));
@ -128,7 +128,8 @@ void BackgroundSchedulePool::TaskInfo::execute()
zkutil::WatchCallback BackgroundSchedulePool::TaskInfo::getWatchCallback()
{
return [t=shared_from_this()](const ZooKeeperImpl::ZooKeeper::WatchResponse &) {
return [t = shared_from_this()](const ZooKeeperImpl::ZooKeeper::WatchResponse &)
{
t->schedule();
};
}

View File

@ -57,6 +57,8 @@ public:
DenominatorType
>;
using value_type = Key;
private:
using Small = SmallSet<Key, small_set_size_max>;
using Medium = HashContainer;

View File

@ -377,6 +377,7 @@ namespace ErrorCodes
extern const int CANNOT_STAT = 400;
extern const int FEATURE_IS_NOT_ENABLED_AT_BUILD_TIME = 401;
extern const int CANNOT_IOSETUP = 402;
extern const int INVALID_JOIN_ON_EXPRESSION = 403;
extern const int KEEPER_EXCEPTION = 999;

View File

@ -287,12 +287,13 @@ private:
/// Size of counter's rank in bits.
static constexpr UInt8 rank_width = details::RankWidth<HashValueType>::get();
private:
using Value_t = UInt64;
using Value = UInt64;
using RankStore = DB::CompactArray<HashValueType, rank_width, bucket_count>;
public:
void insert(Value_t value)
using value_type = Value;
void insert(Value value)
{
HashValueType hash = getHash(value);
@ -413,7 +414,7 @@ private:
return zeros_plus_one;
}
inline HashValueType getHash(Value_t key) const
inline HashValueType getHash(Value key) const
{
return Hash::operator()(key);
}

View File

@ -51,6 +51,8 @@ private:
}
public:
using value_type = Key;
~HyperLogLogWithSmallSetOptimization()
{
if (isLarge())

View File

@ -0,0 +1,125 @@
#pragma once
#include <Common/Exception.h>
#include <Core/Types.h>
#include <Poco/String.h>
#include <unordered_map>
namespace DB
{
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
}
/** If stored objects may have several names (aliases)
* this interface may be helpful
* template parameter is available as Creator
*/
template <typename CreatorFunc>
class IFactoryWithAliases
{
protected:
using Creator = CreatorFunc;
String getAliasToOrName(const String & name) const
{
if (aliases.count(name))
return aliases.at(name);
else if (String name_lowercase = Poco::toLower(name); case_insensitive_aliases.count(name_lowercase))
return case_insensitive_aliases.at(name_lowercase);
else
return name;
}
public:
/// For compatibility with SQL, it's possible to specify that certain function name is case insensitive.
enum CaseSensitiveness
{
CaseSensitive,
CaseInsensitive
};
/** Register additional name for creator
* real_name have to be already registered.
*/
void registerAlias(const String & alias_name, const String & real_name, CaseSensitiveness case_sensitiveness = CaseSensitive)
{
const auto & creator_map = getCreatorMap();
const auto & case_insensitive_creator_map = getCaseInsensitiveCreatorMap();
const String factory_name = getFactoryName();
String real_dict_name;
if (creator_map.count(real_name))
real_dict_name = real_name;
else if (auto real_name_lowercase = Poco::toLower(real_name); case_insensitive_creator_map.count(real_name_lowercase))
real_dict_name = real_name_lowercase;
else
throw Exception(factory_name + ": can't create alias '" + alias_name + "', the real name '" + real_name + "' is not registered",
ErrorCodes::LOGICAL_ERROR);
String alias_name_lowercase = Poco::toLower(alias_name);
if (creator_map.count(alias_name) || case_insensitive_creator_map.count(alias_name_lowercase))
throw Exception(
factory_name + ": the alias name '" + alias_name + "' is already registered as real name", ErrorCodes::LOGICAL_ERROR);
if (case_sensitiveness == CaseInsensitive)
if (!case_insensitive_aliases.emplace(alias_name_lowercase, real_dict_name).second)
throw Exception(
factory_name + ": case insensitive alias name '" + alias_name + "' is not unique", ErrorCodes::LOGICAL_ERROR);
if (!aliases.emplace(alias_name, real_dict_name).second)
throw Exception(factory_name + ": alias name '" + alias_name + "' is not unique", ErrorCodes::LOGICAL_ERROR);
}
std::vector<String> getAllRegisteredNames() const
{
std::vector<String> result;
auto getter = [](const auto & pair) { return pair.first; };
std::transform(getCreatorMap().begin(), getCreatorMap().end(), std::back_inserter(result), getter);
std::transform(aliases.begin(), aliases.end(), std::back_inserter(result), getter);
return result;
}
bool isCaseInsensitive(const String & name) const
{
String name_lowercase = Poco::toLower(name);
return getCaseInsensitiveCreatorMap().count(name_lowercase) || case_insensitive_aliases.count(name_lowercase);
}
const String & aliasTo(const String & name) const
{
if (auto it = aliases.find(name); it != aliases.end())
return it->second;
else if (auto it = case_insensitive_aliases.find(Poco::toLower(name)); it != case_insensitive_aliases.end())
return it->second;
throw Exception(getFactoryName() + ": name '" + name + "' is not alias", ErrorCodes::LOGICAL_ERROR);
}
bool isAlias(const String & name) const
{
return aliases.count(name) || case_insensitive_aliases.count(name);
}
virtual ~IFactoryWithAliases() {}
private:
using InnerMap = std::unordered_map<String, Creator>; // name -> creator
using AliasMap = std::unordered_map<String, String>; // alias -> original type
virtual const InnerMap & getCreatorMap() const = 0;
virtual const InnerMap & getCaseInsensitiveCreatorMap() const = 0;
virtual String getFactoryName() const = 0;
/// Alias map to data_types from previous two maps
AliasMap aliases;
/// Case insensitive aliases
AliasMap case_insensitive_aliases;
};
}

View File

@ -47,7 +47,7 @@ public:
KeeperMultiException(int32_t code, const Requests & requests, const Responses & responses);
private:
size_t getFailedOpIndex(int32_t code, const Responses & responses) const;
static size_t getFailedOpIndex(int32_t code, const Responses & responses);
};
};

View File

@ -815,7 +815,7 @@ int32_t ZooKeeper::tryMultiNoThrow(const Requests & requests, Responses & respon
}
size_t KeeperMultiException::getFailedOpIndex(int32_t code, const Responses & responses) const
size_t KeeperMultiException::getFailedOpIndex(int32_t code, const Responses & responses)
{
if (responses.empty())
throw DB::Exception("Responses for multi transaction is empty", DB::ErrorCodes::LOGICAL_ERROR);
@ -833,15 +833,16 @@ size_t KeeperMultiException::getFailedOpIndex(int32_t code, const Responses & re
KeeperMultiException::KeeperMultiException(int32_t code, const Requests & requests, const Responses & responses)
: KeeperException("Transaction failed at op #" + std::to_string(getFailedOpIndex(code, responses)), code),
: KeeperException("Transaction failed", code),
requests(requests), responses(responses), failed_op_index(getFailedOpIndex(code, responses))
{
addMessage("Op #" + std::to_string(failed_op_index) + ", path: " + getPathForFirstFailedOp());
}
std::string KeeperMultiException::getPathForFirstFailedOp() const
{
return requests[failed_op_index]->getPath();
}
void KeeperMultiException::check(int32_t code, const Requests & requests, const Responses & responses)

View File

@ -367,10 +367,20 @@ void read(String & s, ReadBuffer & in)
static constexpr int32_t max_string_size = 1 << 20;
int32_t size = 0;
read(size, in);
if (size < 0) /// TODO Actually it means that zookeeper node has NULL value. Maybe better to treat it like empty string.
if (size == -1)
{
/// It means that zookeeper node has NULL value. We will treat it like empty string.
s.clear();
return;
}
if (size < 0)
throw Exception("Negative size while reading string from ZooKeeper", ZooKeeper::ZMARSHALLINGERROR);
if (size > max_string_size)
throw Exception("Too large string size while reading from ZooKeeper", ZooKeeper::ZMARSHALLINGERROR);
s.resize(size);
in.read(&s[0], size);
}
@ -875,6 +885,18 @@ ZooKeeper::ResponsePtr ZooKeeper::MultiRequest::makeResponse() const { return st
ZooKeeper::ResponsePtr ZooKeeper::CloseRequest::makeResponse() const { return std::make_shared<CloseResponse>(); }
ZooKeeper::RequestPtr ZooKeeper::MultiRequest::clone() const
{
auto res = std::make_shared<MultiRequest>();
res->requests.reserve(requests.size());
for (const auto & request : requests)
res->requests.emplace_back(request->clone());
return res;
}
void ZooKeeper::CreateRequest::addRootPath(const String & root_path) { ZooKeeperImpl::addRootPath(path, root_path); }
void ZooKeeper::RemoveRequest::addRootPath(const String & root_path) { ZooKeeperImpl::addRootPath(path, root_path); }
void ZooKeeper::ExistsRequest::addRootPath(const String & root_path) { ZooKeeperImpl::addRootPath(path, root_path); }
@ -965,30 +987,52 @@ void ZooKeeper::receiveEvent()
if (it == operations.end())
throw Exception("Received response for unknown xid", ZRUNTIMEINCONSISTENCY);
/// After this point, we must invoke callback, that we've grabbed from 'operations'.
/// Invariant: all callbacks are invoked either in case of success or in case of error.
/// (all callbacks in 'operations' are guaranteed to be invoked)
request_info = std::move(it->second);
operations.erase(it);
CurrentMetrics::sub(CurrentMetrics::ZooKeeperRequest);
}
response = request_info.request->makeResponse();
auto elapsed_microseconds = std::chrono::duration_cast<std::chrono::microseconds>(clock::now() - request_info.time).count();
ProfileEvents::increment(ProfileEvents::ZooKeeperWaitMicroseconds, elapsed_microseconds);
}
if (err)
response->error = err;
else
try
{
response->readImpl(*in);
response->removeRootPath(root_path);
if (!response)
response = request_info.request->makeResponse();
if (err)
response->error = err;
else
{
response->readImpl(*in);
response->removeRootPath(root_path);
}
int32_t actual_length = in->count() - count_before_event;
if (length != actual_length)
throw Exception("Response length doesn't match. Expected: " + toString(length) + ", actual: " + toString(actual_length), ZMARSHALLINGERROR);
}
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__);
/// Unrecoverable. Don't leave incorrect state in memory.
if (!response)
std::terminate();
response->error = ZMARSHALLINGERROR;
if (request_info.callback)
request_info.callback(*response);
throw;
}
int32_t actual_length = in->count() - count_before_event;
if (length != actual_length)
throw Exception("Response length doesn't match. Expected: " + toString(length) + ", actual: " + toString(actual_length), ZMARSHALLINGERROR);
/// NOTE: Exception in callback will propagate to receiveThread and will lead to session expiration. This is Ok.
/// Exception in callback will propagate to receiveThread and will lead to session expiration. This is Ok.
if (request_info.callback)
request_info.callback(*response);
@ -1507,7 +1551,11 @@ void ZooKeeper::multi(
MultiCallback callback)
{
MultiRequest request;
request.requests = requests;
/// Deep copy to avoid modifying path in presence of chroot prefix.
request.requests.reserve(requests.size());
for (const auto & elem : requests)
request.requests.emplace_back(elem->clone());
for (auto & elem : request.requests)
if (CreateRequest * create = typeid_cast<CreateRequest *>(elem.get()))

View File

@ -156,6 +156,10 @@ public:
using XID = int32_t;
using OpNum = int32_t;
struct Response;
using ResponsePtr = std::shared_ptr<Response>;
using Responses = std::vector<ResponsePtr>;
using ResponseCallback = std::function<void(const Response &)>;
struct Response
{
@ -166,9 +170,9 @@ public:
virtual void removeRootPath(const String & /* root_path */) {}
};
using ResponsePtr = std::shared_ptr<Response>;
using Responses = std::vector<ResponsePtr>;
using ResponseCallback = std::function<void(const Response &)>;
struct Request;
using RequestPtr = std::shared_ptr<Request>;
using Requests = std::vector<RequestPtr>;
struct Request
{
@ -176,6 +180,8 @@ public:
bool has_watch = false;
virtual ~Request() {}
virtual RequestPtr clone() const = 0;
virtual OpNum getOpNum() const = 0;
/// Writes length, xid, op_num, then the rest.
@ -188,11 +194,9 @@ public:
virtual String getPath() const = 0;
};
using RequestPtr = std::shared_ptr<Request>;
using Requests = std::vector<RequestPtr>;
struct HeartbeatRequest final : Request
{
RequestPtr clone() const override { return std::make_shared<HeartbeatRequest>(*this); }
OpNum getOpNum() const override { return 11; }
void writeImpl(WriteBuffer &) const override {}
ResponsePtr makeResponse() const override;
@ -222,6 +226,7 @@ public:
String scheme;
String data;
RequestPtr clone() const override { return std::make_shared<AuthRequest>(*this); }
OpNum getOpNum() const override { return 100; }
void writeImpl(WriteBuffer &) const override;
ResponsePtr makeResponse() const override;
@ -235,6 +240,7 @@ public:
struct CloseRequest final : Request
{
RequestPtr clone() const override { return std::make_shared<CloseRequest>(*this); }
OpNum getOpNum() const override { return -11; }
void writeImpl(WriteBuffer &) const override {}
ResponsePtr makeResponse() const override;
@ -254,6 +260,7 @@ public:
bool is_sequential = false;
ACLs acls;
RequestPtr clone() const override { return std::make_shared<CreateRequest>(*this); }
OpNum getOpNum() const override { return 1; }
void writeImpl(WriteBuffer &) const override;
ResponsePtr makeResponse() const override;
@ -274,6 +281,7 @@ public:
String path;
int32_t version = -1;
RequestPtr clone() const override { return std::make_shared<RemoveRequest>(*this); }
OpNum getOpNum() const override { return 2; }
void writeImpl(WriteBuffer &) const override;
ResponsePtr makeResponse() const override;
@ -290,6 +298,7 @@ public:
{
String path;
RequestPtr clone() const override { return std::make_shared<ExistsRequest>(*this); }
OpNum getOpNum() const override { return 3; }
void writeImpl(WriteBuffer &) const override;
ResponsePtr makeResponse() const override;
@ -308,6 +317,7 @@ public:
{
String path;
RequestPtr clone() const override { return std::make_shared<GetRequest>(*this); }
OpNum getOpNum() const override { return 4; }
void writeImpl(WriteBuffer &) const override;
ResponsePtr makeResponse() const override;
@ -329,6 +339,7 @@ public:
String data;
int32_t version = -1;
RequestPtr clone() const override { return std::make_shared<SetRequest>(*this); }
OpNum getOpNum() const override { return 5; }
void writeImpl(WriteBuffer &) const override;
ResponsePtr makeResponse() const override;
@ -347,6 +358,7 @@ public:
{
String path;
RequestPtr clone() const override { return std::make_shared<ListRequest>(*this); }
OpNum getOpNum() const override { return 12; }
void writeImpl(WriteBuffer &) const override;
ResponsePtr makeResponse() const override;
@ -367,6 +379,7 @@ public:
String path;
int32_t version = -1;
RequestPtr clone() const override { return std::make_shared<CheckRequest>(*this); }
OpNum getOpNum() const override { return 13; }
void writeImpl(WriteBuffer &) const override;
ResponsePtr makeResponse() const override;
@ -383,6 +396,7 @@ public:
{
Requests requests;
RequestPtr clone() const override;
OpNum getOpNum() const override { return 14; }
void writeImpl(WriteBuffer &) const override;
ResponsePtr makeResponse() const override;

View File

@ -7,6 +7,7 @@ const char * auto_config_build[]
"VERSION_FULL", "@VERSION_FULL@",
"VERSION_DESCRIBE", "@VERSION_DESCRIBE@",
"VERSION_GITHASH", "@VERSION_GITHASH@",
"VERSION_REVISION", "@VERSION_REVISION@",
"BUILD_DATE", "@BUILD_DATE@",
"BUILD_TYPE", "@CMAKE_BUILD_TYPE@",
"SYSTEM", "@CMAKE_SYSTEM@",

View File

@ -13,7 +13,31 @@
#cmakedefine VERSION_REVISION @VERSION_REVISION@
#endif
#cmakedefine VERSION_NAME "@VERSION_NAME@"
#define DBMS_NAME VERSION_NAME
#cmakedefine VERSION_MAJOR @VERSION_MAJOR@
#cmakedefine VERSION_MINOR @VERSION_MINOR@
#cmakedefine VERSION_PATCH @VERSION_PATCH@
#cmakedefine VERSION_STRING "@VERSION_STRING@"
#cmakedefine VERSION_FULL "@VERSION_FULL@"
#cmakedefine VERSION_DESCRIBE "@VERSION_DESCRIBE@"
#cmakedefine VERSION_GITHASH "@VERSION_GITHASH@"
#if defined(VERSION_MAJOR)
#define DBMS_VERSION_MAJOR VERSION_MAJOR
#else
#define DBMS_VERSION_MAJOR 0
#endif
#if defined(VERSION_MINOR)
#define DBMS_VERSION_MINOR VERSION_MINOR
#else
#define DBMS_VERSION_MINOR 0
#endif
#if defined(VERSION_PATCH)
#define DBMS_VERSION_PATCH VERSION_PATCH
#else
#define DBMS_VERSION_PATCH 0
#endif

View File

@ -1,9 +1,5 @@
#pragma once
#define DBMS_NAME "ClickHouse"
#define DBMS_VERSION_MAJOR 1
#define DBMS_VERSION_MINOR 1
#define DBMS_DEFAULT_HOST "localhost"
#define DBMS_DEFAULT_PORT 9000
#define DBMS_DEFAULT_SECURE_PORT 9440
@ -19,13 +15,6 @@
/// The size of the I/O buffer by default.
#define DBMS_DEFAULT_BUFFER_SIZE 1048576ULL
/// When writing data, a buffer of `max_compress_block_size` size is allocated for compression. When the buffer overflows or if into the buffer
/// more or equal data is written than `min_compress_block_size`, then with the next mark, the data will also compressed
/// As a result, for small columns (numbers 1-8 bytes), with index_granularity = 8192, the block size will be 64 KB.
/// And for large columns (Title - string ~100 bytes), the block size will be ~819 KB. Due to this, the compression ratio almost does not get worse.
#define DEFAULT_MIN_COMPRESS_BLOCK_SIZE 65536
#define DEFAULT_MAX_COMPRESS_BLOCK_SIZE 1048576
/** Which blocks by default read the data (by number of rows).
* Smaller values give better cache locality, less consumption of RAM, but more overhead to process the query.
*/
@ -43,17 +32,12 @@
*/
#define DEFAULT_MERGE_BLOCK_SIZE 8192
#define DEFAULT_MAX_QUERY_SIZE 262144
#define SHOW_CHARS_ON_SYNTAX_ERROR ptrdiff_t(160)
#define DEFAULT_MAX_DISTRIBUTED_CONNECTIONS 1024
#define DEFAULT_INTERACTIVE_DELAY 100000
#define DBMS_DEFAULT_DISTRIBUTED_CONNECTIONS_POOL_SIZE 1024
#define DBMS_CONNECTION_POOL_WITH_FAILOVER_DEFAULT_MAX_TRIES 3
/// each period reduces the error counter by 2 times
/// too short a period can cause errors to disappear immediately after creation.
#define DBMS_CONNECTION_POOL_WITH_FAILOVER_DEFAULT_DECREASE_ERROR_PERIOD (2 * DBMS_DEFAULT_SEND_TIMEOUT_SEC)
#define DEFAULT_QUERIES_QUEUE_WAIT_TIME_MS 5000 /// Maximum waiting time in the request queue.
#define DBMS_DEFAULT_BACKGROUND_POOL_SIZE 16
#define DBMS_MIN_REVISION_WITH_CLIENT_INFO 54032
#define DBMS_MIN_REVISION_WITH_SERVER_TIMEZONE 54058
@ -65,8 +49,6 @@
/// Version of ClickHouse TCP protocol. Set to git tag with latest protocol change.
#define DBMS_TCP_PROTOCOL_VERSION 54226
#define DBMS_DISTRIBUTED_DIRECTORY_MONITOR_SLEEP_TIME_MS 100
/// The boundary on which the blocks for asynchronous file operations should be aligned.
#define DEFAULT_AIO_FILE_BLOCK_SIZE 4096

View File

@ -1,20 +0,0 @@
#include <sstream>
#include <Core/SortDescription.h>
#include <Columns/Collator.h>
#include <IO/Operators.h>
#include <IO/WriteBufferFromString.h>
namespace DB
{
std::string SortColumnDescription::getID() const
{
WriteBufferFromOwnString out;
out << column_name << ", " << column_number << ", " << direction << ", " << nulls_direction;
if (collator)
out << ", collation locale: " << collator->getLocale();
return out.str();
}
}

View File

@ -26,9 +26,6 @@ struct SortColumnDescription
SortColumnDescription(const std::string & column_name_, int direction_, int nulls_direction_, const std::shared_ptr<Collator> & collator_ = nullptr)
: column_name(column_name_), column_number(0), direction(direction_), nulls_direction(nulls_direction_), collator(collator_) {}
/// For IBlockInputStream.
std::string getID() const;
};
/// Description of the sorting rule for several columns.

View File

@ -4,6 +4,7 @@
#include <DataTypes/NestedUtils.h>
#include <DataTypes/DataTypeArray.h>
#include <Columns/ColumnArray.h>
#include <Interpreters/evaluateMissingDefaults.h>
#include <Core/Block.h>

View File

@ -4,7 +4,6 @@
#include <Columns/ColumnConst.h>
#include <Storages/ColumnDefault.h>
#include <Interpreters/Context.h>
#include <Interpreters/evaluateMissingDefaults.h>
namespace DB

View File

@ -119,7 +119,7 @@ void CreatingSetsBlockInputStream::createOne(SubqueryForSet & subquery)
if (!done_with_set)
{
if (!subquery.set->insertFromBlock(block, /*fill_set_elements=*/false))
if (!subquery.set->insertFromBlock(block))
done_with_set = true;
}

View File

@ -63,7 +63,8 @@ void NativeBlockInputStream::readData(const IDataType & type, IColumn & column,
type.deserializeBinaryBulkWithMultipleStreams(column, input_stream_getter, rows, avg_value_size_hint, false, {});
if (column.size() != rows)
throw Exception("Cannot read all data in NativeBlockInputStream.", ErrorCodes::CANNOT_READ_ALL_DATA);
throw Exception("Cannot read all data in NativeBlockInputStream. Rows read: " + toString(column.size()) + ". Rows expected: " + toString(rows) + ".",
ErrorCodes::CANNOT_READ_ALL_DATA);
}

View File

@ -71,7 +71,7 @@ void PushingToViewsBlockOutputStream::write(const Block & block)
try
{
BlockInputStreamPtr from = std::make_shared<OneBlockInputStream>(block);
InterpreterSelectQuery select(view.query, *views_context, {}, QueryProcessingStage::Complete, 0, from);
InterpreterSelectQuery select(view.query, *views_context, from);
BlockInputStreamPtr in = std::make_shared<MaterializingBlockInputStream>(select.execute().in);
/// Squashing is needed here because the materialized view query can generate a lot of blocks
/// even when only one block is inserted into the parent table (e.g. if the query is a GROUP BY

View File

@ -3,6 +3,8 @@
#include <DataTypes/NestedUtils.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypeAggregateFunction.h>
#include <Columns/ColumnAggregateFunction.h>
#include <Columns/ColumnTuple.h>
#include <Common/StringUtils/StringUtils.h>
#include <Common/FieldVisitors.h>
@ -74,7 +76,8 @@ SummingSortedBlockInputStream::SummingSortedBlockInputStream(
}
else
{
if (!column.type->isSummable())
bool is_agg_func = checkDataType<DataTypeAggregateFunction>(column.type.get());
if (!column.type->isSummable() && !is_agg_func)
{
column_numbers_not_to_aggregate.push_back(i);
continue;
@ -93,8 +96,14 @@ SummingSortedBlockInputStream::SummingSortedBlockInputStream(
{
// Create aggregator to sum this column
AggregateDescription desc;
desc.is_agg_func_type = is_agg_func;
desc.column_numbers = {i};
desc.init("sumWithOverflow", {column.type});
if (!is_agg_func)
{
desc.init("sumWithOverflow", {column.type});
}
columns_to_aggregate.emplace_back(std::move(desc));
}
else
@ -193,27 +202,34 @@ void SummingSortedBlockInputStream::insertCurrentRowIfNeeded(MutableColumns & me
// Do not insert if the aggregation state hasn't been created
if (desc.created)
{
try
if (desc.is_agg_func_type)
{
desc.function->insertResultInto(desc.state.data(), *desc.merged_column);
/// Update zero status of current row
if (desc.column_numbers.size() == 1)
{
// Flag row as non-empty if at least one column number if non-zero
current_row_is_zero = current_row_is_zero && desc.merged_column->get64(desc.merged_column->size() - 1) == 0;
}
else
{
/// It is sumMap aggregate function.
/// Assume that the row isn't empty in this case (just because it is compatible with previous version)
current_row_is_zero = false;
}
current_row_is_zero = false;
}
catch (...)
else
{
desc.destroyState();
throw;
try
{
desc.function->insertResultInto(desc.state.data(), *desc.merged_column);
/// Update zero status of current row
if (desc.column_numbers.size() == 1)
{
// Flag row as non-empty if at least one column number if non-zero
current_row_is_zero = current_row_is_zero && desc.merged_column->get64(desc.merged_column->size() - 1) == 0;
}
else
{
/// It is sumMap aggregate function.
/// Assume that the row isn't empty in this case (just because it is compatible with previous version)
current_row_is_zero = false;
}
}
catch (...)
{
desc.destroyState();
throw;
}
}
desc.destroyState();
}
@ -258,7 +274,7 @@ Block SummingSortedBlockInputStream::readImpl()
for (auto & desc : columns_to_aggregate)
{
// Wrap aggregated columns in a tuple to match function signature
if (checkDataType<DataTypeTuple>(desc.function->getReturnType().get()))
if (!desc.is_agg_func_type && checkDataType<DataTypeTuple>(desc.function->getReturnType().get()))
{
size_t tuple_size = desc.column_numbers.size();
MutableColumns tuple_columns(tuple_size);
@ -277,7 +293,7 @@ Block SummingSortedBlockInputStream::readImpl()
/// Place aggregation results into block.
for (auto & desc : columns_to_aggregate)
{
if (checkDataType<DataTypeTuple>(desc.function->getReturnType().get()))
if (!desc.is_agg_func_type && checkDataType<DataTypeTuple>(desc.function->getReturnType().get()))
{
/// Unpack tuple into block.
size_t tuple_size = desc.column_numbers.size();
@ -307,22 +323,24 @@ void SummingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::
if (current_key.empty()) /// The first key encountered.
{
setPrimaryKeyRef(current_key, current);
key_differs = true;
current_row_is_zero = true;
}
else
key_differs = next_key != current_key;
/// if there are enough rows and the last one is calculated completely
if (key_differs && merged_rows >= max_block_size)
return;
queue.pop();
if (key_differs)
{
/// Write the data for the previous group.
insertCurrentRowIfNeeded(merged_columns, false);
if (!current_key.empty())
/// Write the data for the previous group.
insertCurrentRowIfNeeded(merged_columns, false);
if (merged_rows >= max_block_size)
{
/// The block is now full and the last row is calculated completely.
current_key.reset();
return;
}
current_key.swap(next_key);
@ -359,6 +377,8 @@ void SummingSortedBlockInputStream::merge(MutableColumns & merged_columns, std::
current_row_is_zero = false;
}
queue.pop();
if (!current->isLast())
{
current->next();
@ -468,20 +488,29 @@ void SummingSortedBlockInputStream::addRow(SortCursor & cursor)
if (!desc.created)
throw Exception("Logical error in SummingSortedBlockInputStream, there are no description", ErrorCodes::LOGICAL_ERROR);
// Specialized case for unary functions
if (desc.column_numbers.size() == 1)
if (desc.is_agg_func_type)
{
// desc.state is not used for AggregateFunction types
auto & col = cursor->all_columns[desc.column_numbers[0]];
desc.add_function(desc.function.get(), desc.state.data(), &col, cursor->pos, nullptr);
static_cast<ColumnAggregateFunction &>(*desc.merged_column).insertMergeFrom(*col, cursor->pos);
}
else
{
// Gather all source columns into a vector
ColumnRawPtrs columns(desc.column_numbers.size());
for (size_t i = 0; i < desc.column_numbers.size(); ++i)
columns[i] = cursor->all_columns[desc.column_numbers[i]];
// Specialized case for unary functions
if (desc.column_numbers.size() == 1)
{
auto & col = cursor->all_columns[desc.column_numbers[0]];
desc.add_function(desc.function.get(), desc.state.data(), &col, cursor->pos, nullptr);
}
else
{
// Gather all source columns into a vector
ColumnRawPtrs columns(desc.column_numbers.size());
for (size_t i = 0; i < desc.column_numbers.size(); ++i)
columns[i] = cursor->all_columns[desc.column_numbers[i]];
desc.add_function(desc.function.get(), desc.state.data(), columns.data(), cursor->pos, nullptr);
desc.add_function(desc.function.get(), desc.state.data(), columns.data(), cursor->pos, nullptr);
}
}
}
}

View File

@ -69,6 +69,7 @@ private:
/// Stores aggregation function, state, and columns to be used as function arguments
struct AggregateDescription
{
/// An aggregate function 'sumWithOverflow' or 'sumMap' for summing.
AggregateFunctionPtr function;
IAggregateFunction::AddFunc add_function = nullptr;
std::vector<size_t> column_numbers;
@ -76,6 +77,9 @@ private:
std::vector<char> state;
bool created = false;
/// In case when column has type AggregateFunction: use the aggregate function from itself instead of 'function' above.
bool is_agg_func_type = false;
void init(const char * function_name, const DataTypes & argument_types)
{
function = AggregateFunctionFactory::instance().get(function_name, argument_types);
@ -87,7 +91,10 @@ private:
{
if (created)
return;
function->create(state.data());
if (is_agg_func_type)
merged_column->insertDefault();
else
function->create(state.data());
created = true;
}
@ -95,7 +102,8 @@ private:
{
if (!created)
return;
function->destroy(state.data());
if (!is_agg_func_type)
function->destroy(state.data());
created = false;
}

View File

@ -241,7 +241,7 @@ void DataTypeAggregateFunction::serializeTextCSV(const IColumn & column, size_t
void DataTypeAggregateFunction::deserializeTextCSV(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const
{
String s;
readCSV(s, istr, settings.csv.delimiter);
readCSV(s, istr, settings.csv);
deserializeFromString(function, column, s);
}

View File

@ -415,7 +415,7 @@ void DataTypeArray::serializeTextCSV(const IColumn & column, size_t row_num, Wri
void DataTypeArray::deserializeTextCSV(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const
{
String s;
readCSV(s, istr, settings.csv.delimiter);
readCSV(s, istr, settings.csv);
ReadBufferFromString rb(s);
deserializeText(column, rb, settings);
}

View File

@ -194,7 +194,7 @@ template <typename Type>
void DataTypeEnum<Type>::deserializeTextCSV(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const
{
std::string name;
readCSVString(name, istr, settings.csv.delimiter);
readCSVString(name, istr, settings.csv);
static_cast<ColumnType &>(column).getData().push_back(getValue(StringRef(name)));
}

View File

@ -51,16 +51,19 @@ DataTypePtr DataTypeFactory::get(const ASTPtr & ast) const
throw Exception("Unexpected AST element for data type.", ErrorCodes::UNEXPECTED_AST_STRUCTURE);
}
DataTypePtr DataTypeFactory::get(const String & family_name, const ASTPtr & parameters) const
DataTypePtr DataTypeFactory::get(const String & family_name_param, const ASTPtr & parameters) const
{
String family_name = getAliasToOrName(family_name_param);
{
DataTypesDictionary::const_iterator it = data_types.find(family_name);
if (data_types.end() != it)
return it->second(parameters);
}
String family_name_lowercase = Poco::toLower(family_name);
{
String family_name_lowercase = Poco::toLower(family_name);
DataTypesDictionary::const_iterator it = case_insensitive_data_types.find(family_name_lowercase);
if (case_insensitive_data_types.end() != it)
return it->second(parameters);
@ -76,11 +79,16 @@ void DataTypeFactory::registerDataType(const String & family_name, Creator creat
throw Exception("DataTypeFactory: the data type family " + family_name + " has been provided "
" a null constructor", ErrorCodes::LOGICAL_ERROR);
String family_name_lowercase = Poco::toLower(family_name);
if (isAlias(family_name) || isAlias(family_name_lowercase))
throw Exception("DataTypeFactory: the data type family name '" + family_name + "' is already registered as alias",
ErrorCodes::LOGICAL_ERROR);
if (!data_types.emplace(family_name, creator).second)
throw Exception("DataTypeFactory: the data type family name '" + family_name + "' is not unique",
ErrorCodes::LOGICAL_ERROR);
String family_name_lowercase = Poco::toLower(family_name);
if (case_sensitiveness == CaseInsensitive
&& !case_insensitive_data_types.emplace(family_name_lowercase, creator).second)
@ -88,7 +96,6 @@ void DataTypeFactory::registerDataType(const String & family_name, Creator creat
ErrorCodes::LOGICAL_ERROR);
}
void DataTypeFactory::registerSimpleDataType(const String & name, SimpleCreator creator, CaseSensitiveness case_sensitiveness)
{
if (creator == nullptr)
@ -103,7 +110,6 @@ void DataTypeFactory::registerSimpleDataType(const String & name, SimpleCreator
}, case_sensitiveness);
}
void registerDataTypeNumbers(DataTypeFactory & factory);
void registerDataTypeDate(DataTypeFactory & factory);
void registerDataTypeDateTime(DataTypeFactory & factory);

View File

@ -3,6 +3,7 @@
#include <memory>
#include <functional>
#include <unordered_map>
#include <Common/IFactoryWithAliases.h>
#include <DataTypes/IDataType.h>
#include <ext/singleton.h>
@ -19,10 +20,9 @@ using ASTPtr = std::shared_ptr<IAST>;
/** Creates a data type by name of data type family and parameters.
*/
class DataTypeFactory final : public ext::singleton<DataTypeFactory>
class DataTypeFactory final : public ext::singleton<DataTypeFactory>, public IFactoryWithAliases<std::function<DataTypePtr(const ASTPtr & parameters)>>
{
private:
using Creator = std::function<DataTypePtr(const ASTPtr & parameters)>;
using SimpleCreator = std::function<DataTypePtr()>;
using DataTypesDictionary = std::unordered_map<String, Creator>;
@ -31,13 +31,6 @@ public:
DataTypePtr get(const String & family_name, const ASTPtr & parameters) const;
DataTypePtr get(const ASTPtr & ast) const;
/// For compatibility with SQL, it's possible to specify that certain data type name is case insensitive.
enum CaseSensitiveness
{
CaseSensitive,
CaseInsensitive
};
/// Register a type family by its name.
void registerDataType(const String & family_name, Creator creator, CaseSensitiveness case_sensitiveness = CaseSensitive);
@ -51,6 +44,13 @@ private:
DataTypesDictionary case_insensitive_data_types;
DataTypeFactory();
const DataTypesDictionary & getCreatorMap() const override { return data_types; }
const DataTypesDictionary & getCaseInsensitiveCreatorMap() const override { return case_insensitive_data_types; }
String getFactoryName() const override { return "DataTypeFactory"; }
friend class ext::singleton<DataTypeFactory>;
};

View File

@ -102,7 +102,7 @@ void DataTypeFixedString::deserializeBinaryBulk(IColumn & column, ReadBuffer & i
size_t read_bytes = istr.readBig(reinterpret_cast<char *>(&data[initial_size]), max_bytes);
if (read_bytes % n != 0)
throw Exception("Cannot read all data of type FixedString",
throw Exception("Cannot read all data of type FixedString. Bytes read:" + toString(read_bytes) + ". String size:" + toString(n) + ".",
ErrorCodes::CANNOT_READ_ALL_DATA);
data.resize(initial_size + read_bytes);
@ -197,7 +197,7 @@ void DataTypeFixedString::serializeTextCSV(const IColumn & column, size_t row_nu
void DataTypeFixedString::deserializeTextCSV(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const
{
read(*this, column, [&istr, delimiter = settings.csv.delimiter](ColumnFixedString::Chars_t & data) { readCSVStringInto(data, istr, delimiter); });
read(*this, column, [&istr, &csv = settings.csv](ColumnFixedString::Chars_t & data) { readCSVStringInto(data, istr, csv); });
}
@ -231,7 +231,7 @@ void registerDataTypeFixedString(DataTypeFactory & factory)
factory.registerDataType("FixedString", create);
/// Compatibility alias.
factory.registerDataType("BINARY", create, DataTypeFactory::CaseInsensitive);
factory.registerAlias("BINARY", "FixedString", DataTypeFactory::CaseInsensitive);
}
}

View File

@ -245,6 +245,12 @@ bool DataTypeNumberBase<T>::isValueRepresentedByInteger() const
return std::is_integral_v<T>;
}
template <typename T>
bool DataTypeNumberBase<T>::isValueRepresentedByUnsignedInteger() const
{
return std::is_integral_v<T> && std::is_unsigned_v<T>;
}
/// Explicit template instantiations - to avoid code bloat in headers.
template class DataTypeNumberBase<UInt8>;

View File

@ -46,6 +46,7 @@ public:
bool isComparable() const override { return true; }
bool isValueRepresentedByNumber() const override { return true; }
bool isValueRepresentedByInteger() const override;
bool isValueRepresentedByUnsignedInteger() const override;
bool isValueUnambiguouslyRepresentedInContiguousMemoryRegion() const override { return true; }
bool haveMaximumSizeOfValue() const override { return true; }
size_t getSizeOfValueInMemory() const override { return sizeof(T); }

View File

@ -288,7 +288,7 @@ void DataTypeString::serializeTextCSV(const IColumn & column, size_t row_num, Wr
void DataTypeString::deserializeTextCSV(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const
{
read(column, [&](ColumnString::Chars_t & data) { readCSVStringInto(data, istr, settings.csv.delimiter); });
read(column, [&](ColumnString::Chars_t & data) { readCSVStringInto(data, istr, settings.csv); });
}
@ -312,16 +312,16 @@ void registerDataTypeString(DataTypeFactory & factory)
/// These synonims are added for compatibility.
factory.registerSimpleDataType("CHAR", creator, DataTypeFactory::CaseInsensitive);
factory.registerSimpleDataType("VARCHAR", creator, DataTypeFactory::CaseInsensitive);
factory.registerSimpleDataType("TEXT", creator, DataTypeFactory::CaseInsensitive);
factory.registerSimpleDataType("TINYTEXT", creator, DataTypeFactory::CaseInsensitive);
factory.registerSimpleDataType("MEDIUMTEXT", creator, DataTypeFactory::CaseInsensitive);
factory.registerSimpleDataType("LONGTEXT", creator, DataTypeFactory::CaseInsensitive);
factory.registerSimpleDataType("BLOB", creator, DataTypeFactory::CaseInsensitive);
factory.registerSimpleDataType("TINYBLOB", creator, DataTypeFactory::CaseInsensitive);
factory.registerSimpleDataType("MEDIUMBLOB", creator, DataTypeFactory::CaseInsensitive);
factory.registerSimpleDataType("LONGBLOB", creator, DataTypeFactory::CaseInsensitive);
factory.registerAlias("CHAR", "String", DataTypeFactory::CaseInsensitive);
factory.registerAlias("VARCHAR", "String", DataTypeFactory::CaseInsensitive);
factory.registerAlias("TEXT", "String", DataTypeFactory::CaseInsensitive);
factory.registerAlias("TINYTEXT", "String", DataTypeFactory::CaseInsensitive);
factory.registerAlias("MEDIUMTEXT", "String", DataTypeFactory::CaseInsensitive);
factory.registerAlias("LONGTEXT", "String", DataTypeFactory::CaseInsensitive);
factory.registerAlias("BLOB", "String", DataTypeFactory::CaseInsensitive);
factory.registerAlias("TINYBLOB", "String", DataTypeFactory::CaseInsensitive);
factory.registerAlias("MEDIUMBLOB", "String", DataTypeFactory::CaseInsensitive);
factory.registerAlias("LONGBLOB", "String", DataTypeFactory::CaseInsensitive);
}
}

View File

@ -22,13 +22,13 @@ void registerDataTypeNumbers(DataTypeFactory & factory)
/// These synonims are added for compatibility.
factory.registerSimpleDataType("TINYINT", [] { return DataTypePtr(std::make_shared<DataTypeInt8>()); }, DataTypeFactory::CaseInsensitive);
factory.registerSimpleDataType("SMALLINT", [] { return DataTypePtr(std::make_shared<DataTypeInt16>()); }, DataTypeFactory::CaseInsensitive);
factory.registerSimpleDataType("INT", [] { return DataTypePtr(std::make_shared<DataTypeInt32>()); }, DataTypeFactory::CaseInsensitive);
factory.registerSimpleDataType("INTEGER", [] { return DataTypePtr(std::make_shared<DataTypeInt32>()); }, DataTypeFactory::CaseInsensitive);
factory.registerSimpleDataType("BIGINT", [] { return DataTypePtr(std::make_shared<DataTypeInt64>()); }, DataTypeFactory::CaseInsensitive);
factory.registerSimpleDataType("FLOAT", [] { return DataTypePtr(std::make_shared<DataTypeFloat32>()); }, DataTypeFactory::CaseInsensitive);
factory.registerSimpleDataType("DOUBLE", [] { return DataTypePtr(std::make_shared<DataTypeFloat64>()); }, DataTypeFactory::CaseInsensitive);
factory.registerAlias("TINYINT", "Int8", DataTypeFactory::CaseInsensitive);
factory.registerAlias("SMALLINT", "Int16", DataTypeFactory::CaseInsensitive);
factory.registerAlias("INT", "Int32", DataTypeFactory::CaseInsensitive);
factory.registerAlias("INTEGER", "Int32", DataTypeFactory::CaseInsensitive);
factory.registerAlias("BIGINT", "Int64", DataTypeFactory::CaseInsensitive);
factory.registerAlias("FLOAT", "Float32", DataTypeFactory::CaseInsensitive);
factory.registerAlias("DOUBLE", "Float64", DataTypeFactory::CaseInsensitive);
}
}

View File

@ -1,17 +0,0 @@
#pragma once
namespace DB
{
struct FormatSettingsJSON
{
bool force_quoting_64bit_integers = true;
bool output_format_json_quote_denormals = true;
FormatSettingsJSON() = default;
FormatSettingsJSON(bool force_quoting_64bit_integers_, bool output_format_json_quote_denormals_)
: force_quoting_64bit_integers(force_quoting_64bit_integers_), output_format_json_quote_denormals(output_format_json_quote_denormals_) {}
};
}

View File

@ -307,6 +307,10 @@ public:
*/
virtual bool isValueRepresentedByInteger() const { return false; }
/** Unsigned Integers, Date, DateTime. Not nullable.
*/
virtual bool isValueRepresentedByUnsignedInteger() const { return false; }
/** Values are unambiguously identified by contents of contiguous memory region,
* that can be obtained by IColumn::getDataAt method.
* Examples: numbers, Date, DateTime, String, FixedString,

View File

@ -50,7 +50,7 @@ ClickHouseDictionarySource::ClickHouseDictionarySource(
table{config.getString(config_prefix + ".table")},
where{config.getString(config_prefix + ".where", "")},
update_field{config.getString(config_prefix + ".update_field", "")},
query_builder{dict_struct, db, table, where, ExternalQueryBuilder::Backticks},
query_builder{dict_struct, db, table, where, IdentifierQuotingStyle::Backticks},
sample_block{sample_block}, context(context),
is_local{isLocalAddress({ host, port }, config.getInt("tcp_port", 0))},
pool{is_local ? nullptr : createPool(host, port, secure, db, user, password, context)},
@ -67,7 +67,7 @@ ClickHouseDictionarySource::ClickHouseDictionarySource(const ClickHouseDictionar
db{other.db}, table{other.table},
where{other.where},
update_field{other.update_field},
query_builder{dict_struct, db, table, where, ExternalQueryBuilder::Backticks},
query_builder{dict_struct, db, table, where, IdentifierQuotingStyle::Backticks},
sample_block{other.sample_block}, context(other.context),
is_local{other.is_local},
pool{is_local ? nullptr : createPool(host, port, secure, db, user, password, context)},

View File

@ -41,7 +41,6 @@ void RegionsHierarchy::reload()
RegionID max_region_id = 0;
auto regions_reader = data_source->createReader();
RegionEntry region_entry;

View File

@ -22,7 +22,7 @@ ExternalQueryBuilder::ExternalQueryBuilder(
const std::string & db,
const std::string & table,
const std::string & where,
QuotingStyle quoting_style)
IdentifierQuotingStyle quoting_style)
: dict_struct(dict_struct), db(db), table(table), where(where), quoting_style(quoting_style)
{
}
@ -32,15 +32,15 @@ void ExternalQueryBuilder::writeQuoted(const std::string & s, WriteBuffer & out)
{
switch (quoting_style)
{
case None:
case IdentifierQuotingStyle::None:
writeString(s, out);
break;
case Backticks:
case IdentifierQuotingStyle::Backticks:
writeBackQuotedString(s, out);
break;
case DoubleQuotes:
case IdentifierQuotingStyle::DoubleQuotes:
writeDoubleQuotedString(s, out);
break;
}
@ -138,7 +138,7 @@ std::string ExternalQueryBuilder::composeLoadAllQuery() const
}
std::string ExternalQueryBuilder::composeUpdateQuery(const std::string &update_field, const std::string &time_point) const
std::string ExternalQueryBuilder::composeUpdateQuery(const std::string & update_field, const std::string & time_point) const
{
std::string out = composeLoadAllQuery();
std::string update_query;

View File

@ -3,6 +3,7 @@
#include <string>
#include <Formats/FormatSettings.h>
#include <Columns/IColumn.h>
#include <Parsers/IdentifierQuotingStyle.h>
namespace DB
@ -21,16 +22,7 @@ struct ExternalQueryBuilder
const std::string & table;
const std::string & where;
/// Method to quote identifiers.
/// NOTE There could be differences in escaping rules inside quotes. Escaping rules may not match that required by specific external DBMS.
enum QuotingStyle
{
None, /// Write as-is, without quotes.
Backticks, /// `mysql` style
DoubleQuotes /// "postgres" style
};
QuotingStyle quoting_style;
IdentifierQuotingStyle quoting_style;
ExternalQueryBuilder(
@ -38,7 +30,7 @@ struct ExternalQueryBuilder
const std::string & db,
const std::string & table,
const std::string & where,
QuotingStyle quoting_style);
IdentifierQuotingStyle quoting_style);
/** Generate a query to load all data. */
std::string composeLoadAllQuery() const;

View File

@ -208,25 +208,30 @@ BlockInputStreamPtr LibraryDictionarySource::loadKeys(const Columns & key_column
{
LOG_TRACE(log, "loadKeys " << toString() << " size = " << requested_rows.size());
auto columns_holder = std::make_unique<ClickHouseLibrary::CString[]>(key_columns.size());
ClickHouseLibrary::CStrings columns_pass{
static_cast<decltype(ClickHouseLibrary::CStrings::data)>(columns_holder.get()), key_columns.size()};
size_t key_columns_n = 0;
for (auto & column : key_columns)
auto holder = std::make_unique<ClickHouseLibrary::Row[]>(key_columns.size());
std::vector<std::unique_ptr<ClickHouseLibrary::Field[]>> column_data_holders;
for (size_t i = 0; i < key_columns.size(); ++i)
{
columns_pass.data[key_columns_n] = column->getName().c_str();
++key_columns_n;
}
const ClickHouseLibrary::VectorUInt64 requested_rows_c{
ext::bit_cast<decltype(ClickHouseLibrary::VectorUInt64::data)>(requested_rows.data()), requested_rows.size()};
void * data_ptr = nullptr;
auto cell_holder = std::make_unique<ClickHouseLibrary::Field[]>(requested_rows.size());
for (size_t j = 0; j < requested_rows.size(); ++j)
{
auto data_ref = key_columns[i]->getDataAt(requested_rows[j]);
cell_holder[j] = ClickHouseLibrary::Field{.data = static_cast<const void *>(data_ref.data), .size = data_ref.size};
}
holder[i]
= ClickHouseLibrary::Row{.data = static_cast<ClickHouseLibrary::Field *>(cell_holder.get()), .size = requested_rows.size()};
column_data_holders.push_back(std::move(cell_holder));
}
ClickHouseLibrary::Table request_cols{.data = static_cast<ClickHouseLibrary::Row *>(holder.get()), .size = key_columns.size()};
void * data_ptr = nullptr;
/// Get function pointer before dataNew call because library->get may throw.
auto func_loadKeys
= library->get<void * (*)(decltype(data_ptr), decltype(&settings->strings), decltype(&columns_pass), decltype(&requested_rows_c))>(
"ClickHouseDictionary_v3_loadKeys");
auto func_loadKeys = library->get<void * (*)(decltype(data_ptr), decltype(&settings->strings), decltype(&request_cols))>(
"ClickHouseDictionary_v3_loadKeys");
data_ptr = library->get<decltype(data_ptr) (*)(decltype(lib_data))>("ClickHouseDictionary_v3_dataNew")(lib_data);
auto data = func_loadKeys(data_ptr, &settings->strings, &columns_pass, &requested_rows_c);
auto data = func_loadKeys(data_ptr, &settings->strings, &request_cols);
auto block = dataToBlock(description.sample_block, data);
SCOPE_EXIT(library->get<void (*)(decltype(lib_data), decltype(data_ptr))>("ClickHouseDictionary_v3_dataDelete")(lib_data, data_ptr));
return std::make_shared<OneBlockInputStream>(block);

View File

@ -35,7 +35,7 @@ MySQLDictionarySource::MySQLDictionarySource(const DictionaryStructure & dict_st
dont_check_update_time{config.getBool(config_prefix + ".dont_check_update_time", false)},
sample_block{sample_block},
pool{config, config_prefix},
query_builder{dict_struct, db, table, where, ExternalQueryBuilder::Backticks},
query_builder{dict_struct, db, table, where, IdentifierQuotingStyle::Backticks},
load_all_query{query_builder.composeLoadAllQuery()},
invalidate_query{config.getString(config_prefix + ".invalidate_query", "")}
{
@ -53,7 +53,7 @@ MySQLDictionarySource::MySQLDictionarySource(const MySQLDictionarySource & other
dont_check_update_time{other.dont_check_update_time},
sample_block{other.sample_block},
pool{other.pool},
query_builder{dict_struct, db, table, where, ExternalQueryBuilder::Backticks},
query_builder{dict_struct, db, table, where, IdentifierQuotingStyle::Backticks},
load_all_query{other.load_all_query}, last_modification{other.last_modification},
invalidate_query{other.invalidate_query}, invalidate_query_response{other.invalidate_query_response}
{

View File

@ -29,7 +29,7 @@ ODBCDictionarySource::ODBCDictionarySource(const DictionaryStructure & dict_stru
where{config.getString(config_prefix + ".where", "")},
update_field{config.getString(config_prefix + ".update_field", "")},
sample_block{sample_block},
query_builder{dict_struct, db, table, where, ExternalQueryBuilder::None}, /// NOTE Better to obtain quoting style via ODBC interface.
query_builder{dict_struct, db, table, where, IdentifierQuotingStyle::None}, /// NOTE Better to obtain quoting style via ODBC interface.
load_all_query{query_builder.composeLoadAllQuery()},
invalidate_query{config.getString(config_prefix + ".invalidate_query", "")}
{
@ -58,7 +58,7 @@ ODBCDictionarySource::ODBCDictionarySource(const ODBCDictionarySource & other)
update_field{other.update_field},
sample_block{other.sample_block},
pool{other.pool},
query_builder{dict_struct, db, table, where, ExternalQueryBuilder::None},
query_builder{dict_struct, db, table, where, IdentifierQuotingStyle::None},
load_all_query{other.load_all_query},
invalidate_query{other.invalidate_query}, invalidate_query_response{other.invalidate_query_response}
{

View File

@ -1,6 +1,7 @@
#include <Common/Exception.h>
#include <IO/WriteHelpers.h>
#include <Formats/BlockInputStreamFromRowInputStream.h>
#include <common/logger_useful.h>
namespace DB
@ -128,4 +129,16 @@ Block BlockInputStreamFromRowInputStream::readImpl()
return sample.cloneWithColumns(std::move(columns));
}
void BlockInputStreamFromRowInputStream::readSuffix()
{
if (allow_errors_num > 0 || allow_errors_ratio > 0)
{
Logger * log = &Logger::get("BlockInputStreamFromRowInputStream");
LOG_TRACE(log, "Skipped " << num_errors << " rows with errors while reading the input stream");
}
row_input->readSuffix();
}
}

View File

@ -25,7 +25,7 @@ public:
const FormatSettings & settings);
void readPrefix() override { row_input->readPrefix(); }
void readSuffix() override { row_input->readSuffix(); }
void readSuffix() override;
String getName() const override { return "BlockInputStreamFromRowInputStream"; }

View File

@ -83,16 +83,16 @@ static inline void skipWhitespacesAndTabs(ReadBuffer & buf)
}
static void skipRow(ReadBuffer & istr, const char delimiter, size_t num_columns)
static void skipRow(ReadBuffer & istr, const FormatSettings::CSV & settings, size_t num_columns)
{
String tmp;
for (size_t i = 0; i < num_columns; ++i)
{
skipWhitespacesAndTabs(istr);
readCSVString(tmp, istr, delimiter);
readCSVString(tmp, istr, settings);
skipWhitespacesAndTabs(istr);
skipDelimiter(istr, delimiter, i + 1 == num_columns);
skipDelimiter(istr, settings.delimiter, i + 1 == num_columns);
}
}
@ -107,7 +107,7 @@ void CSVRowInputStream::readPrefix()
String tmp;
if (with_names)
skipRow(istr, format_settings.csv.delimiter, num_columns);
skipRow(istr, format_settings.csv, num_columns);
}

View File

@ -43,10 +43,10 @@ private:
/* Action for state machine for traversing nested structures. */
struct Action
{
enum Type { POP, PUSH, READ };
Type type;
capnp::StructSchema::Field field = {};
size_t column = 0;
enum Type { POP, PUSH, READ };
Type type;
capnp::StructSchema::Field field = {};
size_t column = 0;
};
// Wrapper for classes that could throw in destructor
@ -54,10 +54,10 @@ private:
template <typename T>
struct DestructorCatcher
{
T impl;
template <typename ... Arg>
DestructorCatcher(Arg && ... args) : impl(kj::fwd<Arg>(args)...) {}
~DestructorCatcher() noexcept try { } catch (...) { }
T impl;
template <typename ... Arg>
DestructorCatcher(Arg && ... args) : impl(kj::fwd<Arg>(args)...) {}
~DestructorCatcher() noexcept try { } catch (...) { return; }
};
using SchemaParser = DestructorCatcher<capnp::SchemaParser>;

View File

@ -37,6 +37,8 @@ BlockInputStreamPtr FormatFactory::getInput(const String & name, ReadBuffer & bu
FormatSettings format_settings;
format_settings.csv.delimiter = settings.format_csv_delimiter;
format_settings.csv.allow_single_quotes = settings.format_csv_allow_single_quotes;
format_settings.csv.allow_double_quotes = settings.format_csv_allow_double_quotes;
format_settings.values.interpret_expressions = settings.input_format_values_interpret_expressions;
format_settings.skip_unknown_fields = settings.input_format_skip_unknown_fields;
format_settings.date_time_input_format = settings.date_time_input_format;
@ -59,6 +61,8 @@ BlockOutputStreamPtr FormatFactory::getOutput(const String & name, WriteBuffer &
format_settings.json.quote_64bit_integers = settings.output_format_json_quote_64bit_integers;
format_settings.json.quote_denormals = settings.output_format_json_quote_denormals;
format_settings.csv.delimiter = settings.format_csv_delimiter;
format_settings.csv.allow_single_quotes = settings.format_csv_allow_single_quotes;
format_settings.csv.allow_double_quotes = settings.format_csv_allow_double_quotes;
format_settings.pretty.max_rows = settings.output_format_pretty_max_rows;
format_settings.pretty.color = settings.output_format_pretty_color;
format_settings.write_statistics = settings.output_format_write_statistics;

View File

@ -58,6 +58,11 @@ public:
void registerInputFormat(const String & name, InputCreator input_creator);
void registerOutputFormat(const String & name, OutputCreator output_creator);
const FormatsDictionary & getAllFormats() const
{
return dict;
}
private:
FormatsDictionary dict;

View File

@ -24,6 +24,8 @@ struct FormatSettings
struct CSV
{
char delimiter = ',';
bool allow_single_quotes = true;
bool allow_double_quotes = true;
};
CSV csv;

View File

@ -1,5 +1,6 @@
#include <IO/ReadHelpers.h>
#include <Interpreters/evaluateConstantExpression.h>
#include <Interpreters/Context.h>
#include <Interpreters/convertFieldToType.h>
#include <Parsers/TokenIterator.h>
#include <Parsers/ExpressionListParsers.h>
@ -29,7 +30,7 @@ namespace ErrorCodes
ValuesRowInputStream::ValuesRowInputStream(ReadBuffer & istr_, const Block & header_, const Context & context_, const FormatSettings & format_settings)
: istr(istr_), header(header_), context(context_), format_settings(format_settings)
: istr(istr_), header(header_), context(std::make_unique<Context>(context_)), format_settings(format_settings)
{
/// In this format, BOM at beginning of stream cannot be confused with value, so it is safe to skip it.
skipBOMIfExists(istr);
@ -112,7 +113,7 @@ bool ValuesRowInputStream::read(MutableColumns & columns)
istr.position() = const_cast<char *>(token_iterator->begin);
std::pair<Field, DataTypePtr> value_raw = evaluateConstantExpression(ast, context);
std::pair<Field, DataTypePtr> value_raw = evaluateConstantExpression(ast, *context);
Field value = convertFieldToType(value_raw.first, type, value_raw.second.get());
if (value.isNull())

View File

@ -28,7 +28,7 @@ public:
private:
ReadBuffer & istr;
Block header;
const Context & context;
std::unique_ptr<Context> context; /// pimpl
const FormatSettings format_settings;
};

View File

@ -41,6 +41,7 @@ generate_function_register(Array
FunctionArrayEnumerate
FunctionArrayEnumerateUniq
FunctionArrayUniq
FunctionArrayDistinct
FunctionEmptyArrayUInt8
FunctionEmptyArrayUInt16
FunctionEmptyArrayUInt32

View File

@ -6,7 +6,6 @@
#include <Poco/String.h>
namespace DB
{
@ -26,8 +25,13 @@ void FunctionFactory::registerFunction(const
throw Exception("FunctionFactory: the function name '" + name + "' is not unique",
ErrorCodes::LOGICAL_ERROR);
String function_name_lowercase = Poco::toLower(name);
if (isAlias(name) || isAlias(function_name_lowercase))
throw Exception("FunctionFactory: the function name '" + name + "' is already registered as alias",
ErrorCodes::LOGICAL_ERROR);
if (case_sensitiveness == CaseInsensitive
&& !case_insensitive_functions.emplace(Poco::toLower(name), creator).second)
&& !case_insensitive_functions.emplace(function_name_lowercase, creator).second)
throw Exception("FunctionFactory: the case insensitive function name '" + name + "' is not unique",
ErrorCodes::LOGICAL_ERROR);
}
@ -45,9 +49,11 @@ FunctionBuilderPtr FunctionFactory::get(
FunctionBuilderPtr FunctionFactory::tryGet(
const std::string & name,
const std::string & name_param,
const Context & context) const
{
String name = getAliasToOrName(name_param);
auto it = functions.find(name);
if (functions.end() != it)
return it->second(context);

View File

@ -1,6 +1,7 @@
#pragma once
#include <Functions/IFunction.h>
#include <Common/IFactoryWithAliases.h>
#include <ext/singleton.h>
@ -20,19 +21,9 @@ class Context;
* Function could use for initialization (take ownership of shared_ptr, for example)
* some dictionaries from Context.
*/
class FunctionFactory : public ext::singleton<FunctionFactory>
class FunctionFactory : public ext::singleton<FunctionFactory>, public IFactoryWithAliases<std::function<FunctionBuilderPtr(const Context &)>>
{
friend class StorageSystemFunctions;
public:
using Creator = std::function<FunctionBuilderPtr(const Context &)>;
/// For compatibility with SQL, it's possible to specify that certain function name is case insensitive.
enum CaseSensitiveness
{
CaseSensitive,
CaseInsensitive
};
template <typename Function>
void registerFunction(CaseSensitiveness case_sensitiveness = CaseSensitive)
@ -67,6 +58,12 @@ private:
return std::make_shared<DefaultFunctionBuilder>(Function::create(context));
}
const Functions & getCreatorMap() const override { return functions; }
const Functions & getCaseInsensitiveCreatorMap() const override { return case_insensitive_functions; }
String getFactoryName() const override { return "FunctionFactory"; }
/// Register a function by its name.
/// No locking, you must register all functions before usage of get.
void registerFunction(

View File

@ -1062,9 +1062,7 @@ void FunctionArrayUniq::executeImpl(Block & block, const ColumnNumbers & argumen
|| executeNumber<Float32>(first_array, first_null_map, res_values)
|| executeNumber<Float64>(first_array, first_null_map, res_values)
|| executeString(first_array, first_null_map, res_values)))
throw Exception("Illegal column " + block.getByPosition(arguments[0]).column->getName()
+ " of first argument of function " + getName(),
ErrorCodes::ILLEGAL_COLUMN);
executeHashed(*offsets, original_data_columns, res_values);
}
else
{
@ -1272,6 +1270,213 @@ void FunctionArrayUniq::executeHashed(
}
}
/// Implementation of FunctionArrayDistinct.
FunctionPtr FunctionArrayDistinct::create(const Context &)
{
return std::make_shared<FunctionArrayDistinct>();
}
String FunctionArrayDistinct::getName() const
{
return name;
}
DataTypePtr FunctionArrayDistinct::getReturnTypeImpl(const DataTypes & arguments) const
{
const DataTypeArray * array_type = checkAndGetDataType<DataTypeArray>(arguments[0].get());
if (!array_type)
throw Exception("Argument for function " + getName() + " must be array but it "
" has type " + arguments[0]->getName() + ".",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
auto nested_type = removeNullable(array_type->getNestedType());
return std::make_shared<DataTypeArray>(nested_type);
}
void FunctionArrayDistinct::executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t /*input_rows_count*/)
{
ColumnPtr array_ptr = block.getByPosition(arguments[0]).column;
const ColumnArray * array = checkAndGetColumn<ColumnArray>(array_ptr.get());
const auto & return_type = block.getByPosition(result).type;
auto res_ptr = return_type->createColumn();
ColumnArray & res = static_cast<ColumnArray &>(*res_ptr);
const IColumn & src_data = array->getData();
const ColumnArray::Offsets & offsets = array->getOffsets();
ColumnRawPtrs original_data_columns;
original_data_columns.push_back(&src_data);
IColumn & res_data = res.getData();
ColumnArray::Offsets & res_offsets = res.getOffsets();
const ColumnNullable * nullable_col = nullptr;
const IColumn * inner_col;
if (src_data.isColumnNullable())
{
nullable_col = static_cast<const ColumnNullable *>(&src_data);
inner_col = &nullable_col->getNestedColumn();
}
else
{
inner_col = &src_data;
}
if (!(executeNumber<UInt8>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeNumber<UInt16>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeNumber<UInt32>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeNumber<UInt64>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeNumber<Int8>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeNumber<Int16>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeNumber<Int32>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeNumber<Int64>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeNumber<Float32>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeNumber<Float64>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeString(*inner_col, offsets, res_data, res_offsets, nullable_col)))
executeHashed(offsets, original_data_columns, res_data, res_offsets);
block.getByPosition(result).column = std::move(res_ptr);
}
template <typename T>
bool FunctionArrayDistinct::executeNumber(const IColumn & src_data,
const ColumnArray::Offsets & src_offsets,
IColumn & res_data_col,
ColumnArray::Offsets & res_offsets,
const ColumnNullable * nullable_col)
{
const ColumnVector<T> * src_data_concrete = checkAndGetColumn<ColumnVector<T>>(&src_data);
if (!src_data_concrete)
{
return false;
}
const PaddedPODArray<T> & values = src_data_concrete->getData();
PaddedPODArray<T> & res_data = typeid_cast<ColumnVector<T> &>(res_data_col).getData();
const PaddedPODArray<UInt8> * src_null_map = nullptr;
if (nullable_col)
{
src_null_map = &static_cast<const ColumnUInt8 *>(&nullable_col->getNullMapColumn())->getData();
}
using Set = ClearableHashSet<T,
DefaultHash<T>,
HashTableGrower<INITIAL_SIZE_DEGREE>,
HashTableAllocatorWithStackMemory<(1ULL << INITIAL_SIZE_DEGREE) * sizeof(T)>>;
Set set;
size_t prev_off = 0;
for (size_t i = 0; i < src_offsets.size(); ++i)
{
set.clear();
size_t off = src_offsets[i];
for (size_t j = prev_off; j < off; ++j)
{
if ((set.find(values[j]) == set.end()) && (!nullable_col || (*src_null_map)[j] == 0))
{
res_data.emplace_back(values[j]);
set.insert(values[j]);
}
}
res_offsets.emplace_back(set.size() + prev_off);
prev_off = off;
}
return true;
}
bool FunctionArrayDistinct::executeString(
const IColumn & src_data,
const ColumnArray::Offsets & src_offsets,
IColumn & res_data_col,
ColumnArray::Offsets & res_offsets,
const ColumnNullable * nullable_col)
{
const ColumnString * src_data_concrete = checkAndGetColumn<ColumnString>(&src_data);
if (!src_data_concrete)
{
return false;
}
ColumnString & res_data_column_string = typeid_cast<ColumnString &>(res_data_col);
using Set = ClearableHashSet<StringRef,
StringRefHash,
HashTableGrower<INITIAL_SIZE_DEGREE>,
HashTableAllocatorWithStackMemory<(1ULL << INITIAL_SIZE_DEGREE) * sizeof(StringRef)>>;
const PaddedPODArray<UInt8> * src_null_map = nullptr;
if (nullable_col)
{
src_null_map = &static_cast<const ColumnUInt8 *>(&nullable_col->getNullMapColumn())->getData();
}
Set set;
size_t prev_off = 0;
for (size_t i = 0; i < src_offsets.size(); ++i)
{
set.clear();
size_t off = src_offsets[i];
for (size_t j = prev_off; j < off; ++j)
{
StringRef str_ref = src_data_concrete->getDataAt(j);
if (set.find(str_ref) == set.end() && (!nullable_col || (*src_null_map)[j] == 0))
{
set.insert(str_ref);
res_data_column_string.insertData(str_ref.data, str_ref.size);
}
}
res_offsets.emplace_back(set.size() + prev_off);
prev_off = off;
}
return true;
}
void FunctionArrayDistinct::executeHashed(
const ColumnArray::Offsets & offsets,
const ColumnRawPtrs & columns,
IColumn & res_data_col,
ColumnArray::Offsets & res_offsets)
{
size_t count = columns.size();
using Set = ClearableHashSet<UInt128, UInt128TrivialHash, HashTableGrower<INITIAL_SIZE_DEGREE>,
HashTableAllocatorWithStackMemory<(1ULL << INITIAL_SIZE_DEGREE) * sizeof(UInt128)>>;
Set set;
size_t prev_off = 0;
for (size_t i = 0; i < offsets.size(); ++i)
{
set.clear();
size_t off = offsets[i];
for (size_t j = prev_off; j < off; ++j)
{
auto hash = hash128(j, count, columns);
if (set.find(hash) == set.end())
{
set.insert(hash);
res_data_col.insertFrom(*columns[0], j);
}
}
res_offsets.emplace_back(set.size() + prev_off);
prev_off = off;
}
}
/// Implementation of FunctionArrayEnumerateUniq.
FunctionPtr FunctionArrayEnumerateUniq::create(const Context &)
@ -1334,13 +1539,7 @@ void FunctionArrayEnumerateUniq::executeImpl(Block & block, const ColumnNumbers
ErrorCodes::SIZES_OF_ARRAYS_DOESNT_MATCH);
auto * array_data = &array->getData();
if (auto * tuple_column = checkAndGetColumn<ColumnTuple>(array_data))
{
for (const auto & element : tuple_column->getColumns())
data_columns.push_back(element.get());
}
else
data_columns.push_back(array_data);
data_columns.push_back(array_data);
}
size_t num_columns = data_columns.size();
@ -1383,9 +1582,7 @@ void FunctionArrayEnumerateUniq::executeImpl(Block & block, const ColumnNumbers
|| executeNumber<Float32>(first_array, first_null_map, res_values)
|| executeNumber<Float64>(first_array, first_null_map, res_values)
|| executeString (first_array, first_null_map, res_values)))
throw Exception("Illegal column " + block.getByPosition(arguments[0]).column->getName()
+ " of first argument of function " + getName(),
ErrorCodes::ILLEGAL_COLUMN);
executeHashed(*offsets, original_data_columns, res_values);
}
else
{
@ -2427,8 +2624,6 @@ void FunctionArrayReduce::executeImpl(Block & block, const ColumnNumbers & argum
std::vector<const IColumn *> aggregate_arguments_vec(num_arguments_columns);
const ColumnArray::Offsets * offsets = nullptr;
bool is_const = true;
for (size_t i = 0; i < num_arguments_columns; ++i)
{
const IColumn * col = block.getByPosition(arguments[i + 1]).column.get();
@ -2437,7 +2632,6 @@ void FunctionArrayReduce::executeImpl(Block & block, const ColumnNumbers & argum
{
aggregate_arguments_vec[i] = &arr->getData();
offsets_i = &arr->getOffsets();
is_const = false;
}
else if (const ColumnConst * const_arr = checkAndGetColumnConst<ColumnArray>(col))
{
@ -2493,14 +2687,7 @@ void FunctionArrayReduce::executeImpl(Block & block, const ColumnNumbers & argum
current_offset = next_offset;
}
if (!is_const)
{
block.getByPosition(result).column = std::move(result_holder);
}
else
{
block.getByPosition(result).column = block.getByPosition(result).type->createColumnConst(rows, res_col[0]);
}
block.getByPosition(result).column = std::move(result_holder);
}
/// Implementation of FunctionArrayConcat.

View File

@ -46,6 +46,8 @@ namespace ErrorCodes
* arrayUniq(arr) - counts the number of different elements in the array,
* arrayUniq(arr1, arr2, ...) - counts the number of different tuples from the elements in the corresponding positions in several arrays.
*
* arrayDistinct(arr) - retrun different elements in an array
*
* arrayEnumerateUniq(arr)
* - outputs an array parallel (having same size) to this, where for each element specified
* how many times this element was encountered before (including this element) among elements with the same value.
@ -204,7 +206,7 @@ private:
static bool hasNull(const PaddedPODArray<UInt8> & null_map, size_t i)
{
return null_map[i] == 1;
return null_map[i];
}
/// Both function arguments are ordinary.
@ -287,7 +289,7 @@ private:
for (size_t j = 0; j < array_size; ++j)
{
if (null_map_data[current_offset + j] == 1)
if (null_map_data[current_offset + j])
{
}
else if (compare(data[current_offset + j], value, i))
@ -324,7 +326,7 @@ private:
for (size_t j = 0; j < array_size; ++j)
{
bool hit = false;
if (null_map_data[current_offset + j] == 1)
if (null_map_data[current_offset + j])
{
if (hasNull(null_map_item, i))
hit = true;
@ -394,11 +396,6 @@ struct ArrayIndexNumNullImpl
size_t size = offsets.size();
result.resize(size);
if (!null_map_data)
return;
const auto & null_map_ref = *null_map_data;
ColumnArray::Offset current_offset = 0;
for (size_t i = 0; i < size; ++i)
{
@ -407,7 +404,7 @@ struct ArrayIndexNumNullImpl
for (size_t j = 0; j < array_size; ++j)
{
if (null_map_ref[current_offset + j] == 1)
if (null_map_data && (*null_map_data)[current_offset + j])
{
if (!IndexConv::apply(j, current))
break;
@ -433,11 +430,6 @@ struct ArrayIndexStringNullImpl
const auto size = offsets.size();
result.resize(size);
if (!null_map_data)
return;
const auto & null_map_ref = *null_map_data;
ColumnArray::Offset current_offset = 0;
for (size_t i = 0; i < size; ++i)
{
@ -446,8 +438,7 @@ struct ArrayIndexStringNullImpl
for (size_t j = 0; j < array_size; ++j)
{
size_t k = (current_offset == 0 && j == 0) ? 0 : current_offset + j - 1;
if (null_map_ref[k] == 1)
if (null_map_data && (*null_map_data)[current_offset + j])
{
if (!IndexConv::apply(j, current))
break;
@ -487,8 +478,7 @@ struct ArrayIndexStringImpl
ColumnArray::Offset string_size = string_offsets[current_offset + j] - string_pos;
size_t k = (current_offset == 0 && j == 0) ? 0 : current_offset + j - 1;
if (null_map_data && ((*null_map_data)[k] == 1))
if (null_map_data && (*null_map_data)[current_offset + j])
{
}
else if (string_size == value_size + 1 && 0 == memcmp(value.data(), &data[string_pos], value_size))
@ -524,21 +514,20 @@ struct ArrayIndexStringImpl
for (size_t j = 0; j < array_size; ++j)
{
ColumnArray::Offset string_pos = current_offset == 0 && j == 0
? 0
: string_offsets[current_offset + j - 1];
? 0
: string_offsets[current_offset + j - 1];
ColumnArray::Offset string_size = string_offsets[current_offset + j] - string_pos;
bool hit = false;
size_t k = (current_offset == 0 && j == 0) ? 0 : current_offset + j - 1;
if (null_map_data && ((*null_map_data)[k] == 1))
if (null_map_data && (*null_map_data)[current_offset + j])
{
if (null_map_item && ((*null_map_item)[i] == 1))
if (null_map_item && (*null_map_item)[i])
hit = true;
}
else if (string_size == value_size && 0 == memcmp(&item_values[value_pos], &data[string_pos], value_size))
hit = true;
hit = true;
if (hit)
{
@ -638,7 +627,7 @@ private:
for (size_t j = 0; j < array_size; ++j)
{
if (null_map_data[current_offset + j] == 1)
if (null_map_data[current_offset + j])
{
}
else if (0 == data.compareAt(current_offset + j, is_value_has_single_element_to_compare ? 0 : i, value, 1))
@ -674,9 +663,9 @@ private:
for (size_t j = 0; j < array_size; ++j)
{
bool hit = false;
if (null_map_data[current_offset + j] == 1)
if (null_map_data[current_offset + j])
{
if (null_map_item[i] == 1)
if (null_map_item[i])
hit = true;
}
else if (0 == data.compareAt(current_offset + j, is_value_has_single_element_to_compare ? 0 : i, value, 1))
@ -724,11 +713,6 @@ struct ArrayIndexGenericNullImpl
size_t size = offsets.size();
result.resize(size);
if (!null_map_data)
return;
const auto & null_map_ref = *null_map_data;
ColumnArray::Offset current_offset = 0;
for (size_t i = 0; i < size; ++i)
{
@ -737,7 +721,7 @@ struct ArrayIndexGenericNullImpl
for (size_t j = 0; j < array_size; ++j)
{
if (null_map_ref[current_offset + j] == 1)
if (null_map_data && (*null_map_data)[current_offset + j])
{
if (!IndexConv::apply(j, current))
break;
@ -931,7 +915,7 @@ private:
if (arr[i].isNull())
{
if (null_map && ((*null_map)[row] == 1))
if (null_map && (*null_map)[row])
hit = true;
}
else if (applyVisitor(FieldVisitorAccurateEquals(), arr[i], value))
@ -1027,10 +1011,11 @@ public:
DataTypePtr observed_type0 = removeNullable(array_type->getNestedType());
DataTypePtr observed_type1 = removeNullable(arguments[1]);
if (!(observed_type0->isNumber() && observed_type1->isNumber())
/// We also support arrays of Enum type (that are represented by number) to search numeric values.
if (!(observed_type0->isValueRepresentedByNumber() && observed_type1->isNumber())
&& !observed_type0->equals(*observed_type1))
throw Exception("Types of array and 2nd argument of function "
+ getName() + " must be identical up to nullability. Passed: "
+ getName() + " must be identical up to nullability or numeric types or Enum and numeric type. Passed: "
+ arguments[0]->getName() + " and " + arguments[1]->getName() + ".",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
}
@ -1228,6 +1213,52 @@ private:
};
/// Find different elements in an array.
class FunctionArrayDistinct : public IFunction
{
public:
static constexpr auto name = "arrayDistinct";
static FunctionPtr create(const Context & context);
String getName() const override;
bool isVariadic() const override { return false; }
size_t getNumberOfArguments() const override { return 1; }
bool useDefaultImplementationForConstants() const override { return true; }
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override;
void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) override;
private:
/// Initially allocate a piece of memory for 512 elements. NOTE: This is just a guess.
static constexpr size_t INITIAL_SIZE_DEGREE = 9;
template <typename T>
bool executeNumber(
const IColumn & src_data,
const ColumnArray::Offsets & src_offsets,
IColumn & res_data_col,
ColumnArray::Offsets & res_offsets,
const ColumnNullable * nullable_col);
bool executeString(
const IColumn & src_data,
const ColumnArray::Offsets & src_offsets,
IColumn & res_data_col,
ColumnArray::Offsets & res_offsets,
const ColumnNullable * nullable_col);
void executeHashed(
const ColumnArray::Offsets & offsets,
const ColumnRawPtrs & columns,
IColumn & res_data_col,
ColumnArray::Offsets & res_offsets);
};
class FunctionArrayEnumerateUniq : public IFunction
{
public:
@ -1402,6 +1433,9 @@ public:
bool isVariadic() const override { return true; }
size_t getNumberOfArguments() const override { return 0; }
bool useDefaultImplementationForConstants() const override { return true; }
ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {0}; }
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override;
void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) override;

View File

@ -15,6 +15,10 @@
#include <DataTypes/DataTypeFixedString.h>
#include <DataTypes/DataTypeTuple.h>
#include <DataTypes/DataTypeEnum.h>
#include <DataTypes/getLeastSupertype.h>
#include <DataTypes/getLeastSupertype.h>
#include <Interpreters/castColumn.h>
#include <Functions/FunctionsLogical.h>
#include <Functions/IFunction.h>
@ -617,9 +621,12 @@ class FunctionComparison : public IFunction
{
public:
static constexpr auto name = Name::name;
static FunctionPtr create(const Context &) { return std::make_shared<FunctionComparison>(); }
static FunctionPtr create(const Context & context) { return std::make_shared<FunctionComparison>(context); }
FunctionComparison(const Context & context) : context(context) {}
private:
const Context & context;
template <typename T0, typename T1>
bool executeNumRightType(Block & block, size_t result, const ColumnVector<T0> * col_left, const IColumn * col_right_untyped)
{
@ -798,7 +805,7 @@ private:
}
}
void executeDateOrDateTimeOrEnumWithConstString(
bool executeDateOrDateTimeOrEnumOrUUIDWithConstString(
Block & block, size_t result, const IColumn * col_left_untyped, const IColumn * col_right_untyped,
const DataTypePtr & left_type, const DataTypePtr & right_type, bool left_is_num, size_t input_rows_count)
{
@ -821,8 +828,7 @@ private:
const auto column_string = checkAndGetColumnConst<ColumnString>(column_string_untyped);
if (!column_string || !legal_types)
throw Exception{"Illegal columns " + col_left_untyped->getName() + " and " + col_right_untyped->getName()
+ " of arguments of function " + getName(), ErrorCodes::ILLEGAL_COLUMN};
return false;
StringRef string_value = column_string->getDataAt(0);
@ -875,6 +881,8 @@ private:
else if (is_enum16)
executeEnumWithConstString<DataTypeEnum16>(block, result, column_number, column_string,
number_type, left_is_num, input_rows_count);
return true;
}
/// Comparison between DataTypeEnum<T> and string constant containing the name of an enum element
@ -954,7 +962,7 @@ private:
void executeTupleEqualityImpl(Block & block, size_t result, const ColumnsWithTypeAndName & x, const ColumnsWithTypeAndName & y,
size_t tuple_size, size_t input_rows_count)
{
ComparisonFunction func_compare;
ComparisonFunction func_compare(context);
ConvolutionFunction func_convolution;
Block tmp_block;
@ -983,11 +991,11 @@ private:
void executeTupleLessGreaterImpl(Block & block, size_t result, const ColumnsWithTypeAndName & x,
const ColumnsWithTypeAndName & y, size_t tuple_size, size_t input_rows_count)
{
HeadComparisonFunction func_compare_head;
TailComparisonFunction func_compare_tail;
HeadComparisonFunction func_compare_head(context);
TailComparisonFunction func_compare_tail(context);
FunctionAnd func_and;
FunctionOr func_or;
FunctionComparison<EqualsOp, NameEquals> func_equals;
FunctionComparison<EqualsOp, NameEquals> func_equals(context);
Block tmp_block;
@ -1025,7 +1033,7 @@ private:
block.getByPosition(result).column = tmp_block.getByPosition(tmp_block.columns() - 1).column;
}
void executeGeneric(Block & block, size_t result, const IColumn * c0, const IColumn * c1)
void executeGenericIdenticalTypes(Block & block, size_t result, const IColumn * c0, const IColumn * c1)
{
bool c0_const = c0->isColumnConst();
bool c1_const = c1->isColumnConst();
@ -1053,6 +1061,16 @@ private:
}
}
void executeGeneric(Block & block, size_t result, const ColumnWithTypeAndName & c0, const ColumnWithTypeAndName & c1)
{
DataTypePtr common_type = getLeastSupertype({c0.type, c1.type});
ColumnPtr c0_converted = castColumn(c0, common_type, context);
ColumnPtr c1_converted = castColumn(c1, common_type, context);
executeGenericIdenticalTypes(block, result, c0_converted.get(), c1_converted.get());
}
public:
String getName() const override
{
@ -1122,8 +1140,17 @@ public:
|| (left_is_string && right_is_enum)
|| (left_tuple && right_tuple && left_tuple->getElements().size() == right_tuple->getElements().size())
|| (arguments[0]->equals(*arguments[1]))))
throw Exception("Illegal types of arguments (" + arguments[0]->getName() + ", " + arguments[1]->getName() + ")"
" of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
{
try
{
getLeastSupertype(arguments);
}
catch (const Exception &)
{
throw Exception("Illegal types of arguments (" + arguments[0]->getName() + ", " + arguments[1]->getName() + ")"
" of function " + getName(), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
}
}
if (left_tuple && right_tuple)
{
@ -1167,16 +1194,26 @@ public:
ErrorCodes::ILLEGAL_COLUMN);
}
else if (checkAndGetDataType<DataTypeTuple>(col_with_type_and_name_left.type.get()))
{
executeTuple(block, result, col_with_type_and_name_left, col_with_type_and_name_right, input_rows_count);
}
else if (!left_is_num && !right_is_num && executeString(block, result, col_left_untyped, col_right_untyped))
;
{
}
else if (col_with_type_and_name_left.type->equals(*col_with_type_and_name_right.type))
executeGeneric(block, result, col_left_untyped, col_right_untyped);
else
executeDateOrDateTimeOrEnumWithConstString(
{
executeGenericIdenticalTypes(block, result, col_left_untyped, col_right_untyped);
}
else if (executeDateOrDateTimeOrEnumOrUUIDWithConstString(
block, result, col_left_untyped, col_right_untyped,
col_with_type_and_name_left.type, col_with_type_and_name_right.type,
left_is_num, input_rows_count);
left_is_num, input_rows_count))
{
}
else
{
executeGeneric(block, result, col_with_type_and_name_left, col_with_type_and_name_right);
}
}
#if USE_EMBEDDED_COMPILER

Some files were not shown because too many files have changed in this diff Show More