Merge remote-tracking branch 'upstream/master'

This commit is contained in:
alesapin 2018-07-20 13:13:50 +03:00
commit 2915a58c8f
341 changed files with 5267 additions and 4333 deletions

3
.gitignore vendored
View File

@ -11,6 +11,7 @@
/build
/docs/build
/docs/edit
/docs/tools/venv/
/docs/en/development/build/
/docs/ru/development/build/
@ -238,3 +239,5 @@ node_modules
public
website/docs
website/presentations
.DS_Store
*/.DS_Store

View File

@ -1,15 +1,15 @@
# en:
## en:
## Improvements:
### Improvements:
* Added Nullable support for runningDifference function. [#2590](https://github.com/yandex/ClickHouse/issues/2590)
## Bug fiexs:
* Fixed switching to default databses in case of client reconection. [#2580](https://github.com/yandex/ClickHouse/issues/2580)
### Bug fiexs:
* Fixed switching to default databases in case of client reconnection. [#2580](https://github.com/yandex/ClickHouse/issues/2580)
# ru:
## ru:
## Улучшения:
### Улучшения:
* Добавлена поддержка Nullable для функции runningDifference. [#2590](https://github.com/yandex/ClickHouse/issues/2590)
## Исправление ошибок:
### Исправление ошибок:
* Исправлено переключение на дефолтную базу данных при переподключении клиента. [#2580](https://github.com/yandex/ClickHouse/issues/2580)

View File

@ -1,6 +1,6 @@
# ClickHouse release 1.1.54388, 2018-06-28
## ClickHouse release 1.1.54388, 2018-06-28
## New features:
### New features:
* Support for the `ALTER TABLE t DELETE WHERE` query for replicated tables. Added the `system.mutations` table to track progress of this type of queries.
* Support for the `ALTER TABLE t [REPLACE|ATTACH] PARTITION` query for MergeTree tables.
@ -18,12 +18,12 @@
* Added the `date_time_input_format` setting. If you switch this setting to `'best_effort'`, DateTime values will be read in a wide range of formats.
* Added the `clickhouse-obfuscator` utility for data obfuscation. Usage example: publishing data used in performance tests.
## Experimental features:
### Experimental features:
* Added the ability to calculate `and` arguments only where they are needed ([Anastasia Tsarkova](https://github.com/yandex/ClickHouse/pull/2272)).
* JIT compilation to native code is now available for some expressions ([pyos](https://github.com/yandex/ClickHouse/pull/2277)).
## Bug fixes:
### Bug fixes:
* Duplicates no longer appear for a query with `DISTINCT` and `ORDER BY`.
* Queries with `ARRAY JOIN` and `arrayFilter` no longer return an incorrect result.
@ -45,7 +45,7 @@
* Fixed SSRF in the `remote()` table function.
* Fixed exit behavior of `clickhouse-client` in multiline mode ([#2510](https://github.com/yandex/ClickHouse/issues/2510)).
## Improvements:
### Improvements:
* Background tasks in replicated tables are now performed in a thread pool instead of in separate threads ([Silviu Caragea](https://github.com/yandex/ClickHouse/pull/1722)).
* Improved LZ4 compression performance.
@ -58,7 +58,7 @@
* When calculating the number of available CPU cores, limits on cgroups are now taken into account ([Atri Sharma](https://github.com/yandex/ClickHouse/pull/2325)).
* Added chown for config directories in the systemd config file ([Mikhail Shiryaev](https://github.com/yandex/ClickHouse/pull/2421)).
## Build changes:
### Build changes:
* The gcc8 compiler can be used for builds.
* Added the ability to build llvm from a submodule.
@ -69,36 +69,36 @@
* Added the ability to use the libtinfo library instead of libtermcap ([Georgy Kondratiev](https://github.com/yandex/ClickHouse/pull/2519)).
* Fixed a header file conflict in Fedora Rawhide ([#2520](https://github.com/yandex/ClickHouse/issues/2520)).
## Backward incompatible changes:
### Backward incompatible changes:
* Removed escaping in `Vertical` and `Pretty*` formats and deleted the `VerticalRaw` format.
* If servers with version 1.1.54388 (or newer) and servers with older version are used simultaneously in distributed query and the query has `cast(x, 'Type')` expression in the form without `AS` keyword and with `cast` not in uppercase, then the exception with message like `Not found column cast(0, 'UInt8') in block` will be thrown. Solution: update server on all cluster nodes.
# ClickHouse release 1.1.54385, 2018-06-01
## ClickHouse release 1.1.54385, 2018-06-01
## Bug fixes:
### Bug fixes:
* Fixed an error that in some cases caused ZooKeeper operations to block.
# ClickHouse release 1.1.54383, 2018-05-22
## ClickHouse release 1.1.54383, 2018-05-22
## Bug fixes:
### Bug fixes:
* Fixed a slowdown of replication queue if a table has many replicas.
# ClickHouse release 1.1.54381, 2018-05-14
## ClickHouse release 1.1.54381, 2018-05-14
## Bug fixes:
### Bug fixes:
* Fixed a nodes leak in ZooKeeper when ClickHouse loses connection to ZooKeeper server.
# ClickHouse release 1.1.54380, 2018-04-21
## ClickHouse release 1.1.54380, 2018-04-21
## New features:
### New features:
* Added table function `file(path, format, structure)`. An example reading bytes from `/dev/urandom`: `ln -s /dev/urandom /var/lib/clickhouse/user_files/random` `clickhouse-client -q "SELECT * FROM file('random', 'RowBinary', 'd UInt8') LIMIT 10"`.
## Improvements:
### Improvements:
* Subqueries could be wrapped by `()` braces (to enhance queries readability). For example, `(SELECT 1) UNION ALL (SELECT 1)`.
* Simple `SELECT` queries from table `system.processes` are not counted in `max_concurrent_queries` limit.
## Bug fixes:
### Bug fixes:
* Fixed incorrect behaviour of `IN` operator when select from `MATERIALIZED VIEW`.
* Fixed incorrect filtering by partition index in expressions like `WHERE partition_key_column IN (...)`
* Fixed inability to execute `OPTIMIZE` query on non-leader replica if the table was `REANAME`d.
@ -106,11 +106,11 @@
* Fixed freezing of `KILL QUERY` queries.
* Fixed an error in ZooKeeper client library which led to watches loses, freezing of distributed DDL queue and slowing replication queue if non-empty `chroot` prefix is used in ZooKeeper configuration.
## Backward incompatible changes:
### Backward incompatible changes:
* Removed support of expressions like `(a, b) IN (SELECT (a, b))` (instead of them you can use their equivalent `(a, b) IN (SELECT a, b)`). In previous releases, these expressions led to undetermined data filtering or caused errors.
# ClickHouse release 1.1.54378, 2018-04-16
## New features:
## ClickHouse release 1.1.54378, 2018-04-16
### New features:
* Logging level can be changed without restarting the server.
* Added the `SHOW CREATE DATABASE` query.
@ -124,7 +124,7 @@
* Multiple comma-separated `topics` can be specified for the `Kafka` engine (Tobias Adamson).
* When a query is stopped by `KILL QUERY` or `replace_running_query`, the client receives the `Query was cancelled` exception instead of an incomplete response.
## Improvements:
### Improvements:
* `ALTER TABLE ... DROP/DETACH PARTITION` queries are run at the front of the replication queue.
* `SELECT ... FINAL` and `OPTIMIZE ... FINAL` can be used even when the table has a single data part.
@ -135,7 +135,7 @@
* More robust crash recovery for asynchronous insertion into `Distributed` tables.
* The return type of the `countEqual` function changed from `UInt32` to `UInt64` (谢磊).
## Bug fixes:
### Bug fixes:
* Fixed an error with `IN` when the left side of the expression is `Nullable`.
* Correct results are now returned when using tuples with `IN` when some of the tuple components are in the table index.
@ -151,31 +151,31 @@
* `SummingMergeTree` now works correctly for summation of nested data structures with a composite key.
* Fixed the possibility of a race condition when choosing the leader for `ReplicatedMergeTree` tables.
## Build changes:
### Build changes:
* The build supports `ninja` instead of `make` and uses it by default for building releases.
* Renamed packages: `clickhouse-server-base` is now `clickhouse-common-static`; `clickhouse-server-common` is now `clickhouse-server`; `clickhouse-common-dbg` is now `clickhouse-common-static-dbg`. To install, use `clickhouse-server clickhouse-client`. Packages with the old names will still load in the repositories for backward compatibility.
## Backward-incompatible changes:
### Backward-incompatible changes:
* Removed the special interpretation of an IN expression if an array is specified on the left side. Previously, the expression `arr IN (set)` was interpreted as "at least one `arr` element belongs to the `set`". To get the same behavior in the new version, write `arrayExists(x -> x IN (set), arr)`.
* Disabled the incorrect use of the socket option `SO_REUSEPORT`, which was incorrectly enabled by default in the Poco library. Note that on Linux there is no longer any reason to simultaneously specify the addresses `::` and `0.0.0.0` for listen use just `::`, which allows listening to the connection both over IPv4 and IPv6 (with the default kernel config settings). You can also revert to the behavior from previous versions by specifying `<listen_reuse_port>1</listen_reuse_port>` in the config.
# ClickHouse release 1.1.54370, 2018-03-16
## ClickHouse release 1.1.54370, 2018-03-16
## New features:
### New features:
* Added the `system.macros` table and auto updating of macros when the config file is changed.
* Added the `SYSTEM RELOAD CONFIG` query.
* Added the `maxIntersections(left_col, right_col)` aggregate function, which returns the maximum number of simultaneously intersecting intervals `[left; right]`. The `maxIntersectionsPosition(left, right)` function returns the beginning of the "maximum" interval. ([Michael Furmur](https://github.com/yandex/ClickHouse/pull/2012)).
## Improvements:
### Improvements:
* When inserting data in a `Replicated` table, fewer requests are made to `ZooKeeper` (and most of the user-level errors have disappeared from the `ZooKeeper` log).
* Added the ability to create aliases for sets. Example: `WITH (1, 2, 3) AS set SELECT number IN set FROM system.numbers LIMIT 10`.
## Bug fixes:
### Bug fixes:
* Fixed the `Illegal PREWHERE` error when reading from `Merge` tables over `Distributed` tables.
* Added fixes that allow you to run `clickhouse-server` in IPv4-only Docker containers.
@ -189,9 +189,9 @@
* Restored the behavior for queries like `SELECT * FROM remote('server2', default.table) WHERE col IN (SELECT col2 FROM default.table)` when the right side argument of the `IN` should use a remote `default.table` instead of a local one. This behavior was broken in version 1.1.54358.
* Removed extraneous error-level logging of `Not found column ... in block`.
# ClickHouse release 1.1.54356, 2018-03-06
## ClickHouse release 1.1.54356, 2018-03-06
## New features:
### New features:
* Aggregation without `GROUP BY` for an empty set (such as `SELECT count(*) FROM table WHERE 0`) now returns a result with one row with null values for aggregate functions, in compliance with the SQL standard. To restore the old behavior (return an empty result), set `empty_result_for_aggregation_by_empty_set` to 1.
* Added type conversion for `UNION ALL`. Different alias names are allowed in `SELECT` positions in `UNION ALL`, in compliance with the SQL standard.
@ -226,7 +226,7 @@
* `RENAME TABLE` can be performed for `VIEW`.
* Added the `odbc_default_field_size` option, which allows you to extend the maximum size of the value loaded from an ODBC source (by default, it is 1024).
## Improvements:
### Improvements:
* Limits and quotas on the result are no longer applied to intermediate data for `INSERT SELECT` queries or for `SELECT` subqueries.
* Fewer false triggers of `force_restore_data` when checking the status of `Replicated` tables when the server starts.
@ -242,7 +242,7 @@
* `Enum` values can be used in `min`, `max`, `sum` and some other functions. In these cases, it uses the corresponding numeric values. This feature was previously available but was lost in the release 1.1.54337.
* Added `max_expanded_ast_elements` to restrict the size of the AST after recursively expanding aliases.
## Bug fixes:
### Bug fixes:
* Fixed cases when unnecessary columns were removed from subqueries in error, or not removed from subqueries containing `UNION ALL`.
* Fixed a bug in merges for `ReplacingMergeTree` tables.
@ -268,18 +268,18 @@
* Fixed a crash when passing arrays of different sizes to an `arrayReduce` function when using aggregate functions from multiple arguments.
* Prohibited the use of queries with `UNION ALL` in a `MATERIALIZED VIEW`.
## Backward incompatible changes:
### Backward incompatible changes:
* Removed the `distributed_ddl_allow_replicated_alter` option. This behavior is enabled by default.
* Removed the `UnsortedMergeTree` engine.
# ClickHouse release 1.1.54343, 2018-02-05
## ClickHouse release 1.1.54343, 2018-02-05
* Added macros support for defining cluster names in distributed DDL queries and constructors of Distributed tables: `CREATE TABLE distr ON CLUSTER '{cluster}' (...) ENGINE = Distributed('{cluster}', 'db', 'table')`.
* Now the table index is used for conditions like `expr IN (subquery)`.
* Improved processing of duplicates when inserting to Replicated tables, so they no longer slow down execution of the replication queue.
# ClickHouse release 1.1.54342, 2018-01-22
## ClickHouse release 1.1.54342, 2018-01-22
This release contains bug fixes for the previous release 1.1.54337:
* Fixed a regression in 1.1.54337: if the default user has readonly access, then the server refuses to start up with the message `Cannot create database in readonly mode`.
@ -290,9 +290,9 @@ This release contains bug fixes for the previous release 1.1.54337:
* Buffer tables now work correctly when MATERIALIZED columns are present in the destination table (by zhang2014).
* Fixed a bug in implementation of NULL.
# ClickHouse release 1.1.54337, 2018-01-18
## ClickHouse release 1.1.54337, 2018-01-18
## New features:
### New features:
* Added support for storage of multidimensional arrays and tuples (`Tuple` data type) in tables.
* Added support for table functions in `DESCRIBE` and `INSERT` queries. Added support for subqueries in `DESCRIBE`. Examples: `DESC TABLE remote('host', default.hits)`; `DESC TABLE (SELECT 1)`; `INSERT INTO TABLE FUNCTION remote('host', default.hits)`. Support for `INSERT INTO TABLE` syntax in addition to `INSERT INTO`.
@ -323,7 +323,7 @@ This release contains bug fixes for the previous release 1.1.54337:
* Added the `--silent` option for the `clickhouse-local` tool. It suppresses printing query execution info in stderr.
* Added support for reading values of type `Date` from text in a format where the month and/or day of the month is specified using a single digit instead of two digits (Amos Bird).
## Performance optimizations:
### Performance optimizations:
* Improved performance of `min`, `max`, `any`, `anyLast`, `anyHeavy`, `argMin`, `argMax` aggregate functions for String arguments.
* Improved performance of `isInfinite`, `isFinite`, `isNaN`, `roundToExp2` functions.
@ -332,7 +332,7 @@ This release contains bug fixes for the previous release 1.1.54337:
* Lowered memory usage for `JOIN` in the case when the left and right parts have columns with identical names that are not contained in `USING`.
* Improved performance of `varSamp`, `varPop`, `stddevSamp`, `stddevPop`, `covarSamp`, `covarPop`, and `corr` aggregate functions by reducing computational stability. The old functions are available under the names: `varSampStable`, `varPopStable`, `stddevSampStable`, `stddevPopStable`, `covarSampStable`, `covarPopStable`, `corrStable`.
## Bug fixes:
### Bug fixes:
* Fixed data deduplication after running a `DROP PARTITION` query. In the previous version, dropping a partition and INSERTing the same data again was not working because INSERTed blocks were considered duplicates.
* Fixed a bug that could lead to incorrect interpretation of the `WHERE` clause for `CREATE MATERIALIZED VIEW` queries with `POPULATE`.
@ -371,7 +371,7 @@ This release contains bug fixes for the previous release 1.1.54337:
* Fixed the `SYSTEM DROP DNS CACHE` query: the cache was flushed but addresses of cluster nodes were not updated.
* Fixed the behavior of `MATERIALIZED VIEW` after executing `DETACH TABLE` for the table under the view (Marek Vavruša).
## Build improvements:
### Build improvements:
* Builds use `pbuilder`. The build process is almost completely independent of the build host environment.
* A single build is used for different OS versions. Packages and binaries have been made compatible with a wide range of Linux systems.
@ -385,7 +385,7 @@ This release contains bug fixes for the previous release 1.1.54337:
* Removed usage of GNU extensions from the code. Enabled the `-Wextra` option. When building with `clang`, `libc++` is used instead of `libstdc++`.
* Extracted `clickhouse_parsers` and `clickhouse_common_io` libraries to speed up builds of various tools.
## Backward incompatible changes:
### Backward incompatible changes:
* The format for marks in `Log` type tables that contain `Nullable` columns was changed in a backward incompatible way. If you have these tables, you should convert them to the `TinyLog` type before starting up the new server version. To do this, replace `ENGINE = Log` with `ENGINE = TinyLog` in the corresponding `.sql` file in the `metadata` directory. If your table doesn't have `Nullable` columns or if the type of your table is not `Log`, then you don't need to do anything.
* Removed the `experimental_allow_extended_storage_definition_syntax` setting. Now this feature is enabled by default.
@ -396,16 +396,16 @@ This release contains bug fixes for the previous release 1.1.54337:
* In previous server versions there was an undocumented feature: if an aggregate function depends on parameters, you can still specify it without parameters in the AggregateFunction data type. Example: `AggregateFunction(quantiles, UInt64)` instead of `AggregateFunction(quantiles(0.5, 0.9), UInt64)`. This feature was lost. Although it was undocumented, we plan to support it again in future releases.
* Enum data types cannot be used in min/max aggregate functions. The possibility will be returned back in future release.
## Please note when upgrading:
### Please note when upgrading:
* When doing a rolling update on a cluster, at the point when some of the replicas are running the old version of ClickHouse and some are running the new version, replication is temporarily stopped and the message `unknown parameter 'shard'` appears in the log. Replication will continue after all replicas of the cluster are updated.
* If you have different ClickHouse versions on the cluster, you can get incorrect results for distributed queries with the aggregate functions `varSamp`, `varPop`, `stddevSamp`, `stddevPop`, `covarSamp`, `covarPop`, and `corr`. You should update all cluster nodes.
# ClickHouse release 1.1.54327, 2017-12-21
## ClickHouse release 1.1.54327, 2017-12-21
This release contains bug fixes for the previous release 1.1.54318:
* Fixed bug with possible race condition in replication that could lead to data loss. This issue affects versions 1.1.54310 and 1.1.54318. If you use one of these versions with Replicated tables, the update is strongly recommended. This issue shows in logs in Warning messages like `Part ... from own log doesn't exist.` The issue is relevant even if you don't see these messages in logs.
# ClickHouse release 1.1.54318, 2017-11-30
## ClickHouse release 1.1.54318, 2017-11-30
This release contains bug fixes for the previous release 1.1.54310:
* Fixed incorrect row deletions during merges in the SummingMergeTree engine
@ -414,9 +414,9 @@ This release contains bug fixes for the previous release 1.1.54310:
* Fixed an issue that was causing the replication queue to stop running
* Fixed rotation and archiving of server logs
# ClickHouse release 1.1.54310, 2017-11-01
## ClickHouse release 1.1.54310, 2017-11-01
## New features:
### New features:
* Custom partitioning key for the MergeTree family of table engines.
* [Kafka](https://clickhouse.yandex/docs/en/single/index.html#document-table_engines/kafka) table engine.
* Added support for loading [CatBoost](https://catboost.yandex/) models and applying them to data stored in ClickHouse.
@ -432,12 +432,12 @@ This release contains bug fixes for the previous release 1.1.54310:
* Added support for the Cap'n Proto input format.
* You can now customize compression level when using the zstd algorithm.
## Backward incompatible changes:
### Backward incompatible changes:
* Creation of temporary tables with an engine other than Memory is forbidden.
* Explicit creation of tables with the View or MaterializedView engine is forbidden.
* During table creation, a new check verifies that the sampling key expression is included in the primary key.
## Bug fixes:
### Bug fixes:
* Fixed hangups when synchronously inserting into a Distributed table.
* Fixed nonatomic adding and removing of parts in Replicated tables.
* Data inserted into a materialized view is not subjected to unnecessary deduplication.
@ -447,15 +447,15 @@ This release contains bug fixes for the previous release 1.1.54310:
* Fixed hangups when the disk volume containing server logs is full.
* Fixed an overflow in the `toRelativeWeekNum` function for the first week of the Unix epoch.
## Build improvements:
### Build improvements:
* Several third-party libraries (notably Poco) were updated and converted to git submodules.
# ClickHouse release 1.1.54304, 2017-10-19
## ClickHouse release 1.1.54304, 2017-10-19
## New features:
### New features:
* TLS support in the native protocol (to enable, set `tcp_ssl_port` in `config.xml`)
## Bug fixes:
### Bug fixes:
* `ALTER` for replicated tables now tries to start running as soon as possible
* Fixed crashing when reading data with the setting `preferred_block_size_bytes=0`
* Fixed crashes of `clickhouse-client` when `Page Down` is pressed
@ -468,16 +468,16 @@ This release contains bug fixes for the previous release 1.1.54310:
* Users are updated correctly when `users.xml` is invalid
* Correct handling when an executable dictionary returns a non-zero response code
# ClickHouse release 1.1.54292, 2017-09-20
## ClickHouse release 1.1.54292, 2017-09-20
## New features:
### New features:
* Added the `pointInPolygon` function for working with coordinates on a coordinate plane.
* Added the `sumMap` aggregate function for calculating the sum of arrays, similar to `SummingMergeTree`.
* Added the `trunc` function. Improved performance of the rounding functions (`round`, `floor`, `ceil`, `roundToExp2`) and corrected the logic of how they work. Changed the logic of the `roundToExp2` function for fractions and negative numbers.
* The ClickHouse executable file is now less dependent on the libc version. The same ClickHouse executable file can run on a wide variety of Linux systems. Note: There is still a dependency when using compiled queries (with the setting `compile = 1`, which is not used by default).
* Reduced the time needed for dynamic compilation of queries.
## Bug fixes:
### Bug fixes:
* Fixed an error that sometimes produced `part ... intersects previous part` messages and weakened replica consistency.
* Fixed an error that caused the server to lock up if ZooKeeper was unavailable during shutdown.
* Removed excessive logging when restoring replicas.
@ -485,9 +485,9 @@ This release contains bug fixes for the previous release 1.1.54310:
* Fixed an error in the concat function that occurred if the first column in a block has the Array type.
* Progress is now displayed correctly in the system.merges table.
# ClickHouse release 1.1.54289, 2017-09-13
## ClickHouse release 1.1.54289, 2017-09-13
## New features:
### New features:
* `SYSTEM` queries for server administration: `SYSTEM RELOAD DICTIONARY`, `SYSTEM RELOAD DICTIONARIES`, `SYSTEM DROP DNS CACHE`, `SYSTEM SHUTDOWN`, `SYSTEM KILL`.
* Added functions for working with arrays: `concat`, `arraySlice`, `arrayPushBack`, `arrayPushFront`, `arrayPopBack`, `arrayPopFront`.
* Added the `root` and `identity` parameters for the ZooKeeper configuration. This allows you to isolate individual users on the same ZooKeeper cluster.
@ -502,7 +502,7 @@ This release contains bug fixes for the previous release 1.1.54310:
* Option to set `umask` in the config file.
* Improved performance for queries with `DISTINCT`.
## Bug fixes:
### Bug fixes:
* Improved the process for deleting old nodes in ZooKeeper. Previously, old nodes sometimes didn't get deleted if there were very frequent inserts, which caused the server to be slow to shut down, among other things.
* Fixed randomization when choosing hosts for the connection to ZooKeeper.
* Fixed the exclusion of lagging replicas in distributed queries if the replica is localhost.
@ -515,28 +515,28 @@ This release contains bug fixes for the previous release 1.1.54310:
* Resolved the appearance of zombie processes when using a dictionary with an `executable` source.
* Fixed segfault for the HEAD query.
## Improvements to development workflow and ClickHouse build:
### Improvements to development workflow and ClickHouse build:
* You can use `pbuilder` to build ClickHouse.
* You can use `libc++` instead of `libstdc++` for builds on Linux.
* Added instructions for using static code analysis tools: `Coverity`, `clang-tidy`, and `cppcheck`.
## Please note when upgrading:
### Please note when upgrading:
* There is now a higher default value for the MergeTree setting `max_bytes_to_merge_at_max_space_in_pool` (the maximum total size of data parts to merge, in bytes): it has increased from 100 GiB to 150 GiB. This might result in large merges running after the server upgrade, which could cause an increased load on the disk subsystem. If the free space available on the server is less than twice the total amount of the merges that are running, this will cause all other merges to stop running, including merges of small data parts. As a result, INSERT requests will fail with the message "Merges are processing significantly slower than inserts." Use the `SELECT * FROM system.merges` request to monitor the situation. You can also check the `DiskSpaceReservedForMerge` metric in the `system.metrics` table, or in Graphite. You don't need to do anything to fix this, since the issue will resolve itself once the large merges finish. If you find this unacceptable, you can restore the previous value for the `max_bytes_to_merge_at_max_space_in_pool` setting (to do this, go to the `<merge_tree>` section in config.xml, set `<max_bytes_to_merge_at_max_space_in_pool>107374182400</max_bytes_to_merge_at_max_space_in_pool>` and restart the server).
# ClickHouse release 1.1.54284, 2017-08-29
## ClickHouse release 1.1.54284, 2017-08-29
* This is bugfix release for previous 1.1.54282 release. It fixes ZooKeeper nodes leak in `parts/` directory.
# ClickHouse release 1.1.54282, 2017-08-23
## ClickHouse release 1.1.54282, 2017-08-23
This is a bugfix release. The following bugs were fixed:
* `DB::Exception: Assertion violation: !_path.empty()` error when inserting into a Distributed table.
* Error when parsing inserted data in RowBinary format if the data begins with ';' character.
* Errors during runtime compilation of certain aggregate functions (e.g. `groupArray()`).
# ClickHouse release 1.1.54276, 2017-08-16
## ClickHouse release 1.1.54276, 2017-08-16
## New features:
### New features:
* You can use an optional WITH clause in a SELECT query. Example query: `WITH 1+1 AS a SELECT a, a*a`
* INSERT can be performed synchronously in a Distributed table: OK is returned only after all the data is saved on all the shards. This is activated by the setting insert_distributed_sync=1.
@ -547,7 +547,7 @@ This is a bugfix release. The following bugs were fixed:
* Added support for non-constant arguments and negative offsets in the function `substring(str, pos, len).`
* Added the max_size parameter for the `groupArray(max_size)(column)` aggregate function, and optimized its performance.
## Major changes:
### Major changes:
* Improved security: all server files are created with 0640 permissions (can be changed via <umask> config parameter).
* Improved error messages for queries with invalid syntax.
@ -555,11 +555,11 @@ This is a bugfix release. The following bugs were fixed:
* Significantly increased the performance of data merges for the ReplacingMergeTree engine.
* Improved performance for asynchronous inserts from a Distributed table by batching multiple source inserts. To enable this functionality, use the setting distributed_directory_monitor_batch_inserts=1.
## Backward incompatible changes:
### Backward incompatible changes:
* Changed the binary format of aggregate states of `groupArray(array_column)` functions for arrays.
## Complete list of changes:
### Complete list of changes:
* Added the `output_format_json_quote_denormals` setting, which enables outputting nan and inf values in JSON format.
* Optimized thread allocation when reading from a Distributed table.
@ -578,7 +578,7 @@ This is a bugfix release. The following bugs were fixed:
* It is possible to connect to MySQL through a socket in the file system.
* The `system.parts` table has a new column with information about the size of marks, in bytes.
## Bug fixes:
### Bug fixes:
* Distributed tables using a Merge table now work correctly for a SELECT query with a condition on the _table field.
* Fixed a rare race condition in ReplicatedMergeTree when checking data parts.
@ -602,15 +602,15 @@ This is a bugfix release. The following bugs were fixed:
* Fixed the "Cannot mremap" error when using arrays in IN and JOIN clauses with more than 2 billion elements.
* Fixed the failover for dictionaries with MySQL as the source.
## Improved workflow for developing and assembling ClickHouse:
### Improved workflow for developing and assembling ClickHouse:
* Builds can be assembled in Arcadia.
* You can use gcc 7 to compile ClickHouse.
* Parallel builds using ccache+distcc are faster now.
# ClickHouse release 1.1.54245, 2017-07-04
## ClickHouse release 1.1.54245, 2017-07-04
## New features:
### New features:
* Distributed DDL (for example, `CREATE TABLE ON CLUSTER`).
* The replicated request `ALTER TABLE CLEAR COLUMN IN PARTITION.`
@ -622,16 +622,16 @@ This is a bugfix release. The following bugs were fixed:
* Sessions in the HTTP interface.
* The OPTIMIZE query for a Replicated table can can run not only on the leader.
## Backward incompatible changes:
### Backward incompatible changes:
* Removed SET GLOBAL.
## Minor changes:
### Minor changes:
* If an alert is triggered, the full stack trace is printed into the log.
* Relaxed the verification of the number of damaged or extra data parts at startup (there were too many false positives).
## Bug fixes:
### Bug fixes:
* Fixed a bad connection "sticking" when inserting into a Distributed table.
* GLOBAL IN now works for a query from a Merge table that looks at a Distributed table.

View File

@ -1,6 +1,50 @@
# ClickHouse release 1.1.54388, 2018-06-28
## ClickHouse release 1.1.54394, 2018-07-12
## Новые возможности:
### Новые возможности:
* Добавлена агрегатная функция `histogram` ([Михаил Сурин](https://github.com/yandex/ClickHouse/pull/2521)).
* Возможность использования `OPTIMIZE TABLE ... FINAL` без указания партиции для `ReplicatedMergeTree` ([Amos Bird](https://github.com/yandex/ClickHouse/pull/2600)).
### Исправление ошибок:
* Исправлена ошибка - выставление слишком маленького таймаута у сокетов (одна секунда) для чтения и записи при отправке и скачивании реплицируемых данных, что приводило к невозможности скачать куски достаточно большого размера при наличии некоторой нагрузки на сеть или диск (попытки скачивания кусков циклически повторяются). Ошибка возникла в версии 1.1.54388.
* Исправлена работа при использовании chroot в ZooKeeper, в случае вставки дублирующихся блоков данных в таблицу.
* Исправлена работа функции `has` для случая массива с Nullable элементами ([#2115](https://github.com/yandex/ClickHouse/issues/2521)).
* Исправлена работа таблицы `system.tables` при её использовании в распределённых запросах; столбцы `metadata_modification_time` и `engine_full` сделаны невиртуальными; исправлена ошибка в случае, если из таблицы были запрошены только эти столбцы.
* Исправлена работа пустой таблицы типа `TinyLog` после вставки в неё пустого блока данных ([#2563](https://github.com/yandex/ClickHouse/issues/2563)).
* Таблица `system.zookeeper` работает в случае, если значение узла в ZooKeeper равно NULL.
## ClickHouse release 1.1.54390, 2018-07-06
### Новые возможности:
* Возможность отправки запроса в формате `multipart/form-data` (в поле `query`), что полезно, если при этом также отправляются внешние данные для обработки запроса ([Ольга Хвостикова](https://github.com/yandex/ClickHouse/pull/2490)).
* Добавлена возможность включить или отключить обработку одинарных или двойных кавычек при чтении данных в формате CSV. Это задаётся настройками `format_csv_allow_single_quotes` и `format_csv_allow_double_quotes` ([Amos Bird](https://github.com/yandex/ClickHouse/pull/2574))
* Возможность использования `OPTIMIZE TABLE ... FINAL` без указания партиции для не реплицированных вариантов`MergeTree` ([Amos Bird](https://github.com/yandex/ClickHouse/pull/2599)).
### Улучшения:
* Увеличена производительность, уменьшено потребление памяти, добавлен корректный учёт потребления памяти, при использовании оператора IN в случае, когда для его работы может использоваться индекс таблицы ([#2584](https://github.com/yandex/ClickHouse/pull/2584)).
* Убраны избыточные проверки чексумм при добавлении куска. Это важно в случае большого количества реплик, так как в этом случае суммарное количество проверок было равно N^2.
* Добавлена поддержка аргументов типа `Array(Tuple(...))` для функции `arrayEnumerateUniq` ([#2573](https://github.com/yandex/ClickHouse/pull/2573)).
* Добавлена поддержка `Nullable` для функции `runningDifference`. ([#2594](https://github.com/yandex/ClickHouse/pull/2594))
* Увеличена производительность анализа запроса в случае очень большого количества выражений ([#2572](https://github.com/yandex/ClickHouse/pull/2572)).
* Более быстрый выбор кусков для слияния в таблицах типа `ReplicatedMergeTree`. Более быстрое восстановление сессии с ZooKeeper. ([#2597](https://github.com/yandex/ClickHouse/pull/2597)).
* Файл `format_version.txt` для таблиц семейства `MergeTree` создаётся заново при его отсутствии, что имеет смысл в случае запуска ClickHouse после копирования структуры директорий без файлов ([Ciprian Hacman](https://github.com/yandex/ClickHouse/pull/2593)).
### Исправление ошибок:
* Исправлена ошибка при работе с ZooKeeper, которая могла приводить к невозможности восстановления сессии и readonly состояниям таблиц до перезапуска сервера.
* Исправлена ошибка при работе с ZooKeeper, которая могла приводить к неудалению старых узлов при разрыве сессии.
* Исправлена ошибка в функции `quantileTDigest` для Float аргументов (ошибка появилась в версии 1.1.54388) ([Михаил Сурин](https://github.com/yandex/ClickHouse/pull/2553)).
* Исправлена ошибка работы индекса таблиц типа MergeTree, если в условии, столбец первичного ключа расположен внутри функции преобразования типов между знаковым и беззнаковым целым одного размера ([#2603](https://github.com/yandex/ClickHouse/pull/2603)).
* Исправлен segfault, если в конфигурационном файле нет `macros`, но они используются ([#2570](https://github.com/yandex/ClickHouse/pull/2570)).
* Исправлено переключение на базу данных по-умолчанию при переподключении клиента ([#2583](https://github.com/yandex/ClickHouse/pull/2583)).
* Исправлена ошибка в случае отключенной настройки `use_index_for_in_with_subqueries`.
### Исправления безопасности:
* При соединениях с MySQL удалена возможность отправки файлов (`LOAD DATA LOCAL INFILE`).
## ClickHouse release 1.1.54388, 2018-06-28
### Новые возможности:
* Добавлена поддержка запроса `ALTER TABLE t DELETE WHERE` для реплицированных таблиц и таблица `system.mutations`.
* Добавлена поддержка запроса `ALTER TABLE t [REPLACE|ATTACH] PARTITION` для *MergeTree-таблиц.
* Добавлена поддержка запроса `TRUNCATE TABLE` ([Winter Zhang](https://github.com/yandex/ClickHouse/pull/2260))
@ -17,11 +61,11 @@
* Добавлена настройка `date_time_input_format`. Если переключить эту настройку в значение `'best_effort'`, значения DateTime будут читаться в широком диапазоне форматов.
* Добавлена утилита `clickhouse-obfuscator` для обфускации данных. Пример использования: публикация данных, используемых в тестах производительности.
## Экспериментальные возможности:
### Экспериментальные возможности:
* Добавлена возможность вычислять аргументы функции `and` только там, где они нужны ([Анастасия Царькова](https://github.com/yandex/ClickHouse/pull/2272))
* Добавлена возможность JIT-компиляции в нативный код некоторых выражений ([pyos](https://github.com/yandex/ClickHouse/pull/2277)).
## Исправление ошибок:
### Исправление ошибок:
* Исправлено появление дублей в запросе с `DISTINCT` и `ORDER BY`.
* Запросы с `ARRAY JOIN` и `arrayFilter` раньше возвращали некорректный результат.
* Исправлена ошибка при чтении столбца-массива из Nested-структуры ([#2066](https://github.com/yandex/ClickHouse/issues/2066)).
@ -42,7 +86,7 @@
* Исправлена SSRF в табличной функции remote().
* Исправлен выход из `clickhouse-client` в multiline-режиме ([#2510](https://github.com/yandex/ClickHouse/issues/2510)).
## Улучшения:
### Улучшения:
* Фоновые задачи в реплицированных таблицах теперь выполняются не в отдельных потоках, а в пуле потоков ([Silviu Caragea](https://github.com/yandex/ClickHouse/pull/1722))
* Улучшена производительность разжатия LZ4.
* Ускорен анализ запроса с большим числом JOIN-ов и подзапросов.
@ -54,7 +98,7 @@
* При расчёте количества доступных ядер CPU теперь учитываются ограничения cgroups ([Atri Sharma](https://github.com/yandex/ClickHouse/pull/2325)).
* Добавлен chown директорий конфигов в конфигурационном файле systemd ([Михаил Ширяев](https://github.com/yandex/ClickHouse/pull/2421)).
## Изменения сборки:
### Изменения сборки:
* Добавлена возможность сборки компилятором gcc8.
* Добавлена возможность сборки llvm из submodule.
* Используемая версия библиотеки librdkafka обновлена до v0.11.4.
@ -64,34 +108,34 @@
* Добавлена возможность использования библиотеки libtinfo вместо libtermcap ([Георгий Кондратьев](https://github.com/yandex/ClickHouse/pull/2519)).
* Исправлен конфликт заголовочных файлов в Fedora Rawhide ([#2520](https://github.com/yandex/ClickHouse/issues/2520)).
## Обратно несовместимые изменения:
### Обратно несовместимые изменения:
* Убран escaping в форматах `Vertical` и `Pretty*`, удалён формат `VerticalRaw`.
* Если в распределённых запросах одновременно участвуют серверы версии 1.1.54388 или новее и более старые, то при использовании выражения `cast(x, 'Type')`, записанного без указания `AS`, если слово `cast` указано не в верхнем регистре, возникает ошибка вида `Not found column cast(0, 'UInt8') in block`. Решение: обновить сервер на всём кластере.
# ClickHouse release 1.1.54385, 2018-06-01
## Исправление ошибок:
## ClickHouse release 1.1.54385, 2018-06-01
### Исправление ошибок:
* Исправлена ошибка, которая в некоторых случаях приводила к блокировке операций с ZooKeeper.
# ClickHouse release 1.1.54383, 2018-05-22
## Исправление ошибок:
## ClickHouse release 1.1.54383, 2018-05-22
### Исправление ошибок:
* Исправлена деградация скорости выполнения очереди репликации при большом количестве реплик
# ClickHouse release 1.1.54381, 2018-05-14
## ClickHouse release 1.1.54381, 2018-05-14
## Исправление ошибок:
### Исправление ошибок:
* Исправлена ошибка, приводящая к "утеканию" метаданных в ZooKeeper при потере соединения с сервером ZooKeeper.
# ClickHouse release 1.1.54380, 2018-04-21
## ClickHouse release 1.1.54380, 2018-04-21
## Новые возможности:
### Новые возможности:
* Добавлена табличная функция `file(path, format, structure)`. Пример, читающий байты из `/dev/urandom`: `ln -s /dev/urandom /var/lib/clickhouse/user_files/random` `clickhouse-client -q "SELECT * FROM file('random', 'RowBinary', 'd UInt8') LIMIT 10"`.
## Улучшения:
### Улучшения:
* Добавлена возможность оборачивать подзапросы скобками `()` для повышения читаемости запросов. Например: `(SELECT 1) UNION ALL (SELECT 1)`.
* Простые запросы `SELECT` из таблицы `system.processes` не учитываются в ограничении `max_concurrent_queries`.
## Исправление ошибок:
### Исправление ошибок:
* Исправлена неправильная работа оператора `IN` в `MATERIALIZED VIEW`.
* Исправлена неправильная работа индекса по ключу партиционирования в выражениях типа `partition_key_column IN (...)`.
* Исправлена невозможность выполнить `OPTIMIZE` запрос на лидирующей реплике после выполнения `RENAME` таблицы.
@ -99,13 +143,13 @@
* Исправлены зависания запросов `KILL QUERY`.
* Исправлена ошибка в клиентской библиотеке ZooKeeper, которая при использовании непустого префикса `chroot` в конфигурации приводила к потере watch'ей, остановке очереди distributed DDL запросов и замедлению репликации.
## Обратно несовместимые изменения:
### Обратно несовместимые изменения:
* Убрана поддержка выражений типа `(a, b) IN (SELECT (a, b))` (можно использовать эквивалентные выражение `(a, b) IN (SELECT a, b)`). Раньше такие запросы могли приводить к недетерминированной фильтрации в `WHERE`.
# ClickHouse release 1.1.54378, 2018-04-16
## ClickHouse release 1.1.54378, 2018-04-16
## Новые возможности:
### Новые возможности:
* Возможность изменения уровня логгирования без перезагрузки сервера.
* Добавлен запрос `SHOW CREATE DATABASE`.
@ -119,7 +163,7 @@
* Возможность указания нескольких `topics` через запятую для движка `Kafka` (Tobias Adamson)
* При остановке запроса по причине `KILL QUERY` или `replace_running_query`, клиент получает исключение `Query was cancelled` вместо неполного результата.
## Улучшения:
### Улучшения:
* Запросы вида `ALTER TABLE ... DROP/DETACH PARTITION` выполняются впереди очереди репликации.
* Возможность использовать `SELECT ... FINAL` и `OPTIMIZE ... FINAL` даже в случае, если данные в таблице представлены одним куском.
@ -130,7 +174,7 @@
* Более надёжное восстановление после сбоев при асинхронной вставке в `Distributed` таблицы.
* Возвращаемый тип функции `countEqual` изменён с `UInt32` на `UInt64` (谢磊)
## Исправление ошибок:
### Исправление ошибок:
* Исправлена ошибка c `IN` где левая часть выражения `Nullable`.
* Исправлен неправильный результат при использовании кортежей с `IN` в случае, если часть компоненнтов кортежа есть в индексе таблицы.
@ -146,31 +190,31 @@
* Исправлена работа `SummingMergeTree` в случае суммирования вложенных структур данных с составным ключом.
* Исправлена возможность возникновения race condition при выборе лидера таблиц `ReplicatedMergeTree`.
## Изменения сборки:
### Изменения сборки:
* Поддержка `ninja` вместо `make` при сборке. `ninja` используется по-умолчанию при сборке релизов.
* Переименованы пакеты `clickhouse-server-base` в `clickhouse-common-static`; `clickhouse-server-common` в `clickhouse-server`; `clickhouse-common-dbg` в `clickhouse-common-static-dbg`. Для установки используйте `clickhouse-server clickhouse-client`. Для совместимости, пакеты со старыми именами продолжают загружаться в репозиторий.
## Обратно несовместимые изменения:
### Обратно несовместимые изменения:
* Удалена специальная интерпретация выражения IN, если слева указан массив. Ранее выражение вида `arr IN (set)` воспринималось как "хотя бы один элемент `arr` принадлежит множеству `set`". Для получения такого же поведения в новой версии, напишите `arrayExists(x -> x IN (set), arr)`.
* Отключено ошибочное использование опции сокета `SO_REUSEPORT` (которая по ошибке включена по-умолчанию в библиотеке Poco). Стоит обратить внимание, что на Linux системах теперь не имеет смысла указывать одновременно адреса `::` и `0.0.0.0` для listen - следует использовать лишь адрес `::`, который (с настройками ядра по-умолчанию) позволяет слушать соединения как по IPv4 так и по IPv6. Также вы можете вернуть поведение старых версий, указав в конфиге `<listen_reuse_port>1</listen_reuse_port>`.
# ClickHouse release 1.1.54370, 2018-03-16
## ClickHouse release 1.1.54370, 2018-03-16
## Новые возможности:
### Новые возможности:
* Добавлена системная таблица `system.macros` и автоматическое обновление макросов при изменении конфигурационного файла.
* Добавлен запрос `SYSTEM RELOAD CONFIG`.
* Добавлена агрегатная функция `maxIntersections(left_col, right_col)`, возвращающая максимальное количество одновременно пересекающихся интервалов `[left; right]`. Функция `maxIntersectionsPosition(left, right)` возвращает начало такого "максимального" интервала. ([Michael Furmur](https://github.com/yandex/ClickHouse/pull/2012)).
## Улучшения:
### Улучшения:
* При вставке данных в `Replicated`-таблицу делается меньше обращений к `ZooKeeper` (также из лога `ZooKeeper` исчезло большинство user-level ошибок).
* Добавлена возможность создавать алиасы для множеств. Пример: `WITH (1, 2, 3) AS set SELECT number IN set FROM system.numbers LIMIT 10`.
## Исправление ошибок:
### Исправление ошибок:
* Исправлена ошибка `Illegal PREWHERE` при чтении из Merge-таблицы над `Distributed`-таблицами.
* Добавлены исправления, позволяющие запускать clickhouse-server в IPv4-only docker-контейнерах.
@ -185,9 +229,9 @@
* Устранено ненужное Error-level логирование `Not found column ... in block`.
# Релиз ClickHouse 1.1.54362, 2018-03-11
## Релиз ClickHouse 1.1.54362, 2018-03-11
## Новые возможности:
### Новые возможности:
* Агрегация без `GROUP BY` по пустому множеству (как например, `SELECT count(*) FROM table WHERE 0`) теперь возвращает результат из одной строки с нулевыми значениями агрегатных функций, в соответствии со стандартом SQL. Вы можете вернуть старое поведение (возвращать пустой результат), выставив настройку `empty_result_for_aggregation_by_empty_set` в значение 1.
* Добавлено приведение типов при `UNION ALL`. Допустимо использование столбцов с разными алиасами в соответствующих позициях `SELECT` в `UNION ALL`, что соответствует стандарту SQL.
@ -225,7 +269,7 @@
* Добавлена настройка `odbc_default_field_size`, позволяющая расширить максимальный размер значения, загружаемого из ODBC источника (по-умолчанию - 1024).
* В таблицу `system.processes` и в `SHOW PROCESSLIST` добавлены столбцы `is_cancelled` и `peak_memory_usage`.
## Улучшения:
### Улучшения:
* Ограничения на результат и квоты на результат теперь не применяются к промежуточным данным для запросов `INSERT SELECT` и для подзапросов в `SELECT`.
* Уменьшено количество ложных срабатываний при проверке состояния `Replicated` таблиц при запуске сервера, приводивших к необходимости выставления флага `force_restore_data`.
@ -241,7 +285,7 @@
* Значения типа `Enum` можно использовать в функциях `min`, `max`, `sum` и некоторых других - в этих случаях используются соответствующие числовые значения. Эта возможность присутствовала ранее, но была потеряна в релизе 1.1.54337.
* Добавлено ограничение `max_expanded_ast_elements` действующее на размер AST после рекурсивного раскрытия алиасов.
## Исправление ошибок:
### Исправление ошибок:
* Исправлены случаи ошибочного удаления ненужных столбцов из подзапросов, а также отсутствие удаления ненужных столбцов из подзапросов, содержащих `UNION ALL`.
* Исправлена ошибка в слияниях для таблиц типа `ReplacingMergeTree`.
@ -269,19 +313,19 @@
* Запрещено использование запросов с `UNION ALL` в `MATERIALIZED VIEW`.
* Исправлена ошибка, которая может возникать при инициализации системной таблицы `part_log` при старте сервера (по-умолчанию `part_log` выключен).
## Обратно несовместимые изменения:
### Обратно несовместимые изменения:
* Удалена настройка `distributed_ddl_allow_replicated_alter`. Соответствующее поведение включено по-умолчанию.
* Удалена настройка `strict_insert_defaults`. Если вы использовали эту функциональность, напишите на `clickhouse-feedback@yandex-team.com`.
* Удалён движок таблиц `UnsortedMergeTree`.
# Релиз ClickHouse 1.1.54343, 2018-02-05
## Релиз ClickHouse 1.1.54343, 2018-02-05
* Добавлена возможность использовать макросы при задании имени кластера в распределенных DLL запросах и создании Distributed-таблиц: `CREATE TABLE distr ON CLUSTER '{cluster}' (...) ENGINE = Distributed('{cluster}', 'db', 'table')`.
* Теперь при вычислении запросов вида `SELECT ... FROM table WHERE expr IN (subquery)` используется индекс таблицы `table`.
* Улучшена обработка дубликатов при вставке в Replicated-таблицы, теперь они не приводят к излишнему замедлению выполнения очереди репликации.
# Релиз ClickHouse 1.1.54342, 2018-01-22
## Релиз ClickHouse 1.1.54342, 2018-01-22
Релиз содержит исправление к предыдущему релизу 1.1.54337:
* Исправлена регрессия в версии 1.1.54337: если пользователь по-умолчанию имеет readonly доступ, то сервер отказывался стартовать с сообщением `Cannot create database in readonly mode`.
@ -292,9 +336,9 @@
* Таблицы типа Buffer теперь работают при наличии MATERIALIZED столбцов в таблице назначения (by zhang2014).
* Исправлена одна из ошибок в реализации NULL.
# Релиз ClickHouse 1.1.54337, 2018-01-18
## Релиз ClickHouse 1.1.54337, 2018-01-18
## Новые возможности:
### Новые возможности:
* Добавлена поддержка хранения многомерных массивов и кортежей (тип данных `Tuple`) в таблицах.
* Поддержка табличных функций для запросов `DESCRIBE` и `INSERT`. Поддержка подзапроса в запросе `DESCRIBE`. Примеры: `DESC TABLE remote('host', default.hits)`; `DESC TABLE (SELECT 1)`; `INSERT INTO TABLE FUNCTION remote('host', default.hits)`. Возможность писать `INSERT INTO TABLE` вместо `INSERT INTO`.
@ -323,9 +367,9 @@
* Добавлена поддержка `ALTER` для таблиц типа `Null` (Anastasiya Tsarkova).
* Функция `reinterpretAsString` расширена на все типы данных, значения которых хранятся в памяти непрерывно.
* Для программы `clickhouse-local` добавлена опция `--silent` для подавления вывода информации о выполнении запроса в stderr.
* Добавлена поддержка чтения `Date` в текстовом виде в формате, где месяц и день месяца могут быть указаны одной цифрой вместо двух (Amos Bird).
* Добавлена поддержка чтения `Date` в текстовом виде в формате, где месяц и день месяца могут быть указаны одной цифрой вместо двух (Amos Bird).
## Увеличение производительности:
### Увеличение производительности:
* Увеличена производительность агрегатных функций `min`, `max`, `any`, `anyLast`, `anyHeavy`, `argMin`, `argMax` от строковых аргументов.
* Увеличена производительность функций `isInfinite`, `isFinite`, `isNaN`, `roundToExp2`.
@ -334,7 +378,7 @@
* Уменьшено потребление памяти при `JOIN`, если левая и правая часть содержали столбцы с одинаковым именем, не входящие в `USING`.
* Увеличена производительность агрегатных функций `varSamp`, `varPop`, `stddevSamp`, `stddevPop`, `covarSamp`, `covarPop`, `corr` за счёт уменьшения стойкости к вычислительной погрешности. Старые версии функций добавлены под именами `varSampStable`, `varPopStable`, `stddevSampStable`, `stddevPopStable`, `covarSampStable`, `covarPopStable`, `corrStable`.
## Исправления ошибок:
### Исправления ошибок:
* Исправлена работа дедупликации блоков после `DROP` или `DETATH PARTITION`. Раньше удаление партиции и вставка тех же самых данных заново не работала, так как вставленные заново блоки считались дубликатами.
* Исправлена ошибка, в связи с которой может неправильно обрабатываться `WHERE` для запросов на создание `MATERIALIZED VIEW` с указанием `POPULATE`.
@ -344,7 +388,7 @@
* Добавлена недостающая поддержка типа данных `UUID` для `DISTINCT`, `JOIN`, в агрегатных функциях `uniq` и во внешних словарях (Иванов Евгений). Поддержка `UUID` всё ещё остаётся не полной.
* Исправлено поведение `SummingMergeTree` для строк, в которых все значения после суммирования равны нулю.
* Многочисленные доработки для движка таблиц `Kafka` (Marek Vavruša).
* Исправлена некорректная работа движка таблиц `Join` (Amos Bird).
* Исправлена некорректная работа движка таблиц `Join` (Amos Bird).
* Исправлена работа аллокатора под FreeBSD и OS X.
* Функция `extractAll` теперь может доставать пустые вхождения.
* Исправлена ошибка, не позволяющая подключить при сборке `libressl` вместо `openssl`.
@ -368,12 +412,12 @@
* Исправлена работа `DISTINCT` при условии, что все столбцы константные.
* Исправлено форматирование запроса в случае наличия функции `tupleElement` со сложным константным выражением в качестве номера элемента.
* Исправлена работа `Dictionary` таблиц для словарей типа `range_hashed`.
* Исправлена ошибка, приводящая к появлению лишних строк при `FULL` и `RIGHT JOIN` (Amos Bird).
* Исправлена ошибка, приводящая к появлению лишних строк при `FULL` и `RIGHT JOIN` (Amos Bird).
* Исправлено падение сервера в случае создания и удаления временных файлов в `config.d` директориях в момент перечитывания конфигурации.
* Исправлена работа запроса `SYSTEM DROP DNS CACHE`: ранее сброс DNS кэша не приводил к повторному резолвингу имён хостов кластера.
* Исправлено поведение `MATERIALIZED VIEW` после `DETACH TABLE` таблицы, на которую он смотрит (Marek Vavruša).
## Улучшения сборки:
### Улучшения сборки:
* Для сборки используется `pbuilder`. Сборка максимально независима от окружения на сборочной машине.
* Для разных версий систем выкладывается один и тот же пакет, который совместим с широким диапазоном Linux систем.
@ -387,27 +431,27 @@
* Удалено использование расширений GNU из кода и включена опция `-Wextra`. При сборке с помощью `clang` по-умолчанию используется `libc++` вместо `libstdc++`.
* Выделены библиотеки `clickhouse_parsers` и `clickhouse_common_io` для более быстрой сборки утилит.
## Обратно несовместимые изменения:
### Обратно несовместимые изменения:
* Формат засечек (marks) для таблиц типа `Log`, содержащих `Nullable` столбцы, изменён обратно-несовместимым образом. В случае наличия таких таблиц, вы можете преобразовать их в `TinyLog` до запуска новой версии сервера. Для этого в соответствующем таблице файле `.sql` в директории `metadata`, замените `ENGINE = Log` на `ENGINE = TinyLog`. Если в таблице нет `Nullable` столбцов или тип таблицы не `Log`, то ничего делать не нужно.
* Удалена настройка `experimental_allow_extended_storage_definition_syntax`. Соответствующая функциональность включена по-умолчанию.
* Функция `runningIncome` переименована в `runningDifferenceStartingWithFirstValue` во избежание путаницы.
* Удалена возможность написания `FROM ARRAY JOIN arr` без указания таблицы после FROM (Amos Bird).
* Удалена возможность написания `FROM ARRAY JOIN arr` без указания таблицы после FROM (Amos Bird).
* Удалён формат `BlockTabSeparated`, использовавшийся лишь для демонстрационных целей.
* Изменён формат состояния агрегатных функций `varSamp`, `varPop`, `stddevSamp`, `stddevPop`, `covarSamp`, `covarPop`, `corr`. Если вы использовали эти состояния для хранения в таблицах (тип данных `AggregateFunction` от этих функций или материализованные представления, хранящие эти состояния), напишите на clickhouse-feedback@yandex-team.com.
* В предыдущих версиях существовала недокументированная возможность: в типе данных AggregateFunction можно было не указывать параметры для агрегатной функции, которая зависит от параметров. Пример: `AggregateFunction(quantiles, UInt64)` вместо `AggregateFunction(quantiles(0.5, 0.9), UInt64)`. Эта возможность потеряна. Не смотря на то, что возможность не документирована, мы собираемся вернуть её в ближайших релизах.
* Значения типа данных Enum не могут быть переданы в агрегатные функции min/max. Возможность будет возвращена обратно в следующем релизе.
## На что обратить внимание при обновлении:
### На что обратить внимание при обновлении:
* При обновлении кластера, на время, когда на одних репликах работает новая версия сервера, а на других - старая, репликация будет приостановлена и в логе появятся сообщения вида `unknown parameter 'shard'`. Репликация продолжится после обновления всех реплик кластера.
* Если на серверах кластера работают разные версии ClickHouse, то возможен неправильный результат распределённых запросов, использующих функции `varSamp`, `varPop`, `stddevSamp`, `stddevPop`, `covarSamp`, `covarPop`, `corr`. Необходимо обновить все серверы кластера.
# Релиз ClickHouse 1.1.54327, 2017-12-21
## Релиз ClickHouse 1.1.54327, 2017-12-21
Релиз содержит исправление к предыдущему релизу 1.1.54318:
* Исправлена проблема с возможным race condition при репликации, которая может приводить к потере данных. Проблеме подвержены версии 1.1.54310 и 1.1.54318. Если вы их используете и у вас есть Replicated таблицы, то обновление обязательно. Понять, что эта проблема существует, можно по сообщениям в логе Warning вида `Part ... from own log doesn't exist.` Даже если таких сообщений нет, проблема всё-равно актуальна.
# Релиз ClickHouse 1.1.54318, 2017-11-30
## Релиз ClickHouse 1.1.54318, 2017-11-30
Релиз содержит изменения к предыдущему релизу 1.1.54310 с исправлением следующих багов:
* Исправлено некорректное удаление строк при слияниях в движке SummingMergeTree
@ -416,9 +460,9 @@
* Исправлена проблема, приводящая к остановке выполнения очереди репликации
* Исправлено ротирование и архивация логов сервера
# Релиз ClickHouse 1.1.54310, 2017-11-01
## Релиз ClickHouse 1.1.54310, 2017-11-01
## Новые возможности:
### Новые возможности:
* Произвольный ключ партиционирования для таблиц семейства MergeTree.
* Движок таблиц [Kafka](https://clickhouse.yandex/docs/en/single/index.html#document-table_engines/kafka).
* Возможность загружать модели [CatBoost](https://catboost.yandex/) и применять их к данным, хранящимся в ClickHouse.
@ -434,12 +478,12 @@
* Поддержка входного формата Capn Proto.
* Возможность задавать уровень сжатия при использовании алгоритма zstd.
## Обратно несовместимые изменения:
### Обратно несовместимые изменения:
* Запрещено создание временных таблиц с движком, отличным от Memory.
* Запрещено явное создание таблиц с движком View и MaterializedView.
* При создании таблицы теперь проверяется, что ключ сэмплирования входит в первичный ключ.
## Исправления ошибок:
### Исправления ошибок:
* Исправлено зависание при синхронной вставке в Distributed таблицу.
* Исправлена неатомарность при добавлении/удалении кусков в реплицированных таблицах.
* Данные, вставляемые в материализованное представление, теперь не подвергаются излишней дедупликации.
@ -449,14 +493,14 @@
* Исправлено зависание при недостатке места на диске в разделе с логами.
* Исправлено переполнение в функции toRelativeWeekNum для первой недели Unix-эпохи.
## Улучшения сборки:
### Улучшения сборки:
* Несколько сторонних библиотек (в частности, Poco) обновлены и переведены на git submodules.
# Релиз ClickHouse 1.1.54304, 2017-10-19
## Новые возможности:
## Релиз ClickHouse 1.1.54304, 2017-10-19
### Новые возможности:
* Добавлена поддержка TLS в нативном протоколе (включается заданием `tcp_ssl_port` в `config.xml`)
## Исправления ошибок:
### Исправления ошибок:
* `ALTER` для реплицированных таблиц теперь пытается начать выполнение как можно быстрее
* Исправлены падения при чтении данных с настройкой `preferred_block_size_bytes=0`
* Исправлено падение `clickhouse-client` при нажатии `Page Down`
@ -469,16 +513,16 @@
* Корректное обновление пользователей при невалидном `users.xml`
* Корректная обработка случаев, когда executable-словарь возвращает ненулевой код ответа
# Релиз ClickHouse 1.1.54292, 2017-09-20
## Релиз ClickHouse 1.1.54292, 2017-09-20
## Новые возможности:
### Новые возможности:
* Добавлена функция `pointInPolygon` для работы с координатами на плоскости.
* Добавлена агрегатная функция `sumMap`, обеспечивающая суммирование массивов аналогично `SummingMergeTree`.
* Добавлена функция `trunc`. Увеличена производительность функций округления `round`, `floor`, `ceil`, `roundToExp2`. Исправлена логика работы функций округления. Изменена логика работы функции `roundToExp2` для дробных и отрицательных чисел.
* Ослаблена зависимость исполняемого файла ClickHouse от версии libc. Один и тот же исполняемый файл ClickHouse может запускаться и работать на широком множестве Linux систем. Замечание: зависимость всё ещё присутствует при использовании скомпилированных запросов (настройка `compile = 1`, по-умолчанию не используется).
* Уменьшено время динамической компиляции запросов.
## Исправления ошибок:
### Исправления ошибок:
* Исправлена ошибка, которая могла приводить к сообщениям `part ... intersects previous part` и нарушению консистентности реплик.
* Исправлена ошибка, приводящая к блокировке при завершении работы сервера, если в это время ZooKeeper недоступен.
* Удалено избыточное логгирование при восстановлении реплик.
@ -486,9 +530,9 @@
* Исправлена ошибка в функции concat, возникающая в случае, если первый столбец блока имеет тип Array.
* Исправлено отображение прогресса в таблице system.merges.
# Релиз ClickHouse 1.1.54289, 2017-09-13
## Релиз ClickHouse 1.1.54289, 2017-09-13
## Новые возможности:
### Новые возможности:
* Запросы `SYSTEM` для административных действий с сервером: `SYSTEM RELOAD DICTIONARY`, `SYSTEM RELOAD DICTIONARIES`, `SYSTEM DROP DNS CACHE`, `SYSTEM SHUTDOWN`, `SYSTEM KILL`.
* Добавлены функции для работы с массивами: `concat`, `arraySlice`, `arrayPushBack`, `arrayPushFront`, `arrayPopBack`, `arrayPopFront`.
* Добавлены параметры `root` и `identity` для конфигурации ZooKeeper. Это позволяет изолировать разных пользователей одного ZooKeeper кластера.
@ -503,7 +547,7 @@
* Возможность задать `umask` в конфигурационном файле.
* Увеличена производительность запросов с `DISTINCT`.
## Исправления ошибок:
### Исправления ошибок:
* Более оптимальная процедура удаления старых нод в ZooKeeper. Ранее в случае очень частых вставок, старые ноды могли не успевать удаляться, что приводило, в том числе, к очень долгому завершению сервера.
* Исправлена рандомизация при выборе хостов для соединения с ZooKeeper.
* Исправлено отключение отстающей реплики при распределённых запросах, если реплика является localhost.
@ -516,28 +560,28 @@
* Исправлено появление zombie процессов при работе со словарём с источником `executable`.
* Исправлен segfault при запросе HEAD.
## Улучшения процесса разработки и сборки ClickHouse:
### Улучшения процесса разработки и сборки ClickHouse:
* Возможность сборки с помощью `pbuilder`.
* Возможность сборки с использованием `libc++` вместо `libstdc++` под Linux.
* Добавлены инструкции для использования статических анализаторов кода `Coverity`, `clang-tidy`, `cppcheck`.
## На что обратить внимание при обновлении:
### На что обратить внимание при обновлении:
* Увеличено значение по-умолчанию для настройки MergeTree `max_bytes_to_merge_at_max_space_in_pool` (максимальный суммарный размер кусков в байтах для мержа) со 100 GiB до 150 GiB. Это может привести к запуску больших мержей после обновления сервера, что может вызвать повышенную нагрузку на дисковую подсистему. Если же на серверах, где это происходит, количество свободного места менее чем в два раза больше суммарного объёма выполняющихся мержей, то в связи с этим перестанут выполняться какие-либо другие мержи, включая мержи мелких кусков. Это приведёт к тому, что INSERT-ы будут отклоняться с сообщением "Merges are processing significantly slower than inserts". Для наблюдения, используйте запрос `SELECT * FROM system.merges`. Вы также можете смотреть на метрику `DiskSpaceReservedForMerge` в таблице `system.metrics` или в Graphite. Для исправления этой ситуации можно ничего не делать, так как она нормализуется сама после завершения больших мержей. Если же вас это не устраивает, вы можете вернуть настройку `max_bytes_to_merge_at_max_space_in_pool` в старое значение, прописав в config.xml в секции `<merge_tree>` `<max_bytes_to_merge_at_max_space_in_pool>107374182400</max_bytes_to_merge_at_max_space_in_pool>` и перезапустить сервер.
# Релиз ClickHouse 1.1.54284, 2017-08-29
## Релиз ClickHouse 1.1.54284, 2017-08-29
* Релиз содержит изменения к предыдущему релизу 1.1.54282, которые исправляют утечку записей о кусках в ZooKeeper
# Релиз ClickHouse 1.1.54282, 2017-08-23
## Релиз ClickHouse 1.1.54282, 2017-08-23
Релиз содержит исправления к предыдущему релизу 1.1.54276:
* Исправлена ошибка `DB::Exception: Assertion violation: !_path.empty()` при вставке в Distributed таблицу.
* Исправлен парсинг при вставке в формате RowBinary, если входные данные начинаются с ';'.
* Исправлена ошибка при рантайм-компиляции некоторых агрегатных функций (например, `groupArray()`).
# Релиз ClickHouse 1.1.54276, 2017-08-16
## Релиз ClickHouse 1.1.54276, 2017-08-16
## Новые возможности:
### Новые возможности:
* Добавлена опциональная секция WITH запроса SELECT. Пример запроса: `WITH 1+1 AS a SELECT a, a*a`
* Добавлена возможность синхронной вставки в Distributed таблицу: выдается Ok только после того как все данные записались на все шарды. Активируется настройкой insert_distributed_sync=1
* Добавлен тип данных UUID для работы с 16-байтовыми идентификаторами
@ -547,17 +591,17 @@
* Добавлена поддержка неконстантных аргументов и отрицательных смещений в функции `substring(str, pos, len)`
* Добавлен параметр max_size для агрегатной функции `groupArray(max_size)(column)`, и оптимизирована её производительность
## Основные изменения:
### Основные изменения:
* Улучшение безопасности: все файлы сервера создаются с правами 0640 (можно поменять, через параметр <umask> в конфиге).
* Улучшены сообщения об ошибках в случае синтаксически неверных запросов
* Значительно уменьшен расход оперативной памяти и улучшена производительность слияний больших MergeTree-кусков данных
* Значительно увеличена производительность слияний данных для движка ReplacingMergeTree
* Улучшена производительность асинхронных вставок из Distributed таблицы за счет объединения нескольких исходных вставок. Функциональность включается настройкой distributed_directory_monitor_batch_inserts=1.
## Обратно несовместимые изменения:
### Обратно несовместимые изменения:
* Изменился бинарный формат агрегатных состояний функции `groupArray(array_column)` для массивов
## Полный список изменений:
### Полный список изменений:
* Добавлена настройка `output_format_json_quote_denormals`, включающая вывод nan и inf значений в формате JSON
* Более оптимальное выделение потоков при чтении из Distributed таблиц
* Разрешено задавать настройки в режиме readonly, если их значение не изменяется
@ -575,7 +619,7 @@
* Возможность подключения к MySQL через сокет на файловой системе
* В таблицу system.parts добавлен столбец с информацией о размере marks в байтах
## Исправления багов:
### Исправления багов:
* Исправлена некорректная работа Distributed таблиц, использующих Merge таблицы, при SELECT с условием на поле _table
* Исправлен редкий race condition в ReplicatedMergeTree при проверке кусков данных
* Исправлено возможное зависание процедуры leader election при старте сервера
@ -598,15 +642,15 @@
* Исправлена ошибка "Cannot mremap" при использовании множеств в секциях IN, JOIN, содержащих более 2 млрд. элементов
* Исправлен failover для словарей с источником MySQL
## Улучшения процесса разработки и сборки ClickHouse:
### Улучшения процесса разработки и сборки ClickHouse:
* Добавлена возмозможность сборки в Arcadia
* Добавлена возможность сборки с помощью gcc 7
* Ускорена параллельная сборка с помощью ccache+distcc
# Релиз ClickHouse 1.1.54245, 2017-07-04
## Релиз ClickHouse 1.1.54245, 2017-07-04
## Новые возможности:
### Новые возможности:
* Распределённые DDL (например, `CREATE TABLE ON CLUSTER`)
* Реплицируемый запрос `ALTER TABLE CLEAR COLUMN IN PARTITION`
* Движок таблиц Dictionary (доступ к данным словаря в виде таблицы)
@ -617,14 +661,14 @@
* Сессии в HTTP интерфейсе
* Запрос OPTIMIZE для Replicated таблицы теперь можно выполнять не только на лидере
## Обратно несовместимые изменения:
### Обратно несовместимые изменения:
* Убрана команда SET GLOBAL
## Мелкие изменения:
### Мелкие изменения:
* Теперь после получения сигнала в лог печатается полный стектрейс
* Ослаблена проверка на количество повреждённых/лишних кусков при старте (было слишком много ложных срабатываний)
## Исправления багов:
### Исправления багов:
* Исправлено залипание плохого соединения при вставке в Distributed таблицу
* GLOBAL IN теперь работает при запросе из таблицы Merge, смотрящей в Distributed
* Теперь правильно определяется количество ядер на виртуалках Google Compute Engine

View File

@ -1,39 +0,0 @@
## How to increase maxfiles on macOS
To increase maxfiles on macOS, create the following file:
(Note: you'll need to use sudo)
/Library/LaunchDaemons/limit.maxfiles.plist:
```
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>limit.maxfiles</string>
<key>ProgramArguments</key>
<array>
<string>launchctl</string>
<string>limit</string>
<string>maxfiles</string>
<string>524288</string>
<string>524288</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>ServiceIPC</key>
<false/>
</dict>
</plist>
```
Execute the following command:
```
sudo chown root:wheel /Library/LaunchDaemons/limit.maxfiles.plist
```
Reboot.
To check if it's working, you can use `ulimit -n` command.

View File

@ -1,11 +1,11 @@
# This strings autochanged from release_lib.sh:
set(VERSION_REVISION 54395 CACHE STRING "")
set(VERSION_MAJOR 1 CACHE STRING "")
set(VERSION_REVISION 54396 CACHE STRING "")
set(VERSION_MAJOR 18 CACHE STRING "")
set(VERSION_MINOR 1 CACHE STRING "")
set(VERSION_PATCH 54398 CACHE STRING "")
set(VERSION_GITHASH 4b31f389b743c69af688788c0d0cdb8973aefa77 CACHE STRING "")
set(VERSION_DESCRIBE v1.1.54398-testing CACHE STRING "")
set(VERSION_STRING 1.1.54398 CACHE STRING "")
set(VERSION_PATCH 0 CACHE STRING "")
set(VERSION_GITHASH 550f41bc65cb03201acad489e7b96ea346ed8259 CACHE STRING "")
set(VERSION_DESCRIBE v18.1.0-testing CACHE STRING "")
set(VERSION_STRING 18.1.0 CACHE STRING "")
# end of autochange
set(VERSION_EXTRA "" CACHE STRING "")
@ -18,7 +18,8 @@ if (VERSION_EXTRA)
string(CONCAT VERSION_STRING ${VERSION_STRING} "." ${VERSION_EXTRA})
endif ()
set (VERSION_FULL "${PROJECT_NAME} ${VERSION_STRING}")
set (VERSION_NAME "${PROJECT_NAME}")
set (VERSION_FULL "${VERSION_NAME} ${VERSION_STRING}")
if (APPLE)
# dirty hack: ld: malformed 64-bit a.b.c.d.e version number: 1.1.54160

View File

@ -1,6 +1,8 @@
add_library (clickhouse-client-lib Client.cpp)
target_link_libraries (clickhouse-client-lib clickhouse_functions clickhouse_aggregate_functions ${LINE_EDITING_LIBS} ${Boost_PROGRAM_OPTIONS_LIBRARY})
target_include_directories (clickhouse-client-lib SYSTEM PRIVATE ${READLINE_INCLUDE_DIR})
if (READLINE_INCLUDE_DIR)
target_include_directories (clickhouse-client-lib SYSTEM PRIVATE ${READLINE_INCLUDE_DIR})
endif ()
if (CLICKHOUSE_SPLIT_BINARY)
add_executable (clickhouse-client clickhouse-client.cpp)

View File

@ -28,6 +28,7 @@
#include <Common/StringUtils/StringUtils.h>
#include <Common/typeid_cast.h>
#include <Common/Config/ConfigProcessor.h>
#include <Common/config_version.h>
#include <Core/Types.h>
#include <Core/QueryProcessingStage.h>
#include <IO/ReadBufferFromFileDescriptor.h>
@ -1316,10 +1317,7 @@ private:
void showClientVersion()
{
std::cout << "ClickHouse client version " << DBMS_VERSION_MAJOR
<< "." << DBMS_VERSION_MINOR
<< "." << ClickHouseRevision::get()
<< "." << std::endl;
std::cout << DBMS_NAME << " client version " << VERSION_STRING << "." << std::endl;
}
public:

View File

@ -17,6 +17,7 @@
#include <Common/Config/ConfigProcessor.h>
#include <Common/escapeForFileName.h>
#include <Common/ClickHouseRevision.h>
#include <Common/config_version.h>
#include <IO/ReadBufferFromString.h>
#include <IO/WriteBufferFromString.h>
#include <IO/WriteBufferFromFileDescriptor.h>
@ -355,10 +356,7 @@ void LocalServer::setupUsers()
static void showClientVersion()
{
std::cout << "ClickHouse client version " << DBMS_VERSION_MAJOR
<< "." << DBMS_VERSION_MINOR
<< "." << ClickHouseRevision::get()
<< "." << std::endl;
std::cout << DBMS_NAME << " client version " << VERSION_STRING << "." << std::endl;
}
std::string LocalServer::getHelpHeader() const

View File

@ -1,36 +1,27 @@
#include "TCPHandler.h"
#include <iomanip>
#include <Poco/Net/NetException.h>
#include <Common/ClickHouseRevision.h>
#include <Common/Stopwatch.h>
#include <IO/Progress.h>
#include <IO/CompressedReadBuffer.h>
#include <IO/CompressedWriteBuffer.h>
#include <IO/ReadBufferFromPocoSocket.h>
#include <IO/WriteBufferFromPocoSocket.h>
#include <IO/CompressionSettings.h>
#include <IO/copyData.h>
#include <DataStreams/AsynchronousBlockInputStream.h>
#include <DataStreams/NativeBlockInputStream.h>
#include <DataStreams/NativeBlockOutputStream.h>
#include <Interpreters/executeQuery.h>
#include <Interpreters/Quota.h>
#include <Interpreters/TablesStatus.h>
#include <Storages/StorageMemory.h>
#include <Storages/StorageReplicatedMergeTree.h>
#include <Common/ClickHouseRevision.h>
#include <Common/Stopwatch.h>
#include <Common/ExternalTable.h>
#include "TCPHandler.h"
#include <Common/NetException.h>
#include <Common/config_version.h>
#include <ext/scope_guard.h>

View File

@ -19,6 +19,7 @@
#include <Common/CurrentMetrics.h>
#include <Common/DNSResolver.h>
#include <Common/StringUtils/StringUtils.h>
#include <Common/config_version.h>
#include <Interpreters/ClientInfo.h>
#include <Common/config.h>

View File

@ -13,7 +13,31 @@
#cmakedefine VERSION_REVISION @VERSION_REVISION@
#endif
#cmakedefine VERSION_NAME "@VERSION_NAME@"
#define DBMS_NAME VERSION_NAME
#cmakedefine VERSION_MAJOR @VERSION_MAJOR@
#cmakedefine VERSION_MINOR @VERSION_MINOR@
#cmakedefine VERSION_PATCH @VERSION_PATCH@
#cmakedefine VERSION_STRING "@VERSION_STRING@"
#cmakedefine VERSION_FULL "@VERSION_FULL@"
#cmakedefine VERSION_DESCRIBE "@VERSION_DESCRIBE@"
#cmakedefine VERSION_GITHASH "@VERSION_GITHASH@"
#if defined(VERSION_MAJOR)
#define DBMS_VERSION_MAJOR VERSION_MAJOR
#else
#define DBMS_VERSION_MAJOR 0
#endif
#if defined(VERSION_MINOR)
#define DBMS_VERSION_MINOR VERSION_MINOR
#else
#define DBMS_VERSION_MINOR 0
#endif
#if defined(VERSION_PATCH)
#define DBMS_VERSION_PATCH VERSION_PATCH
#else
#define DBMS_VERSION_PATCH 0
#endif

View File

@ -1,9 +1,5 @@
#pragma once
#define DBMS_NAME "ClickHouse"
#define DBMS_VERSION_MAJOR 1
#define DBMS_VERSION_MINOR 1
#define DBMS_DEFAULT_HOST "localhost"
#define DBMS_DEFAULT_PORT 9000
#define DBMS_DEFAULT_SECURE_PORT 9440

View File

@ -50,7 +50,7 @@ ClickHouseDictionarySource::ClickHouseDictionarySource(
table{config.getString(config_prefix + ".table")},
where{config.getString(config_prefix + ".where", "")},
update_field{config.getString(config_prefix + ".update_field", "")},
query_builder{dict_struct, db, table, where, ExternalQueryBuilder::Backticks},
query_builder{dict_struct, db, table, where, IdentifierQuotingStyle::Backticks},
sample_block{sample_block}, context(context),
is_local{isLocalAddress({ host, port }, config.getInt("tcp_port", 0))},
pool{is_local ? nullptr : createPool(host, port, secure, db, user, password, context)},
@ -67,7 +67,7 @@ ClickHouseDictionarySource::ClickHouseDictionarySource(const ClickHouseDictionar
db{other.db}, table{other.table},
where{other.where},
update_field{other.update_field},
query_builder{dict_struct, db, table, where, ExternalQueryBuilder::Backticks},
query_builder{dict_struct, db, table, where, IdentifierQuotingStyle::Backticks},
sample_block{other.sample_block}, context(other.context),
is_local{other.is_local},
pool{is_local ? nullptr : createPool(host, port, secure, db, user, password, context)},

View File

@ -22,7 +22,7 @@ ExternalQueryBuilder::ExternalQueryBuilder(
const std::string & db,
const std::string & table,
const std::string & where,
QuotingStyle quoting_style)
IdentifierQuotingStyle quoting_style)
: dict_struct(dict_struct), db(db), table(table), where(where), quoting_style(quoting_style)
{
}
@ -32,15 +32,15 @@ void ExternalQueryBuilder::writeQuoted(const std::string & s, WriteBuffer & out)
{
switch (quoting_style)
{
case None:
case IdentifierQuotingStyle::None:
writeString(s, out);
break;
case Backticks:
case IdentifierQuotingStyle::Backticks:
writeBackQuotedString(s, out);
break;
case DoubleQuotes:
case IdentifierQuotingStyle::DoubleQuotes:
writeDoubleQuotedString(s, out);
break;
}
@ -138,7 +138,7 @@ std::string ExternalQueryBuilder::composeLoadAllQuery() const
}
std::string ExternalQueryBuilder::composeUpdateQuery(const std::string &update_field, const std::string &time_point) const
std::string ExternalQueryBuilder::composeUpdateQuery(const std::string & update_field, const std::string & time_point) const
{
std::string out = composeLoadAllQuery();
std::string update_query;

View File

@ -3,6 +3,7 @@
#include <string>
#include <Formats/FormatSettings.h>
#include <Columns/IColumn.h>
#include <Parsers/IdentifierQuotingStyle.h>
namespace DB
@ -21,16 +22,7 @@ struct ExternalQueryBuilder
const std::string & table;
const std::string & where;
/// Method to quote identifiers.
/// NOTE There could be differences in escaping rules inside quotes. Escaping rules may not match that required by specific external DBMS.
enum QuotingStyle
{
None, /// Write as-is, without quotes.
Backticks, /// `mysql` style
DoubleQuotes /// "postgres" style
};
QuotingStyle quoting_style;
IdentifierQuotingStyle quoting_style;
ExternalQueryBuilder(
@ -38,7 +30,7 @@ struct ExternalQueryBuilder
const std::string & db,
const std::string & table,
const std::string & where,
QuotingStyle quoting_style);
IdentifierQuotingStyle quoting_style);
/** Generate a query to load all data. */
std::string composeLoadAllQuery() const;

View File

@ -35,7 +35,7 @@ MySQLDictionarySource::MySQLDictionarySource(const DictionaryStructure & dict_st
dont_check_update_time{config.getBool(config_prefix + ".dont_check_update_time", false)},
sample_block{sample_block},
pool{config, config_prefix},
query_builder{dict_struct, db, table, where, ExternalQueryBuilder::Backticks},
query_builder{dict_struct, db, table, where, IdentifierQuotingStyle::Backticks},
load_all_query{query_builder.composeLoadAllQuery()},
invalidate_query{config.getString(config_prefix + ".invalidate_query", "")}
{
@ -53,7 +53,7 @@ MySQLDictionarySource::MySQLDictionarySource(const MySQLDictionarySource & other
dont_check_update_time{other.dont_check_update_time},
sample_block{other.sample_block},
pool{other.pool},
query_builder{dict_struct, db, table, where, ExternalQueryBuilder::Backticks},
query_builder{dict_struct, db, table, where, IdentifierQuotingStyle::Backticks},
load_all_query{other.load_all_query}, last_modification{other.last_modification},
invalidate_query{other.invalidate_query}, invalidate_query_response{other.invalidate_query_response}
{

View File

@ -29,7 +29,7 @@ ODBCDictionarySource::ODBCDictionarySource(const DictionaryStructure & dict_stru
where{config.getString(config_prefix + ".where", "")},
update_field{config.getString(config_prefix + ".update_field", "")},
sample_block{sample_block},
query_builder{dict_struct, db, table, where, ExternalQueryBuilder::None}, /// NOTE Better to obtain quoting style via ODBC interface.
query_builder{dict_struct, db, table, where, IdentifierQuotingStyle::None}, /// NOTE Better to obtain quoting style via ODBC interface.
load_all_query{query_builder.composeLoadAllQuery()},
invalidate_query{config.getString(config_prefix + ".invalidate_query", "")}
{
@ -58,7 +58,7 @@ ODBCDictionarySource::ODBCDictionarySource(const ODBCDictionarySource & other)
update_field{other.update_field},
sample_block{other.sample_block},
pool{other.pool},
query_builder{dict_struct, db, table, where, ExternalQueryBuilder::None},
query_builder{dict_struct, db, table, where, IdentifierQuotingStyle::None},
load_all_query{other.load_all_query},
invalidate_query{other.invalidate_query}, invalidate_query_response{other.invalidate_query_response}
{

View File

@ -1,6 +1,7 @@
#include <Common/Exception.h>
#include <IO/WriteHelpers.h>
#include <Formats/BlockInputStreamFromRowInputStream.h>
#include <common/logger_useful.h>
namespace DB
@ -128,4 +129,16 @@ Block BlockInputStreamFromRowInputStream::readImpl()
return sample.cloneWithColumns(std::move(columns));
}
void BlockInputStreamFromRowInputStream::readSuffix()
{
if (allow_errors_num > 0 || allow_errors_ratio > 0)
{
Logger * log = &Logger::get("BlockInputStreamFromRowInputStream");
LOG_TRACE(log, "Skipped " << num_errors << " rows with errors while reading the input stream");
}
row_input->readSuffix();
}
}

View File

@ -4,7 +4,6 @@
#include <DataStreams/IProfilingBlockInputStream.h>
#include <Formats/FormatSettings.h>
#include <Formats/IRowInputStream.h>
#include <common/logger_useful.h>
namespace DB
@ -26,13 +25,7 @@ public:
const FormatSettings & settings);
void readPrefix() override { row_input->readPrefix(); }
void readSuffix() override
{
Logger * log = &Logger::get("BlockInputStreamFromRowInputStream");
LOG_TRACE(log, "Skipped " << num_errors << " rows while reading the input stream");
row_input->readSuffix();
}
void readSuffix() override;
String getName() const override { return "BlockInputStreamFromRowInputStream"; }

View File

@ -43,10 +43,10 @@ private:
/* Action for state machine for traversing nested structures. */
struct Action
{
enum Type { POP, PUSH, READ };
Type type;
capnp::StructSchema::Field field = {};
size_t column = 0;
enum Type { POP, PUSH, READ };
Type type;
capnp::StructSchema::Field field = {};
size_t column = 0;
};
// Wrapper for classes that could throw in destructor
@ -54,10 +54,10 @@ private:
template <typename T>
struct DestructorCatcher
{
T impl;
template <typename ... Arg>
DestructorCatcher(Arg && ... args) : impl(kj::fwd<Arg>(args)...) {}
~DestructorCatcher() noexcept try { } catch (...) { }
T impl;
template <typename ... Arg>
DestructorCatcher(Arg && ... args) : impl(kj::fwd<Arg>(args)...) {}
~DestructorCatcher() noexcept try { } catch (...) { return; }
};
using SchemaParser = DestructorCatcher<capnp::SchemaParser>;

View File

@ -41,6 +41,7 @@ generate_function_register(Array
FunctionArrayEnumerate
FunctionArrayEnumerateUniq
FunctionArrayUniq
FunctionArrayDistinct
FunctionEmptyArrayUInt8
FunctionEmptyArrayUInt16
FunctionEmptyArrayUInt32

View File

@ -1062,9 +1062,7 @@ void FunctionArrayUniq::executeImpl(Block & block, const ColumnNumbers & argumen
|| executeNumber<Float32>(first_array, first_null_map, res_values)
|| executeNumber<Float64>(first_array, first_null_map, res_values)
|| executeString(first_array, first_null_map, res_values)))
throw Exception("Illegal column " + block.getByPosition(arguments[0]).column->getName()
+ " of first argument of function " + getName(),
ErrorCodes::ILLEGAL_COLUMN);
executeHashed(*offsets, original_data_columns, res_values);
}
else
{
@ -1272,6 +1270,213 @@ void FunctionArrayUniq::executeHashed(
}
}
/// Implementation of FunctionArrayDistinct.
FunctionPtr FunctionArrayDistinct::create(const Context &)
{
return std::make_shared<FunctionArrayDistinct>();
}
String FunctionArrayDistinct::getName() const
{
return name;
}
DataTypePtr FunctionArrayDistinct::getReturnTypeImpl(const DataTypes & arguments) const
{
const DataTypeArray * array_type = checkAndGetDataType<DataTypeArray>(arguments[0].get());
if (!array_type)
throw Exception("Argument for function " + getName() + " must be array but it "
" has type " + arguments[0]->getName() + ".",
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
auto nested_type = removeNullable(array_type->getNestedType());
return std::make_shared<DataTypeArray>(nested_type);
}
void FunctionArrayDistinct::executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t /*input_rows_count*/)
{
ColumnPtr array_ptr = block.getByPosition(arguments[0]).column;
const ColumnArray * array = checkAndGetColumn<ColumnArray>(array_ptr.get());
const auto & return_type = block.getByPosition(result).type;
auto res_ptr = return_type->createColumn();
ColumnArray & res = static_cast<ColumnArray &>(*res_ptr);
const IColumn & src_data = array->getData();
const ColumnArray::Offsets & offsets = array->getOffsets();
ColumnRawPtrs original_data_columns;
original_data_columns.push_back(&src_data);
IColumn & res_data = res.getData();
ColumnArray::Offsets & res_offsets = res.getOffsets();
const ColumnNullable * nullable_col = nullptr;
const IColumn * inner_col;
if (src_data.isColumnNullable())
{
nullable_col = static_cast<const ColumnNullable *>(&src_data);
inner_col = &nullable_col->getNestedColumn();
}
else
{
inner_col = &src_data;
}
if (!(executeNumber<UInt8>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeNumber<UInt16>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeNumber<UInt32>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeNumber<UInt64>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeNumber<Int8>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeNumber<Int16>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeNumber<Int32>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeNumber<Int64>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeNumber<Float32>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeNumber<Float64>(*inner_col, offsets, res_data, res_offsets, nullable_col)
|| executeString(*inner_col, offsets, res_data, res_offsets, nullable_col)))
executeHashed(offsets, original_data_columns, res_data, res_offsets);
block.getByPosition(result).column = std::move(res_ptr);
}
template <typename T>
bool FunctionArrayDistinct::executeNumber(const IColumn & src_data,
const ColumnArray::Offsets & src_offsets,
IColumn & res_data_col,
ColumnArray::Offsets & res_offsets,
const ColumnNullable * nullable_col)
{
const ColumnVector<T> * src_data_concrete = checkAndGetColumn<ColumnVector<T>>(&src_data);
if (!src_data_concrete)
{
return false;
}
const PaddedPODArray<T> & values = src_data_concrete->getData();
PaddedPODArray<T> & res_data = typeid_cast<ColumnVector<T> &>(res_data_col).getData();
const PaddedPODArray<UInt8> * src_null_map = nullptr;
if (nullable_col)
{
src_null_map = &static_cast<const ColumnUInt8 *>(&nullable_col->getNullMapColumn())->getData();
}
using Set = ClearableHashSet<T,
DefaultHash<T>,
HashTableGrower<INITIAL_SIZE_DEGREE>,
HashTableAllocatorWithStackMemory<(1ULL << INITIAL_SIZE_DEGREE) * sizeof(T)>>;
Set set;
size_t prev_off = 0;
for (size_t i = 0; i < src_offsets.size(); ++i)
{
set.clear();
size_t off = src_offsets[i];
for (size_t j = prev_off; j < off; ++j)
{
if ((set.find(values[j]) == set.end()) && (!nullable_col || (*src_null_map)[j] == 0))
{
res_data.emplace_back(values[j]);
set.insert(values[j]);
}
}
res_offsets.emplace_back(set.size() + prev_off);
prev_off = off;
}
return true;
}
bool FunctionArrayDistinct::executeString(
const IColumn & src_data,
const ColumnArray::Offsets & src_offsets,
IColumn & res_data_col,
ColumnArray::Offsets & res_offsets,
const ColumnNullable * nullable_col)
{
const ColumnString * src_data_concrete = checkAndGetColumn<ColumnString>(&src_data);
if (!src_data_concrete)
{
return false;
}
ColumnString & res_data_column_string = typeid_cast<ColumnString &>(res_data_col);
using Set = ClearableHashSet<StringRef,
StringRefHash,
HashTableGrower<INITIAL_SIZE_DEGREE>,
HashTableAllocatorWithStackMemory<(1ULL << INITIAL_SIZE_DEGREE) * sizeof(StringRef)>>;
const PaddedPODArray<UInt8> * src_null_map = nullptr;
if (nullable_col)
{
src_null_map = &static_cast<const ColumnUInt8 *>(&nullable_col->getNullMapColumn())->getData();
}
Set set;
size_t prev_off = 0;
for (size_t i = 0; i < src_offsets.size(); ++i)
{
set.clear();
size_t off = src_offsets[i];
for (size_t j = prev_off; j < off; ++j)
{
StringRef str_ref = src_data_concrete->getDataAt(j);
if (set.find(str_ref) == set.end() && (!nullable_col || (*src_null_map)[j] == 0))
{
set.insert(str_ref);
res_data_column_string.insertData(str_ref.data, str_ref.size);
}
}
res_offsets.emplace_back(set.size() + prev_off);
prev_off = off;
}
return true;
}
void FunctionArrayDistinct::executeHashed(
const ColumnArray::Offsets & offsets,
const ColumnRawPtrs & columns,
IColumn & res_data_col,
ColumnArray::Offsets & res_offsets)
{
size_t count = columns.size();
using Set = ClearableHashSet<UInt128, UInt128TrivialHash, HashTableGrower<INITIAL_SIZE_DEGREE>,
HashTableAllocatorWithStackMemory<(1ULL << INITIAL_SIZE_DEGREE) * sizeof(UInt128)>>;
Set set;
size_t prev_off = 0;
for (size_t i = 0; i < offsets.size(); ++i)
{
set.clear();
size_t off = offsets[i];
for (size_t j = prev_off; j < off; ++j)
{
auto hash = hash128(j, count, columns);
if (set.find(hash) == set.end())
{
set.insert(hash);
res_data_col.insertFrom(*columns[0], j);
}
}
res_offsets.emplace_back(set.size() + prev_off);
prev_off = off;
}
}
/// Implementation of FunctionArrayEnumerateUniq.
FunctionPtr FunctionArrayEnumerateUniq::create(const Context &)
@ -1334,13 +1539,7 @@ void FunctionArrayEnumerateUniq::executeImpl(Block & block, const ColumnNumbers
ErrorCodes::SIZES_OF_ARRAYS_DOESNT_MATCH);
auto * array_data = &array->getData();
if (auto * tuple_column = checkAndGetColumn<ColumnTuple>(array_data))
{
for (const auto & element : tuple_column->getColumns())
data_columns.push_back(element.get());
}
else
data_columns.push_back(array_data);
data_columns.push_back(array_data);
}
size_t num_columns = data_columns.size();
@ -1383,9 +1582,7 @@ void FunctionArrayEnumerateUniq::executeImpl(Block & block, const ColumnNumbers
|| executeNumber<Float32>(first_array, first_null_map, res_values)
|| executeNumber<Float64>(first_array, first_null_map, res_values)
|| executeString (first_array, first_null_map, res_values)))
throw Exception("Illegal column " + block.getByPosition(arguments[0]).column->getName()
+ " of first argument of function " + getName(),
ErrorCodes::ILLEGAL_COLUMN);
executeHashed(*offsets, original_data_columns, res_values);
}
else
{
@ -2427,8 +2624,6 @@ void FunctionArrayReduce::executeImpl(Block & block, const ColumnNumbers & argum
std::vector<const IColumn *> aggregate_arguments_vec(num_arguments_columns);
const ColumnArray::Offsets * offsets = nullptr;
bool is_const = true;
for (size_t i = 0; i < num_arguments_columns; ++i)
{
const IColumn * col = block.getByPosition(arguments[i + 1]).column.get();
@ -2437,7 +2632,6 @@ void FunctionArrayReduce::executeImpl(Block & block, const ColumnNumbers & argum
{
aggregate_arguments_vec[i] = &arr->getData();
offsets_i = &arr->getOffsets();
is_const = false;
}
else if (const ColumnConst * const_arr = checkAndGetColumnConst<ColumnArray>(col))
{
@ -2493,14 +2687,7 @@ void FunctionArrayReduce::executeImpl(Block & block, const ColumnNumbers & argum
current_offset = next_offset;
}
if (!is_const)
{
block.getByPosition(result).column = std::move(result_holder);
}
else
{
block.getByPosition(result).column = block.getByPosition(result).type->createColumnConst(rows, res_col[0]);
}
block.getByPosition(result).column = std::move(result_holder);
}
/// Implementation of FunctionArrayConcat.

View File

@ -46,6 +46,8 @@ namespace ErrorCodes
* arrayUniq(arr) - counts the number of different elements in the array,
* arrayUniq(arr1, arr2, ...) - counts the number of different tuples from the elements in the corresponding positions in several arrays.
*
* arrayDistinct(arr) - retrun different elements in an array
*
* arrayEnumerateUniq(arr)
* - outputs an array parallel (having same size) to this, where for each element specified
* how many times this element was encountered before (including this element) among elements with the same value.
@ -1210,6 +1212,52 @@ private:
};
/// Find different elements in an array.
class FunctionArrayDistinct : public IFunction
{
public:
static constexpr auto name = "arrayDistinct";
static FunctionPtr create(const Context & context);
String getName() const override;
bool isVariadic() const override { return false; }
size_t getNumberOfArguments() const override { return 1; }
bool useDefaultImplementationForConstants() const override { return true; }
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override;
void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) override;
private:
/// Initially allocate a piece of memory for 512 elements. NOTE: This is just a guess.
static constexpr size_t INITIAL_SIZE_DEGREE = 9;
template <typename T>
bool executeNumber(
const IColumn & src_data,
const ColumnArray::Offsets & src_offsets,
IColumn & res_data_col,
ColumnArray::Offsets & res_offsets,
const ColumnNullable * nullable_col);
bool executeString(
const IColumn & src_data,
const ColumnArray::Offsets & src_offsets,
IColumn & res_data_col,
ColumnArray::Offsets & res_offsets,
const ColumnNullable * nullable_col);
void executeHashed(
const ColumnArray::Offsets & offsets,
const ColumnRawPtrs & columns,
IColumn & res_data_col,
ColumnArray::Offsets & res_offsets);
};
class FunctionArrayEnumerateUniq : public IFunction
{
public:
@ -1384,6 +1432,9 @@ public:
bool isVariadic() const override { return true; }
size_t getNumberOfArguments() const override { return 0; }
bool useDefaultImplementationForConstants() const override { return true; }
ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {0}; }
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override;
void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) override;

View File

@ -15,6 +15,7 @@
#include <Common/UnicodeBar.h>
#include <Common/UTF8Helpers.h>
#include <Common/FieldVisitors.h>
#include <Common/config_version.h>
#include <DataTypes/DataTypeAggregateFunction.h>
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypeDate.h>
@ -1867,9 +1868,7 @@ public:
std::string FunctionVersion::getVersion() const
{
std::ostringstream os;
os << DBMS_VERSION_MAJOR << "." << DBMS_VERSION_MINOR << "." << ClickHouseRevision::get();
return os.str();
return VERSION_STRING;
}

View File

@ -640,7 +640,6 @@ inline void readBinary(String & x, ReadBuffer & buf) { readStringBinary(x, buf);
inline void readBinary(UInt128 & x, ReadBuffer & buf) { readPODBinary(x, buf); }
inline void readBinary(UInt256 & x, ReadBuffer & buf) { readPODBinary(x, buf); }
inline void readBinary(LocalDate & x, ReadBuffer & buf) { readPODBinary(x, buf); }
inline void readBinary(LocalDateTime & x, ReadBuffer & buf) { readPODBinary(x, buf); }
/// Generic methods to read value in text tab-separated format.

View File

@ -394,11 +394,13 @@ inline void writeBackQuotedString(const String & s, WriteBuffer & buf)
writeAnyQuotedString<'`'>(s, buf);
}
/// The same, but backquotes apply only if there are characters that do not match the identifier without backquotes.
inline void writeProbablyBackQuotedString(const String & s, WriteBuffer & buf)
/// The same, but quotes apply only if there are characters that do not match the identifier without quotes.
template <typename F>
inline void writeProbablyQuotedStringImpl(const String & s, WriteBuffer & buf, F && write_quoted_string)
{
if (s.empty() || !isValidIdentifierBegin(s[0]))
writeBackQuotedString(s, buf);
write_quoted_string(s, buf);
else
{
const char * pos = s.data() + 1;
@ -407,12 +409,22 @@ inline void writeProbablyBackQuotedString(const String & s, WriteBuffer & buf)
if (!isWordCharASCII(*pos))
break;
if (pos != end)
writeBackQuotedString(s, buf);
write_quoted_string(s, buf);
else
writeString(s, buf);
}
}
inline void writeProbablyBackQuotedString(const String & s, WriteBuffer & buf)
{
writeProbablyQuotedStringImpl(s, buf, [](const String & s, WriteBuffer & buf) { return writeBackQuotedString(s, buf); });
}
inline void writeProbablyDoubleQuotedString(const String & s, WriteBuffer & buf)
{
writeProbablyQuotedStringImpl(s, buf, [](const String & s, WriteBuffer & buf) { return writeDoubleQuotedString(s, buf); });
}
/** Outputs the string in for the CSV format.
* Rules:

View File

@ -6,6 +6,7 @@
#include <Core/Defines.h>
#include <Common/getFQDNOrHostName.h>
#include <Common/ClickHouseRevision.h>
#include <Common/config_version.h>
#include <port/unistd.h>

View File

@ -158,7 +158,7 @@ ExpressionAction ExpressionAction::ordinaryJoin(std::shared_ptr<const Join> join
void ExpressionAction::prepare(Block & sample_block)
{
// std::cerr << "preparing: " << toString() << std::endl;
// std::cerr << "preparing: " << toString() << std::endl;
/** Constant expressions should be evaluated, and put the result in sample_block.
*/
@ -322,8 +322,6 @@ size_t ExpressionAction::getInputRowsCount(Block & block, std::unordered_map<std
void ExpressionAction::execute(Block & block, std::unordered_map<std::string, size_t> & input_rows_counts) const
{
// std::cerr << "executing: " << toString() << std::endl;
size_t input_rows_count = getInputRowsCount(block, input_rows_counts);
if (type == REMOVE_COLUMN || type == COPY_COLUMN)

View File

@ -1477,18 +1477,7 @@ void ExpressionAnalyzer::tryMakeSetForIndexFromSubquery(const ASTPtr & subquery_
{
BlockIO res = interpretSubquery(subquery_or_table_name, context, subquery_depth + 1, {})->execute();
SizeLimits set_for_index_size_limits;
if (settings.use_index_for_in_with_subqueries_max_values && settings.use_index_for_in_with_subqueries_max_values < settings.max_rows_in_set)
{
/// Silently cancel creating the set for index if the specific limit has been reached.
set_for_index_size_limits = SizeLimits(settings.use_index_for_in_with_subqueries_max_values, settings.max_bytes_in_set, OverflowMode::BREAK);
}
else
{
/// If the limit specific for set for index is lower than general limits for set - use general limit.
set_for_index_size_limits = SizeLimits(settings.max_rows_in_set, settings.max_bytes_in_set, settings.set_overflow_mode);
}
SizeLimits set_for_index_size_limits = SizeLimits(settings.max_rows_in_set, settings.max_bytes_in_set, settings.set_overflow_mode);
SetPtr set = std::make_shared<Set>(set_for_index_size_limits, true);
set->setHeader(res.in->getHeader());
@ -2071,6 +2060,7 @@ void ExpressionAnalyzer::getActionsImpl(const ASTPtr & ast, bool no_subqueries,
ColumnWithTypeAndName fake_column;
fake_column.name = projection_manipulator->getColumnName(getColumnName());
fake_column.type = std::make_shared<DataTypeUInt8>();
fake_column.column = fake_column.type->createColumn();
actions_stack.addAction(ExpressionAction::addColumn(fake_column, projection_manipulator->getProjectionSourceColumn(), false));
getActionsImpl(node->arguments->children.at(0), no_subqueries, only_consts, actions_stack,
projection_manipulator);

View File

@ -81,21 +81,25 @@ InterpreterSelectQuery::InterpreterSelectQuery(
}
InterpreterSelectQuery::InterpreterSelectQuery(OnlyAnalyzeTag, const ASTPtr & query_ptr_, const Context & context_)
: query_ptr(query_ptr_->clone())
, query(typeid_cast<ASTSelectQuery &>(*query_ptr))
, context(context_)
, to_stage(QueryProcessingStage::Complete)
, subquery_depth(0)
, only_analyze(true)
, log(&Logger::get("InterpreterSelectQuery"))
{
init({});
}
InterpreterSelectQuery::~InterpreterSelectQuery() = default;
/** There are no limits on the maximum size of the result for the subquery.
* Since the result of the query is not the result of the entire query.
*/
static Context getSubqueryContext(const Context & context)
{
Context subquery_context = context;
Settings subquery_settings = context.getSettings();
subquery_settings.max_result_rows = 0;
subquery_settings.max_result_bytes = 0;
/// The calculation of extremes does not make sense and is not necessary (if you do it, then the extremes of the subquery can be taken for whole query).
subquery_settings.extremes = 0;
subquery_context.setSettings(subquery_settings);
return subquery_context;
}
void InterpreterSelectQuery::init(const Names & required_result_column_names)
{
if (!context.hasQueryContext())
@ -111,17 +115,19 @@ void InterpreterSelectQuery::init(const Names & required_result_column_names)
max_streams = settings.max_threads;
const auto & table_expression = query.table();
NamesAndTypesList source_columns;
if (input)
{
/// Read from prepared input.
source_columns = input->getHeader().getNamesAndTypesList();
source_header = input->getHeader();
}
else if (table_expression && typeid_cast<const ASTSelectWithUnionQuery *>(table_expression.get()))
{
/// Read from subquery.
source_columns = InterpreterSelectWithUnionQuery::getSampleBlock(table_expression, context).getNamesAndTypesList();
interpreter_subquery = std::make_unique<InterpreterSelectWithUnionQuery>(
table_expression, getSubqueryContext(context), required_columns, QueryProcessingStage::Complete, subquery_depth + 1, only_analyze);
source_header = interpreter_subquery->getSampleBlock();
}
else if (table_expression && typeid_cast<const ASTFunction *>(table_expression.get()))
{
@ -143,7 +149,7 @@ void InterpreterSelectQuery::init(const Names & required_result_column_names)
table_lock = storage->lockStructure(false, __PRETTY_FUNCTION__);
query_analyzer = std::make_unique<ExpressionAnalyzer>(
query_ptr, context, storage, source_columns, required_result_column_names, subquery_depth, !only_analyze);
query_ptr, context, storage, source_header.getNamesAndTypesList(), required_result_column_names, subquery_depth, !only_analyze);
if (!only_analyze)
{
@ -161,6 +167,25 @@ void InterpreterSelectQuery::init(const Names & required_result_column_names)
if (!context.tryGetExternalTable(it.first))
context.addExternalTable(it.first, it.second);
}
if (interpreter_subquery)
{
/// If there is an aggregation in the outer query, WITH TOTALS is ignored in the subquery.
if (query_analyzer->hasAggregation())
interpreter_subquery->ignoreWithTotals();
}
required_columns = query_analyzer->getRequiredSourceColumns();
if (storage)
source_header = storage->getSampleBlockForColumns(required_columns);
/// Calculate structure of the result.
{
Pipeline pipeline;
executeImpl(pipeline, input, true);
result_header = pipeline.firstStream()->getHeader();
}
}
@ -194,23 +219,14 @@ void InterpreterSelectQuery::getDatabaseAndTableNames(String & database_name, St
Block InterpreterSelectQuery::getSampleBlock()
{
Pipeline pipeline;
executeImpl(pipeline, input, true);
auto res = pipeline.firstStream()->getHeader();
return res;
}
Block InterpreterSelectQuery::getSampleBlock(const ASTPtr & query_ptr_, const Context & context_)
{
return InterpreterSelectQuery(OnlyAnalyzeTag(), query_ptr_, context_).getSampleBlock();
return result_header;
}
BlockIO InterpreterSelectQuery::execute()
{
Pipeline pipeline;
executeImpl(pipeline, input, false);
executeImpl(pipeline, input, only_analyze);
executeUnion(pipeline);
BlockIO res;
@ -221,12 +237,12 @@ BlockIO InterpreterSelectQuery::execute()
BlockInputStreams InterpreterSelectQuery::executeWithMultipleStreams()
{
Pipeline pipeline;
executeImpl(pipeline, input, false);
executeImpl(pipeline, input, only_analyze);
return pipeline.streams;
}
InterpreterSelectQuery::AnalysisResult InterpreterSelectQuery::analyzeExpressions(QueryProcessingStage::Enum from_stage)
InterpreterSelectQuery::AnalysisResult InterpreterSelectQuery::analyzeExpressions(QueryProcessingStage::Enum from_stage, bool dry_run)
{
AnalysisResult res;
@ -247,16 +263,16 @@ InterpreterSelectQuery::AnalysisResult InterpreterSelectQuery::analyzeExpression
res.need_aggregate = query_analyzer->hasAggregation();
query_analyzer->appendArrayJoin(chain, !res.first_stage);
query_analyzer->appendArrayJoin(chain, dry_run || !res.first_stage);
if (query_analyzer->appendJoin(chain, !res.first_stage))
if (query_analyzer->appendJoin(chain, dry_run || !res.first_stage))
{
res.has_join = true;
res.before_join = chain.getLastActions();
chain.addStep();
}
if (query_analyzer->appendWhere(chain, !res.first_stage))
if (query_analyzer->appendWhere(chain, dry_run || !res.first_stage))
{
res.has_where = true;
res.before_where = chain.getLastActions();
@ -265,14 +281,14 @@ InterpreterSelectQuery::AnalysisResult InterpreterSelectQuery::analyzeExpression
if (res.need_aggregate)
{
query_analyzer->appendGroupBy(chain, !res.first_stage);
query_analyzer->appendAggregateFunctionsArguments(chain, !res.first_stage);
query_analyzer->appendGroupBy(chain, dry_run || !res.first_stage);
query_analyzer->appendAggregateFunctionsArguments(chain, dry_run || !res.first_stage);
res.before_aggregation = chain.getLastActions();
chain.finalize();
chain.clear();
if (query_analyzer->appendHaving(chain, !res.second_stage))
if (query_analyzer->appendHaving(chain, dry_run || !res.second_stage))
{
res.has_having = true;
res.before_having = chain.getLastActions();
@ -281,13 +297,13 @@ InterpreterSelectQuery::AnalysisResult InterpreterSelectQuery::analyzeExpression
}
/// If there is aggregation, we execute expressions in SELECT and ORDER BY on the initiating server, otherwise on the source servers.
query_analyzer->appendSelect(chain, res.need_aggregate ? !res.second_stage : !res.first_stage);
query_analyzer->appendSelect(chain, dry_run || (res.need_aggregate ? !res.second_stage : !res.first_stage));
res.selected_columns = chain.getLastStep().required_output;
res.has_order_by = query_analyzer->appendOrderBy(chain, res.need_aggregate ? !res.second_stage : !res.first_stage);
res.has_order_by = query_analyzer->appendOrderBy(chain, dry_run || (res.need_aggregate ? !res.second_stage : !res.first_stage));
res.before_order_and_select = chain.getLastActions();
chain.addStep();
if (query_analyzer->appendLimitBy(chain, !res.second_stage))
if (query_analyzer->appendLimitBy(chain, dry_run || !res.second_stage))
{
res.has_limit_by = true;
res.before_limit_by = chain.getLastActions();
@ -328,16 +344,25 @@ void InterpreterSelectQuery::executeImpl(Pipeline & pipeline, const BlockInputSt
* then perform the remaining operations with one resulting stream.
*/
/** Read the data from Storage. from_stage - to what stage the request was completed in Storage. */
QueryProcessingStage::Enum from_stage = executeFetchColumns(pipeline, dry_run);
AnalysisResult expressions;
if (from_stage == QueryProcessingStage::WithMergeableState && to_stage == QueryProcessingStage::WithMergeableState)
throw Exception("Distributed on Distributed is not supported", ErrorCodes::NOT_IMPLEMENTED);
if (dry_run)
{
pipeline.streams.emplace_back(std::make_shared<NullBlockInputStream>(source_header));
expressions = analyzeExpressions(QueryProcessingStage::FetchColumns, true);
}
else
{
/** Read the data from Storage. from_stage - to what stage the request was completed in Storage. */
QueryProcessingStage::Enum from_stage = executeFetchColumns(pipeline);
if (from_stage == QueryProcessingStage::WithMergeableState && to_stage == QueryProcessingStage::WithMergeableState)
throw Exception("Distributed on Distributed is not supported", ErrorCodes::NOT_IMPLEMENTED);
if (!dry_run)
LOG_TRACE(log, QueryProcessingStage::toString(from_stage) << " -> " << QueryProcessingStage::toString(to_stage));
AnalysisResult expressions = analyzeExpressions(from_stage);
expressions = analyzeExpressions(from_stage, false);
}
const Settings & settings = context.getSettingsRef();
@ -502,11 +527,8 @@ static void getLimitLengthAndOffset(ASTSelectQuery & query, size_t & length, siz
}
}
QueryProcessingStage::Enum InterpreterSelectQuery::executeFetchColumns(Pipeline & pipeline, bool dry_run)
QueryProcessingStage::Enum InterpreterSelectQuery::executeFetchColumns(Pipeline & pipeline)
{
/// List of columns to read to execute the query.
Names required_columns = query_analyzer->getRequiredSourceColumns();
/// Actions to calculate ALIAS if required.
ExpressionActionsPtr alias_actions;
/// Are ALIAS columns required for query execution?
@ -546,36 +568,11 @@ QueryProcessingStage::Enum InterpreterSelectQuery::executeFetchColumns(Pipeline
}
}
/// The subquery interpreter, if the subquery
std::unique_ptr<InterpreterSelectWithUnionQuery> interpreter_subquery;
auto query_table = query.table();
if (query_table && typeid_cast<ASTSelectWithUnionQuery *>(query_table.get()))
{
/** There are no limits on the maximum size of the result for the subquery.
* Since the result of the query is not the result of the entire query.
*/
Context subquery_context = context;
Settings subquery_settings = context.getSettings();
subquery_settings.max_result_rows = 0;
subquery_settings.max_result_bytes = 0;
/// The calculation of extremes does not make sense and is not necessary (if you do it, then the extremes of the subquery can be taken for whole query).
subquery_settings.extremes = 0;
subquery_context.setSettings(subquery_settings);
interpreter_subquery = std::make_unique<InterpreterSelectWithUnionQuery>(
query_table, subquery_context, required_columns, QueryProcessingStage::Complete, subquery_depth + 1);
/// If there is an aggregation in the outer query, WITH TOTALS is ignored in the subquery.
if (query_analyzer->hasAggregation())
interpreter_subquery->ignoreWithTotals();
}
const Settings & settings = context.getSettingsRef();
/// Limitation on the number of columns to read.
/// It's not applied in 'dry_run' mode, because the query could be analyzed without removal of unnecessary columns.
if (!dry_run && settings.max_columns_to_read && required_columns.size() > settings.max_columns_to_read)
/// It's not applied in 'only_analyze' mode, because the query could be analyzed without removal of unnecessary columns.
if (!only_analyze && settings.max_columns_to_read && required_columns.size() > settings.max_columns_to_read)
throw Exception("Limit for number of columns to read exceeded. "
"Requested: " + toString(required_columns.size())
+ ", maximum: " + settings.max_columns_to_read.toString(),
@ -631,10 +628,17 @@ QueryProcessingStage::Enum InterpreterSelectQuery::executeFetchColumns(Pipeline
{
/// Subquery.
if (!dry_run)
pipeline.streams = interpreter_subquery->executeWithMultipleStreams();
else
pipeline.streams.emplace_back(std::make_shared<NullBlockInputStream>(interpreter_subquery->getSampleBlock()));
/// If we need less number of columns that subquery have - update the interpreter.
if (required_columns.size() < source_header.columns())
{
interpreter_subquery = std::make_unique<InterpreterSelectWithUnionQuery>(
query.table(), getSubqueryContext(context), required_columns, QueryProcessingStage::Complete, subquery_depth + 1, only_analyze);
if (query_analyzer->hasAggregation())
interpreter_subquery->ignoreWithTotals();
}
pipeline.streams = interpreter_subquery->executeWithMultipleStreams();
}
else if (storage)
{
@ -668,8 +672,7 @@ QueryProcessingStage::Enum InterpreterSelectQuery::executeFetchColumns(Pipeline
optimize_prewhere(*merge_tree);
}
if (!dry_run)
pipeline.streams = storage->read(required_columns, query_info, context, from_stage, max_block_size, max_streams);
pipeline.streams = storage->read(required_columns, query_info, context, from_stage, max_block_size, max_streams);
if (pipeline.streams.empty())
pipeline.streams.emplace_back(std::make_shared<NullBlockInputStream>(storage->getSampleBlockForColumns(required_columns)));

View File

@ -1,5 +1,7 @@
#pragma once
#include <memory>
#include <Core/QueryProcessingStage.h>
#include <Interpreters/Context.h>
#include <Interpreters/IInterpreter.h>
@ -16,6 +18,7 @@ namespace DB
class ExpressionAnalyzer;
class ASTSelectQuery;
struct SubqueryForSet;
class InterpreterSelectWithUnionQuery;
/** Interprets the SELECT query. Returns the stream of blocks with the results of the query before `to_stage` stage.
@ -63,10 +66,6 @@ public:
Block getSampleBlock();
static Block getSampleBlock(
const ASTPtr & query_ptr_,
const Context & context_);
void ignoreWithTotals();
private:
@ -103,12 +102,6 @@ private:
}
};
struct OnlyAnalyzeTag {};
InterpreterSelectQuery(
OnlyAnalyzeTag,
const ASTPtr & query_ptr_,
const Context & context_);
void init(const Names & required_result_column_names);
void executeImpl(Pipeline & pipeline, const BlockInputStreamPtr & input, bool dry_run);
@ -142,7 +135,7 @@ private:
SubqueriesForSets subqueries_for_sets;
};
AnalysisResult analyzeExpressions(QueryProcessingStage::Enum from_stage);
AnalysisResult analyzeExpressions(QueryProcessingStage::Enum from_stage, bool dry_run);
/** From which table to read. With JOIN, the "left" table is returned.
@ -155,7 +148,7 @@ private:
void executeWithMultipleStreamsImpl(Pipeline & pipeline, const BlockInputStreamPtr & input, bool dry_run);
/// Fetch data from the table. Returns the stage to which the query was processed in Storage.
QueryProcessingStage::Enum executeFetchColumns(Pipeline & pipeline, bool dry_run);
QueryProcessingStage::Enum executeFetchColumns(Pipeline & pipeline);
void executeWhere(Pipeline & pipeline, const ExpressionActionsPtr & expression);
void executeAggregation(Pipeline & pipeline, const ExpressionActionsPtr & expression, bool overflow_row, bool final);
@ -195,6 +188,16 @@ private:
/// The object was created only for query analysis.
bool only_analyze = false;
/// List of columns to read to execute the query.
Names required_columns;
/// Structure of query source (table, subquery, etc).
Block source_header;
/// Structure of query result.
Block result_header;
/// The subquery interpreter, if the subquery
std::unique_ptr<InterpreterSelectWithUnionQuery> interpreter_subquery;
/// Table from where to read data, if not subquery.
StoragePtr storage;
TableStructureReadLockPtr table_lock;

View File

@ -26,7 +26,8 @@ InterpreterSelectWithUnionQuery::InterpreterSelectWithUnionQuery(
const Context & context_,
const Names & required_result_column_names,
QueryProcessingStage::Enum to_stage_,
size_t subquery_depth_)
size_t subquery_depth_,
bool only_analyze)
: query_ptr(query_ptr_),
context(context_),
to_stage(to_stage_),
@ -56,7 +57,7 @@ InterpreterSelectWithUnionQuery::InterpreterSelectWithUnionQuery(
/// We use it to determine positions of 'required_result_column_names' in SELECT clause.
Block full_result_header = InterpreterSelectQuery(
ast.list_of_selects->children.at(0), context, Names(), to_stage, subquery_depth).getSampleBlock();
ast.list_of_selects->children.at(0), context, Names(), to_stage, subquery_depth, nullptr, true).getSampleBlock();
std::vector<size_t> positions_of_required_result_columns(required_result_column_names.size());
for (size_t required_result_num = 0, size = required_result_column_names.size(); required_result_num < size; ++required_result_num)
@ -65,7 +66,7 @@ InterpreterSelectWithUnionQuery::InterpreterSelectWithUnionQuery(
for (size_t query_num = 1; query_num < num_selects; ++query_num)
{
Block full_result_header_for_current_select = InterpreterSelectQuery(
ast.list_of_selects->children.at(query_num), context, Names(), to_stage, subquery_depth).getSampleBlock();
ast.list_of_selects->children.at(query_num), context, Names(), to_stage, subquery_depth, nullptr, true).getSampleBlock();
if (full_result_header_for_current_select.columns() != full_result_header.columns())
throw Exception("Different number of columns in UNION ALL elements", ErrorCodes::UNION_ALL_RESULT_STRUCTURES_MISMATCH);
@ -83,7 +84,7 @@ InterpreterSelectWithUnionQuery::InterpreterSelectWithUnionQuery(
: required_result_column_names_for_other_selects[query_num];
nested_interpreters.emplace_back(std::make_unique<InterpreterSelectQuery>(
ast.list_of_selects->children.at(query_num), context, current_required_result_column_names, to_stage, subquery_depth));
ast.list_of_selects->children.at(query_num), context, current_required_result_column_names, to_stage, subquery_depth, nullptr, only_analyze));
}
/// Determine structure of result.
@ -165,7 +166,7 @@ Block InterpreterSelectWithUnionQuery::getSampleBlock(
return cache[key];
}
return cache[key] = InterpreterSelectWithUnionQuery(query_ptr, context).getSampleBlock();
return cache[key] = InterpreterSelectWithUnionQuery(query_ptr, context, {}, QueryProcessingStage::Complete, 0, true).getSampleBlock();
}

View File

@ -21,7 +21,8 @@ public:
const Context & context_,
const Names & required_result_column_names = Names{},
QueryProcessingStage::Enum to_stage_ = QueryProcessingStage::Complete,
size_t subquery_depth_ = 0);
size_t subquery_depth_ = 0,
bool only_analyze = false);
~InterpreterSelectWithUnionQuery();

View File

@ -20,10 +20,10 @@ Block QueryLogElement::createBlock()
{
return
{
{ColumnUInt8::create(), std::make_shared<DataTypeUInt8>(), "type"},
{ColumnUInt16::create(), std::make_shared<DataTypeDate>(), "event_date"},
{ColumnUInt32::create(), std::make_shared<DataTypeDateTime>(), "event_time"},
{ColumnUInt32::create(), std::make_shared<DataTypeDateTime>(), "query_start_time"},
{ColumnUInt8::create(), std::make_shared<DataTypeUInt8>(), "type"},
{ColumnUInt16::create(), std::make_shared<DataTypeDate>(), "event_date"},
{ColumnUInt32::create(), std::make_shared<DataTypeDateTime>(), "event_time"},
{ColumnUInt32::create(), std::make_shared<DataTypeDateTime>(), "query_start_time"},
{ColumnUInt64::create(), std::make_shared<DataTypeUInt64>(), "query_duration_ms"},
{ColumnUInt64::create(), std::make_shared<DataTypeUInt64>(), "read_rows"},
@ -41,7 +41,7 @@ Block QueryLogElement::createBlock()
{ColumnString::create(), std::make_shared<DataTypeString>(), "exception"},
{ColumnString::create(), std::make_shared<DataTypeString>(), "stack_trace"},
{ColumnUInt8::create(), std::make_shared<DataTypeUInt8>(), "is_initial_query"},
{ColumnUInt8::create(), std::make_shared<DataTypeUInt8>(), "is_initial_query"},
{ColumnString::create(), std::make_shared<DataTypeString>(), "user"},
{ColumnString::create(), std::make_shared<DataTypeString>(), "query_id"},
@ -53,14 +53,14 @@ Block QueryLogElement::createBlock()
{ColumnFixedString::create(16), std::make_shared<DataTypeFixedString>(16), "initial_address"},
{ColumnUInt16::create(), std::make_shared<DataTypeUInt16>(), "initial_port"},
{ColumnUInt8::create(), std::make_shared<DataTypeUInt8>(), "interface"},
{ColumnUInt8::create(), std::make_shared<DataTypeUInt8>(), "interface"},
{ColumnString::create(), std::make_shared<DataTypeString>(), "os_user"},
{ColumnString::create(), std::make_shared<DataTypeString>(), "client_hostname"},
{ColumnString::create(), std::make_shared<DataTypeString>(), "client_name"},
{ColumnUInt32::create(), std::make_shared<DataTypeUInt32>(), "client_revision"},
{ColumnUInt8::create(), std::make_shared<DataTypeUInt8>(), "http_method"},
{ColumnUInt8::create(), std::make_shared<DataTypeUInt8>(), "http_method"},
{ColumnString::create(), std::make_shared<DataTypeString>(), "http_user_agent"},
{ColumnString::create(), std::make_shared<DataTypeString>(), "quota_key"},

View File

@ -187,8 +187,6 @@ struct Settings
M(SettingSeconds, http_receive_timeout, DEFAULT_HTTP_READ_BUFFER_TIMEOUT, "HTTP receive timeout") \
M(SettingBool, optimize_throw_if_noop, false, "If setting is enabled and OPTIMIZE query didn't actually assign a merge then an explanatory exception is thrown") \
M(SettingBool, use_index_for_in_with_subqueries, true, "Try using an index if there is a subquery or a table expression on the right side of the IN operator.") \
M(SettingUInt64, use_index_for_in_with_subqueries_max_values, 100000, "Don't use index of a table for filtering by right hand size of the IN operator if the size of set is larger than specified threshold. This allows to avoid performance degradation and higher memory usage due to preparation of additional data structures.") \
\
M(SettingBool, empty_result_for_aggregation_by_empty_set, false, "Return empty result when aggregating without keys on empty set.") \
M(SettingBool, allow_distributed_ddl, true, "If it is set to true, then a user is allowed to executed distributed DDL queries.") \
M(SettingUInt64, odbc_max_field_size, 1024, "Max size of filed can be read from ODBC dictionary. Long strings are truncated.") \

View File

@ -13,7 +13,7 @@ void ASTIdentifier::formatImplWithoutAlias(const FormatSettings & settings, Form
settings.ostr << (settings.hilite ? hilite_identifier : "");
WriteBufferFromOStream wb(settings.ostr, 32);
writeProbablyBackQuotedString(name, wb);
settings.writeIdentifier(name, wb);
wb.next();
settings.ostr << (settings.hilite ? hilite_none : "");

View File

@ -6,6 +6,18 @@
namespace DB
{
void ASTWithAlias::writeAlias(const String & name, const FormatSettings & settings) const
{
settings.ostr << (settings.hilite ? hilite_keyword : "") << " AS " << (settings.hilite ? hilite_alias : "");
WriteBufferFromOStream wb(settings.ostr, 32);
settings.writeIdentifier(name, wb);
wb.next();
settings.ostr << (settings.hilite ? hilite_none : "");
}
void ASTWithAlias::formatImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const
{
if (!alias.empty())
@ -14,7 +26,7 @@ void ASTWithAlias::formatImpl(const FormatSettings & settings, FormatState & sta
if (!state.printed_asts_with_alias.emplace(frame.current_select, alias).second)
{
WriteBufferFromOStream wb(settings.ostr, 32);
writeProbablyBackQuotedString(alias, wb);
settings.writeIdentifier(alias, wb);
return;
}
}
@ -27,7 +39,7 @@ void ASTWithAlias::formatImpl(const FormatSettings & settings, FormatState & sta
if (!alias.empty())
{
writeAlias(alias, settings.ostr, settings.hilite);
writeAlias(alias, settings);
if (frame.need_parens)
settings.ostr <<')';
}

View File

@ -32,6 +32,8 @@ public:
protected:
virtual void appendColumnNameImpl(WriteBuffer & ostr) const = 0;
void writeAlias(const String & name, const FormatSettings & settings) const;
};
/// helper for setting aliases and chaining result to other functions

View File

@ -1,6 +1,8 @@
#include <errno.h>
#include <cstdlib>
#include <Poco/String.h>
#include <IO/ReadHelpers.h>
#include <IO/ReadBufferFromMemory.h>
@ -236,6 +238,17 @@ bool ParserFunction::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
, ErrorCodes::SYNTAX_ERROR);
}
/// Temporary compatibility fix for Yandex.Metrika.
/// When we have a query with
/// cast(x, 'Type')
/// when cast is not in uppercase and when expression is written as a function, not as operator like cast(x AS Type)
/// and newer ClickHouse server (1.1.54388) interacts with older ClickHouse server (1.1.54381) in distributed query,
/// then exception was thrown.
auto & identifier_concrete = typeid_cast<ASTIdentifier &>(*identifier);
if (Poco::toLower(identifier_concrete.name) == "cast")
identifier_concrete.name = "CAST";
/// The parametric aggregate function has two lists (parameters and arguments) in parentheses. Example: quantile(0.9)(x).
if (pos->type == TokenType::OpeningRoundBracket)
{

View File

@ -13,6 +13,7 @@ namespace ErrorCodes
{
extern const int TOO_BIG_AST;
extern const int TOO_DEEP_AST;
extern const int BAD_ARGUMENTS;
}
@ -36,18 +37,6 @@ String backQuoteIfNeed(const String & x)
}
void IAST::writeAlias(const String & name, std::ostream & s, bool hilite) const
{
s << (hilite ? hilite_keyword : "") << " AS " << (hilite ? hilite_alias : "");
WriteBufferFromOStream wb(s, 32);
writeProbablyBackQuotedString(name, wb);
wb.next();
s << (hilite ? hilite_none : "");
}
size_t IAST::checkSize(size_t max_size) const
{
size_t res = 1;
@ -109,4 +98,36 @@ String IAST::getColumnName() const
return write_buffer.str();
}
void IAST::FormatSettings::writeIdentifier(const String & name, WriteBuffer & out) const
{
switch (identifier_quoting_style)
{
case IdentifierQuotingStyle::None:
{
if (always_quote_identifiers)
throw Exception("Incompatible arguments: always_quote_identifiers = true && identifier_quoting_style == IdentifierQuotingStyle::None",
ErrorCodes::BAD_ARGUMENTS);
writeString(name, out);
break;
}
case IdentifierQuotingStyle::Backticks:
{
if (always_quote_identifiers)
writeBackQuotedString(name, out);
else
writeProbablyBackQuotedString(name, out);
break;
}
case IdentifierQuotingStyle::DoubleQuotes:
{
if (always_quote_identifiers)
writeDoubleQuotedString(name, out);
else
writeProbablyDoubleQuotedString(name, out);
break;
}
}
}
}

View File

@ -7,6 +7,7 @@
#include <Core/Types.h>
#include <Common/Exception.h>
#include <Parsers/StringRange.h>
#include <Parsers/IdentifierQuotingStyle.h>
class SipHash;
@ -150,16 +151,20 @@ public:
struct FormatSettings
{
std::ostream & ostr;
bool hilite;
bool hilite = false;
bool one_line;
bool always_quote_identifiers = false;
IdentifierQuotingStyle identifier_quoting_style = IdentifierQuotingStyle::Backticks;
char nl_or_ws;
FormatSettings(std::ostream & ostr_, bool hilite_, bool one_line_)
: ostr(ostr_), hilite(hilite_), one_line(one_line_)
FormatSettings(std::ostream & ostr_, bool one_line_)
: ostr(ostr_), one_line(one_line_)
{
nl_or_ws = one_line ? ' ' : '\n';
}
void writeIdentifier(const String & name, WriteBuffer & out) const;
};
/// State. For example, a set of nodes can be remembered, which we already walk through.
@ -194,8 +199,6 @@ public:
ErrorCodes::UNKNOWN_ELEMENT_IN_AST);
}
void writeAlias(const String & name, std::ostream & s, bool hilite) const;
void cloneChildren();
public:

View File

@ -0,0 +1,16 @@
#pragma once
namespace DB
{
/// Method to quote identifiers.
/// NOTE There could be differences in escaping rules inside quotes. Escaping rules may not match that required by specific external DBMS.
enum class IdentifierQuotingStyle
{
None, /// Write as-is, without quotes.
Backticks, /// `mysql` style
DoubleQuotes /// "postgres" style
};
}

View File

@ -7,7 +7,9 @@ namespace DB
void formatAST(const IAST & ast, std::ostream & s, bool hilite, bool one_line)
{
IAST::FormatSettings settings(s, hilite, one_line);
IAST::FormatSettings settings(s, one_line);
settings.hilite = hilite;
ast.format(settings);
}

View File

@ -12,6 +12,7 @@ namespace DB
*/
void formatAST(const IAST & ast, std::ostream & s, bool hilite = true, bool one_line = false);
inline std::ostream & operator<<(std::ostream & os, const IAST & ast)
{
formatAST(ast, os, false, true);

View File

@ -69,10 +69,24 @@ Block ITableDeclaration::getSampleBlockForColumns(const Names & column_names) co
{
Block res;
NamesAndTypesList all_columns = getColumns().getAll();
std::unordered_map<String, DataTypePtr> columns_map;
for (const auto & elem : all_columns)
columns_map.emplace(elem.name, elem.type);
for (const auto & name : column_names)
{
auto col = getColumn(name);
res.insert({ col.type->createColumn(), col.type, name });
auto it = columns_map.find(name);
if (it != columns_map.end())
{
res.insert({ it->second->createColumn(), it->second, it->first });
}
else
{
/// Virtual columns.
NameAndTypePair elem = getColumn(name);
res.insert({ elem.type->createColumn(), elem.type, elem.name });
}
}
return res;

View File

@ -22,6 +22,8 @@ public:
Block getSampleBlock() const;
Block getSampleBlockNonMaterialized() const;
/// Including virtual and alias columns.
Block getSampleBlockForColumns(const Names & column_names) const;
/** Verify that all the requested names are in the table and are set correctly.

View File

@ -57,7 +57,8 @@ BlockInputStreams StorageMySQL::read(
{
check(column_names);
processed_stage = QueryProcessingStage::FetchColumns;
String query = transformQueryForExternalDatabase(*query_info.query, getColumns().ordinary, remote_database_name, remote_table_name, context);
String query = transformQueryForExternalDatabase(
*query_info.query, getColumns().ordinary, IdentifierQuotingStyle::Backticks, remote_database_name, remote_table_name, context);
Block sample_block;
for (const String & name : column_names)

View File

@ -44,7 +44,7 @@ BlockInputStreams StorageODBC::read(
check(column_names);
processed_stage = QueryProcessingStage::FetchColumns;
String query = transformQueryForExternalDatabase(
*query_info.query, getColumns().ordinary, remote_database_name, remote_table_name, context);
*query_info.query, getColumns().ordinary, IdentifierQuotingStyle::DoubleQuotes, remote_database_name, remote_table_name, context);
Block sample_block;
for (const String & name : column_names)

View File

@ -1,3 +1,4 @@
#include <sstream>
#include <Common/typeid_cast.h>
#include <Parsers/IAST.h>
#include <Parsers/ASTFunction.h>
@ -5,7 +6,6 @@
#include <Parsers/ASTLiteral.h>
#include <Parsers/ASTSelectQuery.h>
#include <Parsers/ASTExpressionList.h>
#include <Parsers/queryToString.h>
#include <Interpreters/ExpressionAnalyzer.h>
#include <Storages/transformQueryForExternalDatabase.h>
@ -57,6 +57,7 @@ static bool isCompatible(const IAST & node)
String transformQueryForExternalDatabase(
const IAST & query,
const NamesAndTypesList & available_columns,
IdentifierQuotingStyle identifier_quoting_style,
const String & database,
const String & table,
const Context & context)
@ -105,7 +106,13 @@ String transformQueryForExternalDatabase(
}
}
return queryToString(select);
std::stringstream out;
IAST::FormatSettings settings(out, true);
settings.always_quote_identifiers = true;
settings.identifier_quoting_style = identifier_quoting_style;
select->format(settings);
return out.str();
}
}

View File

@ -2,6 +2,7 @@
#include <Core/Types.h>
#include <Core/NamesAndTypes.h>
#include <Parsers/IdentifierQuotingStyle.h>
namespace DB
@ -27,6 +28,7 @@ class Context;
String transformQueryForExternalDatabase(
const IAST & query,
const NamesAndTypesList & available_columns,
IdentifierQuotingStyle identifier_quoting_style,
const String & database,
const String & table,
const Context & context);

View File

@ -1 +1,3 @@
6
[1,1,1,2]
[1,1,2]

View File

@ -1,2 +1,5 @@
SELECT max(arrayJoin(arr)) FROM (SELECT arrayEnumerateUniq(groupArray(intDiv(number, 54321)) AS nums, groupArray(toString(intDiv(number, 98765)))) AS arr FROM (SELECT number FROM system.numbers LIMIT 1000000) GROUP BY intHash32(number) % 100000)
SELECT max(arrayJoin(arr)) FROM (SELECT arrayEnumerateUniq(groupArray(intDiv(number, 54321)) AS nums, groupArray(toString(intDiv(number, 98765)))) AS arr FROM (SELECT number FROM system.numbers LIMIT 1000000) GROUP BY intHash32(number) % 100000);
SELECT arrayEnumerateUniq([[1], [2], [34], [1]]);
SELECT arrayEnumerateUniq([(1, 2), (3, 4), (1, 2)]);

View File

@ -0,0 +1,12 @@
[1,2,3]
[1,2,3]
[1,2,5]
['1212','sef','343r4']
['1212','sef','343r4']
['1212','sef','343r4','232']
[1,2,3]
[21]
['a','b','c']
['123']
[['1212'],['sef'],['343r4']]
[(1,2),(1,3),(1,5)]

View File

@ -0,0 +1,21 @@
USE test;
SELECT arrayDistinct([1, 2, 3]);
SELECT arrayDistinct([1, 2, 3, 2, 2]);
SELECT arrayDistinct([1, 2, NULL, 5, 2, NULL]);
SELECT arrayDistinct(['1212', 'sef', '343r4']);
SELECT arrayDistinct(['1212', 'sef', '343r4', '1212']);
SELECT arrayDistinct(['1212', 'sef', '343r4', NULL, NULL, '232']);
DROP TABLE IF EXISTS arrayDistinct_test;
CREATE TABLE arrayDistinct_test(arr_int Array(UInt8), arr_string Array(String)) ENGINE=Memory;
INSERT INTO arrayDistinct_test values ([1, 2, 3], ['a', 'b', 'c']), ([21, 21, 21, 21], ['123', '123', '123']);
SELECT arrayDistinct(arr_int) FROM arrayDistinct_test;
SELECT arrayDistinct(arr_string) FROM arrayDistinct_test;
DROP TABLE arrayDistinct_test;
SELECT arrayDistinct([['1212'], ['sef'], ['343r4'], ['1212']]);
SELECT arrayDistinct([(1, 2), (1, 3), (1, 2), (1, 2), (1, 2), (1, 5)]);

View File

@ -0,0 +1,12 @@
DROP TABLE IF EXISTS test.mergetree;
CREATE TABLE test.mergetree (x UInt64) ENGINE = MergeTree ORDER BY x;
INSERT INTO test.mergetree VALUES (1);
SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT * FROM test.mergetree WHERE x IN (SELECT * FROM numbers(10000000))))))))))));
SET force_primary_key = 1;
SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT * FROM test.mergetree WHERE x IN (SELECT * FROM numbers(10000000))))))))))));
DROP TABLE test.mergetree;

4
debian/changelog vendored
View File

@ -1,5 +1,5 @@
clickhouse (1.1.54398) unstable; urgency=low
clickhouse (18.1.0) unstable; urgency=low
* Modified source code
-- <robot-metrika-test@yandex-team.ru> Tue, 17 Jul 2018 20:04:13 +0300
-- <robot-metrika-test@yandex-team.ru> Fri, 20 Jul 2018 04:00:20 +0300

View File

@ -2,7 +2,7 @@
Basically ClickHouse uses "documentation as code" approach, so you can edit Markdown files in this folder from GitHub web interface or fork ClickHouse repository, edit, commit, push and open pull request.
At the moment documentation is bilingual in English and Russian, so it's better to try keeping languages in sync if you can, but it's not strictly required as there are people watching over this. If you add new article, you should also add it to `mkdocs_{en,ru}.yaml` file with pages index.
At the moment documentation is bilingual in English and Russian, so it's better to try keeping languages in sync if you can, but it's not strictly required as there are people watching over this. If you add new article, you should also add it to `toc_{en,ru}.yaml` file with pages index.
Master branch is then asynchronously published to ClickHouse official website:

1
docs/en/changelog.md Symbolic link
View File

@ -0,0 +1 @@
../../CHANGELOG.md

View File

@ -66,5 +66,5 @@ SELECT 0 / 0
└──────────────┘
```
See the rules for ` NaN` sorting in the section [ORDER BY clause](../query_language/queries.md#query_language-queries-order_by).
See the rules for ` NaN` sorting in the section [ORDER BY clause](../query_language/select.md#query_language-queries-order_by).

View File

@ -41,5 +41,45 @@ cd ..
## Caveats
If you intend to run clickhouse-server, make sure to increase the system's maxfiles variable. See [MacOS.md](https://github.com/yandex/ClickHouse/blob/master/MacOS.md) for more details.
If you intend to run clickhouse-server, make sure to increase the system's maxfiles variable.
<div class="admonition info">
Note: you'll need to use sudo.
</div>
To do so, create the following file:
/Library/LaunchDaemons/limit.maxfiles.plist:
``` xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>limit.maxfiles</string>
<key>ProgramArguments</key>
<array>
<string>launchctl</string>
<string>limit</string>
<string>maxfiles</string>
<string>524288</string>
<string>524288</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>ServiceIPC</key>
<false/>
</dict>
</plist>
```
Execute the following command:
``` bash
$ sudo chown root:wheel /Library/LaunchDaemons/limit.maxfiles.plist
```
Reboot.
To check if it's working, you can use `ulimit -n` command.

View File

@ -1,32 +1,253 @@
# How to run ClickHouse tests
# ClickHouse Testing
The `clickhouse-test` utility that is used for functional testing is written using Python 2.x.It also requires you to have some third-party packages:
```bash
$ pip install lxml termcolor
## Functional Tests
Functional tests are the most simple and convenient to use. Most of ClickHouse features can be tested with functional tests and they are mandatory to use for every change in ClickHouse code that can be tested that way.
Each functional test sends one or multiple queries to the running ClickHouse server and compares the result with reference.
Tests are located in `dbms/src/tests/queries` directory. There are two subdirectories: `stateless` and `stateful`. Stateless tests run queries without any preloaded test data - they often create small synthetic datasets on the fly, within the test itself. Stateful tests require preloaded test data from Yandex.Metrica and not available to general public. We tend to use only `stateless` tests and avoid adding new `stateful` tests.
Each test can be one of two types: `.sql` and `.sh`. `.sql` test is the simple SQL script that is piped to `clickhouse-client --multiquery`. `.sh` test is a script that is run by itself.
To run all tests, use `dbms/tests/clickhouse-test` tool. Look `--help` for the list of possible options. You can simply run all tests or run subset of tests filtered by substring in test name: `./clickhouse-test substring`.
The most simple way to invoke functional tests is to copy `clickhouse-client` to `/usr/bin/`, run `clickhouse-server` and then run `./clickhouse-test` from its own directory.
To add new test, create a `.sql` or `.sh` file in `dbms/src/tests/queries/0_stateless` directory, check it manually and then generate `.reference` file in the following way: `clickhouse-client -n < 00000_test.sql > 00000_test.reference` or `./00000_test.sh > ./00000_test.reference`.
Tests should use (create, drop, etc) only tables in `test` database that is assumed to be created beforehand; also tests can use temporary tables.
If you want to use distributed queries in functional tests, you can leverage `remote` table function with `127.0.0.{1..2}` addresses for the server to query itself; or you can use predefined test clusters in server configuration file like `test_shard_localhost`.
Some tests are marked with `zookeeper`, `shard` or `long` in their names. `zookeeper` is for tests that are using ZooKeeper; `shard` is for tests that requires server to listen `127.0.0.*`; `long` is for tests that run slightly longer that one second.
## Integration Tests
Integration tests allow to test ClickHouse in clustered configuration and ClickHouse interaction with other servers like MySQL, Postgres, MongoDB. They are useful to emulate network splits, packet drops, etc. These tests are run under Docker and create multiple containers with various software.
See `dbms/tests/integration/README.md` on how to run these tests.
Note that integration of ClickHouse with third-party drivers is not tested. Also we currently don't have integration tests with our JDBC and ODBC drivers.
We don't have integration tests for `Kafka` table engine that is developed by community - this is one of the most anticipated tests (otherwise there is almost no way to be confident with `Kafka` tables).
## Unit Tests
Unit tests are useful when you want to test not the ClickHouse as a whole, but a single isolated library or class. You can enable or disable build of tests with `ENABLE_TESTS` CMake option. Unit tests (and other test programs) are located in `tests` subdirectories across the code. To run unit tests, type `ninja test`. Some tests use `gtest`, but some are just programs that return non-zero exit code on test failure.
It's not necessarily to have unit tests if the code is already covered by functional tests (and functional tests are usually much more simple to use).
## Performance Tests
Performance tests allow to measure and compare performance of some isolated part of ClickHouse on synthetic queries. Tests are located at `dbms/tests/performance`. Each test is represented by `.xml` file with description of test case. Tests are run with `clickhouse performance-test` tool (that is embedded in `clickhouse` binary). See `--help` for invocation.
Each test run one or miltiple queries (possibly with combinations of parameters) in a loop with some conditions for stop (like "maximum execution speed is not changing in three seconds") and measure some metrics about query performance (like "maximum execution speed"). Some tests can contain preconditions on preloaded test dataset.
If you want to improve performance of ClickHouse in some scenario, and if improvements can be observed on simple queries, it is highly recommended to write a performance test. It always makes sense to use `perf top` or other perf tools during your tests.
Performance tests are not run on per-commit basis. Results of performance tests are not collected and we compare them manually.
## Test Tools And Scripts
Some programs in `tests` directory are not prepared tests, but are test tools. For example, for `Lexer` there is a tool `dbms/src/Parsers/tests/lexer` that just do tokenization of stdin and writes colorized result to stdout. You can use these kind of tools as a code examples and for exploration and manual testing.
You can also place pair of files `.sh` and `.reference` along with the tool to run it on some predefined input - then script result can be compared to `.reference` file. There kind of tests are not automated.
## Miscellanous Tests
There are tests for external dictionaries located at `dbms/tests/external_dictionaries` and for machine learned models in `dbms/tests/external_models`. These tests are not updated and must be transferred to integration tests.
There is separate test for quorum inserts. This test run ClickHouse cluster on separate servers and emulate various failure cases: network split, packet drop (between ClickHouse nodes, between ClickHouse and ZooKeeper, between ClickHouse server and client, etc.), `kill -9`, `kill -STOP` and `kill -CONT` , like [Jepsen](https://aphyr.com/tags/Jepsen). Then the test checks that all acknowledged inserts was written and all rejected inserts was not.
Quorum test was written by separate team before ClickHouse was open-sourced. This team no longer work with ClickHouse. Test was accidentially written in Java. For these reasons, quorum test must be rewritten and moved to integration tests.
## Manual Testing
When you develop a new feature, it is reasonable to also test it manually. You can do it with the following steps:
Build ClickHouse. Run ClickHouse from the terminal: change directory to `dbms/src/programs/clickhouse-server` and run it with `./clickhouse-server`. It will use configuration (`config.xml`, `users.xml` and files within `config.d` and `users.d` directories) from the current directory by default. To connect to ClickHouse server, run `dbms/src/programs/clickhouse-client/clickhouse-client`.
Note that all clickhouse tools (server, client, etc) are just symlinks to a single binary named `clickhouse`. You can find this binary at `dbms/src/programs/clickhouse`. All tools can also be invoked as `clickhouse tool` instead of `clickhouse-tool`.
Alternatively you can install ClickHouse package: either stable release from Yandex repository or you can build package for yourself with `./release` in ClickHouse sources root. Then start the server with `sudo service clickhouse-server start` (or stop to stop the server). Look for logs at `/etc/clickhouse-server/clickhouse-server.log`.
When ClickHouse is already installed on your system, you can build a new `clickhouse` binary and replace the existing binary:
```
sudo service clickhouse-server stop
sudo cp ./clickhouse /usr/bin/
sudo service clickhouse-server start
```
In a nutshell:
- Put the `clickhouse` program to `/usr/bin` (or `PATH`)
- Create a `clickhouse-client` symlink in `/usr/bin` pointing to `clickhouse`
- Start the `clickhouse` server
- `cd dbms/tests/`
- Run `./clickhouse-test`
## Example usage
Run `./clickhouse-test --help` to see available options.
To run tests without having to create a symlink or mess with `PATH`:
```bash
./clickhouse-test -c "../../build/dbms/programs/clickhouse --client"
Also you can stop system clickhouse-server and run your own with the same configuration but with logging to terminal:
```
sudo service clickhouse-server stop
sudo -u clickhouse /usr/bin/clickhouse server --config-file /etc/clickhouse-server/config.xml
```
To run a single test, i.e. `00395_nullable`:
```bash
./clickhouse-test 00395
Example with gdb:
```
sudo -u clickhouse gdb --args /usr/bin/clickhouse server --config-file /etc/clickhouse-server/config.xml
```
If the system clickhouse-server is already running and you don't want to stop it, you can change port numbers in your `config.xml` (or override them in a file in `config.d` directory), provide appropriate data path, and run it.
`clickhouse` binary has almost no dependencies and works across wide range of Linux distributions. To quick and dirty test your changes on a server, you can simply `scp` your fresh built `clickhouse` binary to your server and then run it as in examples above.
## Testing Environment
Before publishing release as stable we deploy it on testing environment. Testing environment is a cluster that process 1/39 part of [Yandex.Metrica](https://metrica.yandex.com/) data. We share our testing environment with Yandex.Metrica team. ClickHouse is upgraded without downtime on top of existing data. We look at first that data is processed successfully without lagging from realtime, the replication continue to work and there is no issues visible to Yandex.Metrica team. First check can be done in the following way:
```
SELECT hostName() AS h, any(version()), any(uptime()), max(UTCEventTime), count() FROM remote('example01-01-{1..3}t', merge, hits) WHERE EventDate >= today() - 2 GROUP BY h ORDER BY h;
```
In some cases we also deploy to testing environment of our friend teams in Yandex: Market, Cloud, etc. Also we have some hardware servers that are used for development purposes.
## Load Testing
After deploying to testing environment we run load testing with queries from production cluster. This is done manually.
Make sure you have enabled `query_log` on your production cluster.
Collect query log for a day or more:
```
clickhouse-client --query="SELECT DISTINCT query FROM system.query_log WHERE event_date = today() AND query LIKE '%ym:%' AND query NOT LIKE '%system.query_log%' AND type = 2 AND is_initial_query" > queries.tsv
```
This is a way complicated example. `type = 2` will filter queries that are executed successfully. `query LIKE '%ym:%'` is to select relevant queries from Yandex.Metrica. `is_initial_query` is to select only queries that are initiated by client, not by ClickHouse itself (as parts of distributed query processing).
`scp` this log to your testing cluster and run it as following:
```
clickhouse benchmark --concurrency 16 < queries.tsv
```
(probably you also want to specify a `--user`)
Then leave it for a night or weekend and go take a rest.
You should check that `clickhouse-server` doesn't crash, memory footprint is bounded and performance not degrading over time.
Precise query execution timings are not recorded and not compared due to high variability of queries and environment.
## Build Tests
Build tests allow to check that build is not broken on various alternative configurations and on some foreign systems. Tests are located at `ci` directory. They run build from source inside Docker, Vagrant, and sometimes with `qemu-user-static` inside Docker. These tests are under development and test runs are not automated.
Motivation:
Normally we release and run all tests on a single variant of ClickHouse build. But there are alternative build variants that are not thoroughly tested. Examples:
- build on FreeBSD;
- build on Debian with libraries from system packages;
- build with shared linking of libraries;
- build on AArch64 platform.
For example, build with system packages is bad practice, because we cannot guarantee what exact version of packages a system will have. But this is really needed by Debian maintainers. For this reason we at least have to support this variant of build. Another example: shared linking is a common source of trouble, but it is needed for some enthusiasts.
Though we cannot run all tests on all variant of builds, we want to check at least that various build variants are not broken. For this purpose we use build tests.
## Testing For Protocol Compatibility
When we extend ClickHouse network protocol, we test manually that old clickhouse-client works with new clickhouse-server and new clickhouse-client works with old clickhouse-server (simply by running binaries from corresponding packages).
## Help From The Compiler
Main ClickHouse code (that is located in `dbms` directory) is built with `-Wall -Wextra -Werror` and with some additional enabled warnings. Although these options are not enabled for third-party libraries.
Clang has even more useful warnings - you can look for them with `-Weverything` and pick something to default build.
For production builds, gcc is used (it still generates slightly more efficient code than clang). For development, clang is usually more convenient to use. You can build on your own machine with debug mode (to save battery of your laptop), but please note that compiler is able to generate more warnings with `-O3` due to better control flow and inter-procedure analysis. When building with clang, `libc++` is used instead of `libstdc++` and when building with debug mode, debug version of `libc++` is used that allows to catch more errors at runtime.
## Sanitizers
**Address sanitizer**.
We run functional tests under ASan on per-commit basis.
**Valgrind (Memcheck)**.
We run functional tests under Valgrind overnight. It takes multiple hours. Currently there is one known false positive in `re2` library, see [this article](https://research.swtch.com/sparse).
**Thread sanitizer**.
We run functional tests under TSan. ClickHouse must pass all tests. Run under TSan is not automated and performed only occasionally.
**Memory sanitizer**.
Currently we still don't use MSan.
**Undefined behaviour sanitizer.**
We still don't use UBSan. The only thing to fix is unaligned placement of structs in Arena during aggregation. This is totally fine, we only have to force alignment under UBSan.
**Debug allocator.**
You can enable debug version of `tcmalloc` with `DEBUG_TCMALLOC` CMake option. We run tests with debug allocator on per-commit basis.
You will find some additional details in `dbms/tests/instructions/sanitizers.txt`.
## Fuzzing
As of July 2018 we don't use fuzzing.
## Security Audit
People from Yandex Cloud department do some basic overview of ClickHouse capabilities from the security standpoint.
## Static Analyzers
We use static analyzers only occasionally. We have evaluated `clang-tidy`, `Coverity`, `cppcheck`, `PVS-Studio`, `tscancode`. You will find instructions for usage in `dbms/tests/instructions/` directory. Also you can read [the article in russian](https://habr.com/company/yandex/blog/342018/).
If you use `CLion` as an IDE, you can leverage some `clang-tidy` checks out of the box.
## Hardening
`FORTIFY_SOURCE` is used by default. It is almost useless, but still makes sense in rare cases and we don't disable it.
## Code Style
Code style rules are described [here](https://clickhouse.yandex/docs/en/development/style/).
To check for some common style violations, you can use `utils/check-style` script.
To force proper style of your code, you can use `clang-format`. File `.clang-format` is located at the sources root. It mostly corresponding with our actual code style. But it's not recommended to apply `clang-format` to existing files because it makes formatting worse. You can use `clang-format-diff` tool that you can find in clang source repository.
Alternatively you can try `uncrustify` tool to reformat your code. Configuration is in `uncrustify.cfg` in the sources root. It is less tested than `clang-format`.
`CLion` has its own code formatter that has to be tuned for our code style.
## Metrica B2B Tests
Each ClickHouse release is tested with Yandex Metrica and AppMetrica engines. Testing and stable versions of ClickHouse are deployed on VMs and run with a small copy of Metrica engine that is processing fixed sample of input data. Then results of two instances of Metrica engine are compared together.
These tests are automated by separate team. Due to high number of moving parts, tests are fail most of the time by completely unrelated reasons, that are very difficult to figure out. Most likely these tests have negative value for us. Nevertheless these tests was proved to be useful in about one or two times out of hundreds.
## Test Coverage
As of July 2018 we don't track test coverage.
## Test Automation
We run tests with Travis CI (available for general public) and Jenkins (available inside Yandex).
In Travis CI due to limit on time and computational power we can afford only subset of functional tests that are run with limited build of ClickHouse (debug version with cut off most of libraries). In about half of runs it still fails to finish in 50 minutes timeout. The only advantage - test results are visible for all external contributors.
In Jenkins we run functional tests for each commit and for each pull request from trusted users; the same under ASan; we also run quorum tests, dictionary tests, Metrica B2B tests. We use Jenkins to prepare and publish releases. Worth to note that we are not happy with Jenkins at all.
One of our goals is to provide reliable testing infrastructure that will be available to community.

View File

@ -1,26 +0,0 @@
<a name="format_capnproto"></a>
# CapnProto
Cap'n Proto is a binary message format similar to Protocol Buffers and Thrift, but not like JSON or MessagePack.
Cap'n Proto messages are strictly typed and not self-describing, meaning they need an external schema description. The schema is applied on the fly and cached for each query.
```sql
SELECT SearchPhrase, count() AS c FROM test.hits
GROUP BY SearchPhrase FORMAT CapnProto SETTINGS schema = 'schema:Message'
```
Where `schema.capnp` looks like this:
```
struct Message {
SearchPhrase @0 :Text;
c @1 :Uint64;
}
```
Schema files are in the file that is located in the directory specified in [ format_schema_path](../operations/server_settings/settings.md#server_settings-format_schema_path) in the server configuration.
Deserialization is effective and usually doesn't increase the system load.

View File

@ -1,12 +0,0 @@
# CSV
Comma Separated Values format ([RFC](https://tools.ietf.org/html/rfc4180)).
When formatting, rows are enclosed in double quotes. A double quote inside a string is output as two double quotes in a row. There are no other rules for escaping characters. Date and date-time are enclosed in double quotes. Numbers are output without quotes. Values are separated by a delimiter&ast;. Rows are separated using the Unix line feed (LF). Arrays are serialized in CSV as follows: first the array is serialized to a string as in TabSeparated format, and then the resulting string is output to CSV in double quotes. Tuples in CSV format are serialized as separate columns (that is, their nesting in the tuple is lost).
&ast;By default — `,`. See a [format_csv_delimiter](/docs/en/operations/settings/settings/#format_csv_delimiter) setting for additional info.
When parsing, all values can be parsed either with or without quotes. Both double and single quotes are supported. Rows can also be arranged without quotes. In this case, they are parsed up to a delimiter or line feed (CR or LF). In violation of the RFC, when parsing rows without quotes, the leading and trailing spaces and tabs are ignored. For the line feed, Unix (LF), Windows (CR LF) and Mac OS Classic (CR LF) are all supported.
The CSV format supports the output of totals and extremes the same way as `TabSeparated`.

View File

@ -1,4 +0,0 @@
# CSVWithNames
Also prints the header row, similar to `TabSeparatedWithNames`.

View File

@ -1,6 +0,0 @@
<a name="formats"></a>
# Formats
The format determines how data is returned to you after SELECTs (how it is written and formatted by the server), and how it is accepted for INSERTs (how it is read and parsed by the server).

View File

@ -1,85 +0,0 @@
# JSON
Outputs data in JSON format. Besides data tables, it also outputs column names and types, along with some additional information: the total number of output rows, and the number of rows that could have been output if there weren't a LIMIT. Example:
```sql
SELECT SearchPhrase, count() AS c FROM test.hits GROUP BY SearchPhrase WITH TOTALS ORDER BY c DESC LIMIT 5 FORMAT JSON
```
```json
{
"meta":
[
{
"name": "SearchPhrase",
"type": "String"
},
{
"name": "c",
"type": "UInt64"
}
],
"data":
[
{
"SearchPhrase": "",
"c": "8267016"
},
{
"SearchPhrase": "bathroom interior design",
"c": "2166"
},
{
"SearchPhrase": "yandex",
"c": "1655"
},
{
"SearchPhrase": "spring 2014 fashion",
"c": "1549"
},
{
"SearchPhrase": "freeform photos",
"c": "1480"
}
],
"totals":
{
"SearchPhrase": "",
"c": "8873898"
},
"extremes":
{
"min":
{
"SearchPhrase": "",
"c": "1480"
},
"max":
{
"SearchPhrase": "",
"c": "8267016"
}
},
"rows": 5,
"rows_before_limit_at_least": 141137
}
```
The JSON is compatible with JavaScript. To ensure this, some characters are additionally escaped: the slash ` /` is escaped as ` \/`; alternative line breaks ` U+2028` and ` U+2029`, which break some browsers, are escaped as ` \uXXXX`. ASCII control characters are escaped: backspace, form feed, line feed, carriage return, and horizontal tab are replaced with `\b`, `\f`, `\n`, `\r`, `\t` , as well as the remaining bytes in the 00-1F range using `\uXXXX` sequences. Invalid UTF-8 sequences are changed to the replacement character <20> so the output text will consist of valid UTF-8 sequences. For compatibility with JavaScript, Int64 and UInt64 integers are enclosed in double quotes by default. To remove the quotes, you can set the configuration parameter output_format_json_quote_64bit_integers to 0.
`rows` The total number of output rows.
`rows_before_limit_at_least` The minimal number of rows there would have been without LIMIT. Output only if the query contains LIMIT.
If the query contains GROUP BY, rows_before_limit_at_least is the exact number of rows there would have been without a LIMIT.
`totals` Total values (when using WITH TOTALS).
`extremes` Extreme values (when extremes is set to 1).
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
See also the JSONEachRow format.

View File

@ -1,46 +0,0 @@
# JSONCompact
Differs from JSON only in that data rows are output in arrays, not in objects.
Example:
```json
{
"meta":
[
{
"name": "SearchPhrase",
"type": "String"
},
{
"name": "c",
"type": "UInt64"
}
],
"data":
[
["", "8267016"],
["bathroom interior design", "2166"],
["yandex", "1655"],
["spring 2014 fashion", "1549"],
["freeform photos", "1480"]
],
"totals": ["","8873898"],
"extremes":
{
"min": ["","1480"],
"max": ["","8267016"]
},
"rows": 5,
"rows_before_limit_at_least": 141137
}
```
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
See also the `JSONEachRow` format.

View File

@ -1,21 +0,0 @@
# JSONEachRow
Outputs data as separate JSON objects for each row (newline delimited JSON).
```json
{"SearchPhrase":"","count()":"8267016"}
{"SearchPhrase":"bathroom interior design","count()":"2166"}
{"SearchPhrase":"yandex","count()":"1655"}
{"SearchPhrase":"spring 2014 fashion","count()":"1549"}
{"SearchPhrase":"freeform photo","count()":"1480"}
{"SearchPhrase":"angelina jolie","count()":"1245"}
{"SearchPhrase":"omsk","count()":"1112"}
{"SearchPhrase":"photos of dog breeds","count()":"1091"}
{"SearchPhrase":"curtain design","count()":"1064"}
{"SearchPhrase":"baku","count()":"1000"}
```
Unlike the JSON format, there is no substitution of invalid UTF-8 sequences. Any set of bytes can be output in the rows. This is necessary so that data can be formatted without losing any information. Values are escaped in the same way as for JSON.
For parsing, any order is supported for the values of different columns. It is acceptable for some values to be omitted they are treated as equal to their default values. In this case, zeros and blank rows are used as default values. Complex values that could be specified in the table are not supported as defaults. Whitespace between elements is ignored. If a comma is placed after the objects, it is ignored. Objects don't necessarily have to be separated by new lines.

View File

@ -1,6 +0,0 @@
# Native
The most efficient format. Data is written and read by blocks in binary format. For each block, the number of rows, number of columns, column names and types, and parts of columns in this block are recorded one after another. In other words, this format is "columnar" it doesn't convert columns to rows. This is the format used in the native interface for interaction between servers, for using the command-line client, and for C++ clients.
You can use this format to quickly generate dumps that can only be read by the ClickHouse DBMS. It doesn't make sense to work with this format yourself.

View File

@ -1,5 +0,0 @@
# Null
Nothing is output. However, the query is processed, and when using the command-line client, data is transmitted to the client. This is used for tests, including productivity testing.
Obviously, this format is only appropriate for output, not for parsing.

View File

@ -1,37 +0,0 @@
# Pretty
Outputs data as Unicode-art tables, also using ANSI-escape sequences for setting colors in the terminal.
A full grid of the table is drawn, and each row occupies two lines in the terminal.
Each result block is output as a separate table. This is necessary so that blocks can be output without buffering results (buffering would be necessary in order to pre-calculate the visible width of all the values).
To avoid dumping too much data to the terminal, only the first 10,000 rows are printed. If the number of rows is greater than or equal to 10,000, the message "Showed first 10 000" is printed.
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
The Pretty format supports outputting total values (when using WITH TOTALS) and extremes (when 'extremes' is set to 1). In these cases, total values and extreme values are output after the main data, in separate tables. Example (shown for the PrettyCompact format):
```sql
SELECT EventDate, count() AS c FROM test.hits GROUP BY EventDate WITH TOTALS ORDER BY EventDate FORMAT PrettyCompact
```
```text
┌──EventDate─┬───────c─┐
│ 2014-03-17 │ 1406958 │
│ 2014-03-18 │ 1383658 │
│ 2014-03-19 │ 1405797 │
│ 2014-03-20 │ 1353623 │
│ 2014-03-21 │ 1245779 │
│ 2014-03-22 │ 1031592 │
│ 2014-03-23 │ 1046491 │
└────────────┴─────────┘
Totals:
┌──EventDate─┬───────c─┐
│ 0000-00-00 │ 8873898 │
└────────────┴─────────┘
Extremes:
┌──EventDate─┬───────c─┐
│ 2014-03-17 │ 1031592 │
│ 2014-03-23 │ 1406958 │
└────────────┴─────────┘
```

View File

@ -1,5 +0,0 @@
# PrettyCompact
Differs from `Pretty` in that the grid is drawn between rows and the result is more compact.
This format is used by default in the command-line client in interactive mode.

View File

@ -1,4 +0,0 @@
# PrettyCompactMonoBlock
Differs from `PrettyCompact` in that up to 10,000 rows are buffered, then output as a single table, not by blocks.

View File

@ -1,20 +0,0 @@
# PrettyNoEscapes
Differs from Pretty in that ANSI-escape sequences aren't used. This is necessary for displaying this format in a browser, as well as for using the 'watch' command-line utility.
Example:
```bash
watch -n1 "clickhouse-client --query='SELECT * FROM system.events FORMAT PrettyCompactNoEscapes'"
```
You can use the HTTP interface for displaying in the browser.
## PrettyCompactNoEscapes
The same as the previous setting.
## PrettySpaceNoEscapes
The same as the previous setting.

View File

@ -1,4 +0,0 @@
# PrettySpace
Differs from `PrettyCompact` in that whitespace (space characters) is used instead of the grid.

View File

@ -1,13 +0,0 @@
# RowBinary
Formats and parses data by row in binary format. Rows and values are listed consecutively, without separators.
This format is less efficient than the Native format, since it is row-based.
Integers use fixed-length little endian representation. For example, UInt64 uses 8 bytes.
DateTime is represented as UInt32 containing the Unix timestamp as the value.
Date is represented as a UInt16 object that contains the number of days since 1970-01-01 as the value.
String is represented as a varint length (unsigned [LEB128](https://en.wikipedia.org/wiki/LEB128)), followed by the bytes of the string.
FixedString is represented simply as a sequence of bytes.
Array is represented as a varint length (unsigned [LEB128](https://en.wikipedia.org/wiki/LEB128)), followed by successive elements of the array.

View File

@ -1,59 +0,0 @@
# TabSeparated
In TabSeparated format, data is written by row. Each row contains values separated by tabs. Each value is follow by a tab, except the last value in the row, which is followed by a line feed. Strictly Unix line feeds are assumed everywhere. The last row also must contain a line feed at the end. Values are written in text format, without enclosing quotation marks, and with special characters escaped.
Integer numbers are written in decimal form. Numbers can contain an extra "+" character at the beginning (ignored when parsing, and not recorded when formatting). Non-negative numbers can't contain the negative sign. When reading, it is allowed to parse an empty string as a zero, or (for signed types) a string consisting of just a minus sign as a zero. Numbers that do not fit into the corresponding data type may be parsed as a different number, without an error message.
Floating-point numbers are written in decimal form. The dot is used as the decimal separator. Exponential entries are supported, as are 'inf', '+inf', '-inf', and 'nan'. An entry of floating-point numbers may begin or end with a decimal point.
During formatting, accuracy may be lost on floating-point numbers.
During parsing, it is not strictly required to read the nearest machine-representable number.
Dates are written in YYYY-MM-DD format and parsed in the same format, but with any characters as separators.
Dates with times are written in the format YYYY-MM-DD hh:mm:ss and parsed in the same format, but with any characters as separators.
This all occurs in the system time zone at the time the client or server starts (depending on which one formats data). For dates with times, daylight saving time is not specified. So if a dump has times during daylight saving time, the dump does not unequivocally match the data, and parsing will select one of the two times.
During a read operation, incorrect dates and dates with times can be parsed with natural overflow or as null dates and times, without an error message.
As an exception, parsing dates with times is also supported in Unix timestamp format, if it consists of exactly 10 decimal digits. The result is not time zone-dependent. The formats YYYY-MM-DD hh:mm:ss and NNNNNNNNNN are differentiated automatically.
Strings are output with backslash-escaped special characters. The following escape sequences are used for output: `\b`, `\f`, `\r`, `\n`, `\t`, `\0`, `\'`, `\\`. Parsing also supports the sequences `\a`, `\v`, and `\xHH` (hex escape sequences) and any `\c` sequences, where `c` is any character (these sequences are converted to `c`). Thus, reading data supports formats where a line feed can be written as `\n` or `\`, or as a line feed. For example, the string `Hello world` with a line feed between the words instead of a space can be parsed in any of the following variations:
```text
Hello\nworld
Hello\
world
```
The second variant is supported because MySQL uses it when writing tab-separated dumps.
The minimum set of characters that you need to escape when passing data in TabSeparated format: tab, line feed (LF) and backslash.
Only a small set of symbols are escaped. You can easily stumble onto a string value that your terminal will ruin in output.
Arrays are written as a list of comma-separated values in square brackets. Number items in the array are fomratted as normally, but dates, dates with times, and strings are written in single quotes with the same escaping rules as above.
The TabSeparated format is convenient for processing data using custom programs and scripts. It is used by default in the HTTP interface, and in the command-line client's batch mode. This format also allows transferring data between different DBMSs. For example, you can get a dump from MySQL and upload it to ClickHouse, or vice versa.
The TabSeparated format supports outputting total values (when using WITH TOTALS) and extreme values (when 'extremes' is set to 1). In these cases, the total values and extremes are output after the main data. The main result, total values, and extremes are separated from each other by an empty line. Example:
```sql
SELECT EventDate, count() AS c FROM test.hits GROUP BY EventDate WITH TOTALS ORDER BY EventDate FORMAT TabSeparated``
```
```text
2014-03-17 1406958
2014-03-18 1383658
2014-03-19 1405797
2014-03-20 1353623
2014-03-21 1245779
2014-03-22 1031592
2014-03-23 1046491
0000-00-00 8873898
2014-03-17 1031592
2014-03-23 1406958
```
This format is also available under the name `TSV`.

View File

@ -1,7 +0,0 @@
# TabSeparatedRaw
Differs from `TabSeparated` format in that the rows are written without escaping.
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
This format is also available under the name ` TSVRaw`.

View File

@ -1,8 +0,0 @@
# TabSeparatedWithNames
Differs from the `TabSeparated` format in that the column names are written in the first row.
During parsing, the first row is completely ignored. You can't use column names to determine their position or to check their correctness.
(Support for parsing the header row may be added in the future.)
This format is also available under the name ` TSVWithNames`.

View File

@ -1,7 +0,0 @@
# TabSeparatedWithNamesAndTypes
Differs from the `TabSeparated` format in that the column names are written to the first row, while the column types are in the second row.
During parsing, the first and second rows are completely ignored.
This format is also available under the name ` TSVWithNamesAndTypes`.

View File

@ -1,23 +0,0 @@
# TSKV
Similar to TabSeparated, but outputs a value in name=value format. Names are escaped the same way as in TabSeparated format, and the = symbol is also escaped.
```text
SearchPhrase= count()=8267016
SearchPhrase=bathroom interior design count()=2166
SearchPhrase=yandex count()=1655
SearchPhrase=spring 2014 fashion count()=1549
SearchPhrase=freeform photos count()=1480
SearchPhrase=angelina jolia count()=1245
SearchPhrase=omsk count()=1112
SearchPhrase=photos of dog breeds count()=1091
SearchPhrase=curtain design count()=1064
SearchPhrase=baku count()=1000
```
When there is a large number of small columns, this format is ineffective, and there is generally no reason to use it. It is used in some departments of Yandex.
Both data output and parsing are supported in this format. For parsing, any order is supported for the values of different columns. It is acceptable for some values to be omitted they are treated as equal to their default values. In this case, zeros and blank rows are used as default values. Complex values that could be specified in the table are not supported as defaults.
Parsing allows the presence of the additional field `tskv` without the equal sign or a value. This field is ignored.

View File

@ -1,8 +0,0 @@
# Values
Prints every row in brackets. Rows are separated by commas. There is no comma after the last row. The values inside the brackets are also comma-separated. Numbers are output in decimal format without quotes. Arrays are output in square brackets. Strings, dates, and dates with times are output in quotes. Escaping rules and parsing are similar to the TabSeparated format. During formatting, extra spaces aren't inserted, but during parsing, they are allowed and skipped (except for spaces inside array values, which are not allowed).
The minimum set of characters that you need to escape when passing data in Values format: single quotes and backslashes.
This is the format that is used in `INSERT INTO t VALUES ...`, but you can also use it for formatting query results.

View File

@ -1,5 +0,0 @@
# Vertical
Prints each value on a separate line with the column name specified. This format is convenient for printing just one or a few rows, if each row consists of a large number of columns.
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).

View File

@ -1,28 +0,0 @@
# VerticalRaw
Differs from `Vertical` format in that the rows are not escaped.
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
Examples:
```
:) SHOW CREATE TABLE geonames FORMAT VerticalRaw;
Row 1:
──────
statement: CREATE TABLE default.geonames ( geonameid UInt32, date Date DEFAULT CAST('2017-12-08' AS Date)) ENGINE = MergeTree(date, geonameid, 8192)
:) SELECT 'string with \'quotes\' and \t with some special \n characters' AS test FORMAT VerticalRaw;
Row 1:
──────
test: string with 'quotes' and with some special
characters
```
Compare with the Vertical format:
```
:) SELECT 'string with \'quotes\' and \t with some special \n characters' AS test FORMAT Vertical;
Row 1:
──────
test: string with \'quotes\' and \t with some special \n characters
```

View File

@ -1,74 +0,0 @@
# XML
XML format is suitable only for output, not for parsing. Example:
```xml
<?xml version='1.0' encoding='UTF-8' ?>
<result>
<meta>
<columns>
<column>
<name>SearchPhrase</name>
<type>String</type>
</column>
<column>
<name>count()</name>
<type>UInt64</type>
</column>
</columns>
</meta>
<data>
<row>
<SearchPhrase></SearchPhrase>
<field>8267016</field>
</row>
<row>
<SearchPhrase>bathroom interior design</SearchPhrase>
<field>2166</field>
</row>
<row>
<SearchPhrase>yandex</SearchPhrase>
<field>1655</field>
</row>
<row>
<SearchPhrase>spring 2014 fashion</SearchPhrase>
<field>1549</field>
</row>
<row>
<SearchPhrase>freeform photos</SearchPhrase>
<field>1480</field>
</row>
<row>
<SearchPhrase>angelina jolie</SearchPhrase>
<field>1245</field>
</row>
<row>
<SearchPhrase>omsk</SearchPhrase>
<field>1112</field>
</row>
<row>
<SearchPhrase>photos of dog breeds</SearchPhrase>
<field>1091</field>
</row>
<row>
<SearchPhrase>curtain design</SearchPhrase>
<field>1064</field>
</row>
<row>
<SearchPhrase>baku</SearchPhrase>
<field>1000</field>
</row>
</data>
<rows>10</rows>
<rows_before_limit_at_least>141137</rows_before_limit_at_least>
</result>
```
If the column name does not have an acceptable format, just 'field' is used as the element name. In general, the XML structure follows the JSON structure.
Just as for JSON, invalid UTF-8 sequences are changed to the replacement character <20> so the output text will consist of valid UTF-8 sequences.
In string values, the characters `<` and `&` are escaped as `<` and `&`.
Arrays are output as `<array><elem>Hello</elem><elem>World</elem>...</array>`,
and tuples as `<tuple><elem>Hello</elem><elem>World</elem>...</tuple>`.

View File

@ -0,0 +1,541 @@
<a name="formats"></a>
# Input and output formats
The format determines how data is returned to you after SELECTs (how it is written and formatted by the server), and how it is accepted for INSERTs (how it is read and parsed by the server).
See the table below for the list of supported formats for either kinds of queries.
Format | INSERT | SELECT
-------|--------|--------
[TabSeparated](formats.md#tabseparated) | ✔ | ✔ |
[TabSeparatedRaw](formats.md#tabseparatedraw) | ✗ | ✔ |
[TabSeparatedWithNames](formats.md#tabseparatedwithnames) | ✔ | ✔ |
[TabSeparatedWithNamesAndTypes](formats.md#tabseparatedwithnamesandtypes) | ✔ | ✔ |
[CSV](formats.md#csv) | ✔ | ✔ |
[CSVWithNames](formats.md#csvwithnames) | ✔ | ✔ |
[Values](formats.md#values) | ✔ | ✔ |
[Vertical](formats.md#vertical) | ✗ | ✔ |
[VerticalRaw](formats.md#verticalraw) | ✗ | ✔ |
[JSON](formats.md#json) | ✗ | ✔ |
[JSONCompact](formats.md#jsoncompact) | ✗ | ✔ |
[JSONEachRow](formats.md#jsoneachrow) | ✔ | ✔ |
[TSKV](formats.md#tskv) | ✔ | ✔ |
[Pretty](formats.md#pretty) | ✗ | ✔ |
[PrettyCompact](formats.md#prettycompact) | ✗ | ✔ |
[PrettyCompactMonoBlock](formats.md#prettycompactmonoblock) | ✗ | ✔ |
[PrettyNoEscapes](formats.md#prettynoescapes) | ✗ | ✔ |
[PrettySpace](formats.md#prettyspace) | ✗ | ✔ |
[RowBinary](formats.md#rowbinary) | ✔ | ✔ |
[Native](formats.md#native) | ✔ | ✔ |
[Null](formats.md#null) | ✗ | ✔ |
[XML](formats.md#xml) | ✗ | ✔ |
[CapnProto](formats.md#capnproto) | ✔ | ✔ |
<a name="format_capnproto"></a>
## CapnProto
Cap'n Proto is a binary message format similar to Protocol Buffers and Thrift, but not like JSON or MessagePack.
Cap'n Proto messages are strictly typed and not self-describing, meaning they need an external schema description. The schema is applied on the fly and cached for each query.
```sql
SELECT SearchPhrase, count() AS c FROM test.hits
GROUP BY SearchPhrase FORMAT CapnProto SETTINGS schema = 'schema:Message'
```
Where `schema.capnp` looks like this:
```
struct Message {
SearchPhrase @0 :Text;
c @1 :Uint64;
}
```
Schema files are in the file that is located in the directory specified in [ format_schema_path](../operations/server_settings/settings.md#server_settings-format_schema_path) in the server configuration.
Deserialization is effective and usually doesn't increase the system load.
## CSV
Comma Separated Values format ([RFC](https://tools.ietf.org/html/rfc4180)).
When formatting, rows are enclosed in double quotes. A double quote inside a string is output as two double quotes in a row. There are no other rules for escaping characters. Date and date-time are enclosed in double quotes. Numbers are output without quotes. Values are separated by a delimiter&ast;. Rows are separated using the Unix line feed (LF). Arrays are serialized in CSV as follows: first the array is serialized to a string as in TabSeparated format, and then the resulting string is output to CSV in double quotes. Tuples in CSV format are serialized as separate columns (that is, their nesting in the tuple is lost).
&ast;By default — `,`. See a [format_csv_delimiter](/docs/en/operations/settings/settings/#format_csv_delimiter) setting for additional info.
When parsing, all values can be parsed either with or without quotes. Both double and single quotes are supported. Rows can also be arranged without quotes. In this case, they are parsed up to a delimiter or line feed (CR or LF). In violation of the RFC, when parsing rows without quotes, the leading and trailing spaces and tabs are ignored. For the line feed, Unix (LF), Windows (CR LF) and Mac OS Classic (CR LF) are all supported.
The CSV format supports the output of totals and extremes the same way as `TabSeparated`.
## CSVWithNames
Also prints the header row, similar to `TabSeparatedWithNames`.
## JSON
Outputs data in JSON format. Besides data tables, it also outputs column names and types, along with some additional information: the total number of output rows, and the number of rows that could have been output if there weren't a LIMIT. Example:
```sql
SELECT SearchPhrase, count() AS c FROM test.hits GROUP BY SearchPhrase WITH TOTALS ORDER BY c DESC LIMIT 5 FORMAT JSON
```
```json
{
"meta":
[
{
"name": "SearchPhrase",
"type": "String"
},
{
"name": "c",
"type": "UInt64"
}
],
"data":
[
{
"SearchPhrase": "",
"c": "8267016"
},
{
"SearchPhrase": "bathroom interior design",
"c": "2166"
},
{
"SearchPhrase": "yandex",
"c": "1655"
},
{
"SearchPhrase": "spring 2014 fashion",
"c": "1549"
},
{
"SearchPhrase": "freeform photos",
"c": "1480"
}
],
"totals":
{
"SearchPhrase": "",
"c": "8873898"
},
"extremes":
{
"min":
{
"SearchPhrase": "",
"c": "1480"
},
"max":
{
"SearchPhrase": "",
"c": "8267016"
}
},
"rows": 5,
"rows_before_limit_at_least": 141137
}
```
The JSON is compatible with JavaScript. To ensure this, some characters are additionally escaped: the slash ` /` is escaped as ` \/`; alternative line breaks ` U+2028` and ` U+2029`, which break some browsers, are escaped as ` \uXXXX`. ASCII control characters are escaped: backspace, form feed, line feed, carriage return, and horizontal tab are replaced with `\b`, `\f`, `\n`, `\r`, `\t` , as well as the remaining bytes in the 00-1F range using `\uXXXX` sequences. Invalid UTF-8 sequences are changed to the replacement character <20> so the output text will consist of valid UTF-8 sequences. For compatibility with JavaScript, Int64 and UInt64 integers are enclosed in double quotes by default. To remove the quotes, you can set the configuration parameter output_format_json_quote_64bit_integers to 0.
`rows` The total number of output rows.
`rows_before_limit_at_least` The minimal number of rows there would have been without LIMIT. Output only if the query contains LIMIT.
If the query contains GROUP BY, rows_before_limit_at_least is the exact number of rows there would have been without a LIMIT.
`totals` Total values (when using WITH TOTALS).
`extremes` Extreme values (when extremes is set to 1).
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
See also the JSONEachRow format.
## JSONCompact
Differs from JSON only in that data rows are output in arrays, not in objects.
Example:
```json
{
"meta":
[
{
"name": "SearchPhrase",
"type": "String"
},
{
"name": "c",
"type": "UInt64"
}
],
"data":
[
["", "8267016"],
["bathroom interior design", "2166"],
["yandex", "1655"],
["spring 2014 fashion", "1549"],
["freeform photos", "1480"]
],
"totals": ["","8873898"],
"extremes":
{
"min": ["","1480"],
"max": ["","8267016"]
},
"rows": 5,
"rows_before_limit_at_least": 141137
}
```
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
See also the `JSONEachRow` format.
## JSONEachRow
Outputs data as separate JSON objects for each row (newline delimited JSON).
```json
{"SearchPhrase":"","count()":"8267016"}
{"SearchPhrase":"bathroom interior design","count()":"2166"}
{"SearchPhrase":"yandex","count()":"1655"}
{"SearchPhrase":"spring 2014 fashion","count()":"1549"}
{"SearchPhrase":"freeform photo","count()":"1480"}
{"SearchPhrase":"angelina jolie","count()":"1245"}
{"SearchPhrase":"omsk","count()":"1112"}
{"SearchPhrase":"photos of dog breeds","count()":"1091"}
{"SearchPhrase":"curtain design","count()":"1064"}
{"SearchPhrase":"baku","count()":"1000"}
```
Unlike the JSON format, there is no substitution of invalid UTF-8 sequences. Any set of bytes can be output in the rows. This is necessary so that data can be formatted without losing any information. Values are escaped in the same way as for JSON.
For parsing, any order is supported for the values of different columns. It is acceptable for some values to be omitted they are treated as equal to their default values. In this case, zeros and blank rows are used as default values. Complex values that could be specified in the table are not supported as defaults. Whitespace between elements is ignored. If a comma is placed after the objects, it is ignored. Objects don't necessarily have to be separated by new lines.
## Native
The most efficient format. Data is written and read by blocks in binary format. For each block, the number of rows, number of columns, column names and types, and parts of columns in this block are recorded one after another. In other words, this format is "columnar" it doesn't convert columns to rows. This is the format used in the native interface for interaction between servers, for using the command-line client, and for C++ clients.
You can use this format to quickly generate dumps that can only be read by the ClickHouse DBMS. It doesn't make sense to work with this format yourself.
## Null
Nothing is output. However, the query is processed, and when using the command-line client, data is transmitted to the client. This is used for tests, including productivity testing.
Obviously, this format is only appropriate for output, not for parsing.
## Pretty
Outputs data as Unicode-art tables, also using ANSI-escape sequences for setting colors in the terminal.
A full grid of the table is drawn, and each row occupies two lines in the terminal.
Each result block is output as a separate table. This is necessary so that blocks can be output without buffering results (buffering would be necessary in order to pre-calculate the visible width of all the values).
To avoid dumping too much data to the terminal, only the first 10,000 rows are printed. If the number of rows is greater than or equal to 10,000, the message "Showed first 10 000" is printed.
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
The Pretty format supports outputting total values (when using WITH TOTALS) and extremes (when 'extremes' is set to 1). In these cases, total values and extreme values are output after the main data, in separate tables. Example (shown for the PrettyCompact format):
```sql
SELECT EventDate, count() AS c FROM test.hits GROUP BY EventDate WITH TOTALS ORDER BY EventDate FORMAT PrettyCompact
```
```text
┌──EventDate─┬───────c─┐
│ 2014-03-17 │ 1406958 │
│ 2014-03-18 │ 1383658 │
│ 2014-03-19 │ 1405797 │
│ 2014-03-20 │ 1353623 │
│ 2014-03-21 │ 1245779 │
│ 2014-03-22 │ 1031592 │
│ 2014-03-23 │ 1046491 │
└────────────┴─────────┘
Totals:
┌──EventDate─┬───────c─┐
│ 0000-00-00 │ 8873898 │
└────────────┴─────────┘
Extremes:
┌──EventDate─┬───────c─┐
│ 2014-03-17 │ 1031592 │
│ 2014-03-23 │ 1406958 │
└────────────┴─────────┘
```
## PrettyCompact
Differs from `Pretty` in that the grid is drawn between rows and the result is more compact.
This format is used by default in the command-line client in interactive mode.
## PrettyCompactMonoBlock
Differs from `PrettyCompact` in that up to 10,000 rows are buffered, then output as a single table, not by blocks.
## PrettyNoEscapes
Differs from Pretty in that ANSI-escape sequences aren't used. This is necessary for displaying this format in a browser, as well as for using the 'watch' command-line utility.
Example:
```bash
watch -n1 "clickhouse-client --query='SELECT * FROM system.events FORMAT PrettyCompactNoEscapes'"
```
You can use the HTTP interface for displaying in the browser.
### PrettyCompactNoEscapes
The same as the previous setting.
### PrettySpaceNoEscapes
The same as the previous setting.
## PrettySpace
Differs from `PrettyCompact` in that whitespace (space characters) is used instead of the grid.
## RowBinary
Formats and parses data by row in binary format. Rows and values are listed consecutively, without separators.
This format is less efficient than the Native format, since it is row-based.
Integers use fixed-length little endian representation. For example, UInt64 uses 8 bytes.
DateTime is represented as UInt32 containing the Unix timestamp as the value.
Date is represented as a UInt16 object that contains the number of days since 1970-01-01 as the value.
String is represented as a varint length (unsigned [LEB128](https://en.wikipedia.org/wiki/LEB128)), followed by the bytes of the string.
FixedString is represented simply as a sequence of bytes.
Array is represented as a varint length (unsigned [LEB128](https://en.wikipedia.org/wiki/LEB128)), followed by successive elements of the array.
## TabSeparated
In TabSeparated format, data is written by row. Each row contains values separated by tabs. Each value is follow by a tab, except the last value in the row, which is followed by a line feed. Strictly Unix line feeds are assumed everywhere. The last row also must contain a line feed at the end. Values are written in text format, without enclosing quotation marks, and with special characters escaped.
Integer numbers are written in decimal form. Numbers can contain an extra "+" character at the beginning (ignored when parsing, and not recorded when formatting). Non-negative numbers can't contain the negative sign. When reading, it is allowed to parse an empty string as a zero, or (for signed types) a string consisting of just a minus sign as a zero. Numbers that do not fit into the corresponding data type may be parsed as a different number, without an error message.
Floating-point numbers are written in decimal form. The dot is used as the decimal separator. Exponential entries are supported, as are 'inf', '+inf', '-inf', and 'nan'. An entry of floating-point numbers may begin or end with a decimal point.
During formatting, accuracy may be lost on floating-point numbers.
During parsing, it is not strictly required to read the nearest machine-representable number.
Dates are written in YYYY-MM-DD format and parsed in the same format, but with any characters as separators.
Dates with times are written in the format YYYY-MM-DD hh:mm:ss and parsed in the same format, but with any characters as separators.
This all occurs in the system time zone at the time the client or server starts (depending on which one formats data). For dates with times, daylight saving time is not specified. So if a dump has times during daylight saving time, the dump does not unequivocally match the data, and parsing will select one of the two times.
During a read operation, incorrect dates and dates with times can be parsed with natural overflow or as null dates and times, without an error message.
As an exception, parsing dates with times is also supported in Unix timestamp format, if it consists of exactly 10 decimal digits. The result is not time zone-dependent. The formats YYYY-MM-DD hh:mm:ss and NNNNNNNNNN are differentiated automatically.
Strings are output with backslash-escaped special characters. The following escape sequences are used for output: `\b`, `\f`, `\r`, `\n`, `\t`, `\0`, `\'`, `\\`. Parsing also supports the sequences `\a`, `\v`, and `\xHH` (hex escape sequences) and any `\c` sequences, where `c` is any character (these sequences are converted to `c`). Thus, reading data supports formats where a line feed can be written as `\n` or `\`, or as a line feed. For example, the string `Hello world` with a line feed between the words instead of a space can be parsed in any of the following variations:
```text
Hello\nworld
Hello\
world
```
The second variant is supported because MySQL uses it when writing tab-separated dumps.
The minimum set of characters that you need to escape when passing data in TabSeparated format: tab, line feed (LF) and backslash.
Only a small set of symbols are escaped. You can easily stumble onto a string value that your terminal will ruin in output.
Arrays are written as a list of comma-separated values in square brackets. Number items in the array are fomratted as normally, but dates, dates with times, and strings are written in single quotes with the same escaping rules as above.
The TabSeparated format is convenient for processing data using custom programs and scripts. It is used by default in the HTTP interface, and in the command-line client's batch mode. This format also allows transferring data between different DBMSs. For example, you can get a dump from MySQL and upload it to ClickHouse, or vice versa.
The TabSeparated format supports outputting total values (when using WITH TOTALS) and extreme values (when 'extremes' is set to 1). In these cases, the total values and extremes are output after the main data. The main result, total values, and extremes are separated from each other by an empty line. Example:
```sql
SELECT EventDate, count() AS c FROM test.hits GROUP BY EventDate WITH TOTALS ORDER BY EventDate FORMAT TabSeparated``
```
```text
2014-03-17 1406958
2014-03-18 1383658
2014-03-19 1405797
2014-03-20 1353623
2014-03-21 1245779
2014-03-22 1031592
2014-03-23 1046491
0000-00-00 8873898
2014-03-17 1031592
2014-03-23 1406958
```
This format is also available under the name `TSV`.
## TabSeparatedRaw
Differs from `TabSeparated` format in that the rows are written without escaping.
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
This format is also available under the name ` TSVRaw`.
## TabSeparatedWithNames
Differs from the `TabSeparated` format in that the column names are written in the first row.
During parsing, the first row is completely ignored. You can't use column names to determine their position or to check their correctness.
(Support for parsing the header row may be added in the future.)
This format is also available under the name ` TSVWithNames`.
## TabSeparatedWithNamesAndTypes
Differs from the `TabSeparated` format in that the column names are written to the first row, while the column types are in the second row.
During parsing, the first and second rows are completely ignored.
This format is also available under the name ` TSVWithNamesAndTypes`.
## TSKV
Similar to TabSeparated, but outputs a value in name=value format. Names are escaped the same way as in TabSeparated format, and the = symbol is also escaped.
```text
SearchPhrase= count()=8267016
SearchPhrase=bathroom interior design count()=2166
SearchPhrase=yandex count()=1655
SearchPhrase=spring 2014 fashion count()=1549
SearchPhrase=freeform photos count()=1480
SearchPhrase=angelina jolia count()=1245
SearchPhrase=omsk count()=1112
SearchPhrase=photos of dog breeds count()=1091
SearchPhrase=curtain design count()=1064
SearchPhrase=baku count()=1000
```
When there is a large number of small columns, this format is ineffective, and there is generally no reason to use it. It is used in some departments of Yandex.
Both data output and parsing are supported in this format. For parsing, any order is supported for the values of different columns. It is acceptable for some values to be omitted they are treated as equal to their default values. In this case, zeros and blank rows are used as default values. Complex values that could be specified in the table are not supported as defaults.
Parsing allows the presence of the additional field `tskv` without the equal sign or a value. This field is ignored.
## Values
Prints every row in brackets. Rows are separated by commas. There is no comma after the last row. The values inside the brackets are also comma-separated. Numbers are output in decimal format without quotes. Arrays are output in square brackets. Strings, dates, and dates with times are output in quotes. Escaping rules and parsing are similar to the TabSeparated format. During formatting, extra spaces aren't inserted, but during parsing, they are allowed and skipped (except for spaces inside array values, which are not allowed).
The minimum set of characters that you need to escape when passing data in Values format: single quotes and backslashes.
This is the format that is used in `INSERT INTO t VALUES ...`, but you can also use it for formatting query results.
## Vertical
Prints each value on a separate line with the column name specified. This format is convenient for printing just one or a few rows, if each row consists of a large number of columns.
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
## VerticalRaw
Differs from `Vertical` format in that the rows are not escaped.
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
Examples:
```
:) SHOW CREATE TABLE geonames FORMAT VerticalRaw;
Row 1:
──────
statement: CREATE TABLE default.geonames ( geonameid UInt32, date Date DEFAULT CAST('2017-12-08' AS Date)) ENGINE = MergeTree(date, geonameid, 8192)
:) SELECT 'string with \'quotes\' and \t with some special \n characters' AS test FORMAT VerticalRaw;
Row 1:
──────
test: string with 'quotes' and with some special
characters
```
Compare with the Vertical format:
```
:) SELECT 'string with \'quotes\' and \t with some special \n characters' AS test FORMAT Vertical;
Row 1:
──────
test: string with \'quotes\' and \t with some special \n characters
```
## XML
XML format is suitable only for output, not for parsing. Example:
```xml
<?xml version='1.0' encoding='UTF-8' ?>
<result>
<meta>
<columns>
<column>
<name>SearchPhrase</name>
<type>String</type>
</column>
<column>
<name>count()</name>
<type>UInt64</type>
</column>
</columns>
</meta>
<data>
<row>
<SearchPhrase></SearchPhrase>
<field>8267016</field>
</row>
<row>
<SearchPhrase>bathroom interior design</SearchPhrase>
<field>2166</field>
</row>
<row>
<SearchPhrase>yandex</SearchPhrase>
<field>1655</field>
</row>
<row>
<SearchPhrase>spring 2014 fashion</SearchPhrase>
<field>1549</field>
</row>
<row>
<SearchPhrase>freeform photos</SearchPhrase>
<field>1480</field>
</row>
<row>
<SearchPhrase>angelina jolie</SearchPhrase>
<field>1245</field>
</row>
<row>
<SearchPhrase>omsk</SearchPhrase>
<field>1112</field>
</row>
<row>
<SearchPhrase>photos of dog breeds</SearchPhrase>
<field>1091</field>
</row>
<row>
<SearchPhrase>curtain design</SearchPhrase>
<field>1064</field>
</row>
<row>
<SearchPhrase>baku</SearchPhrase>
<field>1000</field>
</row>
</data>
<rows>10</rows>
<rows_before_limit_at_least>141137</rows_before_limit_at_least>
</result>
```
If the column name does not have an acceptable format, just 'field' is used as the element name. In general, the XML structure follows the JSON structure.
Just as for JSON, invalid UTF-8 sequences are changed to the replacement character <20> so the output text will consist of valid UTF-8 sequences.
In string values, the characters `<` and `&` are escaped as `<` and `&`.
Arrays are output as `<array><elem>Hello</elem><elem>World</elem>...</array>`,
and tuples as `<tuple><elem>Hello</elem><elem>World</elem>...</tuple>`.

View File

@ -58,5 +58,5 @@ This lets us use the system as the back-end for a web interface. Low latency mea
## Data replication and support for data integrity on replicas
Uses asynchronous multimaster replication. After being written to any available replica, data is distributed to all the remaining replicas. The system maintains identical data on different replicas. Data is restored automatically after a failure, or using a "button" for complex cases.
For more information, see the section [Data replication](../table_engines/replication.md#table_engines-replication).
For more information, see the section [Data replication](../operations/table_engines/replication.md#table_engines-replication).

View File

@ -61,7 +61,7 @@ Users are recorded in the `users` section. Here is a fragment of the `users.xml`
You can see a declaration from two users: `default` and `web`. We added the `web` user separately.
The `default` user is chosen in cases when the username is not passed. The `default` user is also used for distributed query processing, if the configuration of the server or cluster doesn't specify the `user` and `password` (see the section on the [Distributed](../table_engines/distributed.md#table_engines-distributed) engine).
The `default` user is chosen in cases when the username is not passed. The `default` user is also used for distributed query processing, if the configuration of the server or cluster doesn't specify the `user` and `password` (see the section on the [Distributed](../operations/table_engines/distributed.md#table_engines-distributed) engine).
The user that is used for exchanging information between servers combined in a cluster must not have substantial restrictions or quotas otherwise, distributed queries will fail.

View File

@ -67,7 +67,7 @@ ClickHouse checks ` min_part_size` and ` min_part_size_ratio` and processes th
The default database.
To get a list of databases, use the [SHOW DATABASES](../../query_language/queries.md#query_language_queries_show_databases).
To get a list of databases, use the [SHOW DATABASES](../../query_language/misc.md#query_language_queries_show_databases).
**Example**
@ -100,7 +100,7 @@ Path:
- Specify the absolute path or the path relative to the server config file.
- The path can contain wildcards \* and ?.
See also "[External dictionaries](../../dicts/external_dicts.md#dicts-external_dicts)".
See also "[External dictionaries](../../query_language/dicts/external_dicts.md#dicts-external_dicts)".
**Example**
@ -130,7 +130,7 @@ The default is ` true`.
## format_schema_path
The path to the directory with the schemes for the input data, such as schemas for the [CapnProto](../../formats/capnproto.md#format_capnproto) format.
The path to the directory with the schemes for the input data, such as schemas for the [CapnProto](../../interfaces/formats.md#format_capnproto) format.
**Example**
@ -179,7 +179,7 @@ You can configure multiple `<graphite>` clauses. For instance, you can use this
Settings for thinning data for Graphite.
For more information, see [GraphiteMergeTree](../../table_engines/graphitemergetree.md#table_engines-graphitemergetree).
For more information, see [GraphiteMergeTree](../../operations/table_engines/graphitemergetree.md#table_engines-graphitemergetree).
**Example**
@ -358,7 +358,7 @@ Parameter substitutions for replicated tables.
Can be omitted if replicated tables are not used.
For more information, see the section "[Creating replicated tables](../../table_engines/replication.md#table_engines-replication-creation_of_rep_tables)".
For more information, see the section "[Creating replicated tables](../../operations/table_engines/replication.md#table_engines-replication-creation_of_rep_tables)".
**Example**
@ -370,7 +370,7 @@ For more information, see the section "[Creating replicated tables](../../table_
## mark_cache_size
Approximate size (in bytes) of the cache of "marks" used by [MergeTree](../../table_engines/mergetree.md#table_engines-mergetree) engines.
Approximate size (in bytes) of the cache of "marks" used by [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree) engines.
The cache is shared for the server and memory is allocated as needed. The cache size must be at least 5368709120.
@ -426,7 +426,7 @@ We recommend using this option in Mac OS X, since the ` getrlimit()` function re
Restriction on deleting tables.
If the size of a [MergeTree](../../table_engines/mergetree.md#table_engines-mergetree) type table exceeds `max_table_size_to_drop` (in bytes), you can't delete it using a DROP query.
If the size of a [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree) type table exceeds `max_table_size_to_drop` (in bytes), you can't delete it using a DROP query.
If you still need to delete the table without restarting the ClickHouse server, create the ` <clickhouse-path>/flags/force_drop_table` file and run the DROP query.
@ -444,7 +444,7 @@ The value 0 means that you can delete all tables without any restrictions.
## merge_tree
Fine tuning for tables in the [ MergeTree](../../table_engines/mergetree.md#table_engines-mergetree) family.
Fine tuning for tables in the [ MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree) family.
For more information, see the MergeTreeSettings.h header file.
@ -521,7 +521,7 @@ Keys for server/client settings:
## part_log
Logging events that are associated with [MergeTree](../../table_engines/mergetree.md#table_engines-mergetree) data. For instance, adding or merging data. You can use the log to simulate merge algorithms and compare their characteristics. You can visualize the merge process.
Logging events that are associated with [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree) data. For instance, adding or merging data. You can use the log to simulate merge algorithms and compare their characteristics. You can visualize the merge process.
Queries are logged in the ClickHouse table, not in a separate file.
@ -541,7 +541,7 @@ Use the following parameters to configure logging:
- database Name of the database.
- table Name of the table.
- partition_by Sets a [custom partitioning key](../../table_engines/custom_partitioning_key.md#custom-partitioning-key).
- partition_by Sets a [custom partitioning key](../../operations/table_engines/custom_partitioning_key.md#custom-partitioning-key).
- flush_interval_milliseconds Interval for flushing data from memory to the disk.
**Example**
@ -585,7 +585,7 @@ Use the following parameters to configure logging:
- database Name of the database.
- table Name of the table.
- partition_by Sets a [custom partitioning key](../../table_engines/custom_partitioning_key.md#custom-partitioning-key).
- partition_by Sets a [custom partitioning key](../../operations/table_engines/custom_partitioning_key.md#custom-partitioning-key).
- flush_interval_milliseconds Interval for flushing data from memory to the disk.
If the table doesn't exist, ClickHouse will create it. If the structure of the query log changed when the ClickHouse server was updated, the table with the old structure is renamed, and a new table is created automatically.
@ -607,7 +607,7 @@ If the table doesn't exist, ClickHouse will create it. If the structure of the q
Configuration of clusters used by the Distributed table engine.
For more information, see the section "[Table engines/Distributed](../../table_engines/distributed.md#table_engines-distributed)".
For more information, see the section "[Table engines/Distributed](../../operations/table_engines/distributed.md#table_engines-distributed)".
**Example**
@ -667,7 +667,7 @@ The end slash is mandatory.
## uncompressed_cache_size
Cache size (in bytes) for uncompressed data used by table engines from the [MergeTree](../../table_engines/mergetree.md#table_engines-mergetree) family.
Cache size (in bytes) for uncompressed data used by table engines from the [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree) family.
There is one shared cache for the server. Memory is allocated on demand. The cache is used if the option [use_uncompressed_cache](../settings/settings.md#settings-use_uncompressed_cache) is enabled.
@ -681,7 +681,7 @@ The uncompressed cache is advantageous for very short queries in individual case
## user_files_path
A catalog with user files. Used in a [file()](../../table_functions/file.md#table_functions-file) table function.
A catalog with user files. Used in a [file()](../../query_language/table_functions/file.md#table_functions-file) table function.
**Example**
@ -716,7 +716,7 @@ ClickHouse uses ZooKeeper for storing replica metadata when using replicated tab
This parameter can be omitted if replicated tables are not used.
For more information, see the section "[Replication](../../table_engines/replication.md#table_engines-replication)".
For more information, see the section "[Replication](../../operations/table_engines/replication.md#table_engines-replication)".
**Example**

View File

@ -4,7 +4,7 @@
## distributed_product_mode
Changes the behavior of [distributed subqueries](../../query_language/queries.md#queries-distributed-subrequests), i.e. in cases when the query contains the product of distributed tables.
Changes the behavior of [distributed subqueries](../../query_language/select.md#queries-distributed-subrequests), i.e. in cases when the query contains the product of distributed tables.
ClickHouse applies the configuration if the subqueries on any level have a distributed table that exists on the local server and has more than one shard.
@ -12,7 +12,7 @@ Restrictions:
- Only applied for IN and JOIN subqueries.
- Used only if a distributed table is used in the FROM clause.
- Not used for a table-valued [ remote](../../table_functions/remote.md#table_functions-remote) function.
- Not used for a table-valued [ remote](../../query_language/table_functions/remote.md#table_functions-remote) function.
The possible values are:
@ -20,7 +20,7 @@ The possible values are:
## fallback_to_stale_replicas_for_distributed_queries
Forces a query to an out-of-date replica if updated data is not available. See "[Replication](../../table_engines/replication.md#table_engines-replication)".
Forces a query to an out-of-date replica if updated data is not available. See "[Replication](../../operations/table_engines/replication.md#table_engines-replication)".
ClickHouse selects the most relevant from the outdated replicas of the table.
@ -36,7 +36,7 @@ Disables query execution if the index can't be used by date.
Works with tables in the MergeTree family.
If `force_index_by_date=1`, ClickHouse checks whether the query has a date key condition that can be used for restricting data ranges. If there is no suitable condition, it throws an exception. However, it does not check whether the condition actually reduces the amount of data to read. For example, the condition `Date != ' 2000-01-01 '` is acceptable even when it matches all the data in the table (i.e., running the query requires a full scan). For more information about ranges of data in MergeTree tables, see "[MergeTree](../../table_engines/mergetree.md#table_engines-mergetree)".
If `force_index_by_date=1`, ClickHouse checks whether the query has a date key condition that can be used for restricting data ranges. If there is no suitable condition, it throws an exception. However, it does not check whether the condition actually reduces the amount of data to read. For example, the condition `Date != ' 2000-01-01 '` is acceptable even when it matches all the data in the table (i.e., running the query requires a full scan). For more information about ranges of data in MergeTree tables, see "[MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree)".
<a name="settings-settings-force_primary_key"></a>
@ -46,7 +46,7 @@ Disables query execution if indexing by the primary key is not possible.
Works with tables in the MergeTree family.
If `force_primary_key=1`, ClickHouse checks to see if the query has a primary key condition that can be used for restricting data ranges. If there is no suitable condition, it throws an exception. However, it does not check whether the condition actually reduces the amount of data to read. For more information about data ranges in MergeTree tables, see "[MergeTree](../../table_engines/mergetree.md#table_engines-mergetree)".
If `force_primary_key=1`, ClickHouse checks to see if the query has a primary key condition that can be used for restricting data ranges. If there is no suitable condition, it throws an exception. However, it does not check whether the condition actually reduces the amount of data to read. For more information about data ranges in MergeTree tables, see "[MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree)".
<a name="settings_settings_fsync_metadata"></a>
@ -125,7 +125,7 @@ This is slightly more than `max_block_size`. The reason for this is because cert
## max_replica_delay_for_distributed_queries
Disables lagging replicas for distributed queries. See "[Replication](../../table_engines/replication.md#table_engines-replication)".
Disables lagging replicas for distributed queries. See "[Replication](../../operations/table_engines/replication.md#table_engines-replication)".
Sets the time in seconds. If a replica lags more than the set value, this replica is not used.
@ -158,7 +158,7 @@ Don't confuse blocks for compression (a chunk of memory consisting of bytes) and
## min_compress_block_size
For [MergeTree](../../table_engines/mergetree.md#table_engines-mergetree)" tables. In order to reduce latency when processing queries, a block is compressed when writing the next mark if its size is at least 'min_compress_block_size'. By default, 65,536.
For [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree)" tables. In order to reduce latency when processing queries, a block is compressed when writing the next mark if its size is at least 'min_compress_block_size'. By default, 65,536.
The actual size of the block, if the uncompressed data is less than 'max_compress_block_size', is no less than this value and no less than the volume of data for one mark.

View File

@ -0,0 +1,414 @@
# System tables
System tables are used for implementing part of the system's functionality, and for providing access to information about how the system is working.
You can't delete a system table (but you can perform DETACH).
System tables don't have files with data on the disk or files with metadata. The server creates all the system tables when it starts.
System tables are read-only.
They are located in the 'system' database.
<a name="system_tables-system.asynchronous_metrics"></a>
## system.asynchronous_metrics
Contain metrics used for profiling and monitoring.
They usually reflect the number of events currently in the system, or the total resources consumed by the system.
Example: The number of SELECT queries currently running; the amount of memory in use.`system.asynchronous_metrics`and`system.metrics` differ in their sets of metrics and how they are calculated.
## system.clusters
Contains information about clusters available in the config file and the servers in them.
Columns:
```text
cluster String Cluster name.
shard_num UInt32 Number of a shard in the cluster, starting from 1.
shard_weight UInt32 Relative weight of a shard when writing data.
replica_num UInt32 Number of a replica in the shard, starting from 1.
host_name String Host name as specified in the config.
host_address String Host's IP address obtained from DNS.
port UInt16 The port used to access the server.
user String The username to use for connecting to the server.
```
## system.columns
Contains information about the columns in all tables.
You can use this table to get information similar to `DESCRIBE TABLE`, but for multiple tables at once.
```text
database String - Name of the database the table is located in.
table String - Table name.
name String - Column name.
type String - Column type.
default_type String - Expression type (DEFAULT, MATERIALIZED, ALIAS) for the default value, or an empty string if it is not defined.
default_expression String - Expression for the default value, or an empty string if it is not defined.
```
## system.databases
This table contains a single String column called 'name' the name of a database.
Each database that the server knows about has a corresponding entry in the table.
This system table is used for implementing the `SHOW DATABASES` query.
## system.dictionaries
Contains information about external dictionaries.
Columns:
- `name String` Dictionary name.
- `type String` Dictionary type: Flat, Hashed, Cache.
- `origin String` Path to the config file where the dictionary is described.
- `attribute.names Array(String)` Array of attribute names provided by the dictionary.
- `attribute.types Array(String)` Corresponding array of attribute types provided by the dictionary.
- `has_hierarchy UInt8` Whether the dictionary is hierarchical.
- `bytes_allocated UInt64` The amount of RAM used by the dictionary.
- `hit_rate Float64` For cache dictionaries, the percent of usage for which the value was in the cache.
- `element_count UInt64` The number of items stored in the dictionary.
- `load_factor Float64` The filled percentage of the dictionary (for a hashed dictionary, it is the filled percentage of the hash table).
- `creation_time DateTime` Time spent for the creation or last successful reload of the dictionary.
- `last_exception String` Text of an error that occurred when creating or reloading the dictionary, if the dictionary couldn't be created.
- `source String` Text describing the data source for the dictionary.
Note that the amount of memory used by the dictionary is not proportional to the number of items stored in it. So for flat and cached dictionaries, all the memory cells are pre-assigned, regardless of how full the dictionary actually is.
<a name="system_tables-system.events"></a>
## system.events
Contains information about the number of events that have occurred in the system. This is used for profiling and monitoring purposes.
Example: The number of processed SELECT queries.
Columns: 'event String' the event name, and 'value UInt64' the quantity.
## system.functions
Contains information about normal and aggregate functions.
Columns:
- `name` (`String`) Function name.
- `is_aggregate` (`UInt8`) Whether it is an aggregate function.
## system.merges
Contains information about merges currently in process for tables in the MergeTree family.
Columns:
- `database String` — Name of the database the table is located in.
- `table String` — Name of the table.
- `elapsed Float64` — Time in seconds since the merge started.
- `progress Float64` — Percent of progress made, from 0 to 1.
- `num_parts UInt64` — Number of parts to merge.
- `result_part_name String` — Name of the part that will be formed as the result of the merge.
- `total_size_bytes_compressed UInt64` — Total size of compressed data in the parts being merged.
- `total_size_marks UInt64` — Total number of marks in the parts being merged.
- `bytes_read_uncompressed UInt64` — Amount of bytes read, decompressed.
- `rows_read UInt64` — Number of rows read.
- `bytes_written_uncompressed UInt64` — Amount of bytes written, uncompressed.
- `rows_written UInt64` — Number of rows written.
<a name="system_tables-system.metrics"></a>
## system.metrics
## system.numbers
This table contains a single UInt64 column named 'number' that contains almost all the natural numbers starting from zero.
You can use this table for tests, or if you need to do a brute force search.
Reads from this table are not parallelized.
## system.numbers_mt
The same as 'system.numbers' but reads are parallelized. The numbers can be returned in any order.
Used for tests.
## system.one
This table contains a single row with a single 'dummy' UInt8 column containing the value 0.
This table is used if a SELECT query doesn't specify the FROM clause.
This is similar to the DUAL table found in other DBMSs.
## system.parts
Contains information about parts of a table in the [MergeTree](../operations/table_engines/mergetree.md#table_engines-mergetree) family.
Each row describes one part of the data.
Columns:
- partition (String) The partition name. It's in YYYYMM format in case of old-style partitioning and is arbitary serialized value in case of custom partitioning. To learn what a partition is, see the description of the [ALTER](../query_language/alter.md#query_language_queries_alter) query.
- name (String) Name of the data part.
- active (UInt8) Indicates whether the part is active. If a part is active, it is used in a table; otherwise, it will be deleted. Inactive data parts remain after merging.
- marks (UInt64) The number of marks. To get the approximate number of rows in a data part, multiply ``marks`` by the index granularity (usually 8192).
- marks_size (UInt64) The size of the file with marks.
- rows (UInt64) The number of rows.
- bytes (UInt64) The number of bytes when compressed.
- modification_time (DateTime) The modification time of the directory with the data part. This usually corresponds to the time of data part creation.|
- remove_time (DateTime) The time when the data part became inactive.
- refcount (UInt32) The number of places where the data part is used. A value greater than 2 indicates that the data part is used in queries or merges.
- min_date (Date) The minimum value of the date key in the data part.
- max_date (Date) The maximum value of the date key in the data part.
- min_block_number (UInt64) The minimum number of data parts that make up the current part after merging.
- max_block_number (UInt64) The maximum number of data parts that make up the current part after merging.
- level (UInt32) Depth of the merge tree. If a merge was not performed, ``level=0``.
- primary_key_bytes_in_memory (UInt64) The amount of memory (in bytes) used by primary key values.
- primary_key_bytes_in_memory_allocated (UInt64) The amount of memory (in bytes) reserved for primary key values.
- database (String) Name of the database.
- table (String) Name of the table.
- engine (String) Name of the table engine without parameters.
## system.processes
This system table is used for implementing the `SHOW PROCESSLIST` query.
Columns:
```text
user String Name of the user who made the request. For distributed query processing, this is the user who helped the requestor server send the query to this server, not the user who made the distributed request on the requestor server.
address String The IP address that the query was made from. The same is true for distributed query processing.
elapsed Float64 The time in seconds since request execution started.
rows_read UInt64 The number of rows read from the table. For distributed processing, on the requestor server, this is the total for all remote servers.
bytes_read UInt64 The number of uncompressed bytes read from the table. For distributed processing, on the requestor server, this is the total for all remote servers.
UInt64 total_rows_approx The approximate total number of rows that must be read. For distributed processing, on the requestor server, this is the total for all remote servers. It can be updated during request processing, when new sources to process become known.
memory_usage UInt64 Memory consumption by the query. It might not include some types of dedicated memory.
query String The query text. For INSERT, it doesn't include the data to insert.
query_id Query ID, if defined.
```
## system.replicas
Contains information and status for replicated tables residing on the local server.
This table can be used for monitoring. The table contains a row for every Replicated\* table.
Example:
```sql
SELECT *
FROM system.replicas
WHERE table = 'visits'
FORMAT Vertical
```
```text
Row 1:
──────
database: merge
table: visits
engine: ReplicatedCollapsingMergeTree
is_leader: 1
is_readonly: 0
is_session_expired: 0
future_parts: 1
parts_to_check: 0
zookeeper_path: /clickhouse/tables/01-06/visits
replica_name: example01-06-1.yandex.ru
replica_path: /clickhouse/tables/01-06/visits/replicas/example01-06-1.yandex.ru
columns_version: 9
queue_size: 1
inserts_in_queue: 0
merges_in_queue: 1
log_max_index: 596273
log_pointer: 596274
total_replicas: 2
active_replicas: 2
```
Columns:
```text
database: database name
table: table name
engine: table engine name
is_leader: whether the replica is the leader
Only one replica at a time can be the leader. The leader is responsible for selecting background merges to perform.
Note that writes can be performed to any replica that is available and has a session in ZK, regardless of whether it is a leader.
is_readonly: Whether the replica is in read-only mode.
This mode is turned on if the config doesn't have sections with ZK, if an unknown error occurred when reinitializing sessions in ZK, and during session reinitialization in ZK.
is_session_expired: Whether the ZK session expired.
Basically, the same thing as is_readonly.
future_parts: The number of data parts that will appear as the result of INSERTs or merges that haven't been done yet.
parts_to_check: The number of data parts in the queue for verification.
A part is put in the verification queue if there is suspicion that it might be damaged.
zookeeper_path: The path to the table data in ZK.
replica_name: Name of the replica in ZK. Different replicas of the same table have different names.
replica_path: The path to the replica data in ZK. The same as concatenating zookeeper_path/replicas/replica_path.
columns_version: Version number of the table structure.
Indicates how many times ALTER was performed. If replicas have different versions, it means some replicas haven't made all of the ALTERs yet.
queue_size: Size of the queue for operations waiting to be performed.
Operations include inserting blocks of data, merges, and certain other actions.
Normally coincides with future_parts.
inserts_in_queue: Number of inserts of blocks of data that need to be made.
Insertions are usually replicated fairly quickly. If the number is high, something is wrong.
merges_in_queue: The number of merges waiting to be made.
Sometimes merges are lengthy, so this value may be greater than zero for a long time.
The next 4 columns have a non-null value only if the ZK session is active.
log_max_index: Maximum entry number in the log of general activity.
log_pointer: Maximum entry number in the log of general activity that the replica copied to its execution queue, plus one.
If log_pointer is much smaller than log_max_index, something is wrong.
total_replicas: Total number of known replicas of this table.
active_replicas: Number of replicas of this table that have a ZK session (the number of active replicas).
```
If you request all the columns, the table may work a bit slowly, since several reads from ZK are made for each row.
If you don't request the last 4 columns (log_max_index, log_pointer, total_replicas, active_replicas), the table works quickly.
For example, you can check that everything is working correctly like this:
```sql
SELECT
database,
table,
is_leader,
is_readonly,
is_session_expired,
future_parts,
parts_to_check,
columns_version,
queue_size,
inserts_in_queue,
merges_in_queue,
log_max_index,
log_pointer,
total_replicas,
active_replicas
FROM system.replicas
WHERE
is_readonly
OR is_session_expired
OR future_parts > 20
OR parts_to_check > 10
OR queue_size > 20
OR inserts_in_queue > 10
OR log_max_index - log_pointer > 10
OR total_replicas < 2
OR active_replicas < total_replicas
```
If this query doesn't return anything, it means that everything is fine.
## system.settings
Contains information about settings that are currently in use.
I.e. used for executing the query you are using to read from the system.settings table).
Columns:
```text
name String Setting name.
value String Setting value.
changed UInt8 - Whether the setting was explicitly defined in the config or explicitly changed.
```
Example:
```sql
SELECT *
FROM system.settings
WHERE changed
```
```text
┌─name───────────────────┬─value───────┬─changed─┐
│ max_threads │ 8 │ 1 │
│ use_uncompressed_cache │ 0 │ 1 │
│ load_balancing │ random │ 1 │
│ max_memory_usage │ 10000000000 │ 1 │
└────────────────────────┴─────────────┴─────────┘
```
## system.tables
This table contains the String columns 'database', 'name', and 'engine'.
The table also contains three virtual columns: metadata_modification_time (DateTime type), create_table_query, and engine_full (String type).
Each table that the server knows about is entered in the 'system.tables' table.
This system table is used for implementing SHOW TABLES queries.
## system.zookeeper
This table presents when ZooKeeper is configured. It allows reading data from the ZooKeeper cluster defined in the config.
The query must have a 'path' equality condition in the WHERE clause. This is the path in ZooKeeper for the children that you want to get data for.
The query `SELECT * FROM system.zookeeper WHERE path = '/clickhouse'` outputs data for all children on the `/clickhouse` node.
To output data for all root nodes, write path = '/'.
If the path specified in 'path' doesn't exist, an exception will be thrown.
Columns:
- `name String` — Name of the node.
- `path String` — Path to the node.
- `value String` — Value of the node.
- `dataLength Int32` — Size of the value.
- `numChildren Int32` — Number of children.
- `czxid Int64` — ID of the transaction that created the node.
- `mzxid Int64` — ID of the transaction that last changed the node.
- `pzxid Int64` — ID of the transaction that last added or removed children.
- `ctime DateTime` — Time of node creation.
- `mtime DateTime` — Time of the last node modification.
- `version Int32` — Node version - the number of times the node was changed.
- `cversion Int32` — Number of added or removed children.
- `aversion Int32` — Number of changes to ACL.
- `ephemeralOwner Int64` — For ephemeral nodes, the ID of the session that owns this node.
Example:
```sql
SELECT *
FROM system.zookeeper
WHERE path = '/clickhouse/tables/01-08/visits/replicas'
FORMAT Vertical
```
```text
Row 1:
──────
name: example01-08-1.yandex.ru
value:
czxid: 932998691229
mzxid: 932998691229
ctime: 2015-03-27 16:49:51
mtime: 2015-03-27 16:49:51
version: 0
cversion: 47
aversion: 0
ephemeralOwner: 0
dataLength: 0
numChildren: 7
pzxid: 987021031383
path: /clickhouse/tables/01-08/visits/replicas
Row 2:
──────
name: example01-08-2.yandex.ru
value:
czxid: 933002738135
mzxid: 933002738135
ctime: 2015-03-27 16:57:01
mtime: 2015-03-27 16:57:01
version: 0
cversion: 37
aversion: 0
ephemeralOwner: 0
dataLength: 0
numChildren: 7
pzxid: 987021252247
path: /clickhouse/tables/01-08/visits/replicas
```

View File

@ -61,7 +61,7 @@ WHERE name = 'products'
└──────────┴──────┴────────┴─────────────────┴─────────────────┴─────────────────┴───────────────┴─────────────────┘
```
You can use the [dictGet*](../functions/ext_dict_functions.md#ext_dict_functions) function to get the dictionary data in this format.
You can use the [dictGet*](../../query_language/functions/ext_dict_functions.md#ext_dict_functions) function to get the dictionary data in this format.
This view isn't helpful when you need to get raw data, or when performing a `JOIN` operation. For these cases, you can use the `Dictionary` engine, which displays the dictionary data in a table.

Some files were not shown because too many files have changed in this diff Show More