mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-21 15:12:02 +00:00
Some WIP on documentation refactoring (#2659)
* Additional .gitignore entries * Merge a bunch of small articles about system tables into single one * Merge a bunch of small articles about formats into single one * Adapt table with formats to English docs too * Add SPb meetup link to main page * Move Utilities out of top level of docs (the location is probably not yet final) + translate couple articles * Merge MacOS.md into build_osx.md * Move Data types higher in ToC * Publish changelog on website alongside documentation * Few fixes for en/table_engines/file.md * Use smaller header sizes in changelogs * Group up table engines inside ToC * Move table engines out of top level too * Specificy in ToC that query language is SQL based. Thats a bit excessive, but catches eye. * Move stuff that is part of query language into respective folder * Move table functions lower in ToC * Lost redirects.txt update * Do not rely on comments in yaml + fix few ru titles * Extract major parts of queries.md into separate articles * queries.md has been supposed to be removed * Fix weird translation * Fix a bunch of links * There is only table of contents left * "Query language" is actually part of SQL abbreviation * Change filename in README.md too * fix mistype
This commit is contained in:
parent
51cdec0bec
commit
0a4a5b36cc
3
.gitignore
vendored
3
.gitignore
vendored
@ -11,6 +11,7 @@
|
||||
|
||||
/build
|
||||
/docs/build
|
||||
/docs/edit
|
||||
/docs/tools/venv/
|
||||
/docs/en/development/build/
|
||||
/docs/ru/development/build/
|
||||
@ -238,3 +239,5 @@ node_modules
|
||||
public
|
||||
website/docs
|
||||
website/presentations
|
||||
.DS_Store
|
||||
*/.DS_Store
|
||||
|
@ -1,15 +1,15 @@
|
||||
# en:
|
||||
## en:
|
||||
|
||||
## Improvements:
|
||||
### Improvements:
|
||||
* Added Nullable support for runningDifference function. [#2590](https://github.com/yandex/ClickHouse/issues/2590)
|
||||
|
||||
## Bug fiexs:
|
||||
### Bug fiexs:
|
||||
* Fixed switching to default databses in case of client reconection. [#2580](https://github.com/yandex/ClickHouse/issues/2580)
|
||||
|
||||
# ru:
|
||||
## ru:
|
||||
|
||||
## Улучшения:
|
||||
### Улучшения:
|
||||
* Добавлена поддержка Nullable для функции runningDifference. [#2590](https://github.com/yandex/ClickHouse/issues/2590)
|
||||
|
||||
## Исправление ошибок:
|
||||
### Исправление ошибок:
|
||||
* Исправлено переключение на дефолтную базу данных при переподключении клиента. [#2580](https://github.com/yandex/ClickHouse/issues/2580)
|
||||
|
148
CHANGELOG.md
148
CHANGELOG.md
@ -1,6 +1,6 @@
|
||||
# ClickHouse release 1.1.54388, 2018-06-28
|
||||
## ClickHouse release 1.1.54388, 2018-06-28
|
||||
|
||||
## New features:
|
||||
### New features:
|
||||
|
||||
* Support for the `ALTER TABLE t DELETE WHERE` query for replicated tables. Added the `system.mutations` table to track progress of this type of queries.
|
||||
* Support for the `ALTER TABLE t [REPLACE|ATTACH] PARTITION` query for MergeTree tables.
|
||||
@ -18,12 +18,12 @@
|
||||
* Added the `date_time_input_format` setting. If you switch this setting to `'best_effort'`, DateTime values will be read in a wide range of formats.
|
||||
* Added the `clickhouse-obfuscator` utility for data obfuscation. Usage example: publishing data used in performance tests.
|
||||
|
||||
## Experimental features:
|
||||
### Experimental features:
|
||||
|
||||
* Added the ability to calculate `and` arguments only where they are needed ([Anastasia Tsarkova](https://github.com/yandex/ClickHouse/pull/2272)).
|
||||
* JIT compilation to native code is now available for some expressions ([pyos](https://github.com/yandex/ClickHouse/pull/2277)).
|
||||
|
||||
## Bug fixes:
|
||||
### Bug fixes:
|
||||
|
||||
* Duplicates no longer appear for a query with `DISTINCT` and `ORDER BY`.
|
||||
* Queries with `ARRAY JOIN` and `arrayFilter` no longer return an incorrect result.
|
||||
@ -45,7 +45,7 @@
|
||||
* Fixed SSRF in the `remote()` table function.
|
||||
* Fixed exit behavior of `clickhouse-client` in multiline mode ([#2510](https://github.com/yandex/ClickHouse/issues/2510)).
|
||||
|
||||
## Improvements:
|
||||
### Improvements:
|
||||
|
||||
* Background tasks in replicated tables are now performed in a thread pool instead of in separate threads ([Silviu Caragea](https://github.com/yandex/ClickHouse/pull/1722)).
|
||||
* Improved LZ4 compression performance.
|
||||
@ -58,7 +58,7 @@
|
||||
* When calculating the number of available CPU cores, limits on cgroups are now taken into account ([Atri Sharma](https://github.com/yandex/ClickHouse/pull/2325)).
|
||||
* Added chown for config directories in the systemd config file ([Mikhail Shiryaev](https://github.com/yandex/ClickHouse/pull/2421)).
|
||||
|
||||
## Build changes:
|
||||
### Build changes:
|
||||
|
||||
* The gcc8 compiler can be used for builds.
|
||||
* Added the ability to build llvm from a submodule.
|
||||
@ -69,36 +69,36 @@
|
||||
* Added the ability to use the libtinfo library instead of libtermcap ([Georgy Kondratiev](https://github.com/yandex/ClickHouse/pull/2519)).
|
||||
* Fixed a header file conflict in Fedora Rawhide ([#2520](https://github.com/yandex/ClickHouse/issues/2520)).
|
||||
|
||||
## Backward incompatible changes:
|
||||
### Backward incompatible changes:
|
||||
|
||||
* Removed escaping in `Vertical` and `Pretty*` formats and deleted the `VerticalRaw` format.
|
||||
* If servers with version 1.1.54388 (or newer) and servers with older version are used simultaneously in distributed query and the query has `cast(x, 'Type')` expression in the form without `AS` keyword and with `cast` not in uppercase, then the exception with message like `Not found column cast(0, 'UInt8') in block` will be thrown. Solution: update server on all cluster nodes.
|
||||
|
||||
# ClickHouse release 1.1.54385, 2018-06-01
|
||||
## ClickHouse release 1.1.54385, 2018-06-01
|
||||
|
||||
## Bug fixes:
|
||||
### Bug fixes:
|
||||
* Fixed an error that in some cases caused ZooKeeper operations to block.
|
||||
|
||||
# ClickHouse release 1.1.54383, 2018-05-22
|
||||
## ClickHouse release 1.1.54383, 2018-05-22
|
||||
|
||||
## Bug fixes:
|
||||
### Bug fixes:
|
||||
* Fixed a slowdown of replication queue if a table has many replicas.
|
||||
|
||||
# ClickHouse release 1.1.54381, 2018-05-14
|
||||
## ClickHouse release 1.1.54381, 2018-05-14
|
||||
|
||||
## Bug fixes:
|
||||
### Bug fixes:
|
||||
* Fixed a nodes leak in ZooKeeper when ClickHouse loses connection to ZooKeeper server.
|
||||
|
||||
# ClickHouse release 1.1.54380, 2018-04-21
|
||||
## ClickHouse release 1.1.54380, 2018-04-21
|
||||
|
||||
## New features:
|
||||
### New features:
|
||||
* Added table function `file(path, format, structure)`. An example reading bytes from `/dev/urandom`: `ln -s /dev/urandom /var/lib/clickhouse/user_files/random` `clickhouse-client -q "SELECT * FROM file('random', 'RowBinary', 'd UInt8') LIMIT 10"`.
|
||||
|
||||
## Improvements:
|
||||
### Improvements:
|
||||
* Subqueries could be wrapped by `()` braces (to enhance queries readability). For example, `(SELECT 1) UNION ALL (SELECT 1)`.
|
||||
* Simple `SELECT` queries from table `system.processes` are not counted in `max_concurrent_queries` limit.
|
||||
|
||||
## Bug fixes:
|
||||
### Bug fixes:
|
||||
* Fixed incorrect behaviour of `IN` operator when select from `MATERIALIZED VIEW`.
|
||||
* Fixed incorrect filtering by partition index in expressions like `WHERE partition_key_column IN (...)`
|
||||
* Fixed inability to execute `OPTIMIZE` query on non-leader replica if the table was `REANAME`d.
|
||||
@ -106,11 +106,11 @@
|
||||
* Fixed freezing of `KILL QUERY` queries.
|
||||
* Fixed an error in ZooKeeper client library which led to watches loses, freezing of distributed DDL queue and slowing replication queue if non-empty `chroot` prefix is used in ZooKeeper configuration.
|
||||
|
||||
## Backward incompatible changes:
|
||||
### Backward incompatible changes:
|
||||
* Removed support of expressions like `(a, b) IN (SELECT (a, b))` (instead of them you can use their equivalent `(a, b) IN (SELECT a, b)`). In previous releases, these expressions led to undetermined data filtering or caused errors.
|
||||
|
||||
# ClickHouse release 1.1.54378, 2018-04-16
|
||||
## New features:
|
||||
## ClickHouse release 1.1.54378, 2018-04-16
|
||||
### New features:
|
||||
|
||||
* Logging level can be changed without restarting the server.
|
||||
* Added the `SHOW CREATE DATABASE` query.
|
||||
@ -124,7 +124,7 @@
|
||||
* Multiple comma-separated `topics` can be specified for the `Kafka` engine (Tobias Adamson).
|
||||
* When a query is stopped by `KILL QUERY` or `replace_running_query`, the client receives the `Query was cancelled` exception instead of an incomplete response.
|
||||
|
||||
## Improvements:
|
||||
### Improvements:
|
||||
|
||||
* `ALTER TABLE ... DROP/DETACH PARTITION` queries are run at the front of the replication queue.
|
||||
* `SELECT ... FINAL` and `OPTIMIZE ... FINAL` can be used even when the table has a single data part.
|
||||
@ -135,7 +135,7 @@
|
||||
* More robust crash recovery for asynchronous insertion into `Distributed` tables.
|
||||
* The return type of the `countEqual` function changed from `UInt32` to `UInt64` (谢磊).
|
||||
|
||||
## Bug fixes:
|
||||
### Bug fixes:
|
||||
|
||||
* Fixed an error with `IN` when the left side of the expression is `Nullable`.
|
||||
* Correct results are now returned when using tuples with `IN` when some of the tuple components are in the table index.
|
||||
@ -151,31 +151,31 @@
|
||||
* `SummingMergeTree` now works correctly for summation of nested data structures with a composite key.
|
||||
* Fixed the possibility of a race condition when choosing the leader for `ReplicatedMergeTree` tables.
|
||||
|
||||
## Build changes:
|
||||
### Build changes:
|
||||
|
||||
* The build supports `ninja` instead of `make` and uses it by default for building releases.
|
||||
* Renamed packages: `clickhouse-server-base` is now `clickhouse-common-static`; `clickhouse-server-common` is now `clickhouse-server`; `clickhouse-common-dbg` is now `clickhouse-common-static-dbg`. To install, use `clickhouse-server clickhouse-client`. Packages with the old names will still load in the repositories for backward compatibility.
|
||||
|
||||
## Backward-incompatible changes:
|
||||
### Backward-incompatible changes:
|
||||
|
||||
* Removed the special interpretation of an IN expression if an array is specified on the left side. Previously, the expression `arr IN (set)` was interpreted as "at least one `arr` element belongs to the `set`". To get the same behavior in the new version, write `arrayExists(x -> x IN (set), arr)`.
|
||||
* Disabled the incorrect use of the socket option `SO_REUSEPORT`, which was incorrectly enabled by default in the Poco library. Note that on Linux there is no longer any reason to simultaneously specify the addresses `::` and `0.0.0.0` for listen – use just `::`, which allows listening to the connection both over IPv4 and IPv6 (with the default kernel config settings). You can also revert to the behavior from previous versions by specifying `<listen_reuse_port>1</listen_reuse_port>` in the config.
|
||||
|
||||
|
||||
# ClickHouse release 1.1.54370, 2018-03-16
|
||||
## ClickHouse release 1.1.54370, 2018-03-16
|
||||
|
||||
## New features:
|
||||
### New features:
|
||||
|
||||
* Added the `system.macros` table and auto updating of macros when the config file is changed.
|
||||
* Added the `SYSTEM RELOAD CONFIG` query.
|
||||
* Added the `maxIntersections(left_col, right_col)` aggregate function, which returns the maximum number of simultaneously intersecting intervals `[left; right]`. The `maxIntersectionsPosition(left, right)` function returns the beginning of the "maximum" interval. ([Michael Furmur](https://github.com/yandex/ClickHouse/pull/2012)).
|
||||
|
||||
## Improvements:
|
||||
### Improvements:
|
||||
|
||||
* When inserting data in a `Replicated` table, fewer requests are made to `ZooKeeper` (and most of the user-level errors have disappeared from the `ZooKeeper` log).
|
||||
* Added the ability to create aliases for sets. Example: `WITH (1, 2, 3) AS set SELECT number IN set FROM system.numbers LIMIT 10`.
|
||||
|
||||
## Bug fixes:
|
||||
### Bug fixes:
|
||||
|
||||
* Fixed the `Illegal PREWHERE` error when reading from `Merge` tables over `Distributed` tables.
|
||||
* Added fixes that allow you to run `clickhouse-server` in IPv4-only Docker containers.
|
||||
@ -189,9 +189,9 @@
|
||||
* Restored the behavior for queries like `SELECT * FROM remote('server2', default.table) WHERE col IN (SELECT col2 FROM default.table)` when the right side argument of the `IN` should use a remote `default.table` instead of a local one. This behavior was broken in version 1.1.54358.
|
||||
* Removed extraneous error-level logging of `Not found column ... in block`.
|
||||
|
||||
# ClickHouse release 1.1.54356, 2018-03-06
|
||||
## ClickHouse release 1.1.54356, 2018-03-06
|
||||
|
||||
## New features:
|
||||
### New features:
|
||||
|
||||
* Aggregation without `GROUP BY` for an empty set (such as `SELECT count(*) FROM table WHERE 0`) now returns a result with one row with null values for aggregate functions, in compliance with the SQL standard. To restore the old behavior (return an empty result), set `empty_result_for_aggregation_by_empty_set` to 1.
|
||||
* Added type conversion for `UNION ALL`. Different alias names are allowed in `SELECT` positions in `UNION ALL`, in compliance with the SQL standard.
|
||||
@ -226,7 +226,7 @@
|
||||
* `RENAME TABLE` can be performed for `VIEW`.
|
||||
* Added the `odbc_default_field_size` option, which allows you to extend the maximum size of the value loaded from an ODBC source (by default, it is 1024).
|
||||
|
||||
## Improvements:
|
||||
### Improvements:
|
||||
|
||||
* Limits and quotas on the result are no longer applied to intermediate data for `INSERT SELECT` queries or for `SELECT` subqueries.
|
||||
* Fewer false triggers of `force_restore_data` when checking the status of `Replicated` tables when the server starts.
|
||||
@ -242,7 +242,7 @@
|
||||
* `Enum` values can be used in `min`, `max`, `sum` and some other functions. In these cases, it uses the corresponding numeric values. This feature was previously available but was lost in the release 1.1.54337.
|
||||
* Added `max_expanded_ast_elements` to restrict the size of the AST after recursively expanding aliases.
|
||||
|
||||
## Bug fixes:
|
||||
### Bug fixes:
|
||||
|
||||
* Fixed cases when unnecessary columns were removed from subqueries in error, or not removed from subqueries containing `UNION ALL`.
|
||||
* Fixed a bug in merges for `ReplacingMergeTree` tables.
|
||||
@ -268,18 +268,18 @@
|
||||
* Fixed a crash when passing arrays of different sizes to an `arrayReduce` function when using aggregate functions from multiple arguments.
|
||||
* Prohibited the use of queries with `UNION ALL` in a `MATERIALIZED VIEW`.
|
||||
|
||||
## Backward incompatible changes:
|
||||
### Backward incompatible changes:
|
||||
|
||||
* Removed the `distributed_ddl_allow_replicated_alter` option. This behavior is enabled by default.
|
||||
* Removed the `UnsortedMergeTree` engine.
|
||||
|
||||
# ClickHouse release 1.1.54343, 2018-02-05
|
||||
## ClickHouse release 1.1.54343, 2018-02-05
|
||||
|
||||
* Added macros support for defining cluster names in distributed DDL queries and constructors of Distributed tables: `CREATE TABLE distr ON CLUSTER '{cluster}' (...) ENGINE = Distributed('{cluster}', 'db', 'table')`.
|
||||
* Now the table index is used for conditions like `expr IN (subquery)`.
|
||||
* Improved processing of duplicates when inserting to Replicated tables, so they no longer slow down execution of the replication queue.
|
||||
|
||||
# ClickHouse release 1.1.54342, 2018-01-22
|
||||
## ClickHouse release 1.1.54342, 2018-01-22
|
||||
|
||||
This release contains bug fixes for the previous release 1.1.54337:
|
||||
* Fixed a regression in 1.1.54337: if the default user has readonly access, then the server refuses to start up with the message `Cannot create database in readonly mode`.
|
||||
@ -290,9 +290,9 @@ This release contains bug fixes for the previous release 1.1.54337:
|
||||
* Buffer tables now work correctly when MATERIALIZED columns are present in the destination table (by zhang2014).
|
||||
* Fixed a bug in implementation of NULL.
|
||||
|
||||
# ClickHouse release 1.1.54337, 2018-01-18
|
||||
## ClickHouse release 1.1.54337, 2018-01-18
|
||||
|
||||
## New features:
|
||||
### New features:
|
||||
|
||||
* Added support for storage of multidimensional arrays and tuples (`Tuple` data type) in tables.
|
||||
* Added support for table functions in `DESCRIBE` and `INSERT` queries. Added support for subqueries in `DESCRIBE`. Examples: `DESC TABLE remote('host', default.hits)`; `DESC TABLE (SELECT 1)`; `INSERT INTO TABLE FUNCTION remote('host', default.hits)`. Support for `INSERT INTO TABLE` syntax in addition to `INSERT INTO`.
|
||||
@ -323,7 +323,7 @@ This release contains bug fixes for the previous release 1.1.54337:
|
||||
* Added the `--silent` option for the `clickhouse-local` tool. It suppresses printing query execution info in stderr.
|
||||
* Added support for reading values of type `Date` from text in a format where the month and/or day of the month is specified using a single digit instead of two digits (Amos Bird).
|
||||
|
||||
## Performance optimizations:
|
||||
### Performance optimizations:
|
||||
|
||||
* Improved performance of `min`, `max`, `any`, `anyLast`, `anyHeavy`, `argMin`, `argMax` aggregate functions for String arguments.
|
||||
* Improved performance of `isInfinite`, `isFinite`, `isNaN`, `roundToExp2` functions.
|
||||
@ -332,7 +332,7 @@ This release contains bug fixes for the previous release 1.1.54337:
|
||||
* Lowered memory usage for `JOIN` in the case when the left and right parts have columns with identical names that are not contained in `USING`.
|
||||
* Improved performance of `varSamp`, `varPop`, `stddevSamp`, `stddevPop`, `covarSamp`, `covarPop`, and `corr` aggregate functions by reducing computational stability. The old functions are available under the names: `varSampStable`, `varPopStable`, `stddevSampStable`, `stddevPopStable`, `covarSampStable`, `covarPopStable`, `corrStable`.
|
||||
|
||||
## Bug fixes:
|
||||
### Bug fixes:
|
||||
|
||||
* Fixed data deduplication after running a `DROP PARTITION` query. In the previous version, dropping a partition and INSERTing the same data again was not working because INSERTed blocks were considered duplicates.
|
||||
* Fixed a bug that could lead to incorrect interpretation of the `WHERE` clause for `CREATE MATERIALIZED VIEW` queries with `POPULATE`.
|
||||
@ -371,7 +371,7 @@ This release contains bug fixes for the previous release 1.1.54337:
|
||||
* Fixed the `SYSTEM DROP DNS CACHE` query: the cache was flushed but addresses of cluster nodes were not updated.
|
||||
* Fixed the behavior of `MATERIALIZED VIEW` after executing `DETACH TABLE` for the table under the view (Marek Vavruša).
|
||||
|
||||
## Build improvements:
|
||||
### Build improvements:
|
||||
|
||||
* Builds use `pbuilder`. The build process is almost completely independent of the build host environment.
|
||||
* A single build is used for different OS versions. Packages and binaries have been made compatible with a wide range of Linux systems.
|
||||
@ -385,7 +385,7 @@ This release contains bug fixes for the previous release 1.1.54337:
|
||||
* Removed usage of GNU extensions from the code. Enabled the `-Wextra` option. When building with `clang`, `libc++` is used instead of `libstdc++`.
|
||||
* Extracted `clickhouse_parsers` and `clickhouse_common_io` libraries to speed up builds of various tools.
|
||||
|
||||
## Backward incompatible changes:
|
||||
### Backward incompatible changes:
|
||||
|
||||
* The format for marks in `Log` type tables that contain `Nullable` columns was changed in a backward incompatible way. If you have these tables, you should convert them to the `TinyLog` type before starting up the new server version. To do this, replace `ENGINE = Log` with `ENGINE = TinyLog` in the corresponding `.sql` file in the `metadata` directory. If your table doesn't have `Nullable` columns or if the type of your table is not `Log`, then you don't need to do anything.
|
||||
* Removed the `experimental_allow_extended_storage_definition_syntax` setting. Now this feature is enabled by default.
|
||||
@ -396,16 +396,16 @@ This release contains bug fixes for the previous release 1.1.54337:
|
||||
* In previous server versions there was an undocumented feature: if an aggregate function depends on parameters, you can still specify it without parameters in the AggregateFunction data type. Example: `AggregateFunction(quantiles, UInt64)` instead of `AggregateFunction(quantiles(0.5, 0.9), UInt64)`. This feature was lost. Although it was undocumented, we plan to support it again in future releases.
|
||||
* Enum data types cannot be used in min/max aggregate functions. The possibility will be returned back in future release.
|
||||
|
||||
## Please note when upgrading:
|
||||
### Please note when upgrading:
|
||||
* When doing a rolling update on a cluster, at the point when some of the replicas are running the old version of ClickHouse and some are running the new version, replication is temporarily stopped and the message `unknown parameter 'shard'` appears in the log. Replication will continue after all replicas of the cluster are updated.
|
||||
* If you have different ClickHouse versions on the cluster, you can get incorrect results for distributed queries with the aggregate functions `varSamp`, `varPop`, `stddevSamp`, `stddevPop`, `covarSamp`, `covarPop`, and `corr`. You should update all cluster nodes.
|
||||
|
||||
# ClickHouse release 1.1.54327, 2017-12-21
|
||||
## ClickHouse release 1.1.54327, 2017-12-21
|
||||
|
||||
This release contains bug fixes for the previous release 1.1.54318:
|
||||
* Fixed bug with possible race condition in replication that could lead to data loss. This issue affects versions 1.1.54310 and 1.1.54318. If you use one of these versions with Replicated tables, the update is strongly recommended. This issue shows in logs in Warning messages like `Part ... from own log doesn't exist.` The issue is relevant even if you don't see these messages in logs.
|
||||
|
||||
# ClickHouse release 1.1.54318, 2017-11-30
|
||||
## ClickHouse release 1.1.54318, 2017-11-30
|
||||
|
||||
This release contains bug fixes for the previous release 1.1.54310:
|
||||
* Fixed incorrect row deletions during merges in the SummingMergeTree engine
|
||||
@ -414,9 +414,9 @@ This release contains bug fixes for the previous release 1.1.54310:
|
||||
* Fixed an issue that was causing the replication queue to stop running
|
||||
* Fixed rotation and archiving of server logs
|
||||
|
||||
# ClickHouse release 1.1.54310, 2017-11-01
|
||||
## ClickHouse release 1.1.54310, 2017-11-01
|
||||
|
||||
## New features:
|
||||
### New features:
|
||||
* Custom partitioning key for the MergeTree family of table engines.
|
||||
* [Kafka](https://clickhouse.yandex/docs/en/single/index.html#document-table_engines/kafka) table engine.
|
||||
* Added support for loading [CatBoost](https://catboost.yandex/) models and applying them to data stored in ClickHouse.
|
||||
@ -432,12 +432,12 @@ This release contains bug fixes for the previous release 1.1.54310:
|
||||
* Added support for the Cap'n Proto input format.
|
||||
* You can now customize compression level when using the zstd algorithm.
|
||||
|
||||
## Backward incompatible changes:
|
||||
### Backward incompatible changes:
|
||||
* Creation of temporary tables with an engine other than Memory is forbidden.
|
||||
* Explicit creation of tables with the View or MaterializedView engine is forbidden.
|
||||
* During table creation, a new check verifies that the sampling key expression is included in the primary key.
|
||||
|
||||
## Bug fixes:
|
||||
### Bug fixes:
|
||||
* Fixed hangups when synchronously inserting into a Distributed table.
|
||||
* Fixed nonatomic adding and removing of parts in Replicated tables.
|
||||
* Data inserted into a materialized view is not subjected to unnecessary deduplication.
|
||||
@ -447,15 +447,15 @@ This release contains bug fixes for the previous release 1.1.54310:
|
||||
* Fixed hangups when the disk volume containing server logs is full.
|
||||
* Fixed an overflow in the `toRelativeWeekNum` function for the first week of the Unix epoch.
|
||||
|
||||
## Build improvements:
|
||||
### Build improvements:
|
||||
* Several third-party libraries (notably Poco) were updated and converted to git submodules.
|
||||
|
||||
# ClickHouse release 1.1.54304, 2017-10-19
|
||||
## ClickHouse release 1.1.54304, 2017-10-19
|
||||
|
||||
## New features:
|
||||
### New features:
|
||||
* TLS support in the native protocol (to enable, set `tcp_ssl_port` in `config.xml`)
|
||||
|
||||
## Bug fixes:
|
||||
### Bug fixes:
|
||||
* `ALTER` for replicated tables now tries to start running as soon as possible
|
||||
* Fixed crashing when reading data with the setting `preferred_block_size_bytes=0`
|
||||
* Fixed crashes of `clickhouse-client` when `Page Down` is pressed
|
||||
@ -468,16 +468,16 @@ This release contains bug fixes for the previous release 1.1.54310:
|
||||
* Users are updated correctly when `users.xml` is invalid
|
||||
* Correct handling when an executable dictionary returns a non-zero response code
|
||||
|
||||
# ClickHouse release 1.1.54292, 2017-09-20
|
||||
## ClickHouse release 1.1.54292, 2017-09-20
|
||||
|
||||
## New features:
|
||||
### New features:
|
||||
* Added the `pointInPolygon` function for working with coordinates on a coordinate plane.
|
||||
* Added the `sumMap` aggregate function for calculating the sum of arrays, similar to `SummingMergeTree`.
|
||||
* Added the `trunc` function. Improved performance of the rounding functions (`round`, `floor`, `ceil`, `roundToExp2`) and corrected the logic of how they work. Changed the logic of the `roundToExp2` function for fractions and negative numbers.
|
||||
* The ClickHouse executable file is now less dependent on the libc version. The same ClickHouse executable file can run on a wide variety of Linux systems. Note: There is still a dependency when using compiled queries (with the setting `compile = 1`, which is not used by default).
|
||||
* Reduced the time needed for dynamic compilation of queries.
|
||||
|
||||
## Bug fixes:
|
||||
### Bug fixes:
|
||||
* Fixed an error that sometimes produced `part ... intersects previous part` messages and weakened replica consistency.
|
||||
* Fixed an error that caused the server to lock up if ZooKeeper was unavailable during shutdown.
|
||||
* Removed excessive logging when restoring replicas.
|
||||
@ -485,9 +485,9 @@ This release contains bug fixes for the previous release 1.1.54310:
|
||||
* Fixed an error in the concat function that occurred if the first column in a block has the Array type.
|
||||
* Progress is now displayed correctly in the system.merges table.
|
||||
|
||||
# ClickHouse release 1.1.54289, 2017-09-13
|
||||
## ClickHouse release 1.1.54289, 2017-09-13
|
||||
|
||||
## New features:
|
||||
### New features:
|
||||
* `SYSTEM` queries for server administration: `SYSTEM RELOAD DICTIONARY`, `SYSTEM RELOAD DICTIONARIES`, `SYSTEM DROP DNS CACHE`, `SYSTEM SHUTDOWN`, `SYSTEM KILL`.
|
||||
* Added functions for working with arrays: `concat`, `arraySlice`, `arrayPushBack`, `arrayPushFront`, `arrayPopBack`, `arrayPopFront`.
|
||||
* Added the `root` and `identity` parameters for the ZooKeeper configuration. This allows you to isolate individual users on the same ZooKeeper cluster.
|
||||
@ -502,7 +502,7 @@ This release contains bug fixes for the previous release 1.1.54310:
|
||||
* Option to set `umask` in the config file.
|
||||
* Improved performance for queries with `DISTINCT`.
|
||||
|
||||
## Bug fixes:
|
||||
### Bug fixes:
|
||||
* Improved the process for deleting old nodes in ZooKeeper. Previously, old nodes sometimes didn't get deleted if there were very frequent inserts, which caused the server to be slow to shut down, among other things.
|
||||
* Fixed randomization when choosing hosts for the connection to ZooKeeper.
|
||||
* Fixed the exclusion of lagging replicas in distributed queries if the replica is localhost.
|
||||
@ -515,28 +515,28 @@ This release contains bug fixes for the previous release 1.1.54310:
|
||||
* Resolved the appearance of zombie processes when using a dictionary with an `executable` source.
|
||||
* Fixed segfault for the HEAD query.
|
||||
|
||||
## Improvements to development workflow and ClickHouse build:
|
||||
### Improvements to development workflow and ClickHouse build:
|
||||
* You can use `pbuilder` to build ClickHouse.
|
||||
* You can use `libc++` instead of `libstdc++` for builds on Linux.
|
||||
* Added instructions for using static code analysis tools: `Coverity`, `clang-tidy`, and `cppcheck`.
|
||||
|
||||
## Please note when upgrading:
|
||||
### Please note when upgrading:
|
||||
* There is now a higher default value for the MergeTree setting `max_bytes_to_merge_at_max_space_in_pool` (the maximum total size of data parts to merge, in bytes): it has increased from 100 GiB to 150 GiB. This might result in large merges running after the server upgrade, which could cause an increased load on the disk subsystem. If the free space available on the server is less than twice the total amount of the merges that are running, this will cause all other merges to stop running, including merges of small data parts. As a result, INSERT requests will fail with the message "Merges are processing significantly slower than inserts." Use the `SELECT * FROM system.merges` request to monitor the situation. You can also check the `DiskSpaceReservedForMerge` metric in the `system.metrics` table, or in Graphite. You don't need to do anything to fix this, since the issue will resolve itself once the large merges finish. If you find this unacceptable, you can restore the previous value for the `max_bytes_to_merge_at_max_space_in_pool` setting (to do this, go to the `<merge_tree>` section in config.xml, set `<max_bytes_to_merge_at_max_space_in_pool>107374182400</max_bytes_to_merge_at_max_space_in_pool>` and restart the server).
|
||||
|
||||
# ClickHouse release 1.1.54284, 2017-08-29
|
||||
## ClickHouse release 1.1.54284, 2017-08-29
|
||||
|
||||
* This is bugfix release for previous 1.1.54282 release. It fixes ZooKeeper nodes leak in `parts/` directory.
|
||||
|
||||
# ClickHouse release 1.1.54282, 2017-08-23
|
||||
## ClickHouse release 1.1.54282, 2017-08-23
|
||||
|
||||
This is a bugfix release. The following bugs were fixed:
|
||||
* `DB::Exception: Assertion violation: !_path.empty()` error when inserting into a Distributed table.
|
||||
* Error when parsing inserted data in RowBinary format if the data begins with ';' character.
|
||||
* Errors during runtime compilation of certain aggregate functions (e.g. `groupArray()`).
|
||||
|
||||
# ClickHouse release 1.1.54276, 2017-08-16
|
||||
## ClickHouse release 1.1.54276, 2017-08-16
|
||||
|
||||
## New features:
|
||||
### New features:
|
||||
|
||||
* You can use an optional WITH clause in a SELECT query. Example query: `WITH 1+1 AS a SELECT a, a*a`
|
||||
* INSERT can be performed synchronously in a Distributed table: OK is returned only after all the data is saved on all the shards. This is activated by the setting insert_distributed_sync=1.
|
||||
@ -547,7 +547,7 @@ This is a bugfix release. The following bugs were fixed:
|
||||
* Added support for non-constant arguments and negative offsets in the function `substring(str, pos, len).`
|
||||
* Added the max_size parameter for the `groupArray(max_size)(column)` aggregate function, and optimized its performance.
|
||||
|
||||
## Major changes:
|
||||
### Major changes:
|
||||
|
||||
* Improved security: all server files are created with 0640 permissions (can be changed via <umask> config parameter).
|
||||
* Improved error messages for queries with invalid syntax.
|
||||
@ -555,11 +555,11 @@ This is a bugfix release. The following bugs were fixed:
|
||||
* Significantly increased the performance of data merges for the ReplacingMergeTree engine.
|
||||
* Improved performance for asynchronous inserts from a Distributed table by batching multiple source inserts. To enable this functionality, use the setting distributed_directory_monitor_batch_inserts=1.
|
||||
|
||||
## Backward incompatible changes:
|
||||
### Backward incompatible changes:
|
||||
|
||||
* Changed the binary format of aggregate states of `groupArray(array_column)` functions for arrays.
|
||||
|
||||
## Complete list of changes:
|
||||
### Complete list of changes:
|
||||
|
||||
* Added the `output_format_json_quote_denormals` setting, which enables outputting nan and inf values in JSON format.
|
||||
* Optimized thread allocation when reading from a Distributed table.
|
||||
@ -578,7 +578,7 @@ This is a bugfix release. The following bugs were fixed:
|
||||
* It is possible to connect to MySQL through a socket in the file system.
|
||||
* The `system.parts` table has a new column with information about the size of marks, in bytes.
|
||||
|
||||
## Bug fixes:
|
||||
### Bug fixes:
|
||||
|
||||
* Distributed tables using a Merge table now work correctly for a SELECT query with a condition on the _table field.
|
||||
* Fixed a rare race condition in ReplicatedMergeTree when checking data parts.
|
||||
@ -602,15 +602,15 @@ This is a bugfix release. The following bugs were fixed:
|
||||
* Fixed the "Cannot mremap" error when using arrays in IN and JOIN clauses with more than 2 billion elements.
|
||||
* Fixed the failover for dictionaries with MySQL as the source.
|
||||
|
||||
## Improved workflow for developing and assembling ClickHouse:
|
||||
### Improved workflow for developing and assembling ClickHouse:
|
||||
|
||||
* Builds can be assembled in Arcadia.
|
||||
* You can use gcc 7 to compile ClickHouse.
|
||||
* Parallel builds using ccache+distcc are faster now.
|
||||
|
||||
# ClickHouse release 1.1.54245, 2017-07-04
|
||||
## ClickHouse release 1.1.54245, 2017-07-04
|
||||
|
||||
## New features:
|
||||
### New features:
|
||||
|
||||
* Distributed DDL (for example, `CREATE TABLE ON CLUSTER`).
|
||||
* The replicated request `ALTER TABLE CLEAR COLUMN IN PARTITION.`
|
||||
@ -622,16 +622,16 @@ This is a bugfix release. The following bugs were fixed:
|
||||
* Sessions in the HTTP interface.
|
||||
* The OPTIMIZE query for a Replicated table can can run not only on the leader.
|
||||
|
||||
## Backward incompatible changes:
|
||||
### Backward incompatible changes:
|
||||
|
||||
* Removed SET GLOBAL.
|
||||
|
||||
## Minor changes:
|
||||
### Minor changes:
|
||||
|
||||
* If an alert is triggered, the full stack trace is printed into the log.
|
||||
* Relaxed the verification of the number of damaged or extra data parts at startup (there were too many false positives).
|
||||
|
||||
## Bug fixes:
|
||||
### Bug fixes:
|
||||
|
||||
* Fixed a bad connection "sticking" when inserting into a Distributed table.
|
||||
* GLOBAL IN now works for a query from a Merge table that looks at a Distributed table.
|
||||
|
148
CHANGELOG_RU.md
148
CHANGELOG_RU.md
@ -1,6 +1,6 @@
|
||||
# ClickHouse release 1.1.54388, 2018-06-28
|
||||
## ClickHouse release 1.1.54388, 2018-06-28
|
||||
|
||||
## Новые возможности:
|
||||
### Новые возможности:
|
||||
* Добавлена поддержка запроса `ALTER TABLE t DELETE WHERE` для реплицированных таблиц и таблица `system.mutations`.
|
||||
* Добавлена поддержка запроса `ALTER TABLE t [REPLACE|ATTACH] PARTITION` для *MergeTree-таблиц.
|
||||
* Добавлена поддержка запроса `TRUNCATE TABLE` ([Winter Zhang](https://github.com/yandex/ClickHouse/pull/2260))
|
||||
@ -17,11 +17,11 @@
|
||||
* Добавлена настройка `date_time_input_format`. Если переключить эту настройку в значение `'best_effort'`, значения DateTime будут читаться в широком диапазоне форматов.
|
||||
* Добавлена утилита `clickhouse-obfuscator` для обфускации данных. Пример использования: публикация данных, используемых в тестах производительности.
|
||||
|
||||
## Экспериментальные возможности:
|
||||
### Экспериментальные возможности:
|
||||
* Добавлена возможность вычислять аргументы функции `and` только там, где они нужны ([Анастасия Царькова](https://github.com/yandex/ClickHouse/pull/2272))
|
||||
* Добавлена возможность JIT-компиляции в нативный код некоторых выражений ([pyos](https://github.com/yandex/ClickHouse/pull/2277)).
|
||||
|
||||
## Исправление ошибок:
|
||||
### Исправление ошибок:
|
||||
* Исправлено появление дублей в запросе с `DISTINCT` и `ORDER BY`.
|
||||
* Запросы с `ARRAY JOIN` и `arrayFilter` раньше возвращали некорректный результат.
|
||||
* Исправлена ошибка при чтении столбца-массива из Nested-структуры ([#2066](https://github.com/yandex/ClickHouse/issues/2066)).
|
||||
@ -42,7 +42,7 @@
|
||||
* Исправлена SSRF в табличной функции remote().
|
||||
* Исправлен выход из `clickhouse-client` в multiline-режиме ([#2510](https://github.com/yandex/ClickHouse/issues/2510)).
|
||||
|
||||
## Улучшения:
|
||||
### Улучшения:
|
||||
* Фоновые задачи в реплицированных таблицах теперь выполняются не в отдельных потоках, а в пуле потоков ([Silviu Caragea](https://github.com/yandex/ClickHouse/pull/1722))
|
||||
* Улучшена производительность разжатия LZ4.
|
||||
* Ускорен анализ запроса с большим числом JOIN-ов и подзапросов.
|
||||
@ -54,7 +54,7 @@
|
||||
* При расчёте количества доступных ядер CPU теперь учитываются ограничения cgroups ([Atri Sharma](https://github.com/yandex/ClickHouse/pull/2325)).
|
||||
* Добавлен chown директорий конфигов в конфигурационном файле systemd ([Михаил Ширяев](https://github.com/yandex/ClickHouse/pull/2421)).
|
||||
|
||||
## Изменения сборки:
|
||||
### Изменения сборки:
|
||||
* Добавлена возможность сборки компилятором gcc8.
|
||||
* Добавлена возможность сборки llvm из submodule.
|
||||
* Используемая версия библиотеки librdkafka обновлена до v0.11.4.
|
||||
@ -64,34 +64,34 @@
|
||||
* Добавлена возможность использования библиотеки libtinfo вместо libtermcap ([Георгий Кондратьев](https://github.com/yandex/ClickHouse/pull/2519)).
|
||||
* Исправлен конфликт заголовочных файлов в Fedora Rawhide ([#2520](https://github.com/yandex/ClickHouse/issues/2520)).
|
||||
|
||||
## Обратно несовместимые изменения:
|
||||
### Обратно несовместимые изменения:
|
||||
* Убран escaping в форматах `Vertical` и `Pretty*`, удалён формат `VerticalRaw`.
|
||||
* Если в распределённых запросах одновременно участвуют серверы версии 1.1.54388 или новее и более старые, то при использовании выражения `cast(x, 'Type')`, записанного без указания `AS`, если слово `cast` указано не в верхнем регистре, возникает ошибка вида `Not found column cast(0, 'UInt8') in block`. Решение: обновить сервер на всём кластере.
|
||||
|
||||
|
||||
# ClickHouse release 1.1.54385, 2018-06-01
|
||||
## Исправление ошибок:
|
||||
## ClickHouse release 1.1.54385, 2018-06-01
|
||||
### Исправление ошибок:
|
||||
* Исправлена ошибка, которая в некоторых случаях приводила к блокировке операций с ZooKeeper.
|
||||
|
||||
# ClickHouse release 1.1.54383, 2018-05-22
|
||||
## Исправление ошибок:
|
||||
## ClickHouse release 1.1.54383, 2018-05-22
|
||||
### Исправление ошибок:
|
||||
* Исправлена деградация скорости выполнения очереди репликации при большом количестве реплик
|
||||
|
||||
# ClickHouse release 1.1.54381, 2018-05-14
|
||||
## ClickHouse release 1.1.54381, 2018-05-14
|
||||
|
||||
## Исправление ошибок:
|
||||
### Исправление ошибок:
|
||||
* Исправлена ошибка, приводящая к "утеканию" метаданных в ZooKeeper при потере соединения с сервером ZooKeeper.
|
||||
|
||||
# ClickHouse release 1.1.54380, 2018-04-21
|
||||
## ClickHouse release 1.1.54380, 2018-04-21
|
||||
|
||||
## Новые возможности:
|
||||
### Новые возможности:
|
||||
* Добавлена табличная функция `file(path, format, structure)`. Пример, читающий байты из `/dev/urandom`: `ln -s /dev/urandom /var/lib/clickhouse/user_files/random` `clickhouse-client -q "SELECT * FROM file('random', 'RowBinary', 'd UInt8') LIMIT 10"`.
|
||||
|
||||
## Улучшения:
|
||||
### Улучшения:
|
||||
* Добавлена возможность оборачивать подзапросы скобками `()` для повышения читаемости запросов. Например: `(SELECT 1) UNION ALL (SELECT 1)`.
|
||||
* Простые запросы `SELECT` из таблицы `system.processes` не учитываются в ограничении `max_concurrent_queries`.
|
||||
|
||||
## Исправление ошибок:
|
||||
### Исправление ошибок:
|
||||
* Исправлена неправильная работа оператора `IN` в `MATERIALIZED VIEW`.
|
||||
* Исправлена неправильная работа индекса по ключу партиционирования в выражениях типа `partition_key_column IN (...)`.
|
||||
* Исправлена невозможность выполнить `OPTIMIZE` запрос на лидирующей реплике после выполнения `RENAME` таблицы.
|
||||
@ -99,13 +99,13 @@
|
||||
* Исправлены зависания запросов `KILL QUERY`.
|
||||
* Исправлена ошибка в клиентской библиотеке ZooKeeper, которая при использовании непустого префикса `chroot` в конфигурации приводила к потере watch'ей, остановке очереди distributed DDL запросов и замедлению репликации.
|
||||
|
||||
## Обратно несовместимые изменения:
|
||||
### Обратно несовместимые изменения:
|
||||
* Убрана поддержка выражений типа `(a, b) IN (SELECT (a, b))` (можно использовать эквивалентные выражение `(a, b) IN (SELECT a, b)`). Раньше такие запросы могли приводить к недетерминированной фильтрации в `WHERE`.
|
||||
|
||||
|
||||
# ClickHouse release 1.1.54378, 2018-04-16
|
||||
## ClickHouse release 1.1.54378, 2018-04-16
|
||||
|
||||
## Новые возможности:
|
||||
### Новые возможности:
|
||||
|
||||
* Возможность изменения уровня логгирования без перезагрузки сервера.
|
||||
* Добавлен запрос `SHOW CREATE DATABASE`.
|
||||
@ -119,7 +119,7 @@
|
||||
* Возможность указания нескольких `topics` через запятую для движка `Kafka` (Tobias Adamson)
|
||||
* При остановке запроса по причине `KILL QUERY` или `replace_running_query`, клиент получает исключение `Query was cancelled` вместо неполного результата.
|
||||
|
||||
## Улучшения:
|
||||
### Улучшения:
|
||||
|
||||
* Запросы вида `ALTER TABLE ... DROP/DETACH PARTITION` выполняются впереди очереди репликации.
|
||||
* Возможность использовать `SELECT ... FINAL` и `OPTIMIZE ... FINAL` даже в случае, если данные в таблице представлены одним куском.
|
||||
@ -130,7 +130,7 @@
|
||||
* Более надёжное восстановление после сбоев при асинхронной вставке в `Distributed` таблицы.
|
||||
* Возвращаемый тип функции `countEqual` изменён с `UInt32` на `UInt64` (谢磊)
|
||||
|
||||
## Исправление ошибок:
|
||||
### Исправление ошибок:
|
||||
|
||||
* Исправлена ошибка c `IN` где левая часть выражения `Nullable`.
|
||||
* Исправлен неправильный результат при использовании кортежей с `IN` в случае, если часть компоненнтов кортежа есть в индексе таблицы.
|
||||
@ -146,31 +146,31 @@
|
||||
* Исправлена работа `SummingMergeTree` в случае суммирования вложенных структур данных с составным ключом.
|
||||
* Исправлена возможность возникновения race condition при выборе лидера таблиц `ReplicatedMergeTree`.
|
||||
|
||||
## Изменения сборки:
|
||||
### Изменения сборки:
|
||||
|
||||
* Поддержка `ninja` вместо `make` при сборке. `ninja` используется по-умолчанию при сборке релизов.
|
||||
* Переименованы пакеты `clickhouse-server-base` в `clickhouse-common-static`; `clickhouse-server-common` в `clickhouse-server`; `clickhouse-common-dbg` в `clickhouse-common-static-dbg`. Для установки используйте `clickhouse-server clickhouse-client`. Для совместимости, пакеты со старыми именами продолжают загружаться в репозиторий.
|
||||
|
||||
## Обратно несовместимые изменения:
|
||||
### Обратно несовместимые изменения:
|
||||
|
||||
* Удалена специальная интерпретация выражения IN, если слева указан массив. Ранее выражение вида `arr IN (set)` воспринималось как "хотя бы один элемент `arr` принадлежит множеству `set`". Для получения такого же поведения в новой версии, напишите `arrayExists(x -> x IN (set), arr)`.
|
||||
* Отключено ошибочное использование опции сокета `SO_REUSEPORT` (которая по ошибке включена по-умолчанию в библиотеке Poco). Стоит обратить внимание, что на Linux системах теперь не имеет смысла указывать одновременно адреса `::` и `0.0.0.0` для listen - следует использовать лишь адрес `::`, который (с настройками ядра по-умолчанию) позволяет слушать соединения как по IPv4 так и по IPv6. Также вы можете вернуть поведение старых версий, указав в конфиге `<listen_reuse_port>1</listen_reuse_port>`.
|
||||
|
||||
|
||||
# ClickHouse release 1.1.54370, 2018-03-16
|
||||
## ClickHouse release 1.1.54370, 2018-03-16
|
||||
|
||||
## Новые возможности:
|
||||
### Новые возможности:
|
||||
|
||||
* Добавлена системная таблица `system.macros` и автоматическое обновление макросов при изменении конфигурационного файла.
|
||||
* Добавлен запрос `SYSTEM RELOAD CONFIG`.
|
||||
* Добавлена агрегатная функция `maxIntersections(left_col, right_col)`, возвращающая максимальное количество одновременно пересекающихся интервалов `[left; right]`. Функция `maxIntersectionsPosition(left, right)` возвращает начало такого "максимального" интервала. ([Michael Furmur](https://github.com/yandex/ClickHouse/pull/2012)).
|
||||
|
||||
## Улучшения:
|
||||
### Улучшения:
|
||||
|
||||
* При вставке данных в `Replicated`-таблицу делается меньше обращений к `ZooKeeper` (также из лога `ZooKeeper` исчезло большинство user-level ошибок).
|
||||
* Добавлена возможность создавать алиасы для множеств. Пример: `WITH (1, 2, 3) AS set SELECT number IN set FROM system.numbers LIMIT 10`.
|
||||
|
||||
## Исправление ошибок:
|
||||
### Исправление ошибок:
|
||||
|
||||
* Исправлена ошибка `Illegal PREWHERE` при чтении из Merge-таблицы над `Distributed`-таблицами.
|
||||
* Добавлены исправления, позволяющие запускать clickhouse-server в IPv4-only docker-контейнерах.
|
||||
@ -185,9 +185,9 @@
|
||||
* Устранено ненужное Error-level логирование `Not found column ... in block`.
|
||||
|
||||
|
||||
# Релиз ClickHouse 1.1.54362, 2018-03-11
|
||||
## Релиз ClickHouse 1.1.54362, 2018-03-11
|
||||
|
||||
## Новые возможности:
|
||||
### Новые возможности:
|
||||
|
||||
* Агрегация без `GROUP BY` по пустому множеству (как например, `SELECT count(*) FROM table WHERE 0`) теперь возвращает результат из одной строки с нулевыми значениями агрегатных функций, в соответствии со стандартом SQL. Вы можете вернуть старое поведение (возвращать пустой результат), выставив настройку `empty_result_for_aggregation_by_empty_set` в значение 1.
|
||||
* Добавлено приведение типов при `UNION ALL`. Допустимо использование столбцов с разными алиасами в соответствующих позициях `SELECT` в `UNION ALL`, что соответствует стандарту SQL.
|
||||
@ -225,7 +225,7 @@
|
||||
* Добавлена настройка `odbc_default_field_size`, позволяющая расширить максимальный размер значения, загружаемого из ODBC источника (по-умолчанию - 1024).
|
||||
* В таблицу `system.processes` и в `SHOW PROCESSLIST` добавлены столбцы `is_cancelled` и `peak_memory_usage`.
|
||||
|
||||
## Улучшения:
|
||||
### Улучшения:
|
||||
|
||||
* Ограничения на результат и квоты на результат теперь не применяются к промежуточным данным для запросов `INSERT SELECT` и для подзапросов в `SELECT`.
|
||||
* Уменьшено количество ложных срабатываний при проверке состояния `Replicated` таблиц при запуске сервера, приводивших к необходимости выставления флага `force_restore_data`.
|
||||
@ -241,7 +241,7 @@
|
||||
* Значения типа `Enum` можно использовать в функциях `min`, `max`, `sum` и некоторых других - в этих случаях используются соответствующие числовые значения. Эта возможность присутствовала ранее, но была потеряна в релизе 1.1.54337.
|
||||
* Добавлено ограничение `max_expanded_ast_elements` действующее на размер AST после рекурсивного раскрытия алиасов.
|
||||
|
||||
## Исправление ошибок:
|
||||
### Исправление ошибок:
|
||||
|
||||
* Исправлены случаи ошибочного удаления ненужных столбцов из подзапросов, а также отсутствие удаления ненужных столбцов из подзапросов, содержащих `UNION ALL`.
|
||||
* Исправлена ошибка в слияниях для таблиц типа `ReplacingMergeTree`.
|
||||
@ -269,19 +269,19 @@
|
||||
* Запрещено использование запросов с `UNION ALL` в `MATERIALIZED VIEW`.
|
||||
* Исправлена ошибка, которая может возникать при инициализации системной таблицы `part_log` при старте сервера (по-умолчанию `part_log` выключен).
|
||||
|
||||
## Обратно несовместимые изменения:
|
||||
### Обратно несовместимые изменения:
|
||||
|
||||
* Удалена настройка `distributed_ddl_allow_replicated_alter`. Соответствующее поведение включено по-умолчанию.
|
||||
* Удалена настройка `strict_insert_defaults`. Если вы использовали эту функциональность, напишите на `clickhouse-feedback@yandex-team.com`.
|
||||
* Удалён движок таблиц `UnsortedMergeTree`.
|
||||
|
||||
# Релиз ClickHouse 1.1.54343, 2018-02-05
|
||||
## Релиз ClickHouse 1.1.54343, 2018-02-05
|
||||
|
||||
* Добавлена возможность использовать макросы при задании имени кластера в распределенных DLL запросах и создании Distributed-таблиц: `CREATE TABLE distr ON CLUSTER '{cluster}' (...) ENGINE = Distributed('{cluster}', 'db', 'table')`.
|
||||
* Теперь при вычислении запросов вида `SELECT ... FROM table WHERE expr IN (subquery)` используется индекс таблицы `table`.
|
||||
* Улучшена обработка дубликатов при вставке в Replicated-таблицы, теперь они не приводят к излишнему замедлению выполнения очереди репликации.
|
||||
|
||||
# Релиз ClickHouse 1.1.54342, 2018-01-22
|
||||
## Релиз ClickHouse 1.1.54342, 2018-01-22
|
||||
|
||||
Релиз содержит исправление к предыдущему релизу 1.1.54337:
|
||||
* Исправлена регрессия в версии 1.1.54337: если пользователь по-умолчанию имеет readonly доступ, то сервер отказывался стартовать с сообщением `Cannot create database in readonly mode`.
|
||||
@ -292,9 +292,9 @@
|
||||
* Таблицы типа Buffer теперь работают при наличии MATERIALIZED столбцов в таблице назначения (by zhang2014).
|
||||
* Исправлена одна из ошибок в реализации NULL.
|
||||
|
||||
# Релиз ClickHouse 1.1.54337, 2018-01-18
|
||||
## Релиз ClickHouse 1.1.54337, 2018-01-18
|
||||
|
||||
## Новые возможности:
|
||||
### Новые возможности:
|
||||
|
||||
* Добавлена поддержка хранения многомерных массивов и кортежей (тип данных `Tuple`) в таблицах.
|
||||
* Поддержка табличных функций для запросов `DESCRIBE` и `INSERT`. Поддержка подзапроса в запросе `DESCRIBE`. Примеры: `DESC TABLE remote('host', default.hits)`; `DESC TABLE (SELECT 1)`; `INSERT INTO TABLE FUNCTION remote('host', default.hits)`. Возможность писать `INSERT INTO TABLE` вместо `INSERT INTO`.
|
||||
@ -325,7 +325,7 @@
|
||||
* Для программы `clickhouse-local` добавлена опция `--silent` для подавления вывода информации о выполнении запроса в stderr.
|
||||
* Добавлена поддержка чтения `Date` в текстовом виде в формате, где месяц и день месяца могут быть указаны одной цифрой вместо двух (Amos Bird).
|
||||
|
||||
## Увеличение производительности:
|
||||
### Увеличение производительности:
|
||||
|
||||
* Увеличена производительность агрегатных функций `min`, `max`, `any`, `anyLast`, `anyHeavy`, `argMin`, `argMax` от строковых аргументов.
|
||||
* Увеличена производительность функций `isInfinite`, `isFinite`, `isNaN`, `roundToExp2`.
|
||||
@ -334,7 +334,7 @@
|
||||
* Уменьшено потребление памяти при `JOIN`, если левая и правая часть содержали столбцы с одинаковым именем, не входящие в `USING`.
|
||||
* Увеличена производительность агрегатных функций `varSamp`, `varPop`, `stddevSamp`, `stddevPop`, `covarSamp`, `covarPop`, `corr` за счёт уменьшения стойкости к вычислительной погрешности. Старые версии функций добавлены под именами `varSampStable`, `varPopStable`, `stddevSampStable`, `stddevPopStable`, `covarSampStable`, `covarPopStable`, `corrStable`.
|
||||
|
||||
## Исправления ошибок:
|
||||
### Исправления ошибок:
|
||||
|
||||
* Исправлена работа дедупликации блоков после `DROP` или `DETATH PARTITION`. Раньше удаление партиции и вставка тех же самых данных заново не работала, так как вставленные заново блоки считались дубликатами.
|
||||
* Исправлена ошибка, в связи с которой может неправильно обрабатываться `WHERE` для запросов на создание `MATERIALIZED VIEW` с указанием `POPULATE`.
|
||||
@ -373,7 +373,7 @@
|
||||
* Исправлена работа запроса `SYSTEM DROP DNS CACHE`: ранее сброс DNS кэша не приводил к повторному резолвингу имён хостов кластера.
|
||||
* Исправлено поведение `MATERIALIZED VIEW` после `DETACH TABLE` таблицы, на которую он смотрит (Marek Vavruša).
|
||||
|
||||
## Улучшения сборки:
|
||||
### Улучшения сборки:
|
||||
|
||||
* Для сборки используется `pbuilder`. Сборка максимально независима от окружения на сборочной машине.
|
||||
* Для разных версий систем выкладывается один и тот же пакет, который совместим с широким диапазоном Linux систем.
|
||||
@ -387,7 +387,7 @@
|
||||
* Удалено использование расширений GNU из кода и включена опция `-Wextra`. При сборке с помощью `clang` по-умолчанию используется `libc++` вместо `libstdc++`.
|
||||
* Выделены библиотеки `clickhouse_parsers` и `clickhouse_common_io` для более быстрой сборки утилит.
|
||||
|
||||
## Обратно несовместимые изменения:
|
||||
### Обратно несовместимые изменения:
|
||||
|
||||
* Формат засечек (marks) для таблиц типа `Log`, содержащих `Nullable` столбцы, изменён обратно-несовместимым образом. В случае наличия таких таблиц, вы можете преобразовать их в `TinyLog` до запуска новой версии сервера. Для этого в соответствующем таблице файле `.sql` в директории `metadata`, замените `ENGINE = Log` на `ENGINE = TinyLog`. Если в таблице нет `Nullable` столбцов или тип таблицы не `Log`, то ничего делать не нужно.
|
||||
* Удалена настройка `experimental_allow_extended_storage_definition_syntax`. Соответствующая функциональность включена по-умолчанию.
|
||||
@ -398,16 +398,16 @@
|
||||
* В предыдущих версиях существовала недокументированная возможность: в типе данных AggregateFunction можно было не указывать параметры для агрегатной функции, которая зависит от параметров. Пример: `AggregateFunction(quantiles, UInt64)` вместо `AggregateFunction(quantiles(0.5, 0.9), UInt64)`. Эта возможность потеряна. Не смотря на то, что возможность не документирована, мы собираемся вернуть её в ближайших релизах.
|
||||
* Значения типа данных Enum не могут быть переданы в агрегатные функции min/max. Возможность будет возвращена обратно в следующем релизе.
|
||||
|
||||
## На что обратить внимание при обновлении:
|
||||
### На что обратить внимание при обновлении:
|
||||
* При обновлении кластера, на время, когда на одних репликах работает новая версия сервера, а на других - старая, репликация будет приостановлена и в логе появятся сообщения вида `unknown parameter 'shard'`. Репликация продолжится после обновления всех реплик кластера.
|
||||
* Если на серверах кластера работают разные версии ClickHouse, то возможен неправильный результат распределённых запросов, использующих функции `varSamp`, `varPop`, `stddevSamp`, `stddevPop`, `covarSamp`, `covarPop`, `corr`. Необходимо обновить все серверы кластера.
|
||||
|
||||
# Релиз ClickHouse 1.1.54327, 2017-12-21
|
||||
## Релиз ClickHouse 1.1.54327, 2017-12-21
|
||||
|
||||
Релиз содержит исправление к предыдущему релизу 1.1.54318:
|
||||
* Исправлена проблема с возможным race condition при репликации, которая может приводить к потере данных. Проблеме подвержены версии 1.1.54310 и 1.1.54318. Если вы их используете и у вас есть Replicated таблицы, то обновление обязательно. Понять, что эта проблема существует, можно по сообщениям в логе Warning вида `Part ... from own log doesn't exist.` Даже если таких сообщений нет, проблема всё-равно актуальна.
|
||||
|
||||
# Релиз ClickHouse 1.1.54318, 2017-11-30
|
||||
## Релиз ClickHouse 1.1.54318, 2017-11-30
|
||||
|
||||
Релиз содержит изменения к предыдущему релизу 1.1.54310 с исправлением следующих багов:
|
||||
* Исправлено некорректное удаление строк при слияниях в движке SummingMergeTree
|
||||
@ -416,9 +416,9 @@
|
||||
* Исправлена проблема, приводящая к остановке выполнения очереди репликации
|
||||
* Исправлено ротирование и архивация логов сервера
|
||||
|
||||
# Релиз ClickHouse 1.1.54310, 2017-11-01
|
||||
## Релиз ClickHouse 1.1.54310, 2017-11-01
|
||||
|
||||
## Новые возможности:
|
||||
### Новые возможности:
|
||||
* Произвольный ключ партиционирования для таблиц семейства MergeTree.
|
||||
* Движок таблиц [Kafka](https://clickhouse.yandex/docs/en/single/index.html#document-table_engines/kafka).
|
||||
* Возможность загружать модели [CatBoost](https://catboost.yandex/) и применять их к данным, хранящимся в ClickHouse.
|
||||
@ -434,12 +434,12 @@
|
||||
* Поддержка входного формата Cap’n Proto.
|
||||
* Возможность задавать уровень сжатия при использовании алгоритма zstd.
|
||||
|
||||
## Обратно несовместимые изменения:
|
||||
### Обратно несовместимые изменения:
|
||||
* Запрещено создание временных таблиц с движком, отличным от Memory.
|
||||
* Запрещено явное создание таблиц с движком View и MaterializedView.
|
||||
* При создании таблицы теперь проверяется, что ключ сэмплирования входит в первичный ключ.
|
||||
|
||||
## Исправления ошибок:
|
||||
### Исправления ошибок:
|
||||
* Исправлено зависание при синхронной вставке в Distributed таблицу.
|
||||
* Исправлена неатомарность при добавлении/удалении кусков в реплицированных таблицах.
|
||||
* Данные, вставляемые в материализованное представление, теперь не подвергаются излишней дедупликации.
|
||||
@ -449,14 +449,14 @@
|
||||
* Исправлено зависание при недостатке места на диске в разделе с логами.
|
||||
* Исправлено переполнение в функции toRelativeWeekNum для первой недели Unix-эпохи.
|
||||
|
||||
## Улучшения сборки:
|
||||
### Улучшения сборки:
|
||||
* Несколько сторонних библиотек (в частности, Poco) обновлены и переведены на git submodules.
|
||||
|
||||
# Релиз ClickHouse 1.1.54304, 2017-10-19
|
||||
## Новые возможности:
|
||||
## Релиз ClickHouse 1.1.54304, 2017-10-19
|
||||
### Новые возможности:
|
||||
* Добавлена поддержка TLS в нативном протоколе (включается заданием `tcp_ssl_port` в `config.xml`)
|
||||
|
||||
## Исправления ошибок:
|
||||
### Исправления ошибок:
|
||||
* `ALTER` для реплицированных таблиц теперь пытается начать выполнение как можно быстрее
|
||||
* Исправлены падения при чтении данных с настройкой `preferred_block_size_bytes=0`
|
||||
* Исправлено падение `clickhouse-client` при нажатии `Page Down`
|
||||
@ -469,16 +469,16 @@
|
||||
* Корректное обновление пользователей при невалидном `users.xml`
|
||||
* Корректная обработка случаев, когда executable-словарь возвращает ненулевой код ответа
|
||||
|
||||
# Релиз ClickHouse 1.1.54292, 2017-09-20
|
||||
## Релиз ClickHouse 1.1.54292, 2017-09-20
|
||||
|
||||
## Новые возможности:
|
||||
### Новые возможности:
|
||||
* Добавлена функция `pointInPolygon` для работы с координатами на плоскости.
|
||||
* Добавлена агрегатная функция `sumMap`, обеспечивающая суммирование массивов аналогично `SummingMergeTree`.
|
||||
* Добавлена функция `trunc`. Увеличена производительность функций округления `round`, `floor`, `ceil`, `roundToExp2`. Исправлена логика работы функций округления. Изменена логика работы функции `roundToExp2` для дробных и отрицательных чисел.
|
||||
* Ослаблена зависимость исполняемого файла ClickHouse от версии libc. Один и тот же исполняемый файл ClickHouse может запускаться и работать на широком множестве Linux систем. Замечание: зависимость всё ещё присутствует при использовании скомпилированных запросов (настройка `compile = 1`, по-умолчанию не используется).
|
||||
* Уменьшено время динамической компиляции запросов.
|
||||
|
||||
## Исправления ошибок:
|
||||
### Исправления ошибок:
|
||||
* Исправлена ошибка, которая могла приводить к сообщениям `part ... intersects previous part` и нарушению консистентности реплик.
|
||||
* Исправлена ошибка, приводящая к блокировке при завершении работы сервера, если в это время ZooKeeper недоступен.
|
||||
* Удалено избыточное логгирование при восстановлении реплик.
|
||||
@ -486,9 +486,9 @@
|
||||
* Исправлена ошибка в функции concat, возникающая в случае, если первый столбец блока имеет тип Array.
|
||||
* Исправлено отображение прогресса в таблице system.merges.
|
||||
|
||||
# Релиз ClickHouse 1.1.54289, 2017-09-13
|
||||
## Релиз ClickHouse 1.1.54289, 2017-09-13
|
||||
|
||||
## Новые возможности:
|
||||
### Новые возможности:
|
||||
* Запросы `SYSTEM` для административных действий с сервером: `SYSTEM RELOAD DICTIONARY`, `SYSTEM RELOAD DICTIONARIES`, `SYSTEM DROP DNS CACHE`, `SYSTEM SHUTDOWN`, `SYSTEM KILL`.
|
||||
* Добавлены функции для работы с массивами: `concat`, `arraySlice`, `arrayPushBack`, `arrayPushFront`, `arrayPopBack`, `arrayPopFront`.
|
||||
* Добавлены параметры `root` и `identity` для конфигурации ZooKeeper. Это позволяет изолировать разных пользователей одного ZooKeeper кластера.
|
||||
@ -503,7 +503,7 @@
|
||||
* Возможность задать `umask` в конфигурационном файле.
|
||||
* Увеличена производительность запросов с `DISTINCT`.
|
||||
|
||||
## Исправления ошибок:
|
||||
### Исправления ошибок:
|
||||
* Более оптимальная процедура удаления старых нод в ZooKeeper. Ранее в случае очень частых вставок, старые ноды могли не успевать удаляться, что приводило, в том числе, к очень долгому завершению сервера.
|
||||
* Исправлена рандомизация при выборе хостов для соединения с ZooKeeper.
|
||||
* Исправлено отключение отстающей реплики при распределённых запросах, если реплика является localhost.
|
||||
@ -516,28 +516,28 @@
|
||||
* Исправлено появление zombie процессов при работе со словарём с источником `executable`.
|
||||
* Исправлен segfault при запросе HEAD.
|
||||
|
||||
## Улучшения процесса разработки и сборки ClickHouse:
|
||||
### Улучшения процесса разработки и сборки ClickHouse:
|
||||
* Возможность сборки с помощью `pbuilder`.
|
||||
* Возможность сборки с использованием `libc++` вместо `libstdc++` под Linux.
|
||||
* Добавлены инструкции для использования статических анализаторов кода `Coverity`, `clang-tidy`, `cppcheck`.
|
||||
|
||||
## На что обратить внимание при обновлении:
|
||||
### На что обратить внимание при обновлении:
|
||||
* Увеличено значение по-умолчанию для настройки MergeTree `max_bytes_to_merge_at_max_space_in_pool` (максимальный суммарный размер кусков в байтах для мержа) со 100 GiB до 150 GiB. Это может привести к запуску больших мержей после обновления сервера, что может вызвать повышенную нагрузку на дисковую подсистему. Если же на серверах, где это происходит, количество свободного места менее чем в два раза больше суммарного объёма выполняющихся мержей, то в связи с этим перестанут выполняться какие-либо другие мержи, включая мержи мелких кусков. Это приведёт к тому, что INSERT-ы будут отклоняться с сообщением "Merges are processing significantly slower than inserts". Для наблюдения, используйте запрос `SELECT * FROM system.merges`. Вы также можете смотреть на метрику `DiskSpaceReservedForMerge` в таблице `system.metrics` или в Graphite. Для исправления этой ситуации можно ничего не делать, так как она нормализуется сама после завершения больших мержей. Если же вас это не устраивает, вы можете вернуть настройку `max_bytes_to_merge_at_max_space_in_pool` в старое значение, прописав в config.xml в секции `<merge_tree>` `<max_bytes_to_merge_at_max_space_in_pool>107374182400</max_bytes_to_merge_at_max_space_in_pool>` и перезапустить сервер.
|
||||
|
||||
# Релиз ClickHouse 1.1.54284, 2017-08-29
|
||||
## Релиз ClickHouse 1.1.54284, 2017-08-29
|
||||
|
||||
* Релиз содержит изменения к предыдущему релизу 1.1.54282, которые исправляют утечку записей о кусках в ZooKeeper
|
||||
|
||||
# Релиз ClickHouse 1.1.54282, 2017-08-23
|
||||
## Релиз ClickHouse 1.1.54282, 2017-08-23
|
||||
|
||||
Релиз содержит исправления к предыдущему релизу 1.1.54276:
|
||||
* Исправлена ошибка `DB::Exception: Assertion violation: !_path.empty()` при вставке в Distributed таблицу.
|
||||
* Исправлен парсинг при вставке в формате RowBinary, если входные данные начинаются с ';'.
|
||||
* Исправлена ошибка при рантайм-компиляции некоторых агрегатных функций (например, `groupArray()`).
|
||||
|
||||
# Релиз ClickHouse 1.1.54276, 2017-08-16
|
||||
## Релиз ClickHouse 1.1.54276, 2017-08-16
|
||||
|
||||
## Новые возможности:
|
||||
### Новые возможности:
|
||||
* Добавлена опциональная секция WITH запроса SELECT. Пример запроса: `WITH 1+1 AS a SELECT a, a*a`
|
||||
* Добавлена возможность синхронной вставки в Distributed таблицу: выдается Ok только после того как все данные записались на все шарды. Активируется настройкой insert_distributed_sync=1
|
||||
* Добавлен тип данных UUID для работы с 16-байтовыми идентификаторами
|
||||
@ -547,17 +547,17 @@
|
||||
* Добавлена поддержка неконстантных аргументов и отрицательных смещений в функции `substring(str, pos, len)`
|
||||
* Добавлен параметр max_size для агрегатной функции `groupArray(max_size)(column)`, и оптимизирована её производительность
|
||||
|
||||
## Основные изменения:
|
||||
### Основные изменения:
|
||||
* Улучшение безопасности: все файлы сервера создаются с правами 0640 (можно поменять, через параметр <umask> в конфиге).
|
||||
* Улучшены сообщения об ошибках в случае синтаксически неверных запросов
|
||||
* Значительно уменьшен расход оперативной памяти и улучшена производительность слияний больших MergeTree-кусков данных
|
||||
* Значительно увеличена производительность слияний данных для движка ReplacingMergeTree
|
||||
* Улучшена производительность асинхронных вставок из Distributed таблицы за счет объединения нескольких исходных вставок. Функциональность включается настройкой distributed_directory_monitor_batch_inserts=1.
|
||||
|
||||
## Обратно несовместимые изменения:
|
||||
### Обратно несовместимые изменения:
|
||||
* Изменился бинарный формат агрегатных состояний функции `groupArray(array_column)` для массивов
|
||||
|
||||
## Полный список изменений:
|
||||
### Полный список изменений:
|
||||
* Добавлена настройка `output_format_json_quote_denormals`, включающая вывод nan и inf значений в формате JSON
|
||||
* Более оптимальное выделение потоков при чтении из Distributed таблиц
|
||||
* Разрешено задавать настройки в режиме readonly, если их значение не изменяется
|
||||
@ -575,7 +575,7 @@
|
||||
* Возможность подключения к MySQL через сокет на файловой системе
|
||||
* В таблицу system.parts добавлен столбец с информацией о размере marks в байтах
|
||||
|
||||
## Исправления багов:
|
||||
### Исправления багов:
|
||||
* Исправлена некорректная работа Distributed таблиц, использующих Merge таблицы, при SELECT с условием на поле _table
|
||||
* Исправлен редкий race condition в ReplicatedMergeTree при проверке кусков данных
|
||||
* Исправлено возможное зависание процедуры leader election при старте сервера
|
||||
@ -598,15 +598,15 @@
|
||||
* Исправлена ошибка "Cannot mremap" при использовании множеств в секциях IN, JOIN, содержащих более 2 млрд. элементов
|
||||
* Исправлен failover для словарей с источником MySQL
|
||||
|
||||
## Улучшения процесса разработки и сборки ClickHouse:
|
||||
### Улучшения процесса разработки и сборки ClickHouse:
|
||||
* Добавлена возмозможность сборки в Arcadia
|
||||
* Добавлена возможность сборки с помощью gcc 7
|
||||
* Ускорена параллельная сборка с помощью ccache+distcc
|
||||
|
||||
|
||||
# Релиз ClickHouse 1.1.54245, 2017-07-04
|
||||
## Релиз ClickHouse 1.1.54245, 2017-07-04
|
||||
|
||||
## Новые возможности:
|
||||
### Новые возможности:
|
||||
* Распределённые DDL (например, `CREATE TABLE ON CLUSTER`)
|
||||
* Реплицируемый запрос `ALTER TABLE CLEAR COLUMN IN PARTITION`
|
||||
* Движок таблиц Dictionary (доступ к данным словаря в виде таблицы)
|
||||
@ -617,14 +617,14 @@
|
||||
* Сессии в HTTP интерфейсе
|
||||
* Запрос OPTIMIZE для Replicated таблицы теперь можно выполнять не только на лидере
|
||||
|
||||
## Обратно несовместимые изменения:
|
||||
### Обратно несовместимые изменения:
|
||||
* Убрана команда SET GLOBAL
|
||||
|
||||
## Мелкие изменения:
|
||||
### Мелкие изменения:
|
||||
* Теперь после получения сигнала в лог печатается полный стектрейс
|
||||
* Ослаблена проверка на количество повреждённых/лишних кусков при старте (было слишком много ложных срабатываний)
|
||||
|
||||
## Исправления багов:
|
||||
### Исправления багов:
|
||||
* Исправлено залипание плохого соединения при вставке в Distributed таблицу
|
||||
* GLOBAL IN теперь работает при запросе из таблицы Merge, смотрящей в Distributed
|
||||
* Теперь правильно определяется количество ядер на виртуалках Google Compute Engine
|
||||
|
39
MacOS.md
39
MacOS.md
@ -1,39 +0,0 @@
|
||||
## How to increase maxfiles on macOS
|
||||
|
||||
To increase maxfiles on macOS, create the following file:
|
||||
|
||||
(Note: you'll need to use sudo)
|
||||
|
||||
/Library/LaunchDaemons/limit.maxfiles.plist:
|
||||
```
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
|
||||
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||
<plist version="1.0">
|
||||
<dict>
|
||||
<key>Label</key>
|
||||
<string>limit.maxfiles</string>
|
||||
<key>ProgramArguments</key>
|
||||
<array>
|
||||
<string>launchctl</string>
|
||||
<string>limit</string>
|
||||
<string>maxfiles</string>
|
||||
<string>524288</string>
|
||||
<string>524288</string>
|
||||
</array>
|
||||
<key>RunAtLoad</key>
|
||||
<true/>
|
||||
<key>ServiceIPC</key>
|
||||
<false/>
|
||||
</dict>
|
||||
</plist>
|
||||
```
|
||||
|
||||
Execute the following command:
|
||||
```
|
||||
sudo chown root:wheel /Library/LaunchDaemons/limit.maxfiles.plist
|
||||
```
|
||||
|
||||
Reboot.
|
||||
|
||||
To check if it's working, you can use `ulimit -n` command.
|
@ -2,7 +2,7 @@
|
||||
|
||||
Basically ClickHouse uses "documentation as code" approach, so you can edit Markdown files in this folder from GitHub web interface or fork ClickHouse repository, edit, commit, push and open pull request.
|
||||
|
||||
At the moment documentation is bilingual in English and Russian, so it's better to try keeping languages in sync if you can, but it's not strictly required as there are people watching over this. If you add new article, you should also add it to `mkdocs_{en,ru}.yaml` file with pages index.
|
||||
At the moment documentation is bilingual in English and Russian, so it's better to try keeping languages in sync if you can, but it's not strictly required as there are people watching over this. If you add new article, you should also add it to `toc_{en,ru}.yaml` file with pages index.
|
||||
|
||||
Master branch is then asynchronously published to ClickHouse official website:
|
||||
|
||||
|
1
docs/en/changelog.md
Symbolic link
1
docs/en/changelog.md
Symbolic link
@ -0,0 +1 @@
|
||||
../../CHANGELOG.md
|
@ -66,5 +66,5 @@ SELECT 0 / 0
|
||||
└──────────────┘
|
||||
```
|
||||
|
||||
See the rules for ` NaN` sorting in the section [ORDER BY clause](../query_language/queries.md#query_language-queries-order_by).
|
||||
See the rules for ` NaN` sorting in the section [ORDER BY clause](../query_language/select.md#query_language-queries-order_by).
|
||||
|
||||
|
@ -41,5 +41,45 @@ cd ..
|
||||
|
||||
## Caveats
|
||||
|
||||
If you intend to run clickhouse-server, make sure to increase the system's maxfiles variable. See [MacOS.md](https://github.com/yandex/ClickHouse/blob/master/MacOS.md) for more details.
|
||||
If you intend to run clickhouse-server, make sure to increase the system's maxfiles variable.
|
||||
|
||||
<div class="admonition info">
|
||||
Note: you'll need to use sudo.
|
||||
</div>
|
||||
|
||||
To do so, create the following file:
|
||||
|
||||
/Library/LaunchDaemons/limit.maxfiles.plist:
|
||||
``` xml
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
|
||||
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||
<plist version="1.0">
|
||||
<dict>
|
||||
<key>Label</key>
|
||||
<string>limit.maxfiles</string>
|
||||
<key>ProgramArguments</key>
|
||||
<array>
|
||||
<string>launchctl</string>
|
||||
<string>limit</string>
|
||||
<string>maxfiles</string>
|
||||
<string>524288</string>
|
||||
<string>524288</string>
|
||||
</array>
|
||||
<key>RunAtLoad</key>
|
||||
<true/>
|
||||
<key>ServiceIPC</key>
|
||||
<false/>
|
||||
</dict>
|
||||
</plist>
|
||||
```
|
||||
|
||||
Execute the following command:
|
||||
``` bash
|
||||
$ sudo chown root:wheel /Library/LaunchDaemons/limit.maxfiles.plist
|
||||
```
|
||||
|
||||
Reboot.
|
||||
|
||||
To check if it's working, you can use `ulimit -n` command.
|
||||
|
||||
|
@ -1,26 +0,0 @@
|
||||
<a name="format_capnproto"></a>
|
||||
|
||||
# CapnProto
|
||||
|
||||
Cap'n Proto is a binary message format similar to Protocol Buffers and Thrift, but not like JSON or MessagePack.
|
||||
|
||||
Cap'n Proto messages are strictly typed and not self-describing, meaning they need an external schema description. The schema is applied on the fly and cached for each query.
|
||||
|
||||
```sql
|
||||
SELECT SearchPhrase, count() AS c FROM test.hits
|
||||
GROUP BY SearchPhrase FORMAT CapnProto SETTINGS schema = 'schema:Message'
|
||||
```
|
||||
|
||||
Where `schema.capnp` looks like this:
|
||||
|
||||
```
|
||||
struct Message {
|
||||
SearchPhrase @0 :Text;
|
||||
c @1 :Uint64;
|
||||
}
|
||||
```
|
||||
|
||||
Schema files are in the file that is located in the directory specified in [ format_schema_path](../operations/server_settings/settings.md#server_settings-format_schema_path) in the server configuration.
|
||||
|
||||
Deserialization is effective and usually doesn't increase the system load.
|
||||
|
@ -1,12 +0,0 @@
|
||||
# CSV
|
||||
|
||||
Comma Separated Values format ([RFC](https://tools.ietf.org/html/rfc4180)).
|
||||
|
||||
When formatting, rows are enclosed in double quotes. A double quote inside a string is output as two double quotes in a row. There are no other rules for escaping characters. Date and date-time are enclosed in double quotes. Numbers are output without quotes. Values are separated by a delimiter*. Rows are separated using the Unix line feed (LF). Arrays are serialized in CSV as follows: first the array is serialized to a string as in TabSeparated format, and then the resulting string is output to CSV in double quotes. Tuples in CSV format are serialized as separate columns (that is, their nesting in the tuple is lost).
|
||||
|
||||
*By default — `,`. See a [format_csv_delimiter](/docs/en/operations/settings/settings/#format_csv_delimiter) setting for additional info.
|
||||
|
||||
When parsing, all values can be parsed either with or without quotes. Both double and single quotes are supported. Rows can also be arranged without quotes. In this case, they are parsed up to a delimiter or line feed (CR or LF). In violation of the RFC, when parsing rows without quotes, the leading and trailing spaces and tabs are ignored. For the line feed, Unix (LF), Windows (CR LF) and Mac OS Classic (CR LF) are all supported.
|
||||
|
||||
The CSV format supports the output of totals and extremes the same way as `TabSeparated`.
|
||||
|
@ -1,4 +0,0 @@
|
||||
# CSVWithNames
|
||||
|
||||
Also prints the header row, similar to `TabSeparatedWithNames`.
|
||||
|
@ -1,6 +0,0 @@
|
||||
<a name="formats"></a>
|
||||
|
||||
# Formats
|
||||
|
||||
The format determines how data is returned to you after SELECTs (how it is written and formatted by the server), and how it is accepted for INSERTs (how it is read and parsed by the server).
|
||||
|
@ -1,85 +0,0 @@
|
||||
# JSON
|
||||
|
||||
Outputs data in JSON format. Besides data tables, it also outputs column names and types, along with some additional information: the total number of output rows, and the number of rows that could have been output if there weren't a LIMIT. Example:
|
||||
|
||||
```sql
|
||||
SELECT SearchPhrase, count() AS c FROM test.hits GROUP BY SearchPhrase WITH TOTALS ORDER BY c DESC LIMIT 5 FORMAT JSON
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"meta":
|
||||
[
|
||||
{
|
||||
"name": "SearchPhrase",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "c",
|
||||
"type": "UInt64"
|
||||
}
|
||||
],
|
||||
|
||||
"data":
|
||||
[
|
||||
{
|
||||
"SearchPhrase": "",
|
||||
"c": "8267016"
|
||||
},
|
||||
{
|
||||
"SearchPhrase": "bathroom interior design",
|
||||
"c": "2166"
|
||||
},
|
||||
{
|
||||
"SearchPhrase": "yandex",
|
||||
"c": "1655"
|
||||
},
|
||||
{
|
||||
"SearchPhrase": "spring 2014 fashion",
|
||||
"c": "1549"
|
||||
},
|
||||
{
|
||||
"SearchPhrase": "freeform photos",
|
||||
"c": "1480"
|
||||
}
|
||||
],
|
||||
|
||||
"totals":
|
||||
{
|
||||
"SearchPhrase": "",
|
||||
"c": "8873898"
|
||||
},
|
||||
|
||||
"extremes":
|
||||
{
|
||||
"min":
|
||||
{
|
||||
"SearchPhrase": "",
|
||||
"c": "1480"
|
||||
},
|
||||
"max":
|
||||
{
|
||||
"SearchPhrase": "",
|
||||
"c": "8267016"
|
||||
}
|
||||
},
|
||||
|
||||
"rows": 5,
|
||||
|
||||
"rows_before_limit_at_least": 141137
|
||||
}
|
||||
```
|
||||
|
||||
The JSON is compatible with JavaScript. To ensure this, some characters are additionally escaped: the slash ` /` is escaped as ` \/`; alternative line breaks ` U+2028` and ` U+2029`, which break some browsers, are escaped as ` \uXXXX`. ASCII control characters are escaped: backspace, form feed, line feed, carriage return, and horizontal tab are replaced with `\b`, `\f`, `\n`, `\r`, `\t` , as well as the remaining bytes in the 00-1F range using `\uXXXX` sequences. Invalid UTF-8 sequences are changed to the replacement character <20> so the output text will consist of valid UTF-8 sequences. For compatibility with JavaScript, Int64 and UInt64 integers are enclosed in double quotes by default. To remove the quotes, you can set the configuration parameter output_format_json_quote_64bit_integers to 0.
|
||||
|
||||
`rows` – The total number of output rows.
|
||||
|
||||
`rows_before_limit_at_least` The minimal number of rows there would have been without LIMIT. Output only if the query contains LIMIT.
|
||||
If the query contains GROUP BY, rows_before_limit_at_least is the exact number of rows there would have been without a LIMIT.
|
||||
|
||||
`totals` – Total values (when using WITH TOTALS).
|
||||
|
||||
`extremes` – Extreme values (when extremes is set to 1).
|
||||
|
||||
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
|
||||
See also the JSONEachRow format.
|
@ -1,46 +0,0 @@
|
||||
# JSONCompact
|
||||
|
||||
Differs from JSON only in that data rows are output in arrays, not in objects.
|
||||
|
||||
Example:
|
||||
|
||||
```json
|
||||
{
|
||||
"meta":
|
||||
[
|
||||
{
|
||||
"name": "SearchPhrase",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "c",
|
||||
"type": "UInt64"
|
||||
}
|
||||
],
|
||||
|
||||
"data":
|
||||
[
|
||||
["", "8267016"],
|
||||
["bathroom interior design", "2166"],
|
||||
["yandex", "1655"],
|
||||
["spring 2014 fashion", "1549"],
|
||||
["freeform photos", "1480"]
|
||||
],
|
||||
|
||||
"totals": ["","8873898"],
|
||||
|
||||
"extremes":
|
||||
{
|
||||
"min": ["","1480"],
|
||||
"max": ["","8267016"]
|
||||
},
|
||||
|
||||
"rows": 5,
|
||||
|
||||
"rows_before_limit_at_least": 141137
|
||||
}
|
||||
```
|
||||
|
||||
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
|
||||
See also the `JSONEachRow` format.
|
||||
|
@ -1,21 +0,0 @@
|
||||
# JSONEachRow
|
||||
|
||||
Outputs data as separate JSON objects for each row (newline delimited JSON).
|
||||
|
||||
```json
|
||||
{"SearchPhrase":"","count()":"8267016"}
|
||||
{"SearchPhrase":"bathroom interior design","count()":"2166"}
|
||||
{"SearchPhrase":"yandex","count()":"1655"}
|
||||
{"SearchPhrase":"spring 2014 fashion","count()":"1549"}
|
||||
{"SearchPhrase":"freeform photo","count()":"1480"}
|
||||
{"SearchPhrase":"angelina jolie","count()":"1245"}
|
||||
{"SearchPhrase":"omsk","count()":"1112"}
|
||||
{"SearchPhrase":"photos of dog breeds","count()":"1091"}
|
||||
{"SearchPhrase":"curtain design","count()":"1064"}
|
||||
{"SearchPhrase":"baku","count()":"1000"}
|
||||
```
|
||||
|
||||
Unlike the JSON format, there is no substitution of invalid UTF-8 sequences. Any set of bytes can be output in the rows. This is necessary so that data can be formatted without losing any information. Values are escaped in the same way as for JSON.
|
||||
|
||||
For parsing, any order is supported for the values of different columns. It is acceptable for some values to be omitted – they are treated as equal to their default values. In this case, zeros and blank rows are used as default values. Complex values that could be specified in the table are not supported as defaults. Whitespace between elements is ignored. If a comma is placed after the objects, it is ignored. Objects don't necessarily have to be separated by new lines.
|
||||
|
@ -1,6 +0,0 @@
|
||||
# Native
|
||||
|
||||
The most efficient format. Data is written and read by blocks in binary format. For each block, the number of rows, number of columns, column names and types, and parts of columns in this block are recorded one after another. In other words, this format is "columnar" – it doesn't convert columns to rows. This is the format used in the native interface for interaction between servers, for using the command-line client, and for C++ clients.
|
||||
|
||||
You can use this format to quickly generate dumps that can only be read by the ClickHouse DBMS. It doesn't make sense to work with this format yourself.
|
||||
|
@ -1,5 +0,0 @@
|
||||
# Null
|
||||
|
||||
Nothing is output. However, the query is processed, and when using the command-line client, data is transmitted to the client. This is used for tests, including productivity testing.
|
||||
Obviously, this format is only appropriate for output, not for parsing.
|
||||
|
@ -1,37 +0,0 @@
|
||||
# Pretty
|
||||
|
||||
Outputs data as Unicode-art tables, also using ANSI-escape sequences for setting colors in the terminal.
|
||||
A full grid of the table is drawn, and each row occupies two lines in the terminal.
|
||||
Each result block is output as a separate table. This is necessary so that blocks can be output without buffering results (buffering would be necessary in order to pre-calculate the visible width of all the values).
|
||||
To avoid dumping too much data to the terminal, only the first 10,000 rows are printed. If the number of rows is greater than or equal to 10,000, the message "Showed first 10 000" is printed.
|
||||
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
|
||||
|
||||
The Pretty format supports outputting total values (when using WITH TOTALS) and extremes (when 'extremes' is set to 1). In these cases, total values and extreme values are output after the main data, in separate tables. Example (shown for the PrettyCompact format):
|
||||
|
||||
```sql
|
||||
SELECT EventDate, count() AS c FROM test.hits GROUP BY EventDate WITH TOTALS ORDER BY EventDate FORMAT PrettyCompact
|
||||
```
|
||||
|
||||
```text
|
||||
┌──EventDate─┬───────c─┐
|
||||
│ 2014-03-17 │ 1406958 │
|
||||
│ 2014-03-18 │ 1383658 │
|
||||
│ 2014-03-19 │ 1405797 │
|
||||
│ 2014-03-20 │ 1353623 │
|
||||
│ 2014-03-21 │ 1245779 │
|
||||
│ 2014-03-22 │ 1031592 │
|
||||
│ 2014-03-23 │ 1046491 │
|
||||
└────────────┴─────────┘
|
||||
|
||||
Totals:
|
||||
┌──EventDate─┬───────c─┐
|
||||
│ 0000-00-00 │ 8873898 │
|
||||
└────────────┴─────────┘
|
||||
|
||||
Extremes:
|
||||
┌──EventDate─┬───────c─┐
|
||||
│ 2014-03-17 │ 1031592 │
|
||||
│ 2014-03-23 │ 1406958 │
|
||||
└────────────┴─────────┘
|
||||
```
|
||||
|
@ -1,5 +0,0 @@
|
||||
# PrettyCompact
|
||||
|
||||
Differs from `Pretty` in that the grid is drawn between rows and the result is more compact.
|
||||
This format is used by default in the command-line client in interactive mode.
|
||||
|
@ -1,4 +0,0 @@
|
||||
# PrettyCompactMonoBlock
|
||||
|
||||
Differs from `PrettyCompact` in that up to 10,000 rows are buffered, then output as a single table, not by blocks.
|
||||
|
@ -1,20 +0,0 @@
|
||||
# PrettyNoEscapes
|
||||
|
||||
Differs from Pretty in that ANSI-escape sequences aren't used. This is necessary for displaying this format in a browser, as well as for using the 'watch' command-line utility.
|
||||
|
||||
Example:
|
||||
|
||||
```bash
|
||||
watch -n1 "clickhouse-client --query='SELECT * FROM system.events FORMAT PrettyCompactNoEscapes'"
|
||||
```
|
||||
|
||||
You can use the HTTP interface for displaying in the browser.
|
||||
|
||||
## PrettyCompactNoEscapes
|
||||
|
||||
The same as the previous setting.
|
||||
|
||||
## PrettySpaceNoEscapes
|
||||
|
||||
The same as the previous setting.
|
||||
|
@ -1,4 +0,0 @@
|
||||
# PrettySpace
|
||||
|
||||
Differs from `PrettyCompact` in that whitespace (space characters) is used instead of the grid.
|
||||
|
@ -1,13 +0,0 @@
|
||||
# RowBinary
|
||||
|
||||
Formats and parses data by row in binary format. Rows and values are listed consecutively, without separators.
|
||||
This format is less efficient than the Native format, since it is row-based.
|
||||
|
||||
Integers use fixed-length little endian representation. For example, UInt64 uses 8 bytes.
|
||||
DateTime is represented as UInt32 containing the Unix timestamp as the value.
|
||||
Date is represented as a UInt16 object that contains the number of days since 1970-01-01 as the value.
|
||||
String is represented as a varint length (unsigned [LEB128](https://en.wikipedia.org/wiki/LEB128)), followed by the bytes of the string.
|
||||
FixedString is represented simply as a sequence of bytes.
|
||||
|
||||
Array is represented as a varint length (unsigned [LEB128](https://en.wikipedia.org/wiki/LEB128)), followed by successive elements of the array.
|
||||
|
@ -1,59 +0,0 @@
|
||||
# TabSeparated
|
||||
|
||||
In TabSeparated format, data is written by row. Each row contains values separated by tabs. Each value is follow by a tab, except the last value in the row, which is followed by a line feed. Strictly Unix line feeds are assumed everywhere. The last row also must contain a line feed at the end. Values are written in text format, without enclosing quotation marks, and with special characters escaped.
|
||||
|
||||
Integer numbers are written in decimal form. Numbers can contain an extra "+" character at the beginning (ignored when parsing, and not recorded when formatting). Non-negative numbers can't contain the negative sign. When reading, it is allowed to parse an empty string as a zero, or (for signed types) a string consisting of just a minus sign as a zero. Numbers that do not fit into the corresponding data type may be parsed as a different number, without an error message.
|
||||
|
||||
Floating-point numbers are written in decimal form. The dot is used as the decimal separator. Exponential entries are supported, as are 'inf', '+inf', '-inf', and 'nan'. An entry of floating-point numbers may begin or end with a decimal point.
|
||||
During formatting, accuracy may be lost on floating-point numbers.
|
||||
During parsing, it is not strictly required to read the nearest machine-representable number.
|
||||
|
||||
Dates are written in YYYY-MM-DD format and parsed in the same format, but with any characters as separators.
|
||||
Dates with times are written in the format YYYY-MM-DD hh:mm:ss and parsed in the same format, but with any characters as separators.
|
||||
This all occurs in the system time zone at the time the client or server starts (depending on which one formats data). For dates with times, daylight saving time is not specified. So if a dump has times during daylight saving time, the dump does not unequivocally match the data, and parsing will select one of the two times.
|
||||
During a read operation, incorrect dates and dates with times can be parsed with natural overflow or as null dates and times, without an error message.
|
||||
|
||||
As an exception, parsing dates with times is also supported in Unix timestamp format, if it consists of exactly 10 decimal digits. The result is not time zone-dependent. The formats YYYY-MM-DD hh:mm:ss and NNNNNNNNNN are differentiated automatically.
|
||||
|
||||
Strings are output with backslash-escaped special characters. The following escape sequences are used for output: `\b`, `\f`, `\r`, `\n`, `\t`, `\0`, `\'`, `\\`. Parsing also supports the sequences `\a`, `\v`, and `\xHH` (hex escape sequences) and any `\c` sequences, where `c` is any character (these sequences are converted to `c`). Thus, reading data supports formats where a line feed can be written as `\n` or `\`, or as a line feed. For example, the string `Hello world` with a line feed between the words instead of a space can be parsed in any of the following variations:
|
||||
|
||||
```text
|
||||
Hello\nworld
|
||||
|
||||
Hello\
|
||||
world
|
||||
```
|
||||
|
||||
The second variant is supported because MySQL uses it when writing tab-separated dumps.
|
||||
|
||||
The minimum set of characters that you need to escape when passing data in TabSeparated format: tab, line feed (LF) and backslash.
|
||||
|
||||
Only a small set of symbols are escaped. You can easily stumble onto a string value that your terminal will ruin in output.
|
||||
|
||||
Arrays are written as a list of comma-separated values in square brackets. Number items in the array are fomratted as normally, but dates, dates with times, and strings are written in single quotes with the same escaping rules as above.
|
||||
|
||||
The TabSeparated format is convenient for processing data using custom programs and scripts. It is used by default in the HTTP interface, and in the command-line client's batch mode. This format also allows transferring data between different DBMSs. For example, you can get a dump from MySQL and upload it to ClickHouse, or vice versa.
|
||||
|
||||
The TabSeparated format supports outputting total values (when using WITH TOTALS) and extreme values (when 'extremes' is set to 1). In these cases, the total values and extremes are output after the main data. The main result, total values, and extremes are separated from each other by an empty line. Example:
|
||||
|
||||
```sql
|
||||
SELECT EventDate, count() AS c FROM test.hits GROUP BY EventDate WITH TOTALS ORDER BY EventDate FORMAT TabSeparated``
|
||||
```
|
||||
|
||||
```text
|
||||
2014-03-17 1406958
|
||||
2014-03-18 1383658
|
||||
2014-03-19 1405797
|
||||
2014-03-20 1353623
|
||||
2014-03-21 1245779
|
||||
2014-03-22 1031592
|
||||
2014-03-23 1046491
|
||||
|
||||
0000-00-00 8873898
|
||||
|
||||
2014-03-17 1031592
|
||||
2014-03-23 1406958
|
||||
```
|
||||
|
||||
This format is also available under the name `TSV`.
|
||||
|
@ -1,7 +0,0 @@
|
||||
# TabSeparatedRaw
|
||||
|
||||
Differs from `TabSeparated` format in that the rows are written without escaping.
|
||||
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
|
||||
|
||||
This format is also available under the name ` TSVRaw`.
|
||||
|
@ -1,8 +0,0 @@
|
||||
# TabSeparatedWithNames
|
||||
|
||||
Differs from the `TabSeparated` format in that the column names are written in the first row.
|
||||
During parsing, the first row is completely ignored. You can't use column names to determine their position or to check their correctness.
|
||||
(Support for parsing the header row may be added in the future.)
|
||||
|
||||
This format is also available under the name ` TSVWithNames`.
|
||||
|
@ -1,7 +0,0 @@
|
||||
# TabSeparatedWithNamesAndTypes
|
||||
|
||||
Differs from the `TabSeparated` format in that the column names are written to the first row, while the column types are in the second row.
|
||||
During parsing, the first and second rows are completely ignored.
|
||||
|
||||
This format is also available under the name ` TSVWithNamesAndTypes`.
|
||||
|
@ -1,23 +0,0 @@
|
||||
# TSKV
|
||||
|
||||
Similar to TabSeparated, but outputs a value in name=value format. Names are escaped the same way as in TabSeparated format, and the = symbol is also escaped.
|
||||
|
||||
```text
|
||||
SearchPhrase= count()=8267016
|
||||
SearchPhrase=bathroom interior design count()=2166
|
||||
SearchPhrase=yandex count()=1655
|
||||
SearchPhrase=spring 2014 fashion count()=1549
|
||||
SearchPhrase=freeform photos count()=1480
|
||||
SearchPhrase=angelina jolia count()=1245
|
||||
SearchPhrase=omsk count()=1112
|
||||
SearchPhrase=photos of dog breeds count()=1091
|
||||
SearchPhrase=curtain design count()=1064
|
||||
SearchPhrase=baku count()=1000
|
||||
```
|
||||
|
||||
When there is a large number of small columns, this format is ineffective, and there is generally no reason to use it. It is used in some departments of Yandex.
|
||||
|
||||
Both data output and parsing are supported in this format. For parsing, any order is supported for the values of different columns. It is acceptable for some values to be omitted – they are treated as equal to their default values. In this case, zeros and blank rows are used as default values. Complex values that could be specified in the table are not supported as defaults.
|
||||
|
||||
Parsing allows the presence of the additional field `tskv` without the equal sign or a value. This field is ignored.
|
||||
|
@ -1,8 +0,0 @@
|
||||
# Values
|
||||
|
||||
Prints every row in brackets. Rows are separated by commas. There is no comma after the last row. The values inside the brackets are also comma-separated. Numbers are output in decimal format without quotes. Arrays are output in square brackets. Strings, dates, and dates with times are output in quotes. Escaping rules and parsing are similar to the TabSeparated format. During formatting, extra spaces aren't inserted, but during parsing, they are allowed and skipped (except for spaces inside array values, which are not allowed).
|
||||
|
||||
The minimum set of characters that you need to escape when passing data in Values format: single quotes and backslashes.
|
||||
|
||||
This is the format that is used in `INSERT INTO t VALUES ...`, but you can also use it for formatting query results.
|
||||
|
@ -1,5 +0,0 @@
|
||||
# Vertical
|
||||
|
||||
Prints each value on a separate line with the column name specified. This format is convenient for printing just one or a few rows, if each row consists of a large number of columns.
|
||||
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
|
||||
|
@ -1,28 +0,0 @@
|
||||
# VerticalRaw
|
||||
|
||||
Differs from `Vertical` format in that the rows are not escaped.
|
||||
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
|
||||
|
||||
Examples:
|
||||
|
||||
```
|
||||
:) SHOW CREATE TABLE geonames FORMAT VerticalRaw;
|
||||
Row 1:
|
||||
──────
|
||||
statement: CREATE TABLE default.geonames ( geonameid UInt32, date Date DEFAULT CAST('2017-12-08' AS Date)) ENGINE = MergeTree(date, geonameid, 8192)
|
||||
|
||||
:) SELECT 'string with \'quotes\' and \t with some special \n characters' AS test FORMAT VerticalRaw;
|
||||
Row 1:
|
||||
──────
|
||||
test: string with 'quotes' and with some special
|
||||
characters
|
||||
```
|
||||
|
||||
Compare with the Vertical format:
|
||||
|
||||
```
|
||||
:) SELECT 'string with \'quotes\' and \t with some special \n characters' AS test FORMAT Vertical;
|
||||
Row 1:
|
||||
──────
|
||||
test: string with \'quotes\' and \t with some special \n characters
|
||||
```
|
@ -1,74 +0,0 @@
|
||||
# XML
|
||||
|
||||
XML format is suitable only for output, not for parsing. Example:
|
||||
|
||||
```xml
|
||||
<?xml version='1.0' encoding='UTF-8' ?>
|
||||
<result>
|
||||
<meta>
|
||||
<columns>
|
||||
<column>
|
||||
<name>SearchPhrase</name>
|
||||
<type>String</type>
|
||||
</column>
|
||||
<column>
|
||||
<name>count()</name>
|
||||
<type>UInt64</type>
|
||||
</column>
|
||||
</columns>
|
||||
</meta>
|
||||
<data>
|
||||
<row>
|
||||
<SearchPhrase></SearchPhrase>
|
||||
<field>8267016</field>
|
||||
</row>
|
||||
<row>
|
||||
<SearchPhrase>bathroom interior design</SearchPhrase>
|
||||
<field>2166</field>
|
||||
</row>
|
||||
<row>
|
||||
<SearchPhrase>yandex</SearchPhrase>
|
||||
<field>1655</field>
|
||||
</row>
|
||||
<row>
|
||||
<SearchPhrase>spring 2014 fashion</SearchPhrase>
|
||||
<field>1549</field>
|
||||
</row>
|
||||
<row>
|
||||
<SearchPhrase>freeform photos</SearchPhrase>
|
||||
<field>1480</field>
|
||||
</row>
|
||||
<row>
|
||||
<SearchPhrase>angelina jolie</SearchPhrase>
|
||||
<field>1245</field>
|
||||
</row>
|
||||
<row>
|
||||
<SearchPhrase>omsk</SearchPhrase>
|
||||
<field>1112</field>
|
||||
</row>
|
||||
<row>
|
||||
<SearchPhrase>photos of dog breeds</SearchPhrase>
|
||||
<field>1091</field>
|
||||
</row>
|
||||
<row>
|
||||
<SearchPhrase>curtain design</SearchPhrase>
|
||||
<field>1064</field>
|
||||
</row>
|
||||
<row>
|
||||
<SearchPhrase>baku</SearchPhrase>
|
||||
<field>1000</field>
|
||||
</row>
|
||||
</data>
|
||||
<rows>10</rows>
|
||||
<rows_before_limit_at_least>141137</rows_before_limit_at_least>
|
||||
</result>
|
||||
```
|
||||
|
||||
If the column name does not have an acceptable format, just 'field' is used as the element name. In general, the XML structure follows the JSON structure.
|
||||
Just as for JSON, invalid UTF-8 sequences are changed to the replacement character <20> so the output text will consist of valid UTF-8 sequences.
|
||||
|
||||
In string values, the characters `<` and `&` are escaped as `<` and `&`.
|
||||
|
||||
Arrays are output as `<array><elem>Hello</elem><elem>World</elem>...</array>`,
|
||||
and tuples as `<tuple><elem>Hello</elem><elem>World</elem>...</tuple>`.
|
||||
|
541
docs/en/interfaces/formats.md
Normal file
541
docs/en/interfaces/formats.md
Normal file
@ -0,0 +1,541 @@
|
||||
<a name="formats"></a>
|
||||
|
||||
# Input and output formats
|
||||
|
||||
The format determines how data is returned to you after SELECTs (how it is written and formatted by the server), and how it is accepted for INSERTs (how it is read and parsed by the server).
|
||||
|
||||
See the table below for the list of supported formats for either kinds of queries.
|
||||
|
||||
Format | INSERT | SELECT
|
||||
-------|--------|--------
|
||||
[TabSeparated](formats.md#tabseparated) | ✔ | ✔ |
|
||||
[TabSeparatedRaw](formats.md#tabseparatedraw) | ✗ | ✔ |
|
||||
[TabSeparatedWithNames](formats.md#tabseparatedwithnames) | ✔ | ✔ |
|
||||
[TabSeparatedWithNamesAndTypes](formats.md#tabseparatedwithnamesandtypes) | ✔ | ✔ |
|
||||
[CSV](formats.md#csv) | ✔ | ✔ |
|
||||
[CSVWithNames](formats.md#csvwithnames) | ✔ | ✔ |
|
||||
[Values](formats.md#values) | ✔ | ✔ |
|
||||
[Vertical](formats.md#vertical) | ✗ | ✔ |
|
||||
[VerticalRaw](formats.md#verticalraw) | ✗ | ✔ |
|
||||
[JSON](formats.md#json) | ✗ | ✔ |
|
||||
[JSONCompact](formats.md#jsoncompact) | ✗ | ✔ |
|
||||
[JSONEachRow](formats.md#jsoneachrow) | ✔ | ✔ |
|
||||
[TSKV](formats.md#tskv) | ✔ | ✔ |
|
||||
[Pretty](formats.md#pretty) | ✗ | ✔ |
|
||||
[PrettyCompact](formats.md#prettycompact) | ✗ | ✔ |
|
||||
[PrettyCompactMonoBlock](formats.md#prettycompactmonoblock) | ✗ | ✔ |
|
||||
[PrettyNoEscapes](formats.md#prettynoescapes) | ✗ | ✔ |
|
||||
[PrettySpace](formats.md#prettyspace) | ✗ | ✔ |
|
||||
[RowBinary](formats.md#rowbinary) | ✔ | ✔ |
|
||||
[Native](formats.md#native) | ✔ | ✔ |
|
||||
[Null](formats.md#null) | ✗ | ✔ |
|
||||
[XML](formats.md#xml) | ✗ | ✔ |
|
||||
[CapnProto](formats.md#capnproto) | ✔ | ✔ |
|
||||
|
||||
<a name="format_capnproto"></a>
|
||||
|
||||
## CapnProto
|
||||
|
||||
Cap'n Proto is a binary message format similar to Protocol Buffers and Thrift, but not like JSON or MessagePack.
|
||||
|
||||
Cap'n Proto messages are strictly typed and not self-describing, meaning they need an external schema description. The schema is applied on the fly and cached for each query.
|
||||
|
||||
```sql
|
||||
SELECT SearchPhrase, count() AS c FROM test.hits
|
||||
GROUP BY SearchPhrase FORMAT CapnProto SETTINGS schema = 'schema:Message'
|
||||
```
|
||||
|
||||
Where `schema.capnp` looks like this:
|
||||
|
||||
```
|
||||
struct Message {
|
||||
SearchPhrase @0 :Text;
|
||||
c @1 :Uint64;
|
||||
}
|
||||
```
|
||||
|
||||
Schema files are in the file that is located in the directory specified in [ format_schema_path](../operations/server_settings/settings.md#server_settings-format_schema_path) in the server configuration.
|
||||
|
||||
Deserialization is effective and usually doesn't increase the system load.
|
||||
|
||||
## CSV
|
||||
|
||||
Comma Separated Values format ([RFC](https://tools.ietf.org/html/rfc4180)).
|
||||
|
||||
When formatting, rows are enclosed in double quotes. A double quote inside a string is output as two double quotes in a row. There are no other rules for escaping characters. Date and date-time are enclosed in double quotes. Numbers are output without quotes. Values are separated by a delimiter*. Rows are separated using the Unix line feed (LF). Arrays are serialized in CSV as follows: first the array is serialized to a string as in TabSeparated format, and then the resulting string is output to CSV in double quotes. Tuples in CSV format are serialized as separate columns (that is, their nesting in the tuple is lost).
|
||||
|
||||
*By default — `,`. See a [format_csv_delimiter](/docs/en/operations/settings/settings/#format_csv_delimiter) setting for additional info.
|
||||
|
||||
When parsing, all values can be parsed either with or without quotes. Both double and single quotes are supported. Rows can also be arranged without quotes. In this case, they are parsed up to a delimiter or line feed (CR or LF). In violation of the RFC, when parsing rows without quotes, the leading and trailing spaces and tabs are ignored. For the line feed, Unix (LF), Windows (CR LF) and Mac OS Classic (CR LF) are all supported.
|
||||
|
||||
The CSV format supports the output of totals and extremes the same way as `TabSeparated`.
|
||||
|
||||
## CSVWithNames
|
||||
|
||||
Also prints the header row, similar to `TabSeparatedWithNames`.
|
||||
|
||||
## JSON
|
||||
|
||||
Outputs data in JSON format. Besides data tables, it also outputs column names and types, along with some additional information: the total number of output rows, and the number of rows that could have been output if there weren't a LIMIT. Example:
|
||||
|
||||
```sql
|
||||
SELECT SearchPhrase, count() AS c FROM test.hits GROUP BY SearchPhrase WITH TOTALS ORDER BY c DESC LIMIT 5 FORMAT JSON
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"meta":
|
||||
[
|
||||
{
|
||||
"name": "SearchPhrase",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "c",
|
||||
"type": "UInt64"
|
||||
}
|
||||
],
|
||||
|
||||
"data":
|
||||
[
|
||||
{
|
||||
"SearchPhrase": "",
|
||||
"c": "8267016"
|
||||
},
|
||||
{
|
||||
"SearchPhrase": "bathroom interior design",
|
||||
"c": "2166"
|
||||
},
|
||||
{
|
||||
"SearchPhrase": "yandex",
|
||||
"c": "1655"
|
||||
},
|
||||
{
|
||||
"SearchPhrase": "spring 2014 fashion",
|
||||
"c": "1549"
|
||||
},
|
||||
{
|
||||
"SearchPhrase": "freeform photos",
|
||||
"c": "1480"
|
||||
}
|
||||
],
|
||||
|
||||
"totals":
|
||||
{
|
||||
"SearchPhrase": "",
|
||||
"c": "8873898"
|
||||
},
|
||||
|
||||
"extremes":
|
||||
{
|
||||
"min":
|
||||
{
|
||||
"SearchPhrase": "",
|
||||
"c": "1480"
|
||||
},
|
||||
"max":
|
||||
{
|
||||
"SearchPhrase": "",
|
||||
"c": "8267016"
|
||||
}
|
||||
},
|
||||
|
||||
"rows": 5,
|
||||
|
||||
"rows_before_limit_at_least": 141137
|
||||
}
|
||||
```
|
||||
|
||||
The JSON is compatible with JavaScript. To ensure this, some characters are additionally escaped: the slash ` /` is escaped as ` \/`; alternative line breaks ` U+2028` and ` U+2029`, which break some browsers, are escaped as ` \uXXXX`. ASCII control characters are escaped: backspace, form feed, line feed, carriage return, and horizontal tab are replaced with `\b`, `\f`, `\n`, `\r`, `\t` , as well as the remaining bytes in the 00-1F range using `\uXXXX` sequences. Invalid UTF-8 sequences are changed to the replacement character <20> so the output text will consist of valid UTF-8 sequences. For compatibility with JavaScript, Int64 and UInt64 integers are enclosed in double quotes by default. To remove the quotes, you can set the configuration parameter output_format_json_quote_64bit_integers to 0.
|
||||
|
||||
`rows` – The total number of output rows.
|
||||
|
||||
`rows_before_limit_at_least` The minimal number of rows there would have been without LIMIT. Output only if the query contains LIMIT.
|
||||
If the query contains GROUP BY, rows_before_limit_at_least is the exact number of rows there would have been without a LIMIT.
|
||||
|
||||
`totals` – Total values (when using WITH TOTALS).
|
||||
|
||||
`extremes` – Extreme values (when extremes is set to 1).
|
||||
|
||||
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
|
||||
See also the JSONEachRow format.
|
||||
## JSONCompact
|
||||
|
||||
Differs from JSON only in that data rows are output in arrays, not in objects.
|
||||
|
||||
Example:
|
||||
|
||||
```json
|
||||
{
|
||||
"meta":
|
||||
[
|
||||
{
|
||||
"name": "SearchPhrase",
|
||||
"type": "String"
|
||||
},
|
||||
{
|
||||
"name": "c",
|
||||
"type": "UInt64"
|
||||
}
|
||||
],
|
||||
|
||||
"data":
|
||||
[
|
||||
["", "8267016"],
|
||||
["bathroom interior design", "2166"],
|
||||
["yandex", "1655"],
|
||||
["spring 2014 fashion", "1549"],
|
||||
["freeform photos", "1480"]
|
||||
],
|
||||
|
||||
"totals": ["","8873898"],
|
||||
|
||||
"extremes":
|
||||
{
|
||||
"min": ["","1480"],
|
||||
"max": ["","8267016"]
|
||||
},
|
||||
|
||||
"rows": 5,
|
||||
|
||||
"rows_before_limit_at_least": 141137
|
||||
}
|
||||
```
|
||||
|
||||
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
|
||||
See also the `JSONEachRow` format.
|
||||
|
||||
## JSONEachRow
|
||||
|
||||
Outputs data as separate JSON objects for each row (newline delimited JSON).
|
||||
|
||||
```json
|
||||
{"SearchPhrase":"","count()":"8267016"}
|
||||
{"SearchPhrase":"bathroom interior design","count()":"2166"}
|
||||
{"SearchPhrase":"yandex","count()":"1655"}
|
||||
{"SearchPhrase":"spring 2014 fashion","count()":"1549"}
|
||||
{"SearchPhrase":"freeform photo","count()":"1480"}
|
||||
{"SearchPhrase":"angelina jolie","count()":"1245"}
|
||||
{"SearchPhrase":"omsk","count()":"1112"}
|
||||
{"SearchPhrase":"photos of dog breeds","count()":"1091"}
|
||||
{"SearchPhrase":"curtain design","count()":"1064"}
|
||||
{"SearchPhrase":"baku","count()":"1000"}
|
||||
```
|
||||
|
||||
Unlike the JSON format, there is no substitution of invalid UTF-8 sequences. Any set of bytes can be output in the rows. This is necessary so that data can be formatted without losing any information. Values are escaped in the same way as for JSON.
|
||||
|
||||
For parsing, any order is supported for the values of different columns. It is acceptable for some values to be omitted – they are treated as equal to their default values. In this case, zeros and blank rows are used as default values. Complex values that could be specified in the table are not supported as defaults. Whitespace between elements is ignored. If a comma is placed after the objects, it is ignored. Objects don't necessarily have to be separated by new lines.
|
||||
|
||||
## Native
|
||||
|
||||
The most efficient format. Data is written and read by blocks in binary format. For each block, the number of rows, number of columns, column names and types, and parts of columns in this block are recorded one after another. In other words, this format is "columnar" – it doesn't convert columns to rows. This is the format used in the native interface for interaction between servers, for using the command-line client, and for C++ clients.
|
||||
|
||||
You can use this format to quickly generate dumps that can only be read by the ClickHouse DBMS. It doesn't make sense to work with this format yourself.
|
||||
|
||||
## Null
|
||||
|
||||
Nothing is output. However, the query is processed, and when using the command-line client, data is transmitted to the client. This is used for tests, including productivity testing.
|
||||
Obviously, this format is only appropriate for output, not for parsing.
|
||||
|
||||
## Pretty
|
||||
|
||||
Outputs data as Unicode-art tables, also using ANSI-escape sequences for setting colors in the terminal.
|
||||
A full grid of the table is drawn, and each row occupies two lines in the terminal.
|
||||
Each result block is output as a separate table. This is necessary so that blocks can be output without buffering results (buffering would be necessary in order to pre-calculate the visible width of all the values).
|
||||
To avoid dumping too much data to the terminal, only the first 10,000 rows are printed. If the number of rows is greater than or equal to 10,000, the message "Showed first 10 000" is printed.
|
||||
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
|
||||
|
||||
The Pretty format supports outputting total values (when using WITH TOTALS) and extremes (when 'extremes' is set to 1). In these cases, total values and extreme values are output after the main data, in separate tables. Example (shown for the PrettyCompact format):
|
||||
|
||||
```sql
|
||||
SELECT EventDate, count() AS c FROM test.hits GROUP BY EventDate WITH TOTALS ORDER BY EventDate FORMAT PrettyCompact
|
||||
```
|
||||
|
||||
```text
|
||||
┌──EventDate─┬───────c─┐
|
||||
│ 2014-03-17 │ 1406958 │
|
||||
│ 2014-03-18 │ 1383658 │
|
||||
│ 2014-03-19 │ 1405797 │
|
||||
│ 2014-03-20 │ 1353623 │
|
||||
│ 2014-03-21 │ 1245779 │
|
||||
│ 2014-03-22 │ 1031592 │
|
||||
│ 2014-03-23 │ 1046491 │
|
||||
└────────────┴─────────┘
|
||||
|
||||
Totals:
|
||||
┌──EventDate─┬───────c─┐
|
||||
│ 0000-00-00 │ 8873898 │
|
||||
└────────────┴─────────┘
|
||||
|
||||
Extremes:
|
||||
┌──EventDate─┬───────c─┐
|
||||
│ 2014-03-17 │ 1031592 │
|
||||
│ 2014-03-23 │ 1406958 │
|
||||
└────────────┴─────────┘
|
||||
```
|
||||
|
||||
## PrettyCompact
|
||||
|
||||
Differs from `Pretty` in that the grid is drawn between rows and the result is more compact.
|
||||
This format is used by default in the command-line client in interactive mode.
|
||||
|
||||
## PrettyCompactMonoBlock
|
||||
|
||||
Differs from `PrettyCompact` in that up to 10,000 rows are buffered, then output as a single table, not by blocks.
|
||||
|
||||
## PrettyNoEscapes
|
||||
|
||||
Differs from Pretty in that ANSI-escape sequences aren't used. This is necessary for displaying this format in a browser, as well as for using the 'watch' command-line utility.
|
||||
|
||||
Example:
|
||||
|
||||
```bash
|
||||
watch -n1 "clickhouse-client --query='SELECT * FROM system.events FORMAT PrettyCompactNoEscapes'"
|
||||
```
|
||||
|
||||
You can use the HTTP interface for displaying in the browser.
|
||||
|
||||
### PrettyCompactNoEscapes
|
||||
|
||||
The same as the previous setting.
|
||||
|
||||
### PrettySpaceNoEscapes
|
||||
|
||||
The same as the previous setting.
|
||||
|
||||
## PrettySpace
|
||||
|
||||
Differs from `PrettyCompact` in that whitespace (space characters) is used instead of the grid.
|
||||
|
||||
## RowBinary
|
||||
|
||||
Formats and parses data by row in binary format. Rows and values are listed consecutively, without separators.
|
||||
This format is less efficient than the Native format, since it is row-based.
|
||||
|
||||
Integers use fixed-length little endian representation. For example, UInt64 uses 8 bytes.
|
||||
DateTime is represented as UInt32 containing the Unix timestamp as the value.
|
||||
Date is represented as a UInt16 object that contains the number of days since 1970-01-01 as the value.
|
||||
String is represented as a varint length (unsigned [LEB128](https://en.wikipedia.org/wiki/LEB128)), followed by the bytes of the string.
|
||||
FixedString is represented simply as a sequence of bytes.
|
||||
|
||||
Array is represented as a varint length (unsigned [LEB128](https://en.wikipedia.org/wiki/LEB128)), followed by successive elements of the array.
|
||||
|
||||
## TabSeparated
|
||||
|
||||
In TabSeparated format, data is written by row. Each row contains values separated by tabs. Each value is follow by a tab, except the last value in the row, which is followed by a line feed. Strictly Unix line feeds are assumed everywhere. The last row also must contain a line feed at the end. Values are written in text format, without enclosing quotation marks, and with special characters escaped.
|
||||
|
||||
Integer numbers are written in decimal form. Numbers can contain an extra "+" character at the beginning (ignored when parsing, and not recorded when formatting). Non-negative numbers can't contain the negative sign. When reading, it is allowed to parse an empty string as a zero, or (for signed types) a string consisting of just a minus sign as a zero. Numbers that do not fit into the corresponding data type may be parsed as a different number, without an error message.
|
||||
|
||||
Floating-point numbers are written in decimal form. The dot is used as the decimal separator. Exponential entries are supported, as are 'inf', '+inf', '-inf', and 'nan'. An entry of floating-point numbers may begin or end with a decimal point.
|
||||
During formatting, accuracy may be lost on floating-point numbers.
|
||||
During parsing, it is not strictly required to read the nearest machine-representable number.
|
||||
|
||||
Dates are written in YYYY-MM-DD format and parsed in the same format, but with any characters as separators.
|
||||
Dates with times are written in the format YYYY-MM-DD hh:mm:ss and parsed in the same format, but with any characters as separators.
|
||||
This all occurs in the system time zone at the time the client or server starts (depending on which one formats data). For dates with times, daylight saving time is not specified. So if a dump has times during daylight saving time, the dump does not unequivocally match the data, and parsing will select one of the two times.
|
||||
During a read operation, incorrect dates and dates with times can be parsed with natural overflow or as null dates and times, without an error message.
|
||||
|
||||
As an exception, parsing dates with times is also supported in Unix timestamp format, if it consists of exactly 10 decimal digits. The result is not time zone-dependent. The formats YYYY-MM-DD hh:mm:ss and NNNNNNNNNN are differentiated automatically.
|
||||
|
||||
Strings are output with backslash-escaped special characters. The following escape sequences are used for output: `\b`, `\f`, `\r`, `\n`, `\t`, `\0`, `\'`, `\\`. Parsing also supports the sequences `\a`, `\v`, and `\xHH` (hex escape sequences) and any `\c` sequences, where `c` is any character (these sequences are converted to `c`). Thus, reading data supports formats where a line feed can be written as `\n` or `\`, or as a line feed. For example, the string `Hello world` with a line feed between the words instead of a space can be parsed in any of the following variations:
|
||||
|
||||
```text
|
||||
Hello\nworld
|
||||
|
||||
Hello\
|
||||
world
|
||||
```
|
||||
|
||||
The second variant is supported because MySQL uses it when writing tab-separated dumps.
|
||||
|
||||
The minimum set of characters that you need to escape when passing data in TabSeparated format: tab, line feed (LF) and backslash.
|
||||
|
||||
Only a small set of symbols are escaped. You can easily stumble onto a string value that your terminal will ruin in output.
|
||||
|
||||
Arrays are written as a list of comma-separated values in square brackets. Number items in the array are fomratted as normally, but dates, dates with times, and strings are written in single quotes with the same escaping rules as above.
|
||||
|
||||
The TabSeparated format is convenient for processing data using custom programs and scripts. It is used by default in the HTTP interface, and in the command-line client's batch mode. This format also allows transferring data between different DBMSs. For example, you can get a dump from MySQL and upload it to ClickHouse, or vice versa.
|
||||
|
||||
The TabSeparated format supports outputting total values (when using WITH TOTALS) and extreme values (when 'extremes' is set to 1). In these cases, the total values and extremes are output after the main data. The main result, total values, and extremes are separated from each other by an empty line. Example:
|
||||
|
||||
```sql
|
||||
SELECT EventDate, count() AS c FROM test.hits GROUP BY EventDate WITH TOTALS ORDER BY EventDate FORMAT TabSeparated``
|
||||
```
|
||||
|
||||
```text
|
||||
2014-03-17 1406958
|
||||
2014-03-18 1383658
|
||||
2014-03-19 1405797
|
||||
2014-03-20 1353623
|
||||
2014-03-21 1245779
|
||||
2014-03-22 1031592
|
||||
2014-03-23 1046491
|
||||
|
||||
0000-00-00 8873898
|
||||
|
||||
2014-03-17 1031592
|
||||
2014-03-23 1406958
|
||||
```
|
||||
|
||||
This format is also available under the name `TSV`.
|
||||
|
||||
## TabSeparatedRaw
|
||||
|
||||
Differs from `TabSeparated` format in that the rows are written without escaping.
|
||||
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
|
||||
|
||||
This format is also available under the name ` TSVRaw`.
|
||||
|
||||
## TabSeparatedWithNames
|
||||
|
||||
Differs from the `TabSeparated` format in that the column names are written in the first row.
|
||||
During parsing, the first row is completely ignored. You can't use column names to determine their position or to check their correctness.
|
||||
(Support for parsing the header row may be added in the future.)
|
||||
|
||||
This format is also available under the name ` TSVWithNames`.
|
||||
|
||||
## TabSeparatedWithNamesAndTypes
|
||||
|
||||
Differs from the `TabSeparated` format in that the column names are written to the first row, while the column types are in the second row.
|
||||
During parsing, the first and second rows are completely ignored.
|
||||
|
||||
This format is also available under the name ` TSVWithNamesAndTypes`.
|
||||
|
||||
## TSKV
|
||||
|
||||
Similar to TabSeparated, but outputs a value in name=value format. Names are escaped the same way as in TabSeparated format, and the = symbol is also escaped.
|
||||
|
||||
```text
|
||||
SearchPhrase= count()=8267016
|
||||
SearchPhrase=bathroom interior design count()=2166
|
||||
SearchPhrase=yandex count()=1655
|
||||
SearchPhrase=spring 2014 fashion count()=1549
|
||||
SearchPhrase=freeform photos count()=1480
|
||||
SearchPhrase=angelina jolia count()=1245
|
||||
SearchPhrase=omsk count()=1112
|
||||
SearchPhrase=photos of dog breeds count()=1091
|
||||
SearchPhrase=curtain design count()=1064
|
||||
SearchPhrase=baku count()=1000
|
||||
```
|
||||
|
||||
When there is a large number of small columns, this format is ineffective, and there is generally no reason to use it. It is used in some departments of Yandex.
|
||||
|
||||
Both data output and parsing are supported in this format. For parsing, any order is supported for the values of different columns. It is acceptable for some values to be omitted – they are treated as equal to their default values. In this case, zeros and blank rows are used as default values. Complex values that could be specified in the table are not supported as defaults.
|
||||
|
||||
Parsing allows the presence of the additional field `tskv` without the equal sign or a value. This field is ignored.
|
||||
|
||||
## Values
|
||||
|
||||
Prints every row in brackets. Rows are separated by commas. There is no comma after the last row. The values inside the brackets are also comma-separated. Numbers are output in decimal format without quotes. Arrays are output in square brackets. Strings, dates, and dates with times are output in quotes. Escaping rules and parsing are similar to the TabSeparated format. During formatting, extra spaces aren't inserted, but during parsing, they are allowed and skipped (except for spaces inside array values, which are not allowed).
|
||||
|
||||
The minimum set of characters that you need to escape when passing data in Values format: single quotes and backslashes.
|
||||
|
||||
This is the format that is used in `INSERT INTO t VALUES ...`, but you can also use it for formatting query results.
|
||||
|
||||
## Vertical
|
||||
|
||||
Prints each value on a separate line with the column name specified. This format is convenient for printing just one or a few rows, if each row consists of a large number of columns.
|
||||
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
|
||||
|
||||
## VerticalRaw
|
||||
|
||||
Differs from `Vertical` format in that the rows are not escaped.
|
||||
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
|
||||
|
||||
Examples:
|
||||
|
||||
```
|
||||
:) SHOW CREATE TABLE geonames FORMAT VerticalRaw;
|
||||
Row 1:
|
||||
──────
|
||||
statement: CREATE TABLE default.geonames ( geonameid UInt32, date Date DEFAULT CAST('2017-12-08' AS Date)) ENGINE = MergeTree(date, geonameid, 8192)
|
||||
|
||||
:) SELECT 'string with \'quotes\' and \t with some special \n characters' AS test FORMAT VerticalRaw;
|
||||
Row 1:
|
||||
──────
|
||||
test: string with 'quotes' and with some special
|
||||
characters
|
||||
```
|
||||
|
||||
Compare with the Vertical format:
|
||||
|
||||
```
|
||||
:) SELECT 'string with \'quotes\' and \t with some special \n characters' AS test FORMAT Vertical;
|
||||
Row 1:
|
||||
──────
|
||||
test: string with \'quotes\' and \t with some special \n characters
|
||||
```
|
||||
## XML
|
||||
|
||||
XML format is suitable only for output, not for parsing. Example:
|
||||
|
||||
```xml
|
||||
<?xml version='1.0' encoding='UTF-8' ?>
|
||||
<result>
|
||||
<meta>
|
||||
<columns>
|
||||
<column>
|
||||
<name>SearchPhrase</name>
|
||||
<type>String</type>
|
||||
</column>
|
||||
<column>
|
||||
<name>count()</name>
|
||||
<type>UInt64</type>
|
||||
</column>
|
||||
</columns>
|
||||
</meta>
|
||||
<data>
|
||||
<row>
|
||||
<SearchPhrase></SearchPhrase>
|
||||
<field>8267016</field>
|
||||
</row>
|
||||
<row>
|
||||
<SearchPhrase>bathroom interior design</SearchPhrase>
|
||||
<field>2166</field>
|
||||
</row>
|
||||
<row>
|
||||
<SearchPhrase>yandex</SearchPhrase>
|
||||
<field>1655</field>
|
||||
</row>
|
||||
<row>
|
||||
<SearchPhrase>spring 2014 fashion</SearchPhrase>
|
||||
<field>1549</field>
|
||||
</row>
|
||||
<row>
|
||||
<SearchPhrase>freeform photos</SearchPhrase>
|
||||
<field>1480</field>
|
||||
</row>
|
||||
<row>
|
||||
<SearchPhrase>angelina jolie</SearchPhrase>
|
||||
<field>1245</field>
|
||||
</row>
|
||||
<row>
|
||||
<SearchPhrase>omsk</SearchPhrase>
|
||||
<field>1112</field>
|
||||
</row>
|
||||
<row>
|
||||
<SearchPhrase>photos of dog breeds</SearchPhrase>
|
||||
<field>1091</field>
|
||||
</row>
|
||||
<row>
|
||||
<SearchPhrase>curtain design</SearchPhrase>
|
||||
<field>1064</field>
|
||||
</row>
|
||||
<row>
|
||||
<SearchPhrase>baku</SearchPhrase>
|
||||
<field>1000</field>
|
||||
</row>
|
||||
</data>
|
||||
<rows>10</rows>
|
||||
<rows_before_limit_at_least>141137</rows_before_limit_at_least>
|
||||
</result>
|
||||
```
|
||||
|
||||
If the column name does not have an acceptable format, just 'field' is used as the element name. In general, the XML structure follows the JSON structure.
|
||||
Just as for JSON, invalid UTF-8 sequences are changed to the replacement character <20> so the output text will consist of valid UTF-8 sequences.
|
||||
|
||||
In string values, the characters `<` and `&` are escaped as `<` and `&`.
|
||||
|
||||
Arrays are output as `<array><elem>Hello</elem><elem>World</elem>...</array>`,
|
||||
and tuples as `<tuple><elem>Hello</elem><elem>World</elem>...</tuple>`.
|
||||
|
@ -58,5 +58,5 @@ This lets us use the system as the back-end for a web interface. Low latency mea
|
||||
## Data replication and support for data integrity on replicas
|
||||
|
||||
Uses asynchronous multimaster replication. After being written to any available replica, data is distributed to all the remaining replicas. The system maintains identical data on different replicas. Data is restored automatically after a failure, or using a "button" for complex cases.
|
||||
For more information, see the section [Data replication](../table_engines/replication.md#table_engines-replication).
|
||||
For more information, see the section [Data replication](../operations/table_engines/replication.md#table_engines-replication).
|
||||
|
||||
|
@ -61,7 +61,7 @@ Users are recorded in the `users` section. Here is a fragment of the `users.xml`
|
||||
|
||||
You can see a declaration from two users: `default` and `web`. We added the `web` user separately.
|
||||
|
||||
The `default` user is chosen in cases when the username is not passed. The `default` user is also used for distributed query processing, if the configuration of the server or cluster doesn't specify the `user` and `password` (see the section on the [Distributed](../table_engines/distributed.md#table_engines-distributed) engine).
|
||||
The `default` user is chosen in cases when the username is not passed. The `default` user is also used for distributed query processing, if the configuration of the server or cluster doesn't specify the `user` and `password` (see the section on the [Distributed](../operations/table_engines/distributed.md#table_engines-distributed) engine).
|
||||
|
||||
The user that is used for exchanging information between servers combined in a cluster must not have substantial restrictions or quotas – otherwise, distributed queries will fail.
|
||||
|
||||
|
@ -67,7 +67,7 @@ ClickHouse checks ` min_part_size` and ` min_part_size_ratio` and processes th
|
||||
|
||||
The default database.
|
||||
|
||||
To get a list of databases, use the [SHOW DATABASES](../../query_language/queries.md#query_language_queries_show_databases).
|
||||
To get a list of databases, use the [SHOW DATABASES](../../query_language/misc.md#query_language_queries_show_databases).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -100,7 +100,7 @@ Path:
|
||||
- Specify the absolute path or the path relative to the server config file.
|
||||
- The path can contain wildcards \* and ?.
|
||||
|
||||
See also "[External dictionaries](../../dicts/external_dicts.md#dicts-external_dicts)".
|
||||
See also "[External dictionaries](../../query_language/dicts/external_dicts.md#dicts-external_dicts)".
|
||||
|
||||
**Example**
|
||||
|
||||
@ -130,7 +130,7 @@ The default is ` true`.
|
||||
|
||||
## format_schema_path
|
||||
|
||||
The path to the directory with the schemes for the input data, such as schemas for the [CapnProto](../../formats/capnproto.md#format_capnproto) format.
|
||||
The path to the directory with the schemes for the input data, such as schemas for the [CapnProto](../../interfaces/formats.md#format_capnproto) format.
|
||||
|
||||
**Example**
|
||||
|
||||
@ -179,7 +179,7 @@ You can configure multiple `<graphite>` clauses. For instance, you can use this
|
||||
|
||||
Settings for thinning data for Graphite.
|
||||
|
||||
For more information, see [GraphiteMergeTree](../../table_engines/graphitemergetree.md#table_engines-graphitemergetree).
|
||||
For more information, see [GraphiteMergeTree](../../operations/table_engines/graphitemergetree.md#table_engines-graphitemergetree).
|
||||
|
||||
**Example**
|
||||
|
||||
@ -358,7 +358,7 @@ Parameter substitutions for replicated tables.
|
||||
|
||||
Can be omitted if replicated tables are not used.
|
||||
|
||||
For more information, see the section "[Creating replicated tables](../../table_engines/replication.md#table_engines-replication-creation_of_rep_tables)".
|
||||
For more information, see the section "[Creating replicated tables](../../operations/table_engines/replication.md#table_engines-replication-creation_of_rep_tables)".
|
||||
|
||||
**Example**
|
||||
|
||||
@ -370,7 +370,7 @@ For more information, see the section "[Creating replicated tables](../../table_
|
||||
|
||||
## mark_cache_size
|
||||
|
||||
Approximate size (in bytes) of the cache of "marks" used by [MergeTree](../../table_engines/mergetree.md#table_engines-mergetree) engines.
|
||||
Approximate size (in bytes) of the cache of "marks" used by [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree) engines.
|
||||
|
||||
The cache is shared for the server and memory is allocated as needed. The cache size must be at least 5368709120.
|
||||
|
||||
@ -426,7 +426,7 @@ We recommend using this option in Mac OS X, since the ` getrlimit()` function re
|
||||
|
||||
Restriction on deleting tables.
|
||||
|
||||
If the size of a [MergeTree](../../table_engines/mergetree.md#table_engines-mergetree) type table exceeds `max_table_size_to_drop` (in bytes), you can't delete it using a DROP query.
|
||||
If the size of a [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree) type table exceeds `max_table_size_to_drop` (in bytes), you can't delete it using a DROP query.
|
||||
|
||||
If you still need to delete the table without restarting the ClickHouse server, create the ` <clickhouse-path>/flags/force_drop_table` file and run the DROP query.
|
||||
|
||||
@ -444,7 +444,7 @@ The value 0 means that you can delete all tables without any restrictions.
|
||||
|
||||
## merge_tree
|
||||
|
||||
Fine tuning for tables in the [ MergeTree](../../table_engines/mergetree.md#table_engines-mergetree) family.
|
||||
Fine tuning for tables in the [ MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree) family.
|
||||
|
||||
For more information, see the MergeTreeSettings.h header file.
|
||||
|
||||
@ -521,7 +521,7 @@ Keys for server/client settings:
|
||||
|
||||
## part_log
|
||||
|
||||
Logging events that are associated with [MergeTree](../../table_engines/mergetree.md#table_engines-mergetree) data. For instance, adding or merging data. You can use the log to simulate merge algorithms and compare their characteristics. You can visualize the merge process.
|
||||
Logging events that are associated with [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree) data. For instance, adding or merging data. You can use the log to simulate merge algorithms and compare their characteristics. You can visualize the merge process.
|
||||
|
||||
Queries are logged in the ClickHouse table, not in a separate file.
|
||||
|
||||
@ -541,7 +541,7 @@ Use the following parameters to configure logging:
|
||||
|
||||
- database – Name of the database.
|
||||
- table – Name of the table.
|
||||
- partition_by – Sets a [custom partitioning key](../../table_engines/custom_partitioning_key.md#custom-partitioning-key).
|
||||
- partition_by – Sets a [custom partitioning key](../../operations/table_engines/custom_partitioning_key.md#custom-partitioning-key).
|
||||
- flush_interval_milliseconds – Interval for flushing data from memory to the disk.
|
||||
|
||||
**Example**
|
||||
@ -585,7 +585,7 @@ Use the following parameters to configure logging:
|
||||
|
||||
- database – Name of the database.
|
||||
- table – Name of the table.
|
||||
- partition_by – Sets a [custom partitioning key](../../table_engines/custom_partitioning_key.md#custom-partitioning-key).
|
||||
- partition_by – Sets a [custom partitioning key](../../operations/table_engines/custom_partitioning_key.md#custom-partitioning-key).
|
||||
- flush_interval_milliseconds – Interval for flushing data from memory to the disk.
|
||||
|
||||
If the table doesn't exist, ClickHouse will create it. If the structure of the query log changed when the ClickHouse server was updated, the table with the old structure is renamed, and a new table is created automatically.
|
||||
@ -607,7 +607,7 @@ If the table doesn't exist, ClickHouse will create it. If the structure of the q
|
||||
|
||||
Configuration of clusters used by the Distributed table engine.
|
||||
|
||||
For more information, see the section "[Table engines/Distributed](../../table_engines/distributed.md#table_engines-distributed)".
|
||||
For more information, see the section "[Table engines/Distributed](../../operations/table_engines/distributed.md#table_engines-distributed)".
|
||||
|
||||
**Example**
|
||||
|
||||
@ -667,7 +667,7 @@ The end slash is mandatory.
|
||||
|
||||
## uncompressed_cache_size
|
||||
|
||||
Cache size (in bytes) for uncompressed data used by table engines from the [MergeTree](../../table_engines/mergetree.md#table_engines-mergetree) family.
|
||||
Cache size (in bytes) for uncompressed data used by table engines from the [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree) family.
|
||||
|
||||
There is one shared cache for the server. Memory is allocated on demand. The cache is used if the option [use_uncompressed_cache](../settings/settings.md#settings-use_uncompressed_cache) is enabled.
|
||||
|
||||
@ -681,7 +681,7 @@ The uncompressed cache is advantageous for very short queries in individual case
|
||||
|
||||
## user_files_path
|
||||
|
||||
A catalog with user files. Used in a [file()](../../table_functions/file.md#table_functions-file) table function.
|
||||
A catalog with user files. Used in a [file()](../../query_language/table_functions/file.md#table_functions-file) table function.
|
||||
|
||||
**Example**
|
||||
|
||||
@ -716,7 +716,7 @@ ClickHouse uses ZooKeeper for storing replica metadata when using replicated tab
|
||||
|
||||
This parameter can be omitted if replicated tables are not used.
|
||||
|
||||
For more information, see the section "[Replication](../../table_engines/replication.md#table_engines-replication)".
|
||||
For more information, see the section "[Replication](../../operations/table_engines/replication.md#table_engines-replication)".
|
||||
|
||||
**Example**
|
||||
|
||||
|
@ -4,7 +4,7 @@
|
||||
|
||||
## distributed_product_mode
|
||||
|
||||
Changes the behavior of [distributed subqueries](../../query_language/queries.md#queries-distributed-subrequests), i.e. in cases when the query contains the product of distributed tables.
|
||||
Changes the behavior of [distributed subqueries](../../query_language/select.md#queries-distributed-subrequests), i.e. in cases when the query contains the product of distributed tables.
|
||||
|
||||
ClickHouse applies the configuration if the subqueries on any level have a distributed table that exists on the local server and has more than one shard.
|
||||
|
||||
@ -12,7 +12,7 @@ Restrictions:
|
||||
|
||||
- Only applied for IN and JOIN subqueries.
|
||||
- Used only if a distributed table is used in the FROM clause.
|
||||
- Not used for a table-valued [ remote](../../table_functions/remote.md#table_functions-remote) function.
|
||||
- Not used for a table-valued [ remote](../../query_language/table_functions/remote.md#table_functions-remote) function.
|
||||
|
||||
The possible values are:
|
||||
|
||||
@ -20,7 +20,7 @@ The possible values are:
|
||||
|
||||
## fallback_to_stale_replicas_for_distributed_queries
|
||||
|
||||
Forces a query to an out-of-date replica if updated data is not available. See "[Replication](../../table_engines/replication.md#table_engines-replication)".
|
||||
Forces a query to an out-of-date replica if updated data is not available. See "[Replication](../../operations/table_engines/replication.md#table_engines-replication)".
|
||||
|
||||
ClickHouse selects the most relevant from the outdated replicas of the table.
|
||||
|
||||
@ -36,7 +36,7 @@ Disables query execution if the index can't be used by date.
|
||||
|
||||
Works with tables in the MergeTree family.
|
||||
|
||||
If `force_index_by_date=1`, ClickHouse checks whether the query has a date key condition that can be used for restricting data ranges. If there is no suitable condition, it throws an exception. However, it does not check whether the condition actually reduces the amount of data to read. For example, the condition `Date != ' 2000-01-01 '` is acceptable even when it matches all the data in the table (i.e., running the query requires a full scan). For more information about ranges of data in MergeTree tables, see "[MergeTree](../../table_engines/mergetree.md#table_engines-mergetree)".
|
||||
If `force_index_by_date=1`, ClickHouse checks whether the query has a date key condition that can be used for restricting data ranges. If there is no suitable condition, it throws an exception. However, it does not check whether the condition actually reduces the amount of data to read. For example, the condition `Date != ' 2000-01-01 '` is acceptable even when it matches all the data in the table (i.e., running the query requires a full scan). For more information about ranges of data in MergeTree tables, see "[MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree)".
|
||||
|
||||
<a name="settings-settings-force_primary_key"></a>
|
||||
|
||||
@ -46,7 +46,7 @@ Disables query execution if indexing by the primary key is not possible.
|
||||
|
||||
Works with tables in the MergeTree family.
|
||||
|
||||
If `force_primary_key=1`, ClickHouse checks to see if the query has a primary key condition that can be used for restricting data ranges. If there is no suitable condition, it throws an exception. However, it does not check whether the condition actually reduces the amount of data to read. For more information about data ranges in MergeTree tables, see "[MergeTree](../../table_engines/mergetree.md#table_engines-mergetree)".
|
||||
If `force_primary_key=1`, ClickHouse checks to see if the query has a primary key condition that can be used for restricting data ranges. If there is no suitable condition, it throws an exception. However, it does not check whether the condition actually reduces the amount of data to read. For more information about data ranges in MergeTree tables, see "[MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree)".
|
||||
|
||||
<a name="settings_settings_fsync_metadata"></a>
|
||||
|
||||
@ -125,7 +125,7 @@ This is slightly more than `max_block_size`. The reason for this is because cert
|
||||
|
||||
## max_replica_delay_for_distributed_queries
|
||||
|
||||
Disables lagging replicas for distributed queries. See "[Replication](../../table_engines/replication.md#table_engines-replication)".
|
||||
Disables lagging replicas for distributed queries. See "[Replication](../../operations/table_engines/replication.md#table_engines-replication)".
|
||||
|
||||
Sets the time in seconds. If a replica lags more than the set value, this replica is not used.
|
||||
|
||||
@ -158,7 +158,7 @@ Don't confuse blocks for compression (a chunk of memory consisting of bytes) and
|
||||
|
||||
## min_compress_block_size
|
||||
|
||||
For [MergeTree](../../table_engines/mergetree.md#table_engines-mergetree)" tables. In order to reduce latency when processing queries, a block is compressed when writing the next mark if its size is at least 'min_compress_block_size'. By default, 65,536.
|
||||
For [MergeTree](../../operations/table_engines/mergetree.md#table_engines-mergetree)" tables. In order to reduce latency when processing queries, a block is compressed when writing the next mark if its size is at least 'min_compress_block_size'. By default, 65,536.
|
||||
|
||||
The actual size of the block, if the uncompressed data is less than 'max_compress_block_size', is no less than this value and no less than the volume of data for one mark.
|
||||
|
||||
|
414
docs/en/operations/system_tables.md
Normal file
414
docs/en/operations/system_tables.md
Normal file
@ -0,0 +1,414 @@
|
||||
# System tables
|
||||
|
||||
System tables are used for implementing part of the system's functionality, and for providing access to information about how the system is working.
|
||||
You can't delete a system table (but you can perform DETACH).
|
||||
System tables don't have files with data on the disk or files with metadata. The server creates all the system tables when it starts.
|
||||
System tables are read-only.
|
||||
They are located in the 'system' database.
|
||||
|
||||
<a name="system_tables-system.asynchronous_metrics"></a>
|
||||
|
||||
## system.asynchronous_metrics
|
||||
|
||||
Contain metrics used for profiling and monitoring.
|
||||
They usually reflect the number of events currently in the system, or the total resources consumed by the system.
|
||||
Example: The number of SELECT queries currently running; the amount of memory in use.`system.asynchronous_metrics`and`system.metrics` differ in their sets of metrics and how they are calculated.
|
||||
|
||||
## system.clusters
|
||||
|
||||
Contains information about clusters available in the config file and the servers in them.
|
||||
Columns:
|
||||
|
||||
```text
|
||||
cluster String – Cluster name.
|
||||
shard_num UInt32 – Number of a shard in the cluster, starting from 1.
|
||||
shard_weight UInt32 – Relative weight of a shard when writing data.
|
||||
replica_num UInt32 – Number of a replica in the shard, starting from 1.
|
||||
host_name String – Host name as specified in the config.
|
||||
host_address String – Host's IP address obtained from DNS.
|
||||
port UInt16 – The port used to access the server.
|
||||
user String – The username to use for connecting to the server.
|
||||
```
|
||||
## system.columns
|
||||
|
||||
Contains information about the columns in all tables.
|
||||
You can use this table to get information similar to `DESCRIBE TABLE`, but for multiple tables at once.
|
||||
|
||||
```text
|
||||
database String - Name of the database the table is located in.
|
||||
table String - Table name.
|
||||
name String - Column name.
|
||||
type String - Column type.
|
||||
default_type String - Expression type (DEFAULT, MATERIALIZED, ALIAS) for the default value, or an empty string if it is not defined.
|
||||
default_expression String - Expression for the default value, or an empty string if it is not defined.
|
||||
```
|
||||
|
||||
## system.databases
|
||||
|
||||
This table contains a single String column called 'name' – the name of a database.
|
||||
Each database that the server knows about has a corresponding entry in the table.
|
||||
This system table is used for implementing the `SHOW DATABASES` query.
|
||||
|
||||
## system.dictionaries
|
||||
|
||||
Contains information about external dictionaries.
|
||||
|
||||
Columns:
|
||||
|
||||
- `name String` – Dictionary name.
|
||||
- `type String` – Dictionary type: Flat, Hashed, Cache.
|
||||
- `origin String` – Path to the config file where the dictionary is described.
|
||||
- `attribute.names Array(String)` – Array of attribute names provided by the dictionary.
|
||||
- `attribute.types Array(String)` – Corresponding array of attribute types provided by the dictionary.
|
||||
- `has_hierarchy UInt8` – Whether the dictionary is hierarchical.
|
||||
- `bytes_allocated UInt64` – The amount of RAM used by the dictionary.
|
||||
- `hit_rate Float64` – For cache dictionaries, the percent of usage for which the value was in the cache.
|
||||
- `element_count UInt64` – The number of items stored in the dictionary.
|
||||
- `load_factor Float64` – The filled percentage of the dictionary (for a hashed dictionary, it is the filled percentage of the hash table).
|
||||
- `creation_time DateTime` – Time spent for the creation or last successful reload of the dictionary.
|
||||
- `last_exception String` – Text of an error that occurred when creating or reloading the dictionary, if the dictionary couldn't be created.
|
||||
- `source String` – Text describing the data source for the dictionary.
|
||||
|
||||
Note that the amount of memory used by the dictionary is not proportional to the number of items stored in it. So for flat and cached dictionaries, all the memory cells are pre-assigned, regardless of how full the dictionary actually is.
|
||||
<a name="system_tables-system.events"></a>
|
||||
|
||||
## system.events
|
||||
|
||||
Contains information about the number of events that have occurred in the system. This is used for profiling and monitoring purposes.
|
||||
Example: The number of processed SELECT queries.
|
||||
Columns: 'event String' – the event name, and 'value UInt64' – the quantity.
|
||||
|
||||
## system.functions
|
||||
|
||||
Contains information about normal and aggregate functions.
|
||||
|
||||
Columns:
|
||||
|
||||
- `name` (`String`) – Function name.
|
||||
- `is_aggregate` (`UInt8`) – Whether it is an aggregate function.
|
||||
## system.merges
|
||||
|
||||
Contains information about merges currently in process for tables in the MergeTree family.
|
||||
|
||||
Columns:
|
||||
|
||||
- `database String` — Name of the database the table is located in.
|
||||
- `table String` — Name of the table.
|
||||
- `elapsed Float64` — Time in seconds since the merge started.
|
||||
- `progress Float64` — Percent of progress made, from 0 to 1.
|
||||
- `num_parts UInt64` — Number of parts to merge.
|
||||
- `result_part_name String` — Name of the part that will be formed as the result of the merge.
|
||||
- `total_size_bytes_compressed UInt64` — Total size of compressed data in the parts being merged.
|
||||
- `total_size_marks UInt64` — Total number of marks in the parts being merged.
|
||||
- `bytes_read_uncompressed UInt64` — Amount of bytes read, decompressed.
|
||||
- `rows_read UInt64` — Number of rows read.
|
||||
- `bytes_written_uncompressed UInt64` — Amount of bytes written, uncompressed.
|
||||
- `rows_written UInt64` — Number of rows written.
|
||||
<a name="system_tables-system.metrics"></a>
|
||||
|
||||
## system.metrics
|
||||
|
||||
## system.numbers
|
||||
|
||||
This table contains a single UInt64 column named 'number' that contains almost all the natural numbers starting from zero.
|
||||
You can use this table for tests, or if you need to do a brute force search.
|
||||
Reads from this table are not parallelized.
|
||||
|
||||
## system.numbers_mt
|
||||
|
||||
The same as 'system.numbers' but reads are parallelized. The numbers can be returned in any order.
|
||||
Used for tests.
|
||||
|
||||
## system.one
|
||||
|
||||
This table contains a single row with a single 'dummy' UInt8 column containing the value 0.
|
||||
This table is used if a SELECT query doesn't specify the FROM clause.
|
||||
This is similar to the DUAL table found in other DBMSs.
|
||||
|
||||
## system.parts
|
||||
|
||||
Contains information about parts of a table in the [MergeTree](../operations/table_engines/mergetree.md#table_engines-mergetree) family.
|
||||
|
||||
Each row describes one part of the data.
|
||||
|
||||
Columns:
|
||||
|
||||
- partition (String) – The partition name. It's in YYYYMM format in case of old-style partitioning and is arbitary serialized value in case of custom partitioning. To learn what a partition is, see the description of the [ALTER](../query_language/alter.md#query_language_queries_alter) query.
|
||||
- name (String) – Name of the data part.
|
||||
- active (UInt8) – Indicates whether the part is active. If a part is active, it is used in a table; otherwise, it will be deleted. Inactive data parts remain after merging.
|
||||
- marks (UInt64) – The number of marks. To get the approximate number of rows in a data part, multiply ``marks`` by the index granularity (usually 8192).
|
||||
- marks_size (UInt64) – The size of the file with marks.
|
||||
- rows (UInt64) – The number of rows.
|
||||
- bytes (UInt64) – The number of bytes when compressed.
|
||||
- modification_time (DateTime) – The modification time of the directory with the data part. This usually corresponds to the time of data part creation.|
|
||||
- remove_time (DateTime) – The time when the data part became inactive.
|
||||
- refcount (UInt32) – The number of places where the data part is used. A value greater than 2 indicates that the data part is used in queries or merges.
|
||||
- min_date (Date) – The minimum value of the date key in the data part.
|
||||
- max_date (Date) – The maximum value of the date key in the data part.
|
||||
- min_block_number (UInt64) – The minimum number of data parts that make up the current part after merging.
|
||||
- max_block_number (UInt64) – The maximum number of data parts that make up the current part after merging.
|
||||
- level (UInt32) – Depth of the merge tree. If a merge was not performed, ``level=0``.
|
||||
- primary_key_bytes_in_memory (UInt64) – The amount of memory (in bytes) used by primary key values.
|
||||
- primary_key_bytes_in_memory_allocated (UInt64) – The amount of memory (in bytes) reserved for primary key values.
|
||||
- database (String) – Name of the database.
|
||||
- table (String) – Name of the table.
|
||||
- engine (String) – Name of the table engine without parameters.
|
||||
|
||||
## system.processes
|
||||
|
||||
This system table is used for implementing the `SHOW PROCESSLIST` query.
|
||||
Columns:
|
||||
|
||||
```text
|
||||
user String – Name of the user who made the request. For distributed query processing, this is the user who helped the requestor server send the query to this server, not the user who made the distributed request on the requestor server.
|
||||
|
||||
address String – The IP address that the query was made from. The same is true for distributed query processing.
|
||||
|
||||
elapsed Float64 – The time in seconds since request execution started.
|
||||
|
||||
rows_read UInt64 – The number of rows read from the table. For distributed processing, on the requestor server, this is the total for all remote servers.
|
||||
|
||||
bytes_read UInt64 – The number of uncompressed bytes read from the table. For distributed processing, on the requestor server, this is the total for all remote servers.
|
||||
|
||||
UInt64 total_rows_approx – The approximate total number of rows that must be read. For distributed processing, on the requestor server, this is the total for all remote servers. It can be updated during request processing, when new sources to process become known.
|
||||
|
||||
memory_usage UInt64 – Memory consumption by the query. It might not include some types of dedicated memory.
|
||||
|
||||
query String – The query text. For INSERT, it doesn't include the data to insert.
|
||||
|
||||
query_id – Query ID, if defined.
|
||||
```
|
||||
|
||||
## system.replicas
|
||||
|
||||
Contains information and status for replicated tables residing on the local server.
|
||||
This table can be used for monitoring. The table contains a row for every Replicated\* table.
|
||||
|
||||
Example:
|
||||
|
||||
```sql
|
||||
SELECT *
|
||||
FROM system.replicas
|
||||
WHERE table = 'visits'
|
||||
FORMAT Vertical
|
||||
```
|
||||
|
||||
```text
|
||||
Row 1:
|
||||
──────
|
||||
database: merge
|
||||
table: visits
|
||||
engine: ReplicatedCollapsingMergeTree
|
||||
is_leader: 1
|
||||
is_readonly: 0
|
||||
is_session_expired: 0
|
||||
future_parts: 1
|
||||
parts_to_check: 0
|
||||
zookeeper_path: /clickhouse/tables/01-06/visits
|
||||
replica_name: example01-06-1.yandex.ru
|
||||
replica_path: /clickhouse/tables/01-06/visits/replicas/example01-06-1.yandex.ru
|
||||
columns_version: 9
|
||||
queue_size: 1
|
||||
inserts_in_queue: 0
|
||||
merges_in_queue: 1
|
||||
log_max_index: 596273
|
||||
log_pointer: 596274
|
||||
total_replicas: 2
|
||||
active_replicas: 2
|
||||
```
|
||||
|
||||
Columns:
|
||||
|
||||
```text
|
||||
database: database name
|
||||
table: table name
|
||||
engine: table engine name
|
||||
|
||||
is_leader: whether the replica is the leader
|
||||
|
||||
Only one replica at a time can be the leader. The leader is responsible for selecting background merges to perform.
|
||||
Note that writes can be performed to any replica that is available and has a session in ZK, regardless of whether it is a leader.
|
||||
|
||||
is_readonly: Whether the replica is in read-only mode.
|
||||
This mode is turned on if the config doesn't have sections with ZK, if an unknown error occurred when reinitializing sessions in ZK, and during session reinitialization in ZK.
|
||||
|
||||
is_session_expired: Whether the ZK session expired.
|
||||
Basically, the same thing as is_readonly.
|
||||
|
||||
future_parts: The number of data parts that will appear as the result of INSERTs or merges that haven't been done yet.
|
||||
|
||||
parts_to_check: The number of data parts in the queue for verification.
|
||||
A part is put in the verification queue if there is suspicion that it might be damaged.
|
||||
|
||||
zookeeper_path: The path to the table data in ZK.
|
||||
replica_name: Name of the replica in ZK. Different replicas of the same table have different names.
|
||||
replica_path: The path to the replica data in ZK. The same as concatenating zookeeper_path/replicas/replica_path.
|
||||
|
||||
columns_version: Version number of the table structure.
|
||||
Indicates how many times ALTER was performed. If replicas have different versions, it means some replicas haven't made all of the ALTERs yet.
|
||||
|
||||
queue_size: Size of the queue for operations waiting to be performed.
|
||||
Operations include inserting blocks of data, merges, and certain other actions.
|
||||
Normally coincides with future_parts.
|
||||
|
||||
inserts_in_queue: Number of inserts of blocks of data that need to be made.
|
||||
Insertions are usually replicated fairly quickly. If the number is high, something is wrong.
|
||||
|
||||
merges_in_queue: The number of merges waiting to be made.
|
||||
Sometimes merges are lengthy, so this value may be greater than zero for a long time.
|
||||
|
||||
The next 4 columns have a non-null value only if the ZK session is active.
|
||||
|
||||
log_max_index: Maximum entry number in the log of general activity.
|
||||
log_pointer: Maximum entry number in the log of general activity that the replica copied to its execution queue, plus one.
|
||||
If log_pointer is much smaller than log_max_index, something is wrong.
|
||||
|
||||
total_replicas: Total number of known replicas of this table.
|
||||
active_replicas: Number of replicas of this table that have a ZK session (the number of active replicas).
|
||||
```
|
||||
|
||||
If you request all the columns, the table may work a bit slowly, since several reads from ZK are made for each row.
|
||||
If you don't request the last 4 columns (log_max_index, log_pointer, total_replicas, active_replicas), the table works quickly.
|
||||
|
||||
For example, you can check that everything is working correctly like this:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
database,
|
||||
table,
|
||||
is_leader,
|
||||
is_readonly,
|
||||
is_session_expired,
|
||||
future_parts,
|
||||
parts_to_check,
|
||||
columns_version,
|
||||
queue_size,
|
||||
inserts_in_queue,
|
||||
merges_in_queue,
|
||||
log_max_index,
|
||||
log_pointer,
|
||||
total_replicas,
|
||||
active_replicas
|
||||
FROM system.replicas
|
||||
WHERE
|
||||
is_readonly
|
||||
OR is_session_expired
|
||||
OR future_parts > 20
|
||||
OR parts_to_check > 10
|
||||
OR queue_size > 20
|
||||
OR inserts_in_queue > 10
|
||||
OR log_max_index - log_pointer > 10
|
||||
OR total_replicas < 2
|
||||
OR active_replicas < total_replicas
|
||||
```
|
||||
|
||||
If this query doesn't return anything, it means that everything is fine.
|
||||
|
||||
## system.settings
|
||||
|
||||
Contains information about settings that are currently in use.
|
||||
I.e. used for executing the query you are using to read from the system.settings table).
|
||||
|
||||
Columns:
|
||||
|
||||
```text
|
||||
name String – Setting name.
|
||||
value String – Setting value.
|
||||
changed UInt8 - Whether the setting was explicitly defined in the config or explicitly changed.
|
||||
```
|
||||
|
||||
Example:
|
||||
|
||||
```sql
|
||||
SELECT *
|
||||
FROM system.settings
|
||||
WHERE changed
|
||||
```
|
||||
|
||||
```text
|
||||
┌─name───────────────────┬─value───────┬─changed─┐
|
||||
│ max_threads │ 8 │ 1 │
|
||||
│ use_uncompressed_cache │ 0 │ 1 │
|
||||
│ load_balancing │ random │ 1 │
|
||||
│ max_memory_usage │ 10000000000 │ 1 │
|
||||
└────────────────────────┴─────────────┴─────────┘
|
||||
```
|
||||
|
||||
## system.tables
|
||||
|
||||
This table contains the String columns 'database', 'name', and 'engine'.
|
||||
The table also contains three virtual columns: metadata_modification_time (DateTime type), create_table_query, and engine_full (String type).
|
||||
Each table that the server knows about is entered in the 'system.tables' table.
|
||||
This system table is used for implementing SHOW TABLES queries.
|
||||
|
||||
## system.zookeeper
|
||||
|
||||
This table presents when ZooKeeper is configured. It allows reading data from the ZooKeeper cluster defined in the config.
|
||||
The query must have a 'path' equality condition in the WHERE clause. This is the path in ZooKeeper for the children that you want to get data for.
|
||||
|
||||
The query `SELECT * FROM system.zookeeper WHERE path = '/clickhouse'` outputs data for all children on the `/clickhouse` node.
|
||||
To output data for all root nodes, write path = '/'.
|
||||
If the path specified in 'path' doesn't exist, an exception will be thrown.
|
||||
|
||||
Columns:
|
||||
|
||||
- `name String` — Name of the node.
|
||||
- `path String` — Path to the node.
|
||||
- `value String` — Value of the node.
|
||||
- `dataLength Int32` — Size of the value.
|
||||
- `numChildren Int32` — Number of children.
|
||||
- `czxid Int64` — ID of the transaction that created the node.
|
||||
- `mzxid Int64` — ID of the transaction that last changed the node.
|
||||
- `pzxid Int64` — ID of the transaction that last added or removed children.
|
||||
- `ctime DateTime` — Time of node creation.
|
||||
- `mtime DateTime` — Time of the last node modification.
|
||||
- `version Int32` — Node version - the number of times the node was changed.
|
||||
- `cversion Int32` — Number of added or removed children.
|
||||
- `aversion Int32` — Number of changes to ACL.
|
||||
- `ephemeralOwner Int64` — For ephemeral nodes, the ID of the session that owns this node.
|
||||
|
||||
|
||||
Example:
|
||||
|
||||
```sql
|
||||
SELECT *
|
||||
FROM system.zookeeper
|
||||
WHERE path = '/clickhouse/tables/01-08/visits/replicas'
|
||||
FORMAT Vertical
|
||||
```
|
||||
|
||||
```text
|
||||
Row 1:
|
||||
──────
|
||||
name: example01-08-1.yandex.ru
|
||||
value:
|
||||
czxid: 932998691229
|
||||
mzxid: 932998691229
|
||||
ctime: 2015-03-27 16:49:51
|
||||
mtime: 2015-03-27 16:49:51
|
||||
version: 0
|
||||
cversion: 47
|
||||
aversion: 0
|
||||
ephemeralOwner: 0
|
||||
dataLength: 0
|
||||
numChildren: 7
|
||||
pzxid: 987021031383
|
||||
path: /clickhouse/tables/01-08/visits/replicas
|
||||
|
||||
Row 2:
|
||||
──────
|
||||
name: example01-08-2.yandex.ru
|
||||
value:
|
||||
czxid: 933002738135
|
||||
mzxid: 933002738135
|
||||
ctime: 2015-03-27 16:57:01
|
||||
mtime: 2015-03-27 16:57:01
|
||||
version: 0
|
||||
cversion: 37
|
||||
aversion: 0
|
||||
ephemeralOwner: 0
|
||||
dataLength: 0
|
||||
numChildren: 7
|
||||
pzxid: 987021252247
|
||||
path: /clickhouse/tables/01-08/visits/replicas
|
||||
```
|
@ -61,7 +61,7 @@ WHERE name = 'products'
|
||||
└──────────┴──────┴────────┴─────────────────┴─────────────────┴─────────────────┴───────────────┴─────────────────┘
|
||||
```
|
||||
|
||||
You can use the [dictGet*](../functions/ext_dict_functions.md#ext_dict_functions) function to get the dictionary data in this format.
|
||||
You can use the [dictGet*](../../query_language/functions/ext_dict_functions.md#ext_dict_functions) function to get the dictionary data in this format.
|
||||
|
||||
This view isn't helpful when you need to get raw data, or when performing a `JOIN` operation. For these cases, you can use the `Dictionary` engine, which displays the dictionary data in a table.
|
||||
|
79
docs/en/operations/table_engines/file.md
Normal file
79
docs/en/operations/table_engines/file.md
Normal file
@ -0,0 +1,79 @@
|
||||
<a name="table_engines-file"></a>
|
||||
|
||||
# File(InputFormat)
|
||||
|
||||
The data source is a file that stores data in one of the supported input formats (TabSeparated, Native, etc.).
|
||||
|
||||
Usage examples:
|
||||
|
||||
- Data export from ClickHouse to file.
|
||||
- Convert data from one format to another.
|
||||
- Updating data in ClickHouse via editing a file on a disk.
|
||||
|
||||
## Usage in ClickHouse server
|
||||
|
||||
```
|
||||
File(Format)
|
||||
```
|
||||
|
||||
`Format` should be supported for either `INSERT` and `SELECT`. For the full list of supported formats see [Formats](../../interfaces/formats.md#formats).
|
||||
|
||||
ClickHouse does not allow to specify filesystem path for`File`. It will use folder defined by [path](../server_settings/settings.md#server_settings-path) setting in server configuration.
|
||||
|
||||
When creating table using `File(Format)` it creates empty subdirectory in that folder. When data is written to that table, it's put into `data.Format` file in that subdirectory.
|
||||
|
||||
You may manually create this subfolder and file in server filesystem and then [ATTACH](../../query_language/misc.md#queries-attach) it to table information with matching name, so you can query data from that file.
|
||||
|
||||
<div class="admonition warning">
|
||||
Be careful with this funcionality, because ClickHouse does not keep track of external changes to such files. The result of simultaneous writes via ClickHouse and outside of ClickHouse is undefined.
|
||||
</div>
|
||||
|
||||
**Example:**
|
||||
|
||||
**1.** Set up the `file_engine_table` table:
|
||||
|
||||
```sql
|
||||
CREATE TABLE file_engine_table (name String, value UInt32) ENGINE=File(TabSeparated)
|
||||
```
|
||||
|
||||
By default ClickHouse will create folder `/var/lib/clickhouse/data/default/file_engine_table`.
|
||||
|
||||
**2.** Manually create `/var/lib/clickhouse/data/default/file_engine_table/data.TabSeparated` containing:
|
||||
|
||||
```bash
|
||||
$ cat data.TabSeparated
|
||||
one 1
|
||||
two 2
|
||||
```
|
||||
|
||||
**3.** Query the data:
|
||||
|
||||
```sql
|
||||
SELECT * FROM file_engine_table
|
||||
```
|
||||
|
||||
```text
|
||||
┌─name─┬─value─┐
|
||||
│ one │ 1 │
|
||||
│ two │ 2 │
|
||||
└──────┴───────┘
|
||||
```
|
||||
|
||||
## Usage in clickhouse-local
|
||||
|
||||
In [clickhouse-local](../utils/clickhouse-local.md#utils-clickhouse-local) File engine accepts file path in addition to `Format`. Default input/output streams can be specified using numeric or human-readable names like `0` or `stdin`, `1` or `stdout`.
|
||||
|
||||
**Example:**
|
||||
|
||||
```bash
|
||||
$ echo -e "1,2\n3,4" | clickhouse-local -q "CREATE TABLE table (a Int64, b Int64) ENGINE = File(CSV, stdin); SELECT a, b FROM table; DROP TABLE table"
|
||||
```
|
||||
|
||||
## Details of implementation
|
||||
|
||||
- Reads can be parallel, but not writes
|
||||
- Not supported:
|
||||
- `ALTER`
|
||||
- `SELECT ... SAMPLE`
|
||||
- Indices
|
||||
- Replication
|
@ -14,7 +14,7 @@ Graphite stores full data in ClickHouse, and data can be retrieved in the follow
|
||||
|
||||
Using the `GraphiteMergeTree` engine.
|
||||
|
||||
The engine inherits properties from MergeTree. The settings for thinning data are defined by the [graphite_rollup](../operations/server_settings/settings.md#server_settings-graphite_rollup) parameter in the server configuration.
|
||||
The engine inherits properties from MergeTree. The settings for thinning data are defined by the [graphite_rollup](../server_settings/settings.md#server_settings-graphite_rollup) parameter in the server configuration.
|
||||
|
||||
## Using the engine
|
||||
|
@ -67,7 +67,7 @@ Example:
|
||||
SELECT level, sum(total) FROM daily GROUP BY level;
|
||||
```
|
||||
|
||||
To improve performance, received messages are grouped into blocks the size of [max_insert_block_size](../operations/settings/settings.md#settings-settings-max_insert_block_size). If the block wasn't formed within [stream_flush_interval_ms](../operations/settings/settings.md#settings-settings_stream_flush_interval_ms) milliseconds, the data will be flushed to the table regardless of the completeness of the block.
|
||||
To improve performance, received messages are grouped into blocks the size of [max_insert_block_size](../settings/settings.md#settings-settings-max_insert_block_size). If the block wasn't formed within [stream_flush_interval_ms](../settings/settings.md#settings-settings_stream_flush_interval_ms) milliseconds, the data will be flushed to the table regardless of the completeness of the block.
|
||||
|
||||
To stop receiving topic data or to change the conversion logic, detach the materialized view:
|
||||
|
4
docs/en/operations/table_engines/materializedview.md
Normal file
4
docs/en/operations/table_engines/materializedview.md
Normal file
@ -0,0 +1,4 @@
|
||||
# MaterializedView
|
||||
|
||||
Used for implementing materialized views (for more information, see the [CREATE TABLE](../../query_language/create.md#query_language-queries-create_table)) query. For storing data, it uses a different engine that was specified when creating the view. When reading from a table, it just uses this engine.
|
||||
|
@ -56,7 +56,7 @@ In this example, the index can't be used.
|
||||
SELECT count() FROM table WHERE CounterID = 34 OR URL LIKE '%upyachka%'
|
||||
```
|
||||
|
||||
To check whether ClickHouse can use the index when executing the query, use the settings [force_index_by_date](../operations/settings/settings.md#settings-settings-force_index_by_date)and[force_primary_key](../operations/settings/settings.md#settings-settings-force_primary_key).
|
||||
To check whether ClickHouse can use the index when executing the query, use the settings [force_index_by_date](../settings/settings.md#settings-settings-force_index_by_date)and[force_primary_key](../settings/settings.md#settings-settings-force_primary_key).
|
||||
|
||||
The index by date only allows reading those parts that contain dates from the desired range. However, a data part may contain data for many dates (up to an entire month), while within a single part the data is ordered by the primary key, which might not contain the date as the first column. Because of this, using a query with only a date condition that does not specify the primary key prefix will cause more data to be read than for a single date.
|
||||
|
@ -15,7 +15,7 @@ Replication works at the level of an individual table, not the entire server. A
|
||||
|
||||
Replication does not depend on sharding. Each shard has its own independent replication.
|
||||
|
||||
Compressed data is replicated for `INSERT` and `ALTER` queries (see the description of the [ALTER](../query_language/queries.md#query_language_queries_alter) query).
|
||||
Compressed data is replicated for `INSERT` and `ALTER` queries (see the description of the [ALTER](../../query_language/alter.md#query_language_queries_alter) query).
|
||||
|
||||
`CREATE`, `DROP`, `ATTACH`, `DETACH` and `RENAME` queries are executed on a single server and are not replicated:
|
||||
|
||||
@ -48,7 +48,7 @@ You can specify any existing ZooKeeper cluster and the system will use a directo
|
||||
|
||||
If ZooKeeper isn't set in the config file, you can't create replicated tables, and any existing replicated tables will be read-only.
|
||||
|
||||
ZooKeeper is not used in `SELECT` queries because replication does not affect the performance of `SELECT` and queries run just as fast as they do for non-replicated tables. When querying distributed replicated tables, ClickHouse behavior is controlled by the settings [max_replica_delay_for_distributed_queries](../operations/settings/settings.md#settings_settings_max_replica_delay_for_distributed_queries) and [fallback_to_stale_replicas_for_distributed_queries](../operations/settings/settings.md#settings-settings-fallback_to_stale_replicas_for_distributed_queries).
|
||||
ZooKeeper is not used in `SELECT` queries because replication does not affect the performance of `SELECT` and queries run just as fast as they do for non-replicated tables. When querying distributed replicated tables, ClickHouse behavior is controlled by the settings [max_replica_delay_for_distributed_queries](../settings/settings.md#settings_settings_max_replica_delay_for_distributed_queries) and [fallback_to_stale_replicas_for_distributed_queries](../settings/settings.md#settings-settings-fallback_to_stale_replicas_for_distributed_queries).
|
||||
|
||||
For each `INSERT` query, approximately ten entries are added to ZooKeeper through several transactions. (To be more precise, this is for each inserted block of data; an INSERT query contains one block or one block per `max_insert_block_size = 1048576` rows.) This leads to slightly longer latencies for `INSERT` compared to non-replicated tables. But if you follow the recommendations to insert data in batches of no more than one `INSERT` per second, it doesn't create any problems. The entire ClickHouse cluster used for coordinating one ZooKeeper cluster has a total of several hundred `INSERTs` per second. The throughput on data inserts (the number of rows per second) is just as high as for non-replicated data.
|
||||
|
||||
@ -60,7 +60,7 @@ By default, an INSERT query waits for confirmation of writing the data from only
|
||||
|
||||
Each block of data is written atomically. The INSERT query is divided into blocks up to `max_insert_block_size = 1048576` rows. In other words, if the `INSERT` query has less than 1048576 rows, it is made atomically.
|
||||
|
||||
Data blocks are deduplicated. For multiple writes of the same data block (data blocks of the same size containing the same rows in the same order), the block is only written once. The reason for this is in case of network failures when the client application doesn't know if the data was written to the DB, so the `INSERT` query can simply be repeated. It doesn't matter which replica INSERTs were sent to with identical data. `INSERTs` are idempotent. Deduplication parameters are controlled by [merge_tree](../operations/server_settings/settings.md#server_settings-merge_tree) server settings.
|
||||
Data blocks are deduplicated. For multiple writes of the same data block (data blocks of the same size containing the same rows in the same order), the block is only written once. The reason for this is in case of network failures when the client application doesn't know if the data was written to the DB, so the `INSERT` query can simply be repeated. It doesn't matter which replica INSERTs were sent to with identical data. `INSERTs` are idempotent. Deduplication parameters are controlled by [merge_tree](../server_settings/settings.md#server_settings-merge_tree) server settings.
|
||||
|
||||
During replication, only the source data to insert is transferred over the network. Further data transformation (merging) is coordinated and performed on all the replicas in the same way. This minimizes network usage, which means that replication works well when replicas reside in different datacenters. (Note that duplicating data in different datacenters is the main goal of replication.)
|
||||
|
74
docs/en/operations/utils/clickhouse-local.md
Normal file
74
docs/en/operations/utils/clickhouse-local.md
Normal file
@ -0,0 +1,74 @@
|
||||
<a name="utils-clickhouse-local"></a>
|
||||
|
||||
# clickhouse-local
|
||||
|
||||
The `clickhouse-local` program enables you to perform fast processing on local files, without having to deploy and configure the ClickHouse server.
|
||||
|
||||
Accepts data that represent tables and queries them using [ClickHouse SQL dialect](../../query_language/index.md#queries).
|
||||
|
||||
`clickhouse-local` uses the same core as ClickHouse server, so it supports most of the features and the same set of formats and table engines.
|
||||
|
||||
By default `clickhouse-local` does not have access to data on the same host, but it supports loading server configuration using `--config-file` argument.
|
||||
|
||||
<div class="admonition warning">
|
||||
It is not recommended to load production server configuration into `clickhouse-local` because data can be damaged in case of human error.
|
||||
</div>
|
||||
|
||||
|
||||
## Usage
|
||||
|
||||
Basic usage:
|
||||
|
||||
``` bash
|
||||
clickhouse-local --structure "table_structure" --input-format "format_of_incoming_data" -q "query"
|
||||
```
|
||||
|
||||
Arguments:
|
||||
|
||||
- `-S`, `--structure` — table structure for input data.
|
||||
- `-if`, `--input-format` — input format, `TSV` by default.
|
||||
- `-f`, `--file` — path to data, `stdin` by default.
|
||||
- `-q` `--query` — queries to execute with `;` as delimeter.
|
||||
- `-N`, `--table` — table name where to put output data, `table` by default.
|
||||
- `-of`, `--format`, `--output-format` — output format, `TSV` by default.
|
||||
- `--stacktrace` — whether to dump debug output in case of exception.
|
||||
- `--verbose` — more details on query execution.
|
||||
- `-s` — disables `stderr` logging.
|
||||
- `--config-file` — path to configuration file in same format as for ClickHouse server, by default the configuration empty.
|
||||
- `--help` — arguments references for `clickhouse-local`.
|
||||
|
||||
Also there are arguments for each ClickHouse configuration variable which are more commonly used instead of `--config-file`.
|
||||
|
||||
|
||||
## Examples
|
||||
|
||||
``` bash
|
||||
echo -e "1,2\n3,4" | clickhouse-local -S "a Int64, b Int64" -if "CSV" -q "SELECT * FROM table"
|
||||
Read 2 rows, 32.00 B in 0.000 sec., 5182 rows/sec., 80.97 KiB/sec.
|
||||
1 2
|
||||
3 4
|
||||
```
|
||||
|
||||
Previous example is the same as:
|
||||
|
||||
``` bash
|
||||
$ echo -e "1,2\n3,4" | clickhouse-local -q "CREATE TABLE table (a Int64, b Int64) ENGINE = File(CSV, stdin); SELECT a, b FROM table; DROP TABLE table"
|
||||
Read 2 rows, 32.00 B in 0.000 sec., 4987 rows/sec., 77.93 KiB/sec.
|
||||
1 2
|
||||
3 4
|
||||
```
|
||||
|
||||
Now let's output memory user for each Unix user:
|
||||
|
||||
``` bash
|
||||
$ ps aux | tail -n +2 | awk '{ printf("%s\t%s\n", $1, $4) }' | clickhouse-local -S "user String, mem Float64" -q "SELECT user, round(sum(mem), 2) as memTotal FROM table GROUP BY user ORDER BY memTotal DESC FORMAT Pretty"
|
||||
Read 186 rows, 4.15 KiB in 0.035 sec., 5302 rows/sec., 118.34 KiB/sec.
|
||||
┏━━━━━━━━━━┳━━━━━━━━━━┓
|
||||
┃ user ┃ memTotal ┃
|
||||
┡━━━━━━━━━━╇━━━━━━━━━━┩
|
||||
│ bayonet │ 113.5 │
|
||||
├──────────┼──────────┤
|
||||
│ root │ 8.8 │
|
||||
├──────────┼──────────┤
|
||||
...
|
||||
```
|
@ -32,7 +32,7 @@ anyHeavy(column)
|
||||
|
||||
**Example**
|
||||
|
||||
Take the [OnTime](../getting_started/example_datasets/ontime.md#example_datasets-ontime) data set and select any frequently occurring value in the `AirlineID` column.
|
||||
Take the [OnTime](../../getting_started/example_datasets/ontime.md#example_datasets-ontime) data set and select any frequently occurring value in the `AirlineID` column.
|
||||
|
||||
```sql
|
||||
SELECT anyHeavy(AirlineID) AS res
|
||||
@ -306,7 +306,7 @@ We recommend using the `N < 10 ` value; performance is reduced with large `N` va
|
||||
|
||||
**Example**
|
||||
|
||||
Take the [OnTime](../getting_started/example_datasets/ontime.md#example_datasets-ontime) data set and select the three most frequently occurring values in the `AirlineID` column.
|
||||
Take the [OnTime](../../getting_started/example_datasets/ontime.md#example_datasets-ontime) data set and select the three most frequently occurring values in the `AirlineID` column.
|
||||
|
||||
```sql
|
||||
SELECT topK(3)(AirlineID) AS res
|
266
docs/en/query_language/alter.md
Normal file
266
docs/en/query_language/alter.md
Normal file
@ -0,0 +1,266 @@
|
||||
<a name="query_language_queries_alter"></a>
|
||||
|
||||
## ALTER
|
||||
|
||||
The `ALTER` query is only supported for `*MergeTree` tables, as well as `Merge`and`Distributed`. The query has several variations.
|
||||
|
||||
### Column manipulations
|
||||
|
||||
Changing the table structure.
|
||||
|
||||
```sql
|
||||
ALTER TABLE [db].name [ON CLUSTER cluster] ADD|DROP|MODIFY COLUMN ...
|
||||
```
|
||||
|
||||
In the query, specify a list of one or more comma-separated actions.
|
||||
Each action is an operation on a column.
|
||||
|
||||
The following actions are supported:
|
||||
|
||||
```sql
|
||||
ADD COLUMN name [type] [default_expr] [AFTER name_after]
|
||||
```
|
||||
|
||||
Adds a new column to the table with the specified name, type, and `default_expr` (see the section "Default expressions"). If you specify `AFTER name_after` (the name of another column), the column is added after the specified one in the list of table columns. Otherwise, the column is added to the end of the table. Note that there is no way to add a column to the beginning of a table. For a chain of actions, 'name_after' can be the name of a column that is added in one of the previous actions.
|
||||
|
||||
Adding a column just changes the table structure, without performing any actions with data. The data doesn't appear on the disk after ALTER. If the data is missing for a column when reading from the table, it is filled in with default values (by performing the default expression if there is one, or using zeros or empty strings). If the data is missing for a column when reading from the table, it is filled in with default values (by performing the default expression if there is one, or using zeros or empty strings). The column appears on the disk after merging data parts (see MergeTree).
|
||||
|
||||
This approach allows us to complete the ALTER query instantly, without increasing the volume of old data.
|
||||
|
||||
```sql
|
||||
DROP COLUMN name
|
||||
```
|
||||
|
||||
Deletes the column with the name 'name'.
|
||||
Deletes data from the file system. Since this deletes entire files, the query is completed almost instantly.
|
||||
|
||||
```sql
|
||||
MODIFY COLUMN name [type] [default_expr]
|
||||
```
|
||||
|
||||
Changes the 'name' column's type to 'type' and/or the default expression to 'default_expr'. When changing the type, values are converted as if the 'toType' function were applied to them.
|
||||
|
||||
If only the default expression is changed, the query doesn't do anything complex, and is completed almost instantly.
|
||||
|
||||
Changing the column type is the only complex action – it changes the contents of files with data. For large tables, this may take a long time.
|
||||
|
||||
There are several processing stages:
|
||||
|
||||
- Preparing temporary (new) files with modified data.
|
||||
- Renaming old files.
|
||||
- Renaming the temporary (new) files to the old names.
|
||||
- Deleting the old files.
|
||||
|
||||
Only the first stage takes time. If there is a failure at this stage, the data is not changed.
|
||||
If there is a failure during one of the successive stages, data can be restored manually. The exception is if the old files were deleted from the file system but the data for the new files did not get written to the disk and was lost.
|
||||
|
||||
There is no support for changing the column type in arrays and nested data structures.
|
||||
|
||||
The `ALTER` query lets you create and delete separate elements (columns) in nested data structures, but not whole nested data structures. To add a nested data structure, you can add columns with a name like `name.nested_name` and the type `Array(T)`. A nested data structure is equivalent to multiple array columns with a name that has the same prefix before the dot.
|
||||
|
||||
There is no support for deleting columns in the primary key or the sampling key (columns that are in the `ENGINE` expression). Changing the type for columns that are included in the primary key is only possible if this change does not cause the data to be modified (for example, it is allowed to add values to an Enum or change a type with `DateTime` to `UInt32`).
|
||||
|
||||
If the `ALTER` query is not sufficient for making the table changes you need, you can create a new table, copy the data to it using the `INSERT SELECT` query, then switch the tables using the `RENAME` query and delete the old table.
|
||||
|
||||
The `ALTER` query blocks all reads and writes for the table. In other words, if a long `SELECT` is running at the time of the `ALTER` query, the `ALTER` query will wait for it to complete. At the same time, all new queries to the same table will wait while this `ALTER` is running.
|
||||
|
||||
For tables that don't store data themselves (such as `Merge` and `Distributed`), `ALTER` just changes the table structure, and does not change the structure of subordinate tables. For example, when running ALTER for a `Distributed` table, you will also need to run `ALTER` for the tables on all remote servers.
|
||||
|
||||
The `ALTER` query for changing columns is replicated. The instructions are saved in ZooKeeper, then each replica applies them. All `ALTER` queries are run in the same order. The query waits for the appropriate actions to be completed on the other replicas. However, a query to change columns in a replicated table can be interrupted, and all actions will be performed asynchronously.
|
||||
|
||||
### Manipulations with partitions and parts
|
||||
|
||||
It only works for tables in the `MergeTree` family. The following operations are available:
|
||||
|
||||
- `DETACH PARTITION` – Move a partition to the 'detached' directory and forget it.
|
||||
- `DROP PARTITION` – Delete a partition.
|
||||
- `ATTACH PART|PARTITION` – Add a new part or partition from the `detached` directory to the table.
|
||||
- `FREEZE PARTITION` – Create a backup of a partition.
|
||||
- `FETCH PARTITION` – Download a partition from another server.
|
||||
|
||||
Each type of query is covered separately below.
|
||||
|
||||
A partition in a table is data for a single calendar month. This is determined by the values of the date key specified in the table engine parameters. Each month's data is stored separately in order to simplify manipulations with this data.
|
||||
|
||||
A "part" in the table is part of the data from a single partition, sorted by the primary key.
|
||||
|
||||
You can use the `system.parts` table to view the set of table parts and partitions:
|
||||
|
||||
```sql
|
||||
SELECT * FROM system.parts WHERE active
|
||||
```
|
||||
|
||||
`active` – Only count active parts. Inactive parts are, for example, source parts remaining after merging to a larger part – these parts are deleted approximately 10 minutes after merging.
|
||||
|
||||
Another way to view a set of parts and partitions is to go into the directory with table data.
|
||||
Data directory: `/var/lib/clickhouse/data/database/table/`,where `/var/lib/clickhouse/` is the path to the ClickHouse data, 'database' is the database name, and 'table' is the table name. Example:
|
||||
|
||||
```bash
|
||||
$ ls -l /var/lib/clickhouse/data/test/visits/
|
||||
total 48
|
||||
drwxrwxrwx 2 clickhouse clickhouse 20480 May 5 02:58 20140317_20140323_2_2_0
|
||||
drwxrwxrwx 2 clickhouse clickhouse 20480 May 5 02:58 20140317_20140323_4_4_0
|
||||
drwxrwxrwx 2 clickhouse clickhouse 4096 May 5 02:55 detached
|
||||
-rw-rw-rw- 1 clickhouse clickhouse 2 May 5 02:58 increment.txt
|
||||
```
|
||||
|
||||
Here, `20140317_20140323_2_2_0` and ` 20140317_20140323_4_4_0` are the directories of data parts.
|
||||
|
||||
Let's break down the name of the first part: `20140317_20140323_2_2_0`.
|
||||
|
||||
- `20140317` is the minimum date of the data in the chunk.
|
||||
- `20140323` is the maximum date of the data in the chunk.
|
||||
- `2` is the minimum number of the data block.
|
||||
- `2` is the maximum number of the data block.
|
||||
- `0` is the chunk level (the depth of the merge tree it is formed from).
|
||||
|
||||
Each piece relates to a single partition and contains data for just one month.
|
||||
`201403` is the name of the partition. A partition is a set of parts for a single month.
|
||||
|
||||
On an operating server, you can't manually change the set of parts or their data on the file system, since the server won't know about it.
|
||||
For non-replicated tables, you can do this when the server is stopped, but we don't recommended it.
|
||||
For replicated tables, the set of parts can't be changed in any case.
|
||||
|
||||
The `detached` directory contains parts that are not used by the server - detached from the table using the `ALTER ... DETACH` query. Parts that are damaged are also moved to this directory, instead of deleting them. You can add, delete, or modify the data in the 'detached' directory at any time – the server won't know about this until you make the `ALTER TABLE ... ATTACH` query.
|
||||
|
||||
```sql
|
||||
ALTER TABLE [db.]table DETACH PARTITION 'name'
|
||||
```
|
||||
|
||||
Move all data for partitions named 'name' to the 'detached' directory and forget about them.
|
||||
The partition name is specified in YYYYMM format. It can be indicated in single quotes or without them.
|
||||
|
||||
After the query is executed, you can do whatever you want with the data in the 'detached' directory — delete it from the file system, or just leave it.
|
||||
|
||||
The query is replicated – data will be moved to the 'detached' directory and forgotten on all replicas. The query can only be sent to a leader replica. To find out if a replica is a leader, perform SELECT to the 'system.replicas' system table. Alternatively, it is easier to make a query on all replicas, and all except one will throw an exception.
|
||||
|
||||
```sql
|
||||
ALTER TABLE [db.]table DROP PARTITION 'name'
|
||||
```
|
||||
|
||||
The same as the `DETACH` operation. Deletes data from the table. Data parts will be tagged as inactive and will be completely deleted in approximately 10 minutes. The query is replicated – data will be deleted on all replicas.
|
||||
|
||||
```sql
|
||||
ALTER TABLE [db.]table ATTACH PARTITION|PART 'name'
|
||||
```
|
||||
|
||||
Adds data to the table from the 'detached' directory.
|
||||
|
||||
It is possible to add data for an entire partition or a separate part. For a part, specify the full name of the part in single quotes.
|
||||
|
||||
The query is replicated. Each replica checks whether there is data in the 'detached' directory. If there is data, it checks the integrity, verifies that it matches the data on the server that initiated the query, and then adds it if everything is correct. If not, it downloads data from the query requestor replica, or from another replica where the data has already been added.
|
||||
|
||||
So you can put data in the 'detached' directory on one replica, and use the ALTER ... ATTACH query to add it to the table on all replicas.
|
||||
|
||||
```sql
|
||||
ALTER TABLE [db.]table FREEZE PARTITION 'name'
|
||||
```
|
||||
|
||||
Creates a local backup of one or multiple partitions. The name can be the full name of the partition (for example, 201403), or its prefix (for example, 2014): then the backup will be created for all the corresponding partitions.
|
||||
|
||||
The query does the following: for a data snapshot at the time of execution, it creates hardlinks to table data in the directory `/var/lib/clickhouse/shadow/N/...`
|
||||
|
||||
`/var/lib/clickhouse/` is the working ClickHouse directory from the config.
|
||||
`N` is the incremental number of the backup.
|
||||
|
||||
The same structure of directories is created inside the backup as inside `/var/lib/clickhouse/`.
|
||||
It also performs 'chmod' for all files, forbidding writes to them.
|
||||
|
||||
The backup is created almost instantly (but first it waits for current queries to the corresponding table to finish running). At first, the backup doesn't take any space on the disk. As the system works, the backup can take disk space, as data is modified. If the backup is made for old enough data, it won't take space on the disk.
|
||||
|
||||
After creating the backup, data from `/var/lib/clickhouse/shadow/` can be copied to the remote server and then deleted on the local server.
|
||||
The entire backup process is performed without stopping the server.
|
||||
|
||||
The `ALTER ... FREEZE PARTITION` query is not replicated. A local backup is only created on the local server.
|
||||
|
||||
As an alternative, you can manually copy data from the `/var/lib/clickhouse/data/database/table` directory.
|
||||
But if you do this while the server is running, race conditions are possible when copying directories with files being added or changed, and the backup may be inconsistent. You can do this if the server isn't running – then the resulting data will be the same as after the `ALTER TABLE t FREEZE PARTITION` query.
|
||||
|
||||
`ALTER TABLE ... FREEZE PARTITION` only copies data, not table metadata. To make a backup of table metadata, copy the file `/var/lib/clickhouse/metadata/database/table.sql`
|
||||
|
||||
To restore from a backup:
|
||||
|
||||
> - Use the CREATE query to create the table if it doesn't exist. The query can be taken from an .sql file (replace `ATTACH` in it with `CREATE`).
|
||||
- Copy the data from the data/database/table/ directory inside the backup to the `/var/lib/clickhouse/data/database/table/detached/ directory.`
|
||||
- Run `ALTER TABLE ... ATTACH PARTITION YYYYMM` queries, where `YYYYMM` is the month, for every month.
|
||||
|
||||
In this way, data from the backup will be added to the table.
|
||||
Restoring from a backup doesn't require stopping the server.
|
||||
|
||||
### Backups and replication
|
||||
|
||||
Replication provides protection from device failures. If all data disappeared on one of your replicas, follow the instructions in the "Restoration after failure" section to restore it.
|
||||
|
||||
For protection from device failures, you must use replication. For more information about replication, see the section "Data replication".
|
||||
|
||||
Backups protect against human error (accidentally deleting data, deleting the wrong data or in the wrong cluster, or corrupting data).
|
||||
For high-volume databases, it can be difficult to copy backups to remote servers. In such cases, to protect from human error, you can keep a backup on the same server (it will reside in `/var/lib/clickhouse/shadow/`).
|
||||
|
||||
```sql
|
||||
ALTER TABLE [db.]table FETCH PARTITION 'name' FROM 'path-in-zookeeper'
|
||||
```
|
||||
|
||||
This query only works for replicatable tables.
|
||||
|
||||
It downloads the specified partition from the shard that has its `ZooKeeper path` specified in the `FROM` clause, then puts it in the `detached` directory for the specified table.
|
||||
|
||||
Although the query is called `ALTER TABLE`, it does not change the table structure, and does not immediately change the data available in the table.
|
||||
|
||||
Data is placed in the `detached` directory. You can use the `ALTER TABLE ... ATTACH` query to attach the data.
|
||||
|
||||
The ` FROM` clause specifies the path in ` ZooKeeper`. For example, `/clickhouse/tables/01-01/visits`.
|
||||
Before downloading, the system checks that the partition exists and the table structure matches. The most appropriate replica is selected automatically from the healthy replicas.
|
||||
|
||||
The `ALTER ... FETCH PARTITION` query is not replicated. The partition will be downloaded to the 'detached' directory only on the local server. Note that if after this you use the `ALTER TABLE ... ATTACH` query to add data to the table, the data will be added on all replicas (on one of the replicas it will be added from the 'detached' directory, and on the rest it will be loaded from neighboring replicas).
|
||||
|
||||
### Synchronicity of ALTER queries
|
||||
|
||||
For non-replicatable tables, all `ALTER` queries are performed synchronously. For replicatable tables, the query just adds instructions for the appropriate actions to `ZooKeeper`, and the actions themselves are performed as soon as possible. However, the query can wait for these actions to be completed on all the replicas.
|
||||
|
||||
For `ALTER ... ATTACH|DETACH|DROP` queries, you can use the `replication_alter_partitions_sync` setting to set up waiting.
|
||||
Possible values: `0` – do not wait; `1` – only wait for own execution (default); `2` – wait for all.
|
||||
|
||||
<a name="query_language_queries_show_databases"></a>
|
||||
|
||||
### Mutations
|
||||
|
||||
Mutations are an ALTER query variant that allows changing or deleting rows in a table. In contrast to standard `UPDATE` and `DELETE` queries that are intended for point data changes, mutations are intended for heavy operations that change a lot of rows in a table.
|
||||
|
||||
The functionality is in beta stage and is available starting with the 1.1.54388 version. Currently *MergeTree table engines are supported (both replicated and unreplicated).
|
||||
|
||||
Existing tables are ready for mutations as-is (no conversion necessary), but after the first mutation is applied to a table, its metadata format becomes incompatible with previous server versions and falling back to a previous version becomes impossible.
|
||||
|
||||
At the moment the `ALTER DELETE` command is available:
|
||||
|
||||
```sql
|
||||
ALTER TABLE [db.]table DELETE WHERE expr
|
||||
```
|
||||
|
||||
The expression `expr` must be of UInt8 type. The query deletes rows for which this expression evaluates to a non-zero value.
|
||||
|
||||
One query can contain several commands separated by commas.
|
||||
|
||||
For *MergeTree tables mutations execute by rewriting whole data parts. There is no atomicity - parts are substituted for mutated parts as soon as they are ready and a `SELECT` query that started executing during a mutation will see data from parts that have already been mutated along with data from parts that have not been mutated yet.
|
||||
|
||||
Mutations are totally ordered by their creation order and are applied to each part in that order. Mutations are also partially ordered with INSERTs - data that was inserted into the table before the mutation was submitted will be mutated and data that was inserted after that will not be mutated. Note that mutations do not block INSERTs in any way.
|
||||
|
||||
A mutation query returns immediately after the mutation entry is added (in case of replicated tables to ZooKeeper, for nonreplicated tables - to the filesystem). The mutation itself executes asynchronously using the system profile settings. To track the progress of mutations you can use the `system.mutations` table. A mutation that was successfully submitted will continue to execute even if ClickHouse servers are restarted. There is no way to roll back the mutation once it is submitted.
|
||||
|
||||
#### system.mutations table
|
||||
|
||||
The table contains information about mutations of MergeTree tables and their progress. Each mutation command is represented by a single row. The table has the following columns:
|
||||
|
||||
**database**, **table** - The name of the database and table to which the mutation was applied.
|
||||
|
||||
**mutation_id** - The ID of the mutation. For replicated tables these IDs correspond to znode names in the `<table_path_in_zookeeper>/mutations/` directory in ZooKeeper. For unreplicated tables the IDs correspond to file names in the data directory of the table.
|
||||
|
||||
**command** - The mutation command string (the part of the query after `ALTER TABLE [db.]table`).
|
||||
|
||||
**create_time** - When this mutation command was submitted for execution.
|
||||
|
||||
**block_numbers.partition_id**, **block_numbers.number** - A Nested column. For mutations of replicated tables contains one record for each partition: the partition ID and the block number that was acquired by the mutation (in each partition only parts that contain blocks with numbers less than the block number acquired by the mutation in that partition will be mutated). Because in non-replicated tables blocks numbers in all partitions form a single sequence, for mutatations of non-replicated tables the column will contain one record with a single block number acquired by the mutation.
|
||||
|
||||
**parts_to_do** - The number of data parts that need to be mutated for the mutation to finish.
|
||||
|
||||
**is_done** - Is the mutation done? Note that even if `parts_to_do = 0` it is possible that a mutation of a replicated table is not done yet because of a long-running INSERT that will create a new data part that will need to be mutated.
|
||||
|
155
docs/en/query_language/create.md
Normal file
155
docs/en/query_language/create.md
Normal file
@ -0,0 +1,155 @@
|
||||
## CREATE DATABASE
|
||||
|
||||
Creating db_name databases
|
||||
|
||||
```sql
|
||||
CREATE DATABASE [IF NOT EXISTS] db_name
|
||||
```
|
||||
|
||||
`A database` is just a directory for tables.
|
||||
If `IF NOT EXISTS` is included, the query won't return an error if the database already exists.
|
||||
|
||||
<a name="query_language-queries-create_table"></a>
|
||||
|
||||
## CREATE TABLE
|
||||
|
||||
The `CREATE TABLE` query can have several forms.
|
||||
|
||||
```sql
|
||||
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db.]name [ON CLUSTER cluster]
|
||||
(
|
||||
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
|
||||
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
|
||||
...
|
||||
) ENGINE = engine
|
||||
```
|
||||
|
||||
Creates a table named 'name' in the 'db' database or the current database if 'db' is not set, with the structure specified in brackets and the 'engine' engine.
|
||||
The structure of the table is a list of column descriptions. If indexes are supported by the engine, they are indicated as parameters for the table engine.
|
||||
|
||||
A column description is `name type` in the simplest case. Example: `RegionID UInt32`.
|
||||
Expressions can also be defined for default values (see below).
|
||||
|
||||
```sql
|
||||
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db.]name AS [db2.]name2 [ENGINE = engine]
|
||||
```
|
||||
|
||||
Creates a table with the same structure as another table. You can specify a different engine for the table. If the engine is not specified, the same engine will be used as for the `db2.name2` table.
|
||||
|
||||
```sql
|
||||
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db.]name ENGINE = engine AS SELECT ...
|
||||
```
|
||||
|
||||
Creates a table with a structure like the result of the `SELECT` query, with the 'engine' engine, and fills it with data from SELECT.
|
||||
|
||||
In all cases, if `IF NOT EXISTS` is specified, the query won't return an error if the table already exists. In this case, the query won't do anything.
|
||||
|
||||
### Default values
|
||||
|
||||
The column description can specify an expression for a default value, in one of the following ways:`DEFAULT expr`, `MATERIALIZED expr`, `ALIAS expr`.
|
||||
Example: `URLDomain String DEFAULT domain(URL)`.
|
||||
|
||||
If an expression for the default value is not defined, the default values will be set to zeros for numbers, empty strings for strings, empty arrays for arrays, and `0000-00-00` for dates or `0000-00-00 00:00:00` for dates with time. NULLs are not supported.
|
||||
|
||||
If the default expression is defined, the column type is optional. If there isn't an explicitly defined type, the default expression type is used. Example: `EventDate DEFAULT toDate(EventTime)` – the 'Date' type will be used for the 'EventDate' column.
|
||||
|
||||
If the data type and default expression are defined explicitly, this expression will be cast to the specified type using type casting functions. Example: `Hits UInt32 DEFAULT 0` means the same thing as `Hits UInt32 DEFAULT toUInt32(0)`.
|
||||
|
||||
Default expressions may be defined as an arbitrary expression from table constants and columns. When creating and changing the table structure, it checks that expressions don't contain loops. For INSERT, it checks that expressions are resolvable – that all columns they can be calculated from have been passed.
|
||||
|
||||
`DEFAULT expr`
|
||||
|
||||
Normal default value. If the INSERT query doesn't specify the corresponding column, it will be filled in by computing the corresponding expression.
|
||||
|
||||
`MATERIALIZED expr`
|
||||
|
||||
Materialized expression. Such a column can't be specified for INSERT, because it is always calculated.
|
||||
For an INSERT without a list of columns, these columns are not considered.
|
||||
In addition, this column is not substituted when using an asterisk in a SELECT query. This is to preserve the invariant that the dump obtained using `SELECT *` can be inserted back into the table using INSERT without specifying the list of columns.
|
||||
|
||||
`ALIAS expr`
|
||||
|
||||
Synonym. Such a column isn't stored in the table at all.
|
||||
Its values can't be inserted in a table, and it is not substituted when using an asterisk in a SELECT query.
|
||||
It can be used in SELECTs if the alias is expanded during query parsing.
|
||||
|
||||
When using the ALTER query to add new columns, old data for these columns is not written. Instead, when reading old data that does not have values for the new columns, expressions are computed on the fly by default. However, if running the expressions requires different columns that are not indicated in the query, these columns will additionally be read, but only for the blocks of data that need it.
|
||||
|
||||
If you add a new column to a table but later change its default expression, the values used for old data will change (for data where values were not stored on the disk). Note that when running background merges, data for columns that are missing in one of the merging parts is written to the merged part.
|
||||
|
||||
It is not possible to set default values for elements in nested data structures.
|
||||
|
||||
### Temporary tables
|
||||
|
||||
In all cases, if `TEMPORARY` is specified, a temporary table will be created. Temporary tables have the following characteristics:
|
||||
|
||||
- Temporary tables disappear when the session ends, including if the connection is lost.
|
||||
- A temporary table is created with the Memory engine. The other table engines are not supported.
|
||||
- The DB can't be specified for a temporary table. It is created outside of databases.
|
||||
- If a temporary table has the same name as another one and a query specifies the table name without specifying the DB, the temporary table will be used.
|
||||
- For distributed query processing, temporary tables used in a query are passed to remote servers.
|
||||
|
||||
In most cases, temporary tables are not created manually, but when using external data for a query, or for distributed `(GLOBAL) IN`. For more information, see the appropriate sections
|
||||
|
||||
Distributed DDL queries (ON CLUSTER clause)
|
||||
----------------------------------------------
|
||||
|
||||
The `CREATE`, `DROP`, `ALTER`, and `RENAME` queries support distributed execution on a cluster.
|
||||
For example, the following query creates the `all_hits` `Distributed` table on each host in `cluster`:
|
||||
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS all_hits ON CLUSTER cluster (p Date, i Int32) ENGINE = Distributed(cluster, default, hits)
|
||||
```
|
||||
|
||||
In order to run these queries correctly, each host must have the same cluster definition (to simplify syncing configs, you can use substitutions from ZooKeeper). They must also connect to the ZooKeeper servers.
|
||||
The local version of the query will eventually be implemented on each host in the cluster, even if some hosts are currently not available. The order for executing queries within a single host is guaranteed.
|
||||
` ALTER` queries are not yet supported for replicated tables.
|
||||
|
||||
## CREATE VIEW
|
||||
|
||||
```sql
|
||||
CREATE [MATERIALIZED] VIEW [IF NOT EXISTS] [db.]name [TO[db.]name] [ENGINE = engine] [POPULATE] AS SELECT ...
|
||||
```
|
||||
|
||||
Creates a view. There are two types of views: normal and MATERIALIZED.
|
||||
|
||||
When creating a materialized view, you must specify ENGINE – the table engine for storing data.
|
||||
|
||||
A materialized view works as follows: when inserting data to the table specified in SELECT, part of the inserted data is converted by this SELECT query, and the result is inserted in the view.
|
||||
|
||||
Normal views don't store any data, but just perform a read from another table. In other words, a normal view is nothing more than a saved query. When reading from a view, this saved query is used as a subquery in the FROM clause.
|
||||
|
||||
As an example, assume you've created a view:
|
||||
|
||||
```sql
|
||||
CREATE VIEW view AS SELECT ...
|
||||
```
|
||||
|
||||
and written a query:
|
||||
|
||||
```sql
|
||||
SELECT a, b, c FROM view
|
||||
```
|
||||
|
||||
This query is fully equivalent to using the subquery:
|
||||
|
||||
```sql
|
||||
SELECT a, b, c FROM (SELECT ...)
|
||||
```
|
||||
|
||||
Materialized views store data transformed by the corresponding SELECT query.
|
||||
|
||||
When creating a materialized view, you must specify ENGINE – the table engine for storing data.
|
||||
|
||||
A materialized view is arranged as follows: when inserting data to the table specified in SELECT, part of the inserted data is converted by this SELECT query, and the result is inserted in the view.
|
||||
|
||||
If you specify POPULATE, the existing table data is inserted in the view when creating it, as if making a `CREATE TABLE ... AS SELECT ...` . Otherwise, the query contains only the data inserted in the table after creating the view. We don't recommend using POPULATE, since data inserted in the table during the view creation will not be inserted in it.
|
||||
|
||||
A `SELECT` query can contain `DISTINCT`, `GROUP BY`, `ORDER BY`, `LIMIT`... Note that the corresponding conversions are performed independently on each block of inserted data. For example, if `GROUP BY` is set, data is aggregated during insertion, but only within a single packet of inserted data. The data won't be further aggregated. The exception is when using an ENGINE that independently performs data aggregation, such as `SummingMergeTree`.
|
||||
|
||||
The execution of `ALTER` queries on materialized views has not been fully developed, so they might be inconvenient. If the materialized view uses the construction ``TO [db.]name``, you can ``DETACH`` the view, run ``ALTER`` for the target table, and then ``ATTACH`` the previously detached (``DETACH``) view.
|
||||
|
||||
Views look the same as normal tables. For example, they are listed in the result of the `SHOW TABLES` query.
|
||||
|
||||
There isn't a separate query for deleting views. To delete a view, use `DROP TABLE`.
|
||||
|
@ -9,9 +9,9 @@ ClickHouse:
|
||||
> - Fully or partially stores dictionaries in RAM.
|
||||
- Periodically updates dictionaries and dynamically loads missing values. In other words, dictionaries can be loaded dynamically.
|
||||
|
||||
The configuration of external dictionaries is located in one or more files. The path to the configuration is specified in the [dictionaries_config](../operations/server_settings/settings.md#server_settings-dictionaries_config) parameter.
|
||||
The configuration of external dictionaries is located in one or more files. The path to the configuration is specified in the [dictionaries_config](../../operations/server_settings/settings.md#server_settings-dictionaries_config) parameter.
|
||||
|
||||
Dictionaries can be loaded at server startup or at first use, depending on the [dictionaries_lazy_load](../operations/server_settings/settings.md#server_settings-dictionaries_lazy_load) setting.
|
||||
Dictionaries can be loaded at server startup or at first use, depending on the [dictionaries_lazy_load](../../operations/server_settings/settings.md#server_settings-dictionaries_lazy_load) setting.
|
||||
|
||||
The dictionary config file has the following format:
|
||||
|
@ -52,7 +52,7 @@ Example of settings:
|
||||
Setting fields:
|
||||
|
||||
- `path` – The absolute path to the file.
|
||||
- `format` – The file format. All the formats described in "[Formats](../formats/index.md#formats)" are supported.
|
||||
- `format` – The file format. All the formats described in "[Formats](../../interfaces/formats.md#formats)" are supported.
|
||||
|
||||
<a name="dicts-external_dicts_dict_sources-executable"></a>
|
||||
|
||||
@ -74,7 +74,7 @@ Example of settings:
|
||||
Setting fields:
|
||||
|
||||
- `command` – The absolute path to the executable file, or the file name (if the program directory is written to `PATH`).
|
||||
- `format` – The file format. All the formats described in "[Formats](../formats/index.md#formats)" are supported.
|
||||
- `format` – The file format. All the formats described in "[Formats](../../interfaces/formats.md#formats)" are supported.
|
||||
|
||||
<a name="dicts-external_dicts_dict_sources-http"></a>
|
||||
|
||||
@ -93,12 +93,12 @@ Example of settings:
|
||||
</source>
|
||||
```
|
||||
|
||||
In order for ClickHouse to access an HTTPS resource, you must [configure openSSL](../operations/server_settings/settings.md#server_settings-openSSL) in the server configuration.
|
||||
In order for ClickHouse to access an HTTPS resource, you must [configure openSSL](../../operations/server_settings/settings.md#server_settings-openSSL) in the server configuration.
|
||||
|
||||
Setting fields:
|
||||
|
||||
- `url` – The source URL.
|
||||
- `format` – The file format. All the formats described in "[Formats](../formats/index.md#formats)" are supported.
|
||||
- `format` – The file format. All the formats described in "[Formats](../../interfaces/formats.md#formats)" are supported.
|
||||
|
||||
<a name="dicts-external_dicts_dict_sources-odbc"></a>
|
||||
|
||||
@ -361,7 +361,7 @@ Example of settings:
|
||||
|
||||
Setting fields:
|
||||
|
||||
- `host` – The ClickHouse host. If it is a local host, the query is processed without any network activity. To improve fault tolerance, you can create a [Distributed](../table_engines/distributed.md#table_engines-distributed) table and enter it in subsequent configurations.
|
||||
- `host` – The ClickHouse host. If it is a local host, the query is processed without any network activity. To improve fault tolerance, you can create a [Distributed](../../operations/table_engines/distributed.md#table_engines-distributed) table and enter it in subsequent configurations.
|
||||
- `port` – The port on the ClickHouse server.
|
||||
- `user` – Name of the ClickHouse user.
|
||||
- `password` – Password of the ClickHouse user.
|
@ -242,7 +242,7 @@ arrayPushBack(array, single_value)
|
||||
**Arguments**
|
||||
|
||||
- `array` – Array.
|
||||
- `single_value` – A single value. Only numbers can be added to an array with numbers, and only strings can be added to an array of strings. When adding numbers, ClickHouse automatically sets the `single_value` type for the data type of the array. For more information about ClickHouse data types, read the section "[Data types](../data_types/index.md#data_types)".
|
||||
- `single_value` – A single value. Only numbers can be added to an array with numbers, and only strings can be added to an array of strings. When adding numbers, ClickHouse automatically sets the `single_value` type for the data type of the array. For more information about ClickHouse data types, read the section "[Data types](../../data_types/index.md#data_types)".
|
||||
|
||||
**Example**
|
||||
|
||||
@ -267,7 +267,7 @@ arrayPushFront(array, single_value)
|
||||
**Arguments**
|
||||
|
||||
- `array` – Array.
|
||||
- `single_value` – A single value. Only numbers can be added to an array with numbers, and only strings can be added to an array of strings. When adding numbers, ClickHouse automatically sets the `single_value` type for the data type of the array. For more information about ClickHouse data types, read the section "[Data types](../data_types/index.md#data_types)".
|
||||
- `single_value` – A single value. Only numbers can be added to an array with numbers, and only strings can be added to an array of strings. When adding numbers, ClickHouse automatically sets the `single_value` type for the data type of the array. For more information about ClickHouse data types, read the section "[Data types](../../data_types/index.md#data_types)".
|
||||
|
||||
**Example**
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user