mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-24 16:42:05 +00:00
Update clickhouse-v22.2-released.md
This looks great. I just proposed some grammatical changes.
This commit is contained in:
parent
a66a217b3f
commit
a736c58f87
@ -6,57 +6,55 @@ author: 'Alexey Milovidov'
|
||||
tags: ['company', 'community']
|
||||
---
|
||||
|
||||
We prepared a new ClickHouse release 22.2, so it's nice if you have tried it on 2022-02-22. If not, you can try it today.
|
||||
|
||||
The 22.2 release includes 2140 new commits from 118 contributors, including 41 new contributors:
|
||||
On 2022-02-22, we released version 22.2 of ClickHouse! (That's a lot of 2's!) This latest release includes 2,140 new commits from 118 contributors, including 41 new contributors:
|
||||
|
||||
> Aaron Katz, Andre Marianiello, Andrew, Andrii Buriachevskyi, Brian Hunter, CoolT2, Federico Rodriguez, Filippov Denis, Gaurav Kumar, Geoff Genz, HarryLeeIBM, Heena Bansal, ILya Limarenko, Igor Nikonov, IlyaTsoi, Jake Liu, JaySon-Huang, Lemore, Leonid Krylov, Michail Safronov, Mikhail Fursov, Nikita, RogerYK, Roy Bellingan, Saad Ur Rahman, W, Yakov Olkhovskiy, alexeypavlenko, cnmade, grantovsky, hanqf-git, liuneng1994, mlkui, s-kat, tesw yew isal, vahid-sohrabloo, yakov-olkhovskiy, zhifeng, zkun, zxealous, 박동철.
|
||||
|
||||
Let me tell what is the most interesting in 22.2.
|
||||
Let me tell what is most interesting in 22.2...
|
||||
|
||||
## Projections are production ready
|
||||
|
||||
Projections allow you to have multiple data representations in the same table. For example, you can have data aggregations along with the raw data. There are no restrictions on what aggregate functions can be used - you can have count distinct, quantiles and whatever you want. You can have data in multiple different sorting orders. ClickHouse will automatically select the most suitable projection for your query, so the query will be automatically optimized.
|
||||
Projections allow you to have multiple data representations in the same table. For example, you can have data aggregations along with the raw data. There are no restrictions on which aggregate functions can be used - you can have count distinct, quantiles, or whatever you want. You can have data in multiple different sorting orders. ClickHouse will automatically select the most suitable projection for your query, so the query will be automatically optimized.
|
||||
|
||||
Projections are somewhat similar to Materialized Views, which also allows you to have incremental aggregation and multiple sorting orders. But unlike Materialized Views, projections are updated atomically and consistently with the main table. The data for projections is being stored in the same "data parts" of the table and is being merged in the same way as the main data.
|
||||
Projections are somewhat similar to Materialized Views, which also allow you to have incremental aggregation and multiple sorting orders. But unlike Materialized Views, projections are updated atomically and consistently with the main table. The data for projections is being stored in the same "data parts" of the table and is being merged in the same way as the main data.
|
||||
|
||||
The feature is developed by **Amos Bird**, a prominent ClickHouse contributor. The [prototype](https://github.com/ClickHouse/ClickHouse/pull/20202) has been available since Feb 2021, it has been merged in the main codebase by **Nikolai Kochetov** in May 2021 under experimental flag, and after 21 follow-up pull requests we ensured that it passed the full set of test suites and enabled it by default.
|
||||
The feature was developed by **Amos Bird**, a prominent ClickHouse contributor. The [prototype](https://github.com/ClickHouse/ClickHouse/pull/20202) has been available since Feb 2021, it has been merged in the main codebase by **Nikolai Kochetov** in May 2021 under experimental flag, and after 21 follow-up pull requests we ensured that it passed the full set of test suites and enabled it by default.
|
||||
|
||||
Read an example of how to optimize queries with projections [in our docs](https://clickhouse.com/docs/en/getting-started/example-datasets/uk-price-paid/#speedup-with-projections).
|
||||
|
||||
## Control of file creation and rewriting on data export
|
||||
|
||||
When you export your data with `INSERT INTO TABLE FUNCTION` statement into `file`, `s3` or `hdfs` and the target file already exists, now you can control how to deal with it: you can append new data into the file if it is possible or to rewrite it with a new data, or create another file with similar name like 'data.1.parquet.gz'.
|
||||
When you export your data with an `INSERT INTO TABLE FUNCTION` statement into `file`, `s3` or `hdfs` and the target file already exists, now you can control how to deal with it: you can append new data into the file if it is possible, or rewrite it with a new data, or create another file with a similar name like 'data.1.parquet.gz'.
|
||||
|
||||
Some storage systems like `s3` and some formats like `Parquet` don't support data appending. In previous ClickHouse versions, if you insert multiple times into a file with Parquet data format, you will end up with a file that is not recognized by other systems. Now you can choose between throwing exceptions on subsequent inserts or creating more files.
|
||||
|
||||
So, the new settings were introduced: `s3_truncate_on_insert`, `s3_create_new_file_on_insert`, `hdfs_truncate_on_insert`, `hdfs_create_new_file_on_insert`, `engine_file_allow_create_multiple_files`.
|
||||
So, new settings were introduced: `s3_truncate_on_insert`, `s3_create_new_file_on_insert`, `hdfs_truncate_on_insert`, `hdfs_create_new_file_on_insert`, `engine_file_allow_create_multiple_files`.
|
||||
|
||||
This feature [is developed](https://github.com/ClickHouse/ClickHouse/pull/33302) by **Pavel Kruglov**.
|
||||
This feature [was developed](https://github.com/ClickHouse/ClickHouse/pull/33302) by **Pavel Kruglov**.
|
||||
|
||||
## Custom deduplication token
|
||||
|
||||
`ReplicatedMergeTree` and `MergeTree` types of tables implement block-level deduplication. When a block of data is inserted, it's cryptographic hash is calculated and if the same block was already inserted before, then the duplicate is skipped and insert query succeeds. This makes it possible to implement exactly-once semantics for inserts.
|
||||
`ReplicatedMergeTree` and `MergeTree` types of tables implement block-level deduplication. When a block of data is inserted, it's cryptographic hash is calculated and if the same block was already inserted before, then the duplicate is skipped and the insert query succeeds. This makes it possible to implement exactly-once semantics for inserts.
|
||||
|
||||
In ClickHouse version 22.2 you can provide your own deduplication token instead of an automatically calculated hash. This makes sense if you already have batch identifiers from some other system and you want to reuse them. It also makes sense when blocks can be identical but they should actually be inserted multiple times. Or the opposite - when blocks contain some random data and you want to deduplicate only by significant columns.
|
||||
|
||||
This is implemented by adding a setting `insert_deduplication_token`. The feature contributed by **Igor Nikonov**.
|
||||
This is implemented by adding the setting `insert_deduplication_token`. The feature was contributed by **Igor Nikonov**.
|
||||
|
||||
## DEFAULT keyword for INSERT
|
||||
|
||||
A small addition for SQL compatibility - now we allow using `DEFAULT` keyword instead of a value in `INSERT INTO ... VALUES` statement. It looks like this:
|
||||
A small addition for SQL compatibility - now we allow using the `DEFAULT` keyword instead of a value in `INSERT INTO ... VALUES` statement. It looks like this:
|
||||
|
||||
`INSERT INTO test VALUES (1, 'Hello', DEFAULT)`
|
||||
|
||||
Thanks for this feature to **Andrii Buriachevskyi**.
|
||||
Thanks to **Andrii Buriachevskyi** for this feature.
|
||||
|
||||
## EPHEMERAL columns
|
||||
|
||||
A column in a table can have `DEFAULT` expression like `c INT DEFAULT a + b`. In ClickHouse you can also use `MATERIALIZED` instead of `DEFAULT` if you want the column to be always calculated with the provided expression instead of allowing a user to insert data. And you can use `ALIAS` if you don't want the column to be stored at all but instead to be calculated on the fly if referenced.
|
||||
A column in a table can have a `DEFAULT` expression like `c INT DEFAULT a + b`. In ClickHouse you can also use `MATERIALIZED` instead of `DEFAULT` if you want the column to be always calculated with the provided expression instead of allowing a user to insert data. And you can use `ALIAS` if you don't want the column to be stored at all but instead to be calculated on the fly if referenced.
|
||||
|
||||
Since version 22.2 a new type of column is added: `EPHEMERAL` column. The user can insert data into this column but the column is not stored in a table, it's ephemeral. The purpose of this column is to provide data to calculate other columns that can reference it with `DEFAULT` or `MATERIALIZED` expressions.
|
||||
|
||||
This feature is made by **Yakov Olkhovskiy**.
|
||||
This feature was made by **Yakov Olkhovskiy**.
|
||||
|
||||
## Improvements for multi-disk configuration
|
||||
|
||||
@ -66,7 +64,7 @@ Since version 22.2 ClickHouse can automatically repair broken disks without serv
|
||||
|
||||
This feature is implemented by **Amos Bird** and is already being used for more than 1.5 years in production in KuaiShou.
|
||||
|
||||
Another improvement is the option to specify TTL MOVE TO DISK/VOLUME **IF EXISTS**. It allows replicas with non-uniform disk configuration and to have one replica to move old data to cold storage while another replica will have all the data on hot storage. Data will be moved only on replicas that have the specified disk or volume, hence *if exists*. This is developed by **Anton Popov**.
|
||||
Another improvement is the option to specify TTL MOVE TO DISK/VOLUME **IF EXISTS**. It allows replicas with non-uniform disk configuration and to have one replica to move old data to cold storage while another replica has all the data on hot storage. Data will be moved only on replicas that have the specified disk or volume, hence *if exists*. This was developed by **Anton Popov**.
|
||||
|
||||
## Flexible memory limits
|
||||
|
||||
@ -82,11 +80,11 @@ This experimental feature is implemented by **Dmitry Novik** and is continuing t
|
||||
|
||||
Now we allow comments starting with `# ` or `#!`, similarly to MySQL. The variant with `#!` allows using shell scripts with "shebang" interpreted by `clickhouse-local`.
|
||||
|
||||
This feature is contributed by **Aaron Katz**. Very nice.
|
||||
This feature was contributed by **Aaron Katz**. Very nice.
|
||||
|
||||
|
||||
## ... And Many More
|
||||
## And many more...
|
||||
|
||||
Maxim Kita, Danila Kutenin, Anton Popov, zhanglistar, Federico Rodriguez, Raúl Marín, Amos Bird and Alexey Milovidov have contributed a ton of performance optimizations for this release. We are obsessed with high performance, as usual :)
|
||||
Maxim Kita, Danila Kutenin, Anton Popov, zhanglistar, Federico Rodriguez, Raúl Marín, Amos Bird and Alexey Milovidov have contributed a ton of performance optimizations for this release. We are obsessed with high performance, as usual. :)
|
||||
|
||||
Read the [full changelog](https://github.com/ClickHouse/ClickHouse/blob/master/CHANGELOG.md) for the 22.2 release and follow [the roadmap](https://github.com/ClickHouse/ClickHouse/issues/32513).
|
||||
|
Loading…
Reference in New Issue
Block a user