ClickHouse/website/blog/en/2021/clickhouse-v21.12-released.md

---
title: 'What's New in ClickHouse 21.12'
image: 'https://blog-images.clickhouse.com/en/2021/clickhouse-v21-12/featured.jpg'
date: '2021-12-16'
author: '[Alexey Milovidov](https://github.com/alexey-milovidov)', '[Christoph Wurm](https://github.com/cwurm)'
tags: ['company', 'community']
---

We're continuing our monthly release cadence. The 21.12 Christmas release includes 2460 new commits from 125 contributors, including 42 new contributors:

Alex Cao, Amr Alaa, Andrey Torsunov, Constantine Peresypkin, Dmitriy Dorofeev, Egor O'Sten, Elykov Alexandr, Evgeny, Frank Chen, LB, Natasha Murashkina, Peignon Melvyn, Rich Raposa, Roman Chyrva, Roman, SuperDJY, Thom O'Connor, Timur Magomedov, Tom Risse, Tomáš Hromada, cfcz48, cgp, cms, cmsxbc, congbaoyangrou, dongyifeng, frank chen, freedomDR, jus1096, khamadiev, laurieliyang, leosunli, liyang830, loneylee, michael1589, msaf1980, p0ny, qieqieplus, spume, sunlisheng, yandd, zhanghuajie.

If you are wondering, this list is generated by the following command:

```
clickhouse-local --query "
    SELECT arrayStringConcat(groupArray(s), ', ')
    FROM file('contributors-21.12.txt', LineAsString, 's String')
    WHERE s NOT IN (
        SELECT *
        FROM file('contributors-21.11.txt', LineAsString, 's String'))
    FORMAT TSVRaw"
```

And to list the contributors, you can always run the
```
SELECT * FROM system.contributors
```
query on your production server.

Let's highlight some of the new capabilities in 21.12:


## ClickHouse Keeper is Feature Complete

In 21.12 `clickhouse-keeper` started to support ["four letter commands"](https://zookeeper.apache.org/doc/r3.4.8/zookeeperAdmin.html#sc_zkCommands) for status and monitoring. This feature is contributed by **JackyWoo** and reviewed by **Alexander Sapin** (the author of ClickHouse Keeper).

It was the only missing feature to implement. In this release, clickhouse-keeper is still considered in pre-production stage, but many companies already started to evaluate and use it as a replacement of ZooKeeper. You can also start using clickhouse-keeper in your testing environments and we will appreciate your feedback.

ClickHouse Keeper development started in Sep 2020, more than a year ago. It was a long road, and most of the efforts were to ensure correctness and stability in unusual and exceptional scenarios. It is covered by [Jepsen](https://jepsen.io/) tests (including ZooKeeper tests and [new introduced tests](https://github.com/ClickHouse/ClickHouse/tree/master/tests/jepsen.clickhouse-keeper)), continuous randomized stress testing with ClickHouse functional and integration tests. It is started to be tested in Yandex Cloud and among our best friends. If you're pretending to be our best friend, you can also do it.

**How does this help you?**

ClickHouse Keeper is a drop-in replacement for ZooKeeper. It implements ZooKeeper wire protocol and data model, but does it better.

In contrast to ZooKeeper, there are no issues with zxid overflow or packet sizes. It has better memory usage and it does not require JVM tuning (because it does not use JVM). Logs and snapshots are compressed (about 10x typical) and checksummed. It can run as a separate process or directly inside clickhouse-server. You can use it with ClickHouse or with your Kafkas and Hadoops as well.

[More info](http://presentations.clickhouse.tech/meetup54/keeper.pdf).


## Partitions For INSERT INTO File, URL And HDFS Storages

When using the table engines `File`, `URL`, and `HDFS` ClickHouse now supports partitions. When creating a table you can specify the partition key using the `PARTITION BY` clause e.g. `CREATE TABLE hits_files (...) ENGINE = File(TabSeparated) PARTITION BY toYYYYMM(EventDate)`.

Similarly, when exporting data from ClickHouse using the `file`, `url`, and `hdfs` table functions you can now specify that the data is to be partitioned into multiple files using a `PARTITION BY` clause. For example, `INSERT INTO TABLE FUNCTION file('path/hits_{_partition_id}', 'TSV', 'columns...') PARTITION BY toYYYYMM(EventDate) VALUES ...` will create as many files as there are unique months in the dataset.

The `s3` table function has supported partitioned writes since ClickHouse 21.10.

**How does this help you?**

If data is split into multiple files, then `SELECT` query will be automatically parallelized. Example:

```
SELECT user_id, count() FROM s3(
    'https://s3.us-east-2.amazonaws.com/.../*.csv.zstd',
    '...', '...',
    CSV,
    'user_id UInt64, ...')
```

You can even parallelize data processing across distributed compute cluster if you use `s3Cluster` table function:

```
SELECT user_id, count() FROM s3Cluster(
    my_cluster,
    'https://s3.us-east-2.amazonaws.com/.../*.csv.zstd',
    '...',
    '...', CSV,
    'user_id UInt64, ...')
```

It can also be used for integrations with external data processing tools that consumes data from `s3`.


## FROM INFILE in clickhouse-client now supports glob patterns and parallel reading

Just write:

```
INSERT INTO my_table FROM INFILE '*.csv.gz' FORMAT CSV
```

Glob patterns support `*`, `?` and `{n..m}` with `{1..10}` or (aligned) `{01..10}` forms.
This query will be automatically parallelized, it will also automatically detect compression format from file extension and decompress transparently.

**How does this help you?**

Now you don't have to recall how to write parallel for loop in your command line shell. clickhouse-client will do everything for you, it works intuitively and fast.


## Support for INTERVAL operator inside WITH FILL modifier for ORDER BY clause

What's the... WITH FILL modifier in ORDER BY clause? Just look at the example.

```
:) SELECT EventDate, count() FROM test.hits WHERE CounterID = 2841673 GROUP BY EventDate ORDER BY EventDate

┌──EventDate─┬─count()─┐
│ 2014-03-17 │       3 │
│ 2014-03-19 │       6 │
│ 2014-03-21 │       7 │
│ 2014-03-22 │       6 │
└────────────┴─────────┘
```

We have the report with Mar 17th, 19th, 21th, 22th. But Mar 18th and 20th are missing, because there is no data for these dates.
And this is how it works in all SQL databases.

But ClickHouse also has quite unique and neat `WITH FILL` modifier for `ORDER BY clause`.

You just write:
```
SELECT EventDate, count() FROM test.hits WHERE CounterID = 2841673 GROUP BY EventDate
ORDER BY EventDate WITH FILL STEP 1

┌──EventDate─┬─count()─┐
│ 2014-03-17 │       3 │
│ 2014-03-18 │       0 │
│ 2014-03-19 │       6 │
│ 2014-03-20 │       0 │
│ 2014-03-21 │       7 │
│ 2014-03-22 │       6 │
└────────────┴─────────┘
```

And missing data is automatically filled.

You can also add `FROM` and `TO`:

```
ORDER BY date WITH FILL FROM '2014-03-01'::Date TO '2014-03-31'::Date STEP 1;
```

And it will automatically fill missing rows in the report.

The STEP can be arbitrary number. But what to do if you want fill missing dates for report by months? You cannot just write STEP 30 or STEP 31 because months contain different number of days...

Since ClickHouse version 21.12 you can do it like this:

```
ORDER BY EventDate WITH FILL STEP INTERVAL 1 MONTH
```

`INTERVAL` is a standard SQL operator, you can use SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER and YEAR.

**How does this help you?**

It allows to avoid postprocessing step for your reports.


## Add Support For "Identifier" Table and Database Query Parameters

ClickHouse has support for parameterized queries.

```
SELECT uniq(user_id) FROM table WHERE website = {name:String}
```

It allows to safely substitute parameters without the risk of SQL injections:

```
curl https://clickhouse-server:8443/?param_name=upyachka -d 'SELECT uniq(user_id) FROM table WHERE website = {name:String}'
```

You can even create customized API handlers for clickhouse-server based on prepared queries.

Since version 21.12 we introduce support for using parameters for tables and databases in your queries. This is implemented with `Identifier` table parameter:

```
SELECT uniq(user_id) FROM {tbl:Identifier}
```

**How does this help you?**

Let ClickHouse do the heavy-lifting and keep your scripts safe and secure.


## Bool Data Type

This feature is experimental in version 21.12.

ClickHouse now natively supports a `Bool` data type. It allows to represent values as "true"/"false" during data import and export in text formats. It can also be adjusted to anything else using the settings `bool_true_representation` and `bool_false_representation` (for example, "yes" and "no").

**How does this help you?**

Native boolean data types exist today in other databases that are often integrated with ClickHouse, such as PostgreSQL. The `Bool` data type in ClickHouse will make it more compatible with existing code and ease migration from other databases.

Also it simplifies data ingestion from various text sources.


## Query Optimizations With Table Constraints

This feature is [contributed](https://github.com/ClickHouse/ClickHouse/pull/18787) by **Nikita Vasilev**. Nikita is one of the most notable ClickHouse contributors. He started in 2019 by introducing data skipping indices into ClickHouse, then continued in 2020 with SSD-optimized key-value dictionaries and now contributed the new advancements in the query optimizer. This feature is reviewed by **Anton Popov**.

So, what optimizations? ClickHouse already allows to specify constraints for tables:

```
CREATE TABLE
(
    URL String,
    Domain String,
    CONSTRAINT validate CHECK isValidUTF8(URL) AND length(URL) BETWEEN 10 AND 10000,
    CONSTRAINT my_constraint CHECK Domain = domainWithoutWWW(URL)
) ...
```

Constraints are checked on INSERT. In this example we validate the URL and check that Domain column actually contains the domain of URL.

Since version 21.12 constraints can also automatically optimize your queries! For example, if you write:

```
SELECT count() FROM hits WHERE domainWithoutWWW(URL) = 'ghe.clickhouse.tech'
```

The query can be automatically rewritten to:

```
SELECT count() FROM hits WHERE Domain = 'ghe.clickhouse.tech'
```

because `Domain` column is smaller, more compressable, will be faster to read and it does not require calculation of the domain from URL.
The only thing you need is to enable the `optimize_using_constraints` and `optimize_substitute_columns` settings.

As a bonus, new type of constraints is introduced: `ASSUME`.

```
CONSTRAINT my_constraint ASSUME Domain = domainWithoutWWW(URL)
```

This type of constraint will not check anything on INSERT, but still use the assumption to optimize the queries.

It can also do logical inference, simplify the conditions and remove the conditions that are proved to be satisfied by constraints.
It is controlled by `convert_query_to_cnf` setting. You can also enable `optimize_append_index` setting. With this setting ClickHouse will derive more consitions on the table primary key.

The idea is so powerful that we cannot resist adding one more feature: *indices for hypothesis*.

```
INDEX my_index (a < b) TYPE hypothesis GRANULARITY 1
```

The expression is checked and the result (true/false) is written as an index for query optimization.

**How does this help you?**

Especially in large ClickHouse deployments with many complex tables it can be hard for users to always be up to date on the best way to query a given dataset. Constraints can help optimize queries without having to change the query structure itself. They can also make it easier to make changes to tables. 

For example, let's say you have a table containing web requests and it includes a URL column that contains the full URL of each request. Many times, users will want to know the top level domain (.com, .co.uk, etc.), something ClickHouse provides the `topLevelDomain` function to calculate. If you discover that many people are using this function you might decide to create a new materialized column that pre-calculates the top level domain for each record.

Rather than tell all your users to change their queries you can use a table constraint to tell ClickHouse that each time a user tries to call the `topLevelDomain` function the request should be rewritten to use the new materialized column.


## Read Large Remote Files In Chunks

ClickHouse combines fast query engine and efficient data storage. It also allows to integrate external data sources for data import and export or even to process external datasets on the fly without the need for data import or preprocessing.

When reading large files in `Parquet`, `ORC`, and `Arrow` format using the `s3`, `url`, and `hdfs` table functions, ClickHouse will now automatically choose whether to read the entire file at once or read parts of it incrementally. This is now enabled by default and the setting `remote_read_min_bytes_for_seek` controls when to switch from reading it all to reading in chunks. The default is 1MiB.

`Parquet`, `ORC`, and `Arrow` are column-oriented formats (quite similar to ClickHouse Native format) and now we can read only requested columns even if they are being read from remote HTTP server with the `url` table function (range requests will be performed to skip unneeded data).

This feature is implemented by **Kseniia Sumarokova**.

**How does this help our ClickHouse Users?**

In previous versions, when reading files in Arrow-based formats from remote locations with the `s3`, `url`, and `hdfs` table functions, ClickHouse would always read the entire file into memory. This works well when the files are small but will cause excessive memory usage or not work at all when the files are large. With this change, ClickHouse will read large files in chunks to keep memory usage in check and is now able to read even very large files.


## ... And Many More

Read the [full changelog](https://github.com/ClickHouse/ClickHouse/blob/master/CHANGELOG.md) for 21.12 "Christmas" release for the full list of the gifts from [ClickHouse Team](https://clickhouse.com/careers/).
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00			`---`
21.12 blog post 2021-12-16 03:16:58 +00:00			`title: 'What's New in ClickHouse 21.12'`
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00			`image: 'https://blog-images.clickhouse.com/en/2021/clickhouse-v21-12/featured.jpg'`
21.12 blog post 2021-12-16 03:16:58 +00:00			`date: '2021-12-16'`
			`author: '[Alexey Milovidov](https://github.com/alexey-milovidov)', '[Christoph Wurm](https://github.com/cwurm)'`
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00			`tags: ['company', 'community']`
			`---`

21.12 blog post 2021-12-16 03:16:58 +00:00			`We're continuing our monthly release cadence. The 21.12 Christmas release includes 2460 new commits from 125 contributors, including 42 new contributors:`
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00
21.12 blog post 2021-12-16 03:16:58 +00:00			`Alex Cao, Amr Alaa, Andrey Torsunov, Constantine Peresypkin, Dmitriy Dorofeev, Egor O'Sten, Elykov Alexandr, Evgeny, Frank Chen, LB, Natasha Murashkina, Peignon Melvyn, Rich Raposa, Roman Chyrva, Roman, SuperDJY, Thom O'Connor, Timur Magomedov, Tom Risse, Tomáš Hromada, cfcz48, cgp, cms, cmsxbc, congbaoyangrou, dongyifeng, frank chen, freedomDR, jus1096, khamadiev, laurieliyang, leosunli, liyang830, loneylee, michael1589, msaf1980, p0ny, qieqieplus, spume, sunlisheng, yandd, zhanghuajie.`
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00
21.12 blog post 2021-12-16 03:16:58 +00:00			`If you are wondering, this list is generated by the following command:`

			```
			`clickhouse-local --query "`
			`SELECT arrayStringConcat(groupArray(s), ', ')`
			`FROM file('contributors-21.12.txt', LineAsString, 's String')`
			`WHERE s NOT IN (`
			`SELECT *`
			`FROM file('contributors-21.11.txt', LineAsString, 's String'))`
			`FORMAT TSVRaw"`
			```

			`And to list the contributors, you can always run the`
			```
			`SELECT * FROM system.contributors`
			```
			`query on your production server.`

			`Let's highlight some of the new capabilities in 21.12:`


			`## ClickHouse Keeper is Feature Complete`

			In 21.12 `clickhouse-keeper` started to support ["four letter commands"](https://zookeeper.apache.org/doc/r3.4.8/zookeeperAdmin.html#sc_zkCommands) for status and monitoring. This feature is contributed by JackyWoo and reviewed by Alexander Sapin (the author of ClickHouse Keeper).

			`It was the only missing feature to implement. In this release, clickhouse-keeper is still considered in pre-production stage, but many companies already started to evaluate and use it as a replacement of ZooKeeper. You can also start using clickhouse-keeper in your testing environments and we will appreciate your feedback.`

			ClickHouse Keeper development started in Sep 2020, more than a year ago. It was a long road, and most of the efforts were to ensure correctness and stability in unusual and exceptional scenarios. It is covered by [Jepsen](https://jepsen.io/) tests (including ZooKeeper tests and [new introduced tests](https://github.com/ClickHouse/ClickHouse/tree/master/tests/jepsen.clickhouse-keeper)), continuous randomized stress testing with ClickHouse functional and integration tests. It is started to be tested in Yandex Cloud and among our best friends. If you're pretending to be our best friend, you can also do it.
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00
			`How does this help you?`

21.12 blog post 2021-12-16 03:16:58 +00:00			`ClickHouse Keeper is a drop-in replacement for ZooKeeper. It implements ZooKeeper wire protocol and data model, but does it better.`

			`In contrast to ZooKeeper, there are no issues with zxid overflow or packet sizes. It has better memory usage and it does not require JVM tuning (because it does not use JVM). Logs and snapshots are compressed (about 10x typical) and checksummed. It can run as a separate process or directly inside clickhouse-server. You can use it with ClickHouse or with your Kafkas and Hadoops as well.`

			`[More info](http://presentations.clickhouse.tech/meetup54/keeper.pdf).`
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00

21.12 blog post 2021-12-16 03:16:58 +00:00			`## Partitions For INSERT INTO File, URL And HDFS Storages`
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00
21.12 blog post 2021-12-16 03:16:58 +00:00			When using the table engines `File`, `URL`, and `HDFS` ClickHouse now supports partitions. When creating a table you can specify the partition key using the `PARTITION BY` clause e.g. `CREATE TABLE hits_files (...) ENGINE = File(TabSeparated) PARTITION BY toYYYYMM(EventDate)`.
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00
Update clickhouse-v21.12-released.md 2021-12-16 03:30:31 +00:00			Similarly, when exporting data from ClickHouse using the `file`, `url`, and `hdfs` table functions you can now specify that the data is to be partitioned into multiple files using a `PARTITION BY` clause. For example, `INSERT INTO TABLE FUNCTION file('path/hits_{_partition_id}', 'TSV', 'columns...') PARTITION BY toYYYYMM(EventDate) VALUES ...` will create as many files as there are unique months in the dataset.
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00
21.12 blog post 2021-12-16 03:16:58 +00:00			The `s3` table function has supported partitioned writes since ClickHouse 21.10.
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00
			`How does this help you?`

21.12 blog post 2021-12-16 03:16:58 +00:00			If data is split into multiple files, then `SELECT` query will be automatically parallelized. Example:
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00
21.12 blog post 2021-12-16 03:16:58 +00:00			```
			`SELECT user_id, count() FROM s3(`
			`'https://s3.us-east-2.amazonaws.com/.../*.csv.zstd',`
			`'...', '...',`
			`CSV,`
			`'user_id UInt64, ...')`
			```
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00
21.12 blog post 2021-12-16 03:16:58 +00:00			You can even parallelize data processing across distributed compute cluster if you use `s3Cluster` table function:
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00
21.12 blog post 2021-12-16 03:16:58 +00:00			```
			`SELECT user_id, count() FROM s3Cluster(`
			`my_cluster,`
			`'https://s3.us-east-2.amazonaws.com/.../*.csv.zstd',`
			`'...',`
			`'...', CSV,`
			`'user_id UInt64, ...')`
			```
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00
21.12 blog post 2021-12-16 03:16:58 +00:00			It can also be used for integrations with external data processing tools that consumes data from `s3`.
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00
21.12 blog post 2021-12-16 03:16:58 +00:00
			`## FROM INFILE in clickhouse-client now supports glob patterns and parallel reading`

			`Just write:`

			```
			`INSERT INTO my_table FROM INFILE '*.csv.gz' FORMAT CSV`
			```

			Glob patterns support `*`, `?` and `{n..m}` with `{1..10}` or (aligned) `{01..10}` forms.
			`This query will be automatically parallelized, it will also automatically detect compression format from file extension and decompress transparently.`
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00
			`How does this help you?`

21.12 blog post 2021-12-16 03:16:58 +00:00			`Now you don't have to recall how to write parallel for loop in your command line shell. clickhouse-client will do everything for you, it works intuitively and fast.`
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00

21.12 blog post 2021-12-16 03:16:58 +00:00			`## Support for INTERVAL operator inside WITH FILL modifier for ORDER BY clause`
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00
21.12 blog post 2021-12-16 03:16:58 +00:00			`What's the... WITH FILL modifier in ORDER BY clause? Just look at the example.`
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00
21.12 blog post 2021-12-16 03:16:58 +00:00			```
			`:) SELECT EventDate, count() FROM test.hits WHERE CounterID = 2841673 GROUP BY EventDate ORDER BY EventDate`

			`┌──EventDate─┬─count()─┐`
			`│ 2014-03-17 │ 3 │`
			`│ 2014-03-19 │ 6 │`
			`│ 2014-03-21 │ 7 │`
			`│ 2014-03-22 │ 6 │`
			`└────────────┴─────────┘`
			```

			`We have the report with Mar 17th, 19th, 21th, 22th. But Mar 18th and 20th are missing, because there is no data for these dates.`
			`And this is how it works in all SQL databases.`

			But ClickHouse also has quite unique and neat `WITH FILL` modifier for `ORDER BY clause`.

			`You just write:`
			```
			`SELECT EventDate, count() FROM test.hits WHERE CounterID = 2841673 GROUP BY EventDate`
			`ORDER BY EventDate WITH FILL STEP 1`

			`┌──EventDate─┬─count()─┐`
			`│ 2014-03-17 │ 3 │`
			`│ 2014-03-18 │ 0 │`
			`│ 2014-03-19 │ 6 │`
			`│ 2014-03-20 │ 0 │`
			`│ 2014-03-21 │ 7 │`
			`│ 2014-03-22 │ 6 │`
			`└────────────┴─────────┘`
			```

			`And missing data is automatically filled.`

			You can also add `FROM` and `TO`:

			```
			`ORDER BY date WITH FILL FROM '2014-03-01'::Date TO '2014-03-31'::Date STEP 1;`
			```

			`And it will automatically fill missing rows in the report.`

			`The STEP can be arbitrary number. But what to do if you want fill missing dates for report by months? You cannot just write STEP 30 or STEP 31 because months contain different number of days...`

			`Since ClickHouse version 21.12 you can do it like this:`

			```
			`ORDER BY EventDate WITH FILL STEP INTERVAL 1 MONTH`
			```

Update clickhouse-v21.12-released.md 2021-12-16 03:34:44 +00:00			`INTERVAL` is a standard SQL operator, you can use SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER and YEAR.
21.12 blog post 2021-12-16 03:16:58 +00:00
			`How does this help you?`

Update clickhouse-v21.12-released.md 2021-12-16 03:35:09 +00:00			`It allows to avoid postprocessing step for your reports.`
21.12 blog post 2021-12-16 03:16:58 +00:00

			`## Add Support For "Identifier" Table and Database Query Parameters`

			`ClickHouse has support for parameterized queries.`

			```
			`SELECT uniq(user_id) FROM table WHERE website = {name:String}`
			```

			`It allows to safely substitute parameters without the risk of SQL injections:`

			```
			`curl https://clickhouse-server:8443/?param_name=upyachka -d 'SELECT uniq(user_id) FROM table WHERE website = {name:String}'`
			```

			`You can even create customized API handlers for clickhouse-server based on prepared queries.`

			Since version 21.12 we introduce support for using parameters for tables and databases in your queries. This is implemented with `Identifier` table parameter:
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00
21.12 blog post 2021-12-16 03:16:58 +00:00			```
			`SELECT uniq(user_id) FROM {tbl:Identifier}`
			```

			`How does this help you?`

			`Let ClickHouse do the heavy-lifting and keep your scripts safe and secure.`


			`## Bool Data Type`

			`This feature is experimental in version 21.12.`

			ClickHouse now natively supports a `Bool` data type. It allows to represent values as "true"/"false" during data import and export in text formats. It can also be adjusted to anything else using the settings `bool_true_representation` and `bool_false_representation` (for example, "yes" and "no").

			`How does this help you?`

			Native boolean data types exist today in other databases that are often integrated with ClickHouse, such as PostgreSQL. The `Bool` data type in ClickHouse will make it more compatible with existing code and ease migration from other databases.

			`Also it simplifies data ingestion from various text sources.`


			`## Query Optimizations With Table Constraints`

Update clickhouse-v21.12-released.md 2021-12-16 03:21:38 +00:00			`This feature is [contributed](https://github.com/ClickHouse/ClickHouse/pull/18787) by Nikita Vasilev. Nikita is one of the most notable ClickHouse contributors. He started in 2019 by introducing data skipping indices into ClickHouse, then continued in 2020 with SSD-optimized key-value dictionaries and now contributed the new advancements in the query optimizer. This feature is reviewed by Anton Popov.`
21.12 blog post 2021-12-16 03:16:58 +00:00
			`So, what optimizations? ClickHouse already allows to specify constraints for tables:`

			```
			`CREATE TABLE`
			`(`
			`URL String,`
			`Domain String,`
			`CONSTRAINT validate CHECK isValidUTF8(URL) AND length(URL) BETWEEN 10 AND 10000,`
			`CONSTRAINT my_constraint CHECK Domain = domainWithoutWWW(URL)`
			`) ...`
			```

			`Constraints are checked on INSERT. In this example we validate the URL and check that Domain column actually contains the domain of URL.`

			`Since version 21.12 constraints can also automatically optimize your queries! For example, if you write:`

			```
			`SELECT count() FROM hits WHERE domainWithoutWWW(URL) = 'ghe.clickhouse.tech'`
			```

			`The query can be automatically rewritten to:`

			```
			`SELECT count() FROM hits WHERE Domain = 'ghe.clickhouse.tech'`
			```

Update clickhouse-v21.12-released.md 2021-12-16 03:38:20 +00:00			because `Domain` column is smaller, more compressable, will be faster to read and it does not require calculation of the domain from URL.
21.12 blog post 2021-12-16 03:16:58 +00:00			The only thing you need is to enable the `optimize_using_constraints` and `optimize_substitute_columns` settings.

Update clickhouse-v21.12-released.md 2021-12-16 03:38:20 +00:00			As a bonus, new type of constraints is introduced: `ASSUME`.
21.12 blog post 2021-12-16 03:16:58 +00:00
			```
			`CONSTRAINT my_constraint ASSUME Domain = domainWithoutWWW(URL)`
			```

			`This type of constraint will not check anything on INSERT, but still use the assumption to optimize the queries.`

			`It can also do logical inference, simplify the conditions and remove the conditions that are proved to be satisfied by constraints.`
			It is controlled by `convert_query_to_cnf` setting. You can also enable `optimize_append_index` setting. With this setting ClickHouse will derive more consitions on the table primary key.

			`The idea is so powerful that we cannot resist adding one more feature: indices for hypothesis.`

			```
			`INDEX my_index (a < b) TYPE hypothesis GRANULARITY 1`
			```

			`The expression is checked and the result (true/false) is written as an index for query optimization.`

			`How does this help you?`

Update clickhouse-v21.12-released.md 2021-12-16 03:41:22 +00:00			`Especially in large ClickHouse deployments with many complex tables it can be hard for users to always be up to date on the best way to query a given dataset. Constraints can help optimize queries without having to change the query structure itself. They can also make it easier to make changes to tables.`

			For example, let's say you have a table containing web requests and it includes a URL column that contains the full URL of each request. Many times, users will want to know the top level domain (.com, .co.uk, etc.), something ClickHouse provides the `topLevelDomain` function to calculate. If you discover that many people are using this function you might decide to create a new materialized column that pre-calculates the top level domain for each record.

			Rather than tell all your users to change their queries you can use a table constraint to tell ClickHouse that each time a user tries to call the `topLevelDomain` function the request should be rewritten to use the new materialized column.
21.12 blog post 2021-12-16 03:16:58 +00:00

			`## Read Large Remote Files In Chunks`

			`ClickHouse combines fast query engine and efficient data storage. It also allows to integrate external data sources for data import and export or even to process external datasets on the fly without the need for data import or preprocessing.`

			When reading large files in `Parquet`, `ORC`, and `Arrow` format using the `s3`, `url`, and `hdfs` table functions, ClickHouse will now automatically choose whether to read the entire file at once or read parts of it incrementally. This is now enabled by default and the setting `remote_read_min_bytes_for_seek` controls when to switch from reading it all to reading in chunks. The default is 1MiB.

Update clickhouse-v21.12-released.md 2021-12-16 03:40:34 +00:00			`Parquet`, `ORC`, and `Arrow` are column-oriented formats (quite similar to ClickHouse Native format) and now we can read only requested columns even if they are being read from remote HTTP server with the `url` table function (range requests will be performed to skip unneeded data).
21.12 blog post 2021-12-16 03:16:58 +00:00
Update clickhouse-v21.12-released.md 2021-12-16 03:43:07 +00:00			`This feature is implemented by Kseniia Sumarokova.`

21.12 blog post 2021-12-16 03:16:58 +00:00			`How does this help our ClickHouse Users?`
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00
21.12 blog post 2021-12-16 03:16:58 +00:00			In previous versions, when reading files in Arrow-based formats from remote locations with the `s3`, `url`, and `hdfs` table functions, ClickHouse would always read the entire file into memory. This works well when the files are small but will cause excessive memory usage or not work at all when the files are large. With this change, ClickHouse will read large files in chunks to keep memory usage in check and is now able to read even very large files.
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00

21.12 blog post 2021-12-16 03:16:58 +00:00			`## ... And Many More`
Add v21.21 release blog post 2021-12-13 22:28:43 +00:00
21.12 blog post 2021-12-16 03:16:58 +00:00			`Read the [full changelog](https://github.com/ClickHouse/ClickHouse/blob/master/CHANGELOG.md) for 21.12 "Christmas" release for the full list of the gifts from [ClickHouse Team](https://clickhouse.com/careers/).`