Merge branch 'master' into orc_load

This commit is contained in:
Alexey Milovidov 2023-06-29 23:10:34 +03:00 committed by GitHub
commit 846a08af2a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
10 changed files with 129 additions and 9 deletions

View File

@ -21,7 +21,7 @@
* Added `Overlay` database engine to combine multiple databases into one. Added `Filesystem` database engine to represent a directory in the filesystem as a set of implicitly available tables with auto-detected formats and structures. A new `S3` database engine allows to read-only interact with s3 storage by representing a prefix as a set of tables. A new `HDFS` database engine allows to interact with HDFS storage in the same way. [#48821](https://github.com/ClickHouse/ClickHouse/pull/48821) ([alekseygolub](https://github.com/alekseygolub)).
* The function `transform` as well as `CASE` with value matching started to support all data types. This closes [#29730](https://github.com/ClickHouse/ClickHouse/issues/29730). This closes [#32387](https://github.com/ClickHouse/ClickHouse/issues/32387). This closes [#50827](https://github.com/ClickHouse/ClickHouse/issues/50827). This closes [#31336](https://github.com/ClickHouse/ClickHouse/issues/31336). This closes [#40493](https://github.com/ClickHouse/ClickHouse/issues/40493). [#51351](https://github.com/ClickHouse/ClickHouse/pull/51351) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
* Added option `--rename_files_after_processing <pattern>`. This closes [#34207](https://github.com/ClickHouse/ClickHouse/issues/34207). [#49626](https://github.com/ClickHouse/ClickHouse/pull/49626) ([alekseygolub](https://github.com/alekseygolub)).
* Add support for `APPEND` modifier in `INTO OUTFILE` clause. Suggest using `APPEND` or `TRUNCATE` for `INTO OUTFILE` when file exists. [#50950](https://github.com/ClickHouse/ClickHouse/pull/50950) ([alekar](https://github.com/alekar)).
* Add support for `TRUNCATE` modifier in `INTO OUTFILE` clause. Suggest using `APPEND` or `TRUNCATE` for `INTO OUTFILE` when file exists. [#50950](https://github.com/ClickHouse/ClickHouse/pull/50950) ([alekar](https://github.com/alekar)).
* Add table engine `Redis` and table function `redis`. It allows querying external Redis servers. [#50150](https://github.com/ClickHouse/ClickHouse/pull/50150) ([JackyWoo](https://github.com/JackyWoo)).
* Allow to skip empty files in file/s3/url/hdfs table functions using settings `s3_skip_empty_files`, `hdfs_skip_empty_files`, `engine_file_skip_empty_files`, `engine_url_skip_empty_files`. [#50364](https://github.com/ClickHouse/ClickHouse/pull/50364) ([Kruglov Pavel](https://github.com/Avogar)).
* Add a new setting named `use_mysql_types_in_show_columns` to alter the `SHOW COLUMNS` SQL statement to display MySQL equivalent types when a client is connected via the MySQL compatibility port. [#49577](https://github.com/ClickHouse/ClickHouse/pull/49577) ([Thomas Panetti](https://github.com/tpanetti)).
@ -40,12 +40,12 @@
* Make multiple list requests to ZooKeeper in parallel to speed up reading from system.zookeeper table. [#51042](https://github.com/ClickHouse/ClickHouse/pull/51042) ([Alexander Gololobov](https://github.com/davenger)).
* Speedup initialization of DateTime lookup tables for time zones. This should reduce startup/connect time of clickhouse-client especially in debug build as it is rather heavy. [#51347](https://github.com/ClickHouse/ClickHouse/pull/51347) ([Alexander Gololobov](https://github.com/davenger)).
* Fix data lakes slowness because of synchronous head requests. (Related to Iceberg/Deltalake/Hudi being slow with a lot of files). [#50976](https://github.com/ClickHouse/ClickHouse/pull/50976) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Do not replicate `ALTER PARTITION` queries and mutations through `Replicated` database if it has only one shard and the underlying table is `ReplicatedMergeTree`. [#51049](https://github.com/ClickHouse/ClickHouse/pull/51049) ([Alexander Tokmakov](https://github.com/tavplubix)).
* Do not read all the columns from right GLOBAL JOIN table. [#50721](https://github.com/ClickHouse/ClickHouse/pull/50721) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
#### Experimental Feature
* Support parallel replicas with the analyzer. [#50441](https://github.com/ClickHouse/ClickHouse/pull/50441) ([Raúl Marín](https://github.com/Algunenano)).
* Add random sleep before large merges/mutations execution to split load more evenly between replicas in case of zero-copy replication. [#51282](https://github.com/ClickHouse/ClickHouse/pull/51282) ([alesapin](https://github.com/alesapin)).
* Do not replicate `ALTER PARTITION` queries and mutations through `Replicated` database if it has only one shard and the underlying table is `ReplicatedMergeTree`. [#51049](https://github.com/ClickHouse/ClickHouse/pull/51049) ([Alexander Tokmakov](https://github.com/tavplubix)).
#### Improvement
* Relax the thresholds for "too many parts" to be more modern. Return the backpressure during long-running insert queries. [#50856](https://github.com/ClickHouse/ClickHouse/pull/50856) ([Alexey Milovidov](https://github.com/alexey-milovidov)).

View File

@ -756,6 +756,17 @@ If you perform the `SELECT` query between merges, you may get expired data. To a
- [ttl_only_drop_parts](/docs/en/operations/settings/settings.md/#ttl_only_drop_parts) setting
## Disk types
In addition to local block devices, ClickHouse supports these storage types:
- [`s3` for S3 and MinIO](#table_engine-mergetree-s3)
- [`gcs` for GCS](/docs/en/integrations/data-ingestion/gcs/index.md/#creating-a-disk)
- [`blob_storage_disk` for Azure Blob Storage](#table_engine-mergetree-azure-blob-storage)
- [`hdfs` for HDFS](#hdfs-storage)
- [`web` for read-only from web](#web-storage)
- [`cache` for local caching](/docs/en/operations/storing-data.md/#using-local-cache)
- [`s3_plain` for backups to S3](/docs/en/operations/backup#backuprestore-using-an-s3-disk)
## Using Multiple Block Devices for Data Storage {#table_engine-mergetree-multiple-volumes}
### Introduction {#introduction}
@ -936,6 +947,8 @@ configuration files; all the settings are in the CREATE/ATTACH query.
The example uses `type=web`, but any disk type can be configured as dynamic, even Local disk. Local disks require a path argument to be inside the server config parameter `custom_local_disks_base_directory`, which has no default, so set that also when using local disk.
:::
#### Example dynamic web storage
```sql
ATTACH TABLE uk_price_paid UUID 'cf712b4f-2ca8-435c-ac23-c4393efe52f7'
(
@ -1238,6 +1251,93 @@ Examples of working configurations can be found in integration tests directory (
Zero-copy replication is disabled by default in ClickHouse version 22.8 and higher. This feature is not recommended for production use.
:::
## HDFS storage {#hdfs-storage}
In this sample configuration:
- the disk is of type `hdfs`
- the data is hosted at `hdfs://hdfs1:9000/clickhouse/`
```xml
<clickhouse>
<storage_configuration>
<disks>
<hdfs>
<type>hdfs</type>
<endpoint>hdfs://hdfs1:9000/clickhouse/</endpoint>
<skip_access_check>true</skip_access_check>
</hdfs>
<hdd>
<type>local</type>
<path>/</path>
</hdd>
</disks>
<policies>
<hdfs>
<volumes>
<main>
<disk>hdfs</disk>
</main>
<external>
<disk>hdd</disk>
</external>
</volumes>
</hdfs>
</policies>
</storage_configuration>
</clickhouse>
```
## Web storage (read-only) {#web-storage}
Web storage can be used for read-only purposes. An example use is for hosting sample
data, or for migrating data.
:::tip
Storage can also be configured temporarily within a query, if a web dataset is not expected
to be used routinely, see [dynamic storage](#dynamic-storage) and skip editing the
configuration file.
:::
In this sample configuration:
- the disk is of type `web`
- the data is hosted at `http://nginx:80/test1/`
- a cache on local storage is used
```xml
<clickhouse>
<storage_configuration>
<disks>
<web>
<type>web</type>
<endpoint>http://nginx:80/test1/</endpoint>
</web>
<cached_web>
<type>cache</type>
<disk>web</disk>
<path>cached_web_cache/</path>
<max_size>100000000</max_size>
</cached_web>
</disks>
<policies>
<web>
<volumes>
<main>
<disk>web</disk>
</main>
</volumes>
</web>
<cached_web>
<volumes>
<main>
<disk>cached_web</disk>
</main>
</volumes>
</cached_web>
</policies>
</storage_configuration>
</clickhouse>
```
## Virtual Columns {#virtual-columns}
- `_part` — Name of a part.

View File

@ -1322,7 +1322,7 @@ Connection pool size for PostgreSQL table engine and database engine.
Default value: 16
## postgresql_connection_pool_size {#postgresql-connection-pool-size}
## postgresql_connection_pool_wait_timeout {#postgresql-connection-pool-wait-timeout}
Connection pool push/pop timeout on empty pool for PostgreSQL table engine and database engine. By default it will block on empty pool.

View File

@ -230,13 +230,15 @@ hasAll(set, subset)
**Arguments**
- `set` Array of any type with a set of elements.
- `subset` Array of any type with elements that should be tested to be a subset of `set`.
- `subset` Array of any type that shares a common supertype with `set` containing elements that should be tested to be a subset of `set`.
**Return values**
- `1`, if `set` contains all of the elements from `subset`.
- `0`, otherwise.
Raises an exception `NO_COMMON_TYPE` if the set and subset elements do not share a common supertype.
**Peculiar properties**
- An empty array is a subset of any array.
@ -253,7 +255,7 @@ hasAll(set, subset)
`SELECT hasAll(['a', 'b'], ['a'])` returns 1.
`SELECT hasAll([1], ['a'])` returns 0.
`SELECT hasAll([1], ['a'])` raises a `NO_COMMON_TYPE` exception.
`SELECT hasAll([[1, 2], [3, 4]], [[1, 2], [3, 5]])` returns 0.
@ -268,13 +270,15 @@ hasAny(array1, array2)
**Arguments**
- `array1` Array of any type with a set of elements.
- `array2` Array of any type with a set of elements.
- `array2` Array of any type that shares a common supertype with `array1`.
**Return values**
- `1`, if `array1` and `array2` have one similar element at least.
- `0`, otherwise.
Raises an exception `NO_COMMON_TYPE` if the array1 and array2 elements do not share a common supertype.
**Peculiar properties**
- `Null` processed as a value.
@ -288,7 +292,7 @@ hasAny(array1, array2)
`SELECT hasAny([-128, 1., 512], [1])` returns `1`.
`SELECT hasAny([[1, 2], [3, 4]], ['a', 'c'])` returns `0`.
`SELECT hasAny([[1, 2], [3, 4]], ['a', 'c'])` raises a `NO_COMMON_TYPE` exception.
`SELECT hasAll([[1, 2], [3, 4]], [[1, 2], [1, 2]])` returns `1`.
@ -318,6 +322,8 @@ For Example:
- `1`, if `array1` contains `array2`.
- `0`, otherwise.
Raises an exception `NO_COMMON_TYPE` if the array1 and array2 elements do not share a common supertype.
**Peculiar properties**
- The function will return `1` if `array2` is empty.
@ -339,6 +345,9 @@ For Example:
`SELECT hasSubstr(['a', 'b' , 'c'], ['a', 'c'])` returns 0.
`SELECT hasSubstr([[1, 2], [3, 4], [5, 6]], [[1, 2], [3, 4]])` returns 1.
i
`SELECT hasSubstr([1, 2, NULL, 3, 4], ['a'])` raises a `NO_COMMON_TYPE` exception.
## indexOf(arr, x)

View File

@ -154,6 +154,8 @@ private:
using ColVecType = ColumnVectorOrDecimal<Type>;
const auto col_vec = checkAndGetColumn<ColVecType>(col.column.get());
if (col_vec == nullptr)
return false;
return (res = execute<Type, ReturnType>(col_vec)) != nullptr;
};

View File

@ -1,6 +1,6 @@
. {
hosts /example.com {
reload "200ms"
reload "20ms"
fallthrough
}
forward . 127.0.0.11

View File

@ -1,6 +1,6 @@
. {
hosts /example.com {
reload "200ms"
reload "20ms"
fallthrough
}
forward . 127.0.0.11

View File

@ -0,0 +1,2 @@
1
1

View File

@ -0,0 +1,6 @@
DROP TABLE IF EXISTS t10;
CREATE TABLE t10 (`c0` Int32) ENGINE = MergeTree ORDER BY tuple();
INSERT INTO t10 (c0) FORMAT Values (-1);
SELECT 1 FROM t10 GROUP BY erf(-sign(t10.c0));
SELECT 1 FROM t10 GROUP BY -sign(t10.c0);
DROP TABLE t10;

View File

@ -2264,6 +2264,7 @@ summap
summingmergetree
sumwithoverflow
superaggregates
supertype
supremum
symlink
symlinks