mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-22 07:31:57 +00:00
Merge branch 'master' into FawnD2-switch-upstream-for-arrow-submodule
This commit is contained in:
commit
252f2ae876
@ -1,4 +1,4 @@
|
||||
## system.table_name {#system-tables_table-name}
|
||||
# system.table_name {#system-tables_table-name}
|
||||
|
||||
Description.
|
||||
|
||||
|
@ -16,4 +16,6 @@ You can also use the following database engines:
|
||||
|
||||
- [Lazy](../../engines/database-engines/lazy.md)
|
||||
|
||||
- [MaterializeMySQL](../../engines/database-engines/materialize-mysql.md)
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/database_engines/) <!--hide-->
|
||||
|
156
docs/en/engines/database-engines/materialize-mysql.md
Normal file
156
docs/en/engines/database-engines/materialize-mysql.md
Normal file
@ -0,0 +1,156 @@
|
||||
---
|
||||
toc_priority: 29
|
||||
toc_title: MaterializeMySQL
|
||||
---
|
||||
|
||||
# MaterializeMySQL {#materialize-mysql}
|
||||
|
||||
Creates ClickHouse database with all the tables existing in MySQL, and all the data in those tables.
|
||||
|
||||
ClickHouse server works as MySQL replica. It reads binlog and performs DDL and DML queries.
|
||||
|
||||
## Creating a Database {#creating-a-database}
|
||||
|
||||
``` sql
|
||||
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster]
|
||||
ENGINE = MaterializeMySQL('host:port', ['database' | database], 'user', 'password') [SETTINGS ...]
|
||||
```
|
||||
|
||||
**Engine Parameters**
|
||||
|
||||
- `host:port` — MySQL server endpoint.
|
||||
- `database` — MySQL database name.
|
||||
- `user` — MySQL user.
|
||||
- `password` — User password.
|
||||
|
||||
## Virtual columns {#virtual-columns}
|
||||
|
||||
When working with the `MaterializeMySQL` database engine, [ReplacingMergeTree](../../engines/table-engines/mergetree-family/replacingmergetree.md) tables are used with virtual `_sign` and `_version` columns.
|
||||
|
||||
- `_version` — Transaction counter. Type [UInt64](../../sql-reference/data-types/int-uint.md).
|
||||
- `_sign` — Deletion mark. Type [Int8](../../sql-reference/data-types/int-uint.md). Possible values:
|
||||
- `1` — Row is not deleted,
|
||||
- `-1` — Row is deleted.
|
||||
|
||||
## Data Types Support {#data_types-support}
|
||||
|
||||
| MySQL | ClickHouse |
|
||||
|-------------------------|--------------------------------------------------------------|
|
||||
| TINY | [Int8](../../sql-reference/data-types/int-uint.md) |
|
||||
| SHORT | [Int16](../../sql-reference/data-types/int-uint.md) |
|
||||
| INT24 | [Int32](../../sql-reference/data-types/int-uint.md) |
|
||||
| LONG | [UInt32](../../sql-reference/data-types/int-uint.md) |
|
||||
| LONGLONG | [UInt64](../../sql-reference/data-types/int-uint.md) |
|
||||
| FLOAT | [Float32](../../sql-reference/data-types/float.md) |
|
||||
| DOUBLE | [Float64](../../sql-reference/data-types/float.md) |
|
||||
| DECIMAL, NEWDECIMAL | [Decimal](../../sql-reference/data-types/decimal.md) |
|
||||
| DATE, NEWDATE | [Date](../../sql-reference/data-types/date.md) |
|
||||
| DATETIME, TIMESTAMP | [DateTime](../../sql-reference/data-types/datetime.md) |
|
||||
| DATETIME2, TIMESTAMP2 | [DateTime64](../../sql-reference/data-types/datetime64.md) |
|
||||
| STRING | [String](../../sql-reference/data-types/string.md) |
|
||||
| VARCHAR, VAR_STRING | [String](../../sql-reference/data-types/string.md) |
|
||||
| BLOB | [String](../../sql-reference/data-types/string.md) |
|
||||
|
||||
Other types are not supported. If MySQL table contains a column of such type, ClickHouse throws exception "Unhandled data type" and stops replication.
|
||||
|
||||
[Nullable](../../sql-reference/data-types/nullable.md) is supported.
|
||||
|
||||
## Specifics and Recommendations {#specifics-and-recommendations}
|
||||
|
||||
### DDL Queries {#ddl-queries}
|
||||
|
||||
MySQL DDL queries are converted into the corresponding ClickHouse DDL queries ([ALTER](../../sql-reference/statements/alter/index.md), [CREATE](../../sql-reference/statements/create/index.md), [DROP](../../sql-reference/statements/drop.md), [RENAME](../../sql-reference/statements/rename.md)). If ClickHouse cannot parse some DDL query, the query is ignored.
|
||||
|
||||
### Data Replication {#data-replication}
|
||||
|
||||
MaterializeMySQL does not support direct `INSERT`, `DELETE` and `UPDATE` queries. However, they are supported in terms of data replication:
|
||||
|
||||
- MySQL `INSERT` query is converted into `INSERT` with `_sign=1`.
|
||||
|
||||
- MySQl `DELETE` query is converted into `INSERT` with `_sign=-1`.
|
||||
|
||||
- MySQL `UPDATE` query is converted into `INSERT` with `_sign=-1` and `INSERT` with `_sign=1`.
|
||||
|
||||
### Selecting from MaterializeMySQL Tables {#select}
|
||||
|
||||
`SELECT` query form MaterializeMySQL tables has some specifics:
|
||||
|
||||
- If `_version` is not specified in the `SELECT` query, [FINAL](../../sql-reference/statements/select/from.md#select-from-final) modifier is used. So only rows with `MAX(_version)` are selected.
|
||||
|
||||
- If `_sign` is not specified in the `SELECT` query, `WHERE _sign=1` is used by default, so the deleted rows are not included into the result set.
|
||||
|
||||
### Index Conversion {#index-conversion}
|
||||
|
||||
MySQL `PRIMARY KEY` and `INDEX` clauses are converted into `ORDER BY` tuples in ClickHouse tables.
|
||||
|
||||
ClickHouse has only one physical order, which is determined by `ORDER BY` clause. To create a new physical order, use [materialized views](../../sql-reference/statements/create/view.md#materialized).
|
||||
|
||||
**Notes**
|
||||
|
||||
- Rows with `_sign=-1` are not deleted physically from the tables.
|
||||
- Cascade `UPDATE/DELETE` queries are not supported by the `MaterializeMySQL` engine.
|
||||
- Replication can be easily broken.
|
||||
- Manual operations on database and tables are forbidden.
|
||||
|
||||
## Examples of Use {#examples-of-use}
|
||||
|
||||
Queries in MySQL:
|
||||
|
||||
``` sql
|
||||
mysql> CREATE DATABASE db;
|
||||
mysql> CREATE TABLE db.test (a INT PRIMARY KEY, b INT);
|
||||
mysql> INSERT INTO db.test VALUES (1, 11), (2, 22);
|
||||
mysql> DELETE FROM db.test WHERE a=1;
|
||||
mysql> ALTER TABLE db.test ADD COLUMN c VARCHAR(16);
|
||||
mysql> UPDATE db.test SET c='Wow!', b=222;
|
||||
mysql> SELECT * FROM test;
|
||||
```
|
||||
```text
|
||||
+---+------+------+
|
||||
| a | b | c |
|
||||
+---+------+------+
|
||||
| 2 | 222 | Wow! |
|
||||
+---+------+------+
|
||||
```
|
||||
|
||||
Database in ClickHouse, exchanging data with the MySQL server:
|
||||
|
||||
The database and the table created:
|
||||
|
||||
``` sql
|
||||
CREATE DATABASE mysql ENGINE = MaterializeMySQL('localhost:3306', 'db', 'user', '***');
|
||||
SHOW TABLES FROM mysql;
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─name─┐
|
||||
│ test │
|
||||
└──────┘
|
||||
```
|
||||
|
||||
After inserting data:
|
||||
|
||||
``` sql
|
||||
SELECT * FROM mysql.test;
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─a─┬──b─┐
|
||||
│ 1 │ 11 │
|
||||
│ 2 │ 22 │
|
||||
└───┴────┘
|
||||
```
|
||||
|
||||
After deleting data, adding the column and updating:
|
||||
|
||||
``` sql
|
||||
SELECT * FROM mysql.test;
|
||||
```
|
||||
|
||||
``` text
|
||||
┌─a─┬───b─┬─c────┐
|
||||
│ 2 │ 222 │ Wow! │
|
||||
└───┴─────┴──────┘
|
||||
```
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/database_engines/materialize-mysql/) <!--hide-->
|
317
docs/en/getting-started/example-datasets/recipes.md
Normal file
317
docs/en/getting-started/example-datasets/recipes.md
Normal file
@ -0,0 +1,317 @@
|
||||
---
|
||||
toc_priority: 16
|
||||
toc_title: Recipes Dataset
|
||||
---
|
||||
|
||||
# Recipes Dataset
|
||||
|
||||
RecipeNLG dataset is available for download [here](https://recipenlg.cs.put.poznan.pl/dataset). It contains 2.2 million recipes. The size is slightly less than 1 GB.
|
||||
|
||||
## Download and unpack the dataset
|
||||
|
||||
Accept Terms and Conditions and download it [here](https://recipenlg.cs.put.poznan.pl/dataset). Unpack the zip file with `unzip`. You will get the `full_dataset.csv` file.
|
||||
|
||||
## Create a table
|
||||
|
||||
Run clickhouse-client and execute the following CREATE query:
|
||||
|
||||
```
|
||||
CREATE TABLE recipes
|
||||
(
|
||||
title String,
|
||||
ingredients Array(String),
|
||||
directions Array(String),
|
||||
link String,
|
||||
source LowCardinality(String),
|
||||
NER Array(String)
|
||||
) ENGINE = MergeTree ORDER BY title;
|
||||
```
|
||||
|
||||
## Insert the data
|
||||
|
||||
Run the following command:
|
||||
|
||||
```
|
||||
clickhouse-client --query "
|
||||
INSERT INTO recipes
|
||||
SELECT
|
||||
title,
|
||||
JSONExtract(ingredients, 'Array(String)'),
|
||||
JSONExtract(directions, 'Array(String)'),
|
||||
link,
|
||||
source,
|
||||
JSONExtract(NER, 'Array(String)')
|
||||
FROM input('num UInt32, title String, ingredients String, directions String, link String, source LowCardinality(String), NER String')
|
||||
FORMAT CSVWithNames
|
||||
" --input_format_with_names_use_header 0 --format_csv_allow_single_quote 0 --input_format_allow_errors_num 10 < full_dataset.csv
|
||||
```
|
||||
|
||||
This is a showcase how to parse custom CSV, as it requires multiple tunes.
|
||||
|
||||
Explanation:
|
||||
- the dataset is in CSV format, but it requires some preprocessing on insertion; we use table function [input](../../sql-reference/table-functions/input/) to perform preprocessing;
|
||||
- the structure of CSV file is specified in the argument of the table function `input`;
|
||||
- the field `num` (row number) is unneeded - we parse it from file and ignore;
|
||||
- we use `FORMAT CSVWithNames` but the header in CSV will be ignored (by command line parameter `--input_format_with_names_use_header 0`), because the header does not contain the name for the first field;
|
||||
- file is using only double quotes to enclose CSV strings; some strings are not enclosed in double quotes, and single quote must not be parsed as the string enclosing - that's why we also add the `--format_csv_allow_single_quote 0` parameter;
|
||||
- some strings from CSV cannot parse, because they contain `\M/` sequence at the beginning of the value; the only value starting with backslash in CSV can be `\N` that is parsed as SQL NULL. We add `--input_format_allow_errors_num 10` parameter and up to ten malformed records can be skipped;
|
||||
- there are arrays for ingredients, directions and NER fields; these arrays are represented in unusual form: they are serialized into string as JSON and then placed in CSV - we parse them as String and then use [JSONExtract](../../sql-reference/functions/json-functions/) function to transform it to Array.
|
||||
|
||||
## Validate the inserted data
|
||||
|
||||
By checking the row count:
|
||||
|
||||
```
|
||||
SELECT count() FROM recipes
|
||||
|
||||
┌─count()─┐
|
||||
│ 2231141 │
|
||||
└─────────┘
|
||||
```
|
||||
|
||||
|
||||
## Example queries
|
||||
|
||||
### Top components by the number of recipes:
|
||||
|
||||
```
|
||||
SELECT
|
||||
arrayJoin(NER) AS k,
|
||||
count() AS c
|
||||
FROM recipes
|
||||
GROUP BY k
|
||||
ORDER BY c DESC
|
||||
LIMIT 50
|
||||
|
||||
┌─k────────────────────┬──────c─┐
|
||||
│ salt │ 890741 │
|
||||
│ sugar │ 620027 │
|
||||
│ butter │ 493823 │
|
||||
│ flour │ 466110 │
|
||||
│ eggs │ 401276 │
|
||||
│ onion │ 372469 │
|
||||
│ garlic │ 358364 │
|
||||
│ milk │ 346769 │
|
||||
│ water │ 326092 │
|
||||
│ vanilla │ 270381 │
|
||||
│ olive oil │ 197877 │
|
||||
│ pepper │ 179305 │
|
||||
│ brown sugar │ 174447 │
|
||||
│ tomatoes │ 163933 │
|
||||
│ egg │ 160507 │
|
||||
│ baking powder │ 148277 │
|
||||
│ lemon juice │ 146414 │
|
||||
│ Salt │ 122557 │
|
||||
│ cinnamon │ 117927 │
|
||||
│ sour cream │ 116682 │
|
||||
│ cream cheese │ 114423 │
|
||||
│ margarine │ 112742 │
|
||||
│ celery │ 112676 │
|
||||
│ baking soda │ 110690 │
|
||||
│ parsley │ 102151 │
|
||||
│ chicken │ 101505 │
|
||||
│ onions │ 98903 │
|
||||
│ vegetable oil │ 91395 │
|
||||
│ oil │ 85600 │
|
||||
│ mayonnaise │ 84822 │
|
||||
│ pecans │ 79741 │
|
||||
│ nuts │ 78471 │
|
||||
│ potatoes │ 75820 │
|
||||
│ carrots │ 75458 │
|
||||
│ pineapple │ 74345 │
|
||||
│ soy sauce │ 70355 │
|
||||
│ black pepper │ 69064 │
|
||||
│ thyme │ 68429 │
|
||||
│ mustard │ 65948 │
|
||||
│ chicken broth │ 65112 │
|
||||
│ bacon │ 64956 │
|
||||
│ honey │ 64626 │
|
||||
│ oregano │ 64077 │
|
||||
│ ground beef │ 64068 │
|
||||
│ unsalted butter │ 63848 │
|
||||
│ mushrooms │ 61465 │
|
||||
│ Worcestershire sauce │ 59328 │
|
||||
│ cornstarch │ 58476 │
|
||||
│ green pepper │ 58388 │
|
||||
│ Cheddar cheese │ 58354 │
|
||||
└──────────────────────┴────────┘
|
||||
|
||||
50 rows in set. Elapsed: 0.112 sec. Processed 2.23 million rows, 361.57 MB (19.99 million rows/s., 3.24 GB/s.)
|
||||
```
|
||||
|
||||
In this example we learn how to use [arrayJoin](../../sql-reference/functions/array-join/) function to multiply data by array elements.
|
||||
|
||||
### The most complex recipes with strawberry
|
||||
|
||||
```
|
||||
SELECT
|
||||
title,
|
||||
length(NER),
|
||||
length(directions)
|
||||
FROM recipes
|
||||
WHERE has(NER, 'strawberry')
|
||||
ORDER BY length(directions) DESC
|
||||
LIMIT 10
|
||||
|
||||
┌─title────────────────────────────────────────────────────────────┬─length(NER)─┬─length(directions)─┐
|
||||
│ Chocolate-Strawberry-Orange Wedding Cake │ 24 │ 126 │
|
||||
│ Strawberry Cream Cheese Crumble Tart │ 19 │ 47 │
|
||||
│ Charlotte-Style Ice Cream │ 11 │ 45 │
|
||||
│ Sinfully Good a Million Layers Chocolate Layer Cake, With Strawb │ 31 │ 45 │
|
||||
│ Sweetened Berries With Elderflower Sherbet │ 24 │ 44 │
|
||||
│ Chocolate-Strawberry Mousse Cake │ 15 │ 42 │
|
||||
│ Rhubarb Charlotte with Strawberries and Rum │ 20 │ 42 │
|
||||
│ Chef Joey's Strawberry Vanilla Tart │ 7 │ 37 │
|
||||
│ Old-Fashioned Ice Cream Sundae Cake │ 17 │ 37 │
|
||||
│ Watermelon Cake │ 16 │ 36 │
|
||||
└──────────────────────────────────────────────────────────────────┴─────────────┴────────────────────┘
|
||||
|
||||
10 rows in set. Elapsed: 0.215 sec. Processed 2.23 million rows, 1.48 GB (10.35 million rows/s., 6.86 GB/s.)
|
||||
```
|
||||
|
||||
In this example, we involve [has](../../sql-reference/functions/array-functions/#hasarr-elem) function to filter by array elements and sort by the number of directions.
|
||||
|
||||
There is a wedding cake that requires the whole 126 steps to produce!
|
||||
|
||||
Show that directions:
|
||||
|
||||
```
|
||||
SELECT arrayJoin(directions)
|
||||
FROM recipes
|
||||
WHERE title = 'Chocolate-Strawberry-Orange Wedding Cake'
|
||||
|
||||
┌─arrayJoin(directions)───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Position 1 rack in center and 1 rack in bottom third of oven and preheat to 350F. │
|
||||
│ Butter one 5-inch-diameter cake pan with 2-inch-high sides, one 8-inch-diameter cake pan with 2-inch-high sides and one 12-inch-diameter cake pan with 2-inch-high sides. │
|
||||
│ Dust pans with flour; line bottoms with parchment. │
|
||||
│ Combine 1/3 cup orange juice and 2 ounces unsweetened chocolate in heavy small saucepan. │
|
||||
│ Stir mixture over medium-low heat until chocolate melts. │
|
||||
│ Remove from heat. │
|
||||
│ Gradually mix in 1 2/3 cups orange juice. │
|
||||
│ Sift 3 cups flour, 2/3 cup cocoa, 2 teaspoons baking soda, 1 teaspoon salt and 1/2 teaspoon baking powder into medium bowl. │
|
||||
│ using electric mixer, beat 1 cup (2 sticks) butter and 3 cups sugar in large bowl until blended (mixture will look grainy). │
|
||||
│ Add 4 eggs, 1 at a time, beating to blend after each. │
|
||||
│ Beat in 1 tablespoon orange peel and 1 tablespoon vanilla extract. │
|
||||
│ Add dry ingredients alternately with orange juice mixture in 3 additions each, beating well after each addition. │
|
||||
│ Mix in 1 cup chocolate chips. │
|
||||
│ Transfer 1 cup plus 2 tablespoons batter to prepared 5-inch pan, 3 cups batter to prepared 8-inch pan and remaining batter (about 6 cups) to 12-inch pan. │
|
||||
│ Place 5-inch and 8-inch pans on center rack of oven. │
|
||||
│ Place 12-inch pan on lower rack of oven. │
|
||||
│ Bake cakes until tester inserted into center comes out clean, about 35 minutes. │
|
||||
│ Transfer cakes in pans to racks and cool completely. │
|
||||
│ Mark 4-inch diameter circle on one 6-inch-diameter cardboard cake round. │
|
||||
│ Cut out marked circle. │
|
||||
│ Mark 7-inch-diameter circle on one 8-inch-diameter cardboard cake round. │
|
||||
│ Cut out marked circle. │
|
||||
│ Mark 11-inch-diameter circle on one 12-inch-diameter cardboard cake round. │
|
||||
│ Cut out marked circle. │
|
||||
│ Cut around sides of 5-inch-cake to loosen. │
|
||||
│ Place 4-inch cardboard over pan. │
|
||||
│ Hold cardboard and pan together; turn cake out onto cardboard. │
|
||||
│ Peel off parchment.Wrap cakes on its cardboard in foil. │
|
||||
│ Repeat turning out, peeling off parchment and wrapping cakes in foil, using 7-inch cardboard for 8-inch cake and 11-inch cardboard for 12-inch cake. │
|
||||
│ Using remaining ingredients, make 1 more batch of cake batter and bake 3 more cake layers as described above. │
|
||||
│ Cool cakes in pans. │
|
||||
│ Cover cakes in pans tightly with foil. │
|
||||
│ (Can be prepared ahead. │
|
||||
│ Let stand at room temperature up to 1 day or double-wrap all cake layers and freeze up to 1 week. │
|
||||
│ Bring cake layers to room temperature before using.) │
|
||||
│ Place first 12-inch cake on its cardboard on work surface. │
|
||||
│ Spread 2 3/4 cups ganache over top of cake and all the way to edge. │
|
||||
│ Spread 2/3 cup jam over ganache, leaving 1/2-inch chocolate border at edge. │
|
||||
│ Drop 1 3/4 cups white chocolate frosting by spoonfuls over jam. │
|
||||
│ Gently spread frosting over jam, leaving 1/2-inch chocolate border at edge. │
|
||||
│ Rub some cocoa powder over second 12-inch cardboard. │
|
||||
│ Cut around sides of second 12-inch cake to loosen. │
|
||||
│ Place cardboard, cocoa side down, over pan. │
|
||||
│ Turn cake out onto cardboard. │
|
||||
│ Peel off parchment. │
|
||||
│ Carefully slide cake off cardboard and onto filling on first 12-inch cake. │
|
||||
│ Refrigerate. │
|
||||
│ Place first 8-inch cake on its cardboard on work surface. │
|
||||
│ Spread 1 cup ganache over top all the way to edge. │
|
||||
│ Spread 1/4 cup jam over, leaving 1/2-inch chocolate border at edge. │
|
||||
│ Drop 1 cup white chocolate frosting by spoonfuls over jam. │
|
||||
│ Gently spread frosting over jam, leaving 1/2-inch chocolate border at edge. │
|
||||
│ Rub some cocoa over second 8-inch cardboard. │
|
||||
│ Cut around sides of second 8-inch cake to loosen. │
|
||||
│ Place cardboard, cocoa side down, over pan. │
|
||||
│ Turn cake out onto cardboard. │
|
||||
│ Peel off parchment. │
|
||||
│ Slide cake off cardboard and onto filling on first 8-inch cake. │
|
||||
│ Refrigerate. │
|
||||
│ Place first 5-inch cake on its cardboard on work surface. │
|
||||
│ Spread 1/2 cup ganache over top of cake and all the way to edge. │
|
||||
│ Spread 2 tablespoons jam over, leaving 1/2-inch chocolate border at edge. │
|
||||
│ Drop 1/3 cup white chocolate frosting by spoonfuls over jam. │
|
||||
│ Gently spread frosting over jam, leaving 1/2-inch chocolate border at edge. │
|
||||
│ Rub cocoa over second 6-inch cardboard. │
|
||||
│ Cut around sides of second 5-inch cake to loosen. │
|
||||
│ Place cardboard, cocoa side down, over pan. │
|
||||
│ Turn cake out onto cardboard. │
|
||||
│ Peel off parchment. │
|
||||
│ Slide cake off cardboard and onto filling on first 5-inch cake. │
|
||||
│ Chill all cakes 1 hour to set filling. │
|
||||
│ Place 12-inch tiered cake on its cardboard on revolving cake stand. │
|
||||
│ Spread 2 2/3 cups frosting over top and sides of cake as a first coat. │
|
||||
│ Refrigerate cake. │
|
||||
│ Place 8-inch tiered cake on its cardboard on cake stand. │
|
||||
│ Spread 1 1/4 cups frosting over top and sides of cake as a first coat. │
|
||||
│ Refrigerate cake. │
|
||||
│ Place 5-inch tiered cake on its cardboard on cake stand. │
|
||||
│ Spread 3/4 cup frosting over top and sides of cake as a first coat. │
|
||||
│ Refrigerate all cakes until first coats of frosting set, about 1 hour. │
|
||||
│ (Cakes can be made to this point up to 1 day ahead; cover and keep refrigerate.) │
|
||||
│ Prepare second batch of frosting, using remaining frosting ingredients and following directions for first batch. │
|
||||
│ Spoon 2 cups frosting into pastry bag fitted with small star tip. │
|
||||
│ Place 12-inch cake on its cardboard on large flat platter. │
|
||||
│ Place platter on cake stand. │
|
||||
│ Using icing spatula, spread 2 1/2 cups frosting over top and sides of cake; smooth top. │
|
||||
│ Using filled pastry bag, pipe decorative border around top edge of cake. │
|
||||
│ Refrigerate cake on platter. │
|
||||
│ Place 8-inch cake on its cardboard on cake stand. │
|
||||
│ Using icing spatula, spread 1 1/2 cups frosting over top and sides of cake; smooth top. │
|
||||
│ Using pastry bag, pipe decorative border around top edge of cake. │
|
||||
│ Refrigerate cake on its cardboard. │
|
||||
│ Place 5-inch cake on its cardboard on cake stand. │
|
||||
│ Using icing spatula, spread 3/4 cup frosting over top and sides of cake; smooth top. │
|
||||
│ Using pastry bag, pipe decorative border around top edge of cake, spooning more frosting into bag if necessary. │
|
||||
│ Refrigerate cake on its cardboard. │
|
||||
│ Keep all cakes refrigerated until frosting sets, about 2 hours. │
|
||||
│ (Can be prepared 2 days ahead. │
|
||||
│ Cover loosely; keep refrigerated.) │
|
||||
│ Place 12-inch cake on platter on work surface. │
|
||||
│ Press 1 wooden dowel straight down into and completely through center of cake. │
|
||||
│ Mark dowel 1/4 inch above top of frosting. │
|
||||
│ Remove dowel and cut with serrated knife at marked point. │
|
||||
│ Cut 4 more dowels to same length. │
|
||||
│ Press 1 cut dowel back into center of cake. │
|
||||
│ Press remaining 4 cut dowels into cake, positioning 3 1/2 inches inward from cake edges and spacing evenly. │
|
||||
│ Place 8-inch cake on its cardboard on work surface. │
|
||||
│ Press 1 dowel straight down into and completely through center of cake. │
|
||||
│ Mark dowel 1/4 inch above top of frosting. │
|
||||
│ Remove dowel and cut with serrated knife at marked point. │
|
||||
│ Cut 3 more dowels to same length. │
|
||||
│ Press 1 cut dowel back into center of cake. │
|
||||
│ Press remaining 3 cut dowels into cake, positioning 2 1/2 inches inward from edges and spacing evenly. │
|
||||
│ Using large metal spatula as aid, place 8-inch cake on its cardboard atop dowels in 12-inch cake, centering carefully. │
|
||||
│ Gently place 5-inch cake on its cardboard atop dowels in 8-inch cake, centering carefully. │
|
||||
│ Using citrus stripper, cut long strips of orange peel from oranges. │
|
||||
│ Cut strips into long segments. │
|
||||
│ To make orange peel coils, wrap peel segment around handle of wooden spoon; gently slide peel off handle so that peel keeps coiled shape. │
|
||||
│ Garnish cake with orange peel coils, ivy or mint sprigs, and some berries. │
|
||||
│ (Assembled cake can be made up to 8 hours ahead. │
|
||||
│ Let stand at cool room temperature.) │
|
||||
│ Remove top and middle cake tiers. │
|
||||
│ Remove dowels from cakes. │
|
||||
│ Cut top and middle cakes into slices. │
|
||||
│ To cut 12-inch cake: Starting 3 inches inward from edge and inserting knife straight down, cut through from top to bottom to make 6-inch-diameter circle in center of cake. │
|
||||
│ Cut outer portion of cake into slices; cut inner portion into slices and serve with strawberries. │
|
||||
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
126 rows in set. Elapsed: 0.011 sec. Processed 8.19 thousand rows, 5.34 MB (737.75 thousand rows/s., 480.59 MB/s.)
|
||||
```
|
||||
|
||||
### Online playground
|
||||
|
||||
The dataset is also available in the [Playground](https://gh-api.clickhouse.tech/play?user=play#U0VMRUNUCiAgICBhcnJheUpvaW4oTkVSKSBBUyBrLAogICAgY291bnQoKSBBUyBjCkZST00gcmVjaXBlcwpHUk9VUCBCWSBrCk9SREVSIEJZIGMgREVTQwpMSU1JVCA1MA==).
|
@ -11,9 +11,9 @@ ClickHouse provides a native command-line client: `clickhouse-client`. The clien
|
||||
|
||||
``` bash
|
||||
$ clickhouse-client
|
||||
ClickHouse client version 19.17.1.1579 (official build).
|
||||
ClickHouse client version 20.13.1.5273 (official build).
|
||||
Connecting to localhost:9000 as user default.
|
||||
Connected to ClickHouse server version 19.17.1 revision 54428.
|
||||
Connected to ClickHouse server version 20.13.1 revision 54442.
|
||||
|
||||
:)
|
||||
```
|
||||
@ -127,6 +127,8 @@ You can pass parameters to `clickhouse-client` (all parameters have a default va
|
||||
- `--history_file` — Path to a file containing command history.
|
||||
- `--param_<name>` — Value for a [query with parameters](#cli-queries-with-parameters).
|
||||
|
||||
Since version 20.5, `clickhouse-client` has automatic syntax highlighting (always enabled).
|
||||
|
||||
### Configuration Files {#configuration_files}
|
||||
|
||||
`clickhouse-client` uses the first existing file of the following:
|
||||
|
@ -60,5 +60,7 @@ toc_title: Client Libraries
|
||||
- [pillar](https://github.com/sofakingworld/pillar)
|
||||
- Nim
|
||||
- [nim-clickhouse](https://github.com/leonardoce/nim-clickhouse)
|
||||
- Haskell
|
||||
- [hdbc-clickhouse](https://github.com/zaneli/hdbc-clickhouse)
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/interfaces/third-party/client_libraries/) <!--hide-->
|
||||
|
@ -53,6 +53,11 @@ Merges the intermediate aggregation states in the same way as the -Merge combina
|
||||
|
||||
Converts an aggregate function for tables into an aggregate function for arrays that aggregates the corresponding array items and returns an array of results. For example, `sumForEach` for the arrays `[1, 2]`, `[3, 4, 5]`and`[6, 7]`returns the result `[10, 13, 5]` after adding together the corresponding array items.
|
||||
|
||||
## -Distinct {#agg-functions-combinator-distinct}
|
||||
|
||||
Every unique combination of arguments will be aggregated only once. Repeating values are ignored.
|
||||
Examples: `sum(DISTINCT x)`, `groupArray(DISTINCT x)`, `corrStableDistinct(DISTINCT x, y)` and so on.
|
||||
|
||||
## -OrDefault {#agg-functions-combinator-ordefault}
|
||||
|
||||
Changes behavior of an aggregate function.
|
||||
@ -244,4 +249,5 @@ FROM people
|
||||
└────────┴───────────────────────────┘
|
||||
```
|
||||
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/query_language/agg_functions/combinators/) <!--hide-->
|
||||
|
@ -11,9 +11,9 @@ ClickHouse предоставляет собственный клиент ком
|
||||
|
||||
``` bash
|
||||
$ clickhouse-client
|
||||
ClickHouse client version 19.17.1.1579 (official build).
|
||||
ClickHouse client version 20.13.1.5273 (official build).
|
||||
Connecting to localhost:9000 as user default.
|
||||
Connected to ClickHouse server version 19.17.1 revision 54428.
|
||||
Connected to ClickHouse server version 20.13.1 revision 54442.
|
||||
|
||||
:)
|
||||
```
|
||||
@ -131,6 +131,8 @@ $ clickhouse-client --param_tuple_in_tuple="(10, ('dt', 10))" -q "SELECT * FROM
|
||||
- `--secure` — если указано, будет использован безопасный канал.
|
||||
- `--param_<name>` — значение параметра для [запроса с параметрами](#cli-queries-with-parameters).
|
||||
|
||||
Начиная с версии 20.5, в `clickhouse-client` есть автоматическая подсветка синтаксиса (включена всегда).
|
||||
|
||||
### Конфигурационные файлы {#konfiguratsionnye-faily}
|
||||
|
||||
`clickhouse—client` использует первый существующий файл из:
|
||||
|
@ -51,6 +51,11 @@ toc_title: "\u041a\u043e\u043c\u0431\u0438\u043d\u0430\u0442\u043e\u0440\u044b\u
|
||||
|
||||
Преобразует агрегатную функцию для таблиц в агрегатную функцию для массивов, которая применяет агрегирование для соответствующих элементов массивов и возвращает массив результатов. Например, `sumForEach` для массивов `[1, 2]`, `[3, 4, 5]` и `[6, 7]` даст результат `[10, 13, 5]`, сложив соответственные элементы массивов.
|
||||
|
||||
## -Distinct {#agg-functions-combinator-distinct}
|
||||
|
||||
При наличии комбинатора Distinct, каждое уникальное значение аргументов, будет учитано в агрегатной функции только один раз.
|
||||
Примеры: `sum(DISTINCT x)`, `groupArray(DISTINCT x)`, `corrStableDistinct(DISTINCT x, y)` и т.п.
|
||||
|
||||
## -OrDefault {#agg-functions-combinator-ordefault}
|
||||
|
||||
Изменяет поведение агрегатной функции.
|
||||
@ -59,7 +64,7 @@ toc_title: "\u041a\u043e\u043c\u0431\u0438\u043d\u0430\u0442\u043e\u0440\u044b\u
|
||||
|
||||
`-OrDefault` можно использовать с другими комбинаторами.
|
||||
|
||||
**Синтаксис**
|
||||
**Синтаксис**
|
||||
|
||||
``` sql
|
||||
<aggFunction>OrDefault(x)
|
||||
@ -69,8 +74,8 @@ toc_title: "\u041a\u043e\u043c\u0431\u0438\u043d\u0430\u0442\u043e\u0440\u044b\u
|
||||
|
||||
- `x` — Параметры агрегатной функции.
|
||||
|
||||
**Возращаемые зачения**
|
||||
|
||||
**Возращаемые зачения**
|
||||
|
||||
Возвращает значение по умолчанию для соответствующего типа агрегатной функции, если агрегировать нечего.
|
||||
|
||||
Тип данных зависит от используемой агрегатной функции.
|
||||
@ -116,11 +121,11 @@ FROM
|
||||
|
||||
Изменяет поведение агрегатной функции.
|
||||
|
||||
Комбинатор преобразует результат агрегатной функции к типу [Nullable](../data-types/nullable.md). Если агрегатная функция не получает данных на вход, то с комбинатором она возвращает [NULL](../syntax.md#null-literal).
|
||||
Комбинатор преобразует результат агрегатной функции к типу [Nullable](../data-types/nullable.md). Если агрегатная функция не получает данных на вход, то с комбинатором она возвращает [NULL](../syntax.md#null-literal).
|
||||
|
||||
`-OrNull` может использоваться с другими комбинаторами.
|
||||
|
||||
**Синтаксис**
|
||||
**Синтаксис**
|
||||
|
||||
``` sql
|
||||
<aggFunction>OrNull(x)
|
||||
@ -129,8 +134,8 @@ FROM
|
||||
**Параметры**
|
||||
|
||||
- `x` — Параметры агрегатной функции.
|
||||
|
||||
**Возвращаемые значения**
|
||||
|
||||
**Возвращаемые значения**
|
||||
|
||||
- Результат агрегатной функции, преобразованный в тип данных `Nullable`.
|
||||
- `NULL`, если у агрегатной функции нет входных данных.
|
||||
|
@ -376,7 +376,7 @@ bool ContextAccess::checkAccessImpl2(const AccessFlags & flags, const Args &...
|
||||
return true;
|
||||
};
|
||||
|
||||
auto access_denied = [&](const String & error_msg, int error_code)
|
||||
auto access_denied = [&](const String & error_msg, int error_code [[maybe_unused]])
|
||||
{
|
||||
if (trace_log)
|
||||
LOG_TRACE(trace_log, "Access denied: {}{}", (AccessRightsElement{flags, args...}.toString()),
|
||||
@ -558,7 +558,7 @@ bool ContextAccess::checkAdminOptionImpl2(const Container & role_ids, const GetN
|
||||
if (!std::size(role_ids) || is_full_access)
|
||||
return true;
|
||||
|
||||
auto show_error = [this](const String & msg, int error_code)
|
||||
auto show_error = [this](const String & msg, int error_code [[maybe_unused]])
|
||||
{
|
||||
UNUSED(this);
|
||||
if constexpr (throw_if_denied)
|
||||
|
@ -13,12 +13,12 @@ struct MultiEnum
|
||||
MultiEnum() = default;
|
||||
|
||||
template <typename ... EnumValues, typename = std::enable_if_t<std::conjunction_v<std::is_same<EnumTypeT, EnumValues>...>>>
|
||||
explicit MultiEnum(EnumValues ... v)
|
||||
constexpr explicit MultiEnum(EnumValues ... v)
|
||||
: MultiEnum((toBitFlag(v) | ... | 0u))
|
||||
{}
|
||||
|
||||
template <typename ValueType, typename = std::enable_if_t<std::is_convertible_v<ValueType, StorageType>>>
|
||||
explicit MultiEnum(ValueType v)
|
||||
constexpr explicit MultiEnum(ValueType v)
|
||||
: bitset(v)
|
||||
{
|
||||
static_assert(std::is_unsigned_v<ValueType>);
|
||||
@ -95,5 +95,5 @@ struct MultiEnum
|
||||
private:
|
||||
StorageType bitset = 0;
|
||||
|
||||
static StorageType toBitFlag(EnumType v) { return StorageType{1} << static_cast<StorageType>(v); }
|
||||
static constexpr StorageType toBitFlag(EnumType v) { return StorageType{1} << static_cast<StorageType>(v); }
|
||||
};
|
||||
|
@ -14,6 +14,7 @@
|
||||
#include <DataTypes/DataTypeString.h>
|
||||
#include <DataTypes/DataTypeDateTime.h>
|
||||
#include <DataTypes/DataTypeDateTime64.h>
|
||||
#include <DataTypes/DataTypeEnum.h>
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
#include <DataTypes/DataTypesDecimal.h>
|
||||
|
||||
@ -360,9 +361,9 @@ DataTypePtr getLeastSupertype(const DataTypes & types)
|
||||
maximize(max_bits_of_unsigned_integer, 64);
|
||||
else if (typeid_cast<const DataTypeUInt256 *>(type.get()))
|
||||
maximize(max_bits_of_unsigned_integer, 256);
|
||||
else if (typeid_cast<const DataTypeInt8 *>(type.get()))
|
||||
else if (typeid_cast<const DataTypeInt8 *>(type.get()) || typeid_cast<const DataTypeEnum8 *>(type.get()))
|
||||
maximize(max_bits_of_signed_integer, 8);
|
||||
else if (typeid_cast<const DataTypeInt16 *>(type.get()))
|
||||
else if (typeid_cast<const DataTypeInt16 *>(type.get()) || typeid_cast<const DataTypeEnum16 *>(type.get()))
|
||||
maximize(max_bits_of_signed_integer, 16);
|
||||
else if (typeid_cast<const DataTypeInt32 *>(type.get()))
|
||||
maximize(max_bits_of_signed_integer, 32);
|
||||
|
@ -27,8 +27,14 @@ void ExpressionInfoMatcher::visit(const ASTFunction & ast_function, const ASTPtr
|
||||
const auto & function = FunctionFactory::instance().tryGet(ast_function.name, data.context);
|
||||
|
||||
/// Skip lambda, tuple and other special functions
|
||||
if (function && function->isStateful())
|
||||
data.is_stateful_function = true;
|
||||
if (function)
|
||||
{
|
||||
if (function->isStateful())
|
||||
data.is_stateful_function = true;
|
||||
|
||||
if (!function->isDeterministicInScopeOfQuery())
|
||||
data.is_deterministic_function = false;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -21,6 +21,7 @@ struct ExpressionInfoMatcher
|
||||
bool is_array_join = false;
|
||||
bool is_stateful_function = false;
|
||||
bool is_aggregate_function = false;
|
||||
bool is_deterministic_function = true;
|
||||
std::unordered_set<size_t> unique_reference_tables_pos = {};
|
||||
};
|
||||
|
||||
|
@ -34,6 +34,7 @@
|
||||
#include <Interpreters/QueryLog.h>
|
||||
#include <Interpreters/TranslateQualifiedNamesVisitor.h>
|
||||
#include <Interpreters/getTableExpressions.h>
|
||||
#include <Interpreters/processColumnTransformers.h>
|
||||
|
||||
namespace
|
||||
{
|
||||
@ -52,7 +53,6 @@ namespace ErrorCodes
|
||||
extern const int LOGICAL_ERROR;
|
||||
}
|
||||
|
||||
|
||||
InterpreterInsertQuery::InterpreterInsertQuery(
|
||||
const ASTPtr & query_ptr_, const Context & context_, bool allow_materialized_, bool no_squash_, bool no_destination_)
|
||||
: query_ptr(query_ptr_)
|
||||
@ -95,27 +95,7 @@ Block InterpreterInsertQuery::getSampleBlock(
|
||||
|
||||
Block table_sample = metadata_snapshot->getSampleBlock();
|
||||
|
||||
/// Process column transformers (e.g. * EXCEPT(a)), asterisks and qualified columns.
|
||||
const auto & columns = metadata_snapshot->getColumns();
|
||||
auto names_and_types = columns.getOrdinary();
|
||||
removeDuplicateColumns(names_and_types);
|
||||
auto table_expr = std::make_shared<ASTTableExpression>();
|
||||
table_expr->database_and_table_name = createTableIdentifier(table->getStorageID());
|
||||
table_expr->children.push_back(table_expr->database_and_table_name);
|
||||
TablesWithColumns tables_with_columns;
|
||||
tables_with_columns.emplace_back(DatabaseAndTableWithAlias(*table_expr, context.getCurrentDatabase()), names_and_types);
|
||||
|
||||
tables_with_columns[0].addHiddenColumns(columns.getMaterialized());
|
||||
tables_with_columns[0].addHiddenColumns(columns.getAliases());
|
||||
tables_with_columns[0].addHiddenColumns(table->getVirtuals());
|
||||
|
||||
NameSet source_columns_set;
|
||||
for (const auto & identifier : query.columns->children)
|
||||
source_columns_set.insert(identifier->getColumnName());
|
||||
TranslateQualifiedNamesVisitor::Data visitor_data(source_columns_set, tables_with_columns);
|
||||
TranslateQualifiedNamesVisitor visitor(visitor_data);
|
||||
auto columns_ast = query.columns->clone();
|
||||
visitor.visit(columns_ast);
|
||||
const auto columns_ast = processColumnTransformers(context.getCurrentDatabase(), table, metadata_snapshot, query.columns);
|
||||
|
||||
/// Form the block based on the column names from the query
|
||||
Block res;
|
||||
|
@ -5,13 +5,18 @@
|
||||
#include <Interpreters/InterpreterOptimizeQuery.h>
|
||||
#include <Access/AccessRightsElement.h>
|
||||
#include <Common/typeid_cast.h>
|
||||
#include <Parsers/ASTExpressionList.h>
|
||||
|
||||
#include <Interpreters/processColumnTransformers.h>
|
||||
|
||||
#include <memory>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int THERE_IS_NO_COLUMN;
|
||||
}
|
||||
|
||||
|
||||
@ -27,7 +32,44 @@ BlockIO InterpreterOptimizeQuery::execute()
|
||||
auto table_id = context.resolveStorageID(ast, Context::ResolveOrdinary);
|
||||
StoragePtr table = DatabaseCatalog::instance().getTable(table_id, context);
|
||||
auto metadata_snapshot = table->getInMemoryMetadataPtr();
|
||||
table->optimize(query_ptr, metadata_snapshot, ast.partition, ast.final, ast.deduplicate, context);
|
||||
|
||||
// Empty list of names means we deduplicate by all columns, but user can explicitly state which columns to use.
|
||||
Names column_names;
|
||||
if (ast.deduplicate_by_columns)
|
||||
{
|
||||
// User requested custom set of columns for deduplication, possibly with Column Transformer expression.
|
||||
{
|
||||
// Expand asterisk, column transformers, etc into list of column names.
|
||||
const auto cols = processColumnTransformers(context.getCurrentDatabase(), table, metadata_snapshot, ast.deduplicate_by_columns);
|
||||
for (const auto & col : cols->children)
|
||||
column_names.emplace_back(col->getColumnName());
|
||||
}
|
||||
|
||||
metadata_snapshot->check(column_names, NamesAndTypesList{}, table_id);
|
||||
Names required_columns;
|
||||
{
|
||||
required_columns = metadata_snapshot->getColumnsRequiredForSortingKey();
|
||||
const auto partitioning_cols = metadata_snapshot->getColumnsRequiredForPartitionKey();
|
||||
required_columns.reserve(required_columns.size() + partitioning_cols.size());
|
||||
required_columns.insert(required_columns.end(), partitioning_cols.begin(), partitioning_cols.end());
|
||||
}
|
||||
for (const auto & required_col : required_columns)
|
||||
{
|
||||
// Deduplication is performed only for adjacent rows in a block,
|
||||
// and all rows in block are in the sorting key order within a single partition,
|
||||
// hence deduplication always implicitly takes sorting keys and parition keys in account.
|
||||
// So we just explicitly state that limitation in order to avoid confusion.
|
||||
if (std::find(column_names.begin(), column_names.end(), required_col) == column_names.end())
|
||||
throw Exception(ErrorCodes::THERE_IS_NO_COLUMN,
|
||||
"DEDUPLICATE BY expression must include all columns used in table's"
|
||||
" ORDER BY, PRIMARY KEY, or PARTITION BY but '{}' is missing."
|
||||
" Expanded DEDUPLICATE BY columns expression: ['{}']",
|
||||
required_col, fmt::join(column_names, "', '"));
|
||||
}
|
||||
}
|
||||
|
||||
table->optimize(query_ptr, metadata_snapshot, ast.partition, ast.final, ast.deduplicate, column_names, context);
|
||||
|
||||
return {};
|
||||
}
|
||||
|
||||
|
@ -90,8 +90,8 @@ std::vector<ASTs> PredicateExpressionsOptimizer::extractTablesPredicates(const A
|
||||
ExpressionInfoVisitor::Data expression_info{.context = context, .tables = tables_with_columns};
|
||||
ExpressionInfoVisitor(expression_info).visit(predicate_expression);
|
||||
|
||||
if (expression_info.is_stateful_function)
|
||||
return {}; /// give up the optimization when the predicate contains stateful function
|
||||
if (expression_info.is_stateful_function || !expression_info.is_deterministic_function)
|
||||
return {}; /// Not optimized when predicate contains stateful function or indeterministic function
|
||||
|
||||
if (!expression_info.is_array_join)
|
||||
{
|
||||
|
49
src/Interpreters/processColumnTransformers.cpp
Normal file
49
src/Interpreters/processColumnTransformers.cpp
Normal file
@ -0,0 +1,49 @@
|
||||
#include <Interpreters/processColumnTransformers.h>
|
||||
|
||||
#include <Interpreters/DatabaseAndTableWithAlias.h>
|
||||
#include <Interpreters/TranslateQualifiedNamesVisitor.h>
|
||||
#include <Interpreters/getTableExpressions.h>
|
||||
#include <Parsers/ASTIdentifier.h>
|
||||
#include <Parsers/ASTTablesInSelectQuery.h>
|
||||
#include <Parsers/IAST.h>
|
||||
#include <Storages/IStorage.h>
|
||||
#include <Storages/StorageInMemoryMetadata.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
ASTPtr processColumnTransformers(
|
||||
const String & current_database,
|
||||
const StoragePtr & table,
|
||||
const StorageMetadataPtr & metadata_snapshot,
|
||||
ASTPtr query_columns)
|
||||
{
|
||||
const auto & columns = metadata_snapshot->getColumns();
|
||||
auto names_and_types = columns.getOrdinary();
|
||||
removeDuplicateColumns(names_and_types);
|
||||
|
||||
TablesWithColumns tables_with_columns;
|
||||
{
|
||||
auto table_expr = std::make_shared<ASTTableExpression>();
|
||||
table_expr->database_and_table_name = createTableIdentifier(table->getStorageID());
|
||||
table_expr->children.push_back(table_expr->database_and_table_name);
|
||||
tables_with_columns.emplace_back(DatabaseAndTableWithAlias(*table_expr, current_database), names_and_types);
|
||||
}
|
||||
|
||||
tables_with_columns[0].addHiddenColumns(columns.getMaterialized());
|
||||
tables_with_columns[0].addHiddenColumns(columns.getAliases());
|
||||
tables_with_columns[0].addHiddenColumns(table->getVirtuals());
|
||||
|
||||
NameSet source_columns_set;
|
||||
for (const auto & identifier : query_columns->children)
|
||||
source_columns_set.insert(identifier->getColumnName());
|
||||
|
||||
TranslateQualifiedNamesVisitor::Data visitor_data(source_columns_set, tables_with_columns);
|
||||
TranslateQualifiedNamesVisitor visitor(visitor_data);
|
||||
auto columns_ast = query_columns->clone();
|
||||
visitor.visit(columns_ast);
|
||||
|
||||
return columns_ast;
|
||||
}
|
||||
|
||||
}
|
19
src/Interpreters/processColumnTransformers.h
Normal file
19
src/Interpreters/processColumnTransformers.h
Normal file
@ -0,0 +1,19 @@
|
||||
#pragma once
|
||||
|
||||
#include <Parsers/IAST_fwd.h>
|
||||
#include <Storages/IStorage_fwd.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
struct StorageInMemoryMetadata;
|
||||
using StorageMetadataPtr = std::shared_ptr<const StorageInMemoryMetadata>;
|
||||
|
||||
/// Process column transformers (e.g. * EXCEPT(a)), asterisks and qualified columns.
|
||||
ASTPtr processColumnTransformers(
|
||||
const String & current_database,
|
||||
const StoragePtr & table,
|
||||
const StorageMetadataPtr & metadata_snapshot,
|
||||
ASTPtr query_columns);
|
||||
|
||||
}
|
@ -157,6 +157,7 @@ SRCS(
|
||||
interpretSubquery.cpp
|
||||
join_common.cpp
|
||||
loadMetadata.cpp
|
||||
processColumnTransformers.cpp
|
||||
sortBlock.cpp
|
||||
|
||||
)
|
||||
|
@ -23,6 +23,12 @@ void ASTOptimizeQuery::formatQueryImpl(const FormatSettings & settings, FormatSt
|
||||
|
||||
if (deduplicate)
|
||||
settings.ostr << (settings.hilite ? hilite_keyword : "") << " DEDUPLICATE" << (settings.hilite ? hilite_none : "");
|
||||
|
||||
if (deduplicate_by_columns)
|
||||
{
|
||||
settings.ostr << (settings.hilite ? hilite_keyword : "") << " BY " << (settings.hilite ? hilite_none : "");
|
||||
deduplicate_by_columns->formatImpl(settings, state, frame);
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -16,9 +16,11 @@ public:
|
||||
/// The partition to optimize can be specified.
|
||||
ASTPtr partition;
|
||||
/// A flag can be specified - perform optimization "to the end" instead of one step.
|
||||
bool final;
|
||||
bool final = false;
|
||||
/// Do deduplicate (default: false)
|
||||
bool deduplicate;
|
||||
bool deduplicate = false;
|
||||
/// Deduplicate by columns.
|
||||
ASTPtr deduplicate_by_columns;
|
||||
|
||||
/** Get the text that identifies this element. */
|
||||
String getID(char delim) const override
|
||||
@ -37,6 +39,12 @@ public:
|
||||
res->children.push_back(res->partition);
|
||||
}
|
||||
|
||||
if (deduplicate_by_columns)
|
||||
{
|
||||
res->deduplicate_by_columns = deduplicate_by_columns->clone();
|
||||
res->children.push_back(res->deduplicate_by_columns);
|
||||
}
|
||||
|
||||
return res;
|
||||
}
|
||||
|
||||
|
@ -1259,7 +1259,7 @@ bool ParserColumnsMatcher::parseImpl(Pos & pos, ASTPtr & node, Expected & expect
|
||||
res->children.push_back(regex_node);
|
||||
}
|
||||
|
||||
ParserColumnsTransformers transformers_p;
|
||||
ParserColumnsTransformers transformers_p(allowed_transformers);
|
||||
ASTPtr transformer;
|
||||
while (transformers_p.parse(pos, transformer, expected))
|
||||
{
|
||||
@ -1278,7 +1278,7 @@ bool ParserColumnsTransformers::parseImpl(Pos & pos, ASTPtr & node, Expected & e
|
||||
ParserKeyword as("AS");
|
||||
ParserKeyword strict("STRICT");
|
||||
|
||||
if (apply.ignore(pos, expected))
|
||||
if (allowed_transformers.isSet(ColumnTransformer::APPLY) && apply.ignore(pos, expected))
|
||||
{
|
||||
bool with_open_round_bracket = false;
|
||||
|
||||
@ -1331,7 +1331,7 @@ bool ParserColumnsTransformers::parseImpl(Pos & pos, ASTPtr & node, Expected & e
|
||||
node = std::move(res);
|
||||
return true;
|
||||
}
|
||||
else if (except.ignore(pos, expected))
|
||||
else if (allowed_transformers.isSet(ColumnTransformer::EXCEPT) && except.ignore(pos, expected))
|
||||
{
|
||||
if (strict.ignore(pos, expected))
|
||||
is_strict = true;
|
||||
@ -1371,7 +1371,7 @@ bool ParserColumnsTransformers::parseImpl(Pos & pos, ASTPtr & node, Expected & e
|
||||
node = std::move(res);
|
||||
return true;
|
||||
}
|
||||
else if (replace.ignore(pos, expected))
|
||||
else if (allowed_transformers.isSet(ColumnTransformer::REPLACE) && replace.ignore(pos, expected))
|
||||
{
|
||||
if (strict.ignore(pos, expected))
|
||||
is_strict = true;
|
||||
@ -1434,7 +1434,7 @@ bool ParserAsterisk::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
|
||||
{
|
||||
++pos;
|
||||
auto asterisk = std::make_shared<ASTAsterisk>();
|
||||
ParserColumnsTransformers transformers_p;
|
||||
ParserColumnsTransformers transformers_p(allowed_transformers);
|
||||
ASTPtr transformer;
|
||||
while (transformers_p.parse(pos, transformer, expected))
|
||||
{
|
||||
|
@ -1,6 +1,7 @@
|
||||
#pragma once
|
||||
|
||||
#include <Core/Field.h>
|
||||
#include <Core/MultiEnum.h>
|
||||
#include <Parsers/IParserBase.h>
|
||||
|
||||
|
||||
@ -70,12 +71,47 @@ protected:
|
||||
bool allow_query_parameter;
|
||||
};
|
||||
|
||||
/** *, t.*, db.table.*, COLUMNS('<regular expression>') APPLY(...) or EXCEPT(...) or REPLACE(...)
|
||||
*/
|
||||
class ParserColumnsTransformers : public IParserBase
|
||||
{
|
||||
public:
|
||||
enum class ColumnTransformer : UInt8
|
||||
{
|
||||
APPLY,
|
||||
EXCEPT,
|
||||
REPLACE,
|
||||
};
|
||||
using ColumnTransformers = MultiEnum<ColumnTransformer, UInt8>;
|
||||
static constexpr auto AllTransformers = ColumnTransformers{ColumnTransformer::APPLY, ColumnTransformer::EXCEPT, ColumnTransformer::REPLACE};
|
||||
|
||||
ParserColumnsTransformers(ColumnTransformers allowed_transformers_ = AllTransformers, bool is_strict_ = false)
|
||||
: allowed_transformers(allowed_transformers_)
|
||||
, is_strict(is_strict_)
|
||||
{}
|
||||
|
||||
protected:
|
||||
const char * getName() const override { return "COLUMNS transformers"; }
|
||||
bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override;
|
||||
ColumnTransformers allowed_transformers;
|
||||
bool is_strict;
|
||||
};
|
||||
|
||||
|
||||
/// Just *
|
||||
class ParserAsterisk : public IParserBase
|
||||
{
|
||||
public:
|
||||
using ColumnTransformers = ParserColumnsTransformers::ColumnTransformers;
|
||||
ParserAsterisk(ColumnTransformers allowed_transformers_ = ParserColumnsTransformers::AllTransformers)
|
||||
: allowed_transformers(allowed_transformers_)
|
||||
{}
|
||||
|
||||
protected:
|
||||
const char * getName() const override { return "asterisk"; }
|
||||
bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override;
|
||||
|
||||
ColumnTransformers allowed_transformers;
|
||||
};
|
||||
|
||||
/** Something like t.* or db.table.*
|
||||
@ -91,21 +127,17 @@ protected:
|
||||
*/
|
||||
class ParserColumnsMatcher : public IParserBase
|
||||
{
|
||||
public:
|
||||
using ColumnTransformers = ParserColumnsTransformers::ColumnTransformers;
|
||||
ParserColumnsMatcher(ColumnTransformers allowed_transformers_ = ParserColumnsTransformers::AllTransformers)
|
||||
: allowed_transformers(allowed_transformers_)
|
||||
{}
|
||||
|
||||
protected:
|
||||
const char * getName() const override { return "COLUMNS matcher"; }
|
||||
bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override;
|
||||
};
|
||||
|
||||
/** *, t.*, db.table.*, COLUMNS('<regular expression>') APPLY(...) or EXCEPT(...) or REPLACE(...)
|
||||
*/
|
||||
class ParserColumnsTransformers : public IParserBase
|
||||
{
|
||||
public:
|
||||
ParserColumnsTransformers(bool is_strict_ = false): is_strict(is_strict_) {}
|
||||
protected:
|
||||
const char * getName() const override { return "COLUMNS transformers"; }
|
||||
bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override;
|
||||
bool is_strict;
|
||||
ColumnTransformers allowed_transformers;
|
||||
};
|
||||
|
||||
/** A function, for example, f(x, y + 1, g(z)).
|
||||
|
@ -4,11 +4,24 @@
|
||||
|
||||
#include <Parsers/ASTOptimizeQuery.h>
|
||||
#include <Parsers/ASTIdentifier.h>
|
||||
#include <Parsers/ExpressionListParsers.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
bool ParserOptimizeQueryColumnsSpecification::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
|
||||
{
|
||||
// Do not allow APPLY and REPLACE transformers.
|
||||
// Since we use Columns Transformers only to get list of columns,
|
||||
// ad we can't actuall modify content of the columns for deduplication.
|
||||
const auto allowed_transformers = ParserColumnsTransformers::ColumnTransformers{ParserColumnsTransformers::ColumnTransformer::EXCEPT};
|
||||
|
||||
return ParserColumnsMatcher(allowed_transformers).parse(pos, node, expected)
|
||||
|| ParserAsterisk(allowed_transformers).parse(pos, node, expected)
|
||||
|| ParserIdentifier(false).parse(pos, node, expected);
|
||||
}
|
||||
|
||||
|
||||
bool ParserOptimizeQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expected)
|
||||
{
|
||||
@ -16,6 +29,7 @@ bool ParserOptimizeQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expecte
|
||||
ParserKeyword s_partition("PARTITION");
|
||||
ParserKeyword s_final("FINAL");
|
||||
ParserKeyword s_deduplicate("DEDUPLICATE");
|
||||
ParserKeyword s_by("BY");
|
||||
ParserToken s_dot(TokenType::Dot);
|
||||
ParserIdentifier name_p;
|
||||
ParserPartition partition_p;
|
||||
@ -55,6 +69,14 @@ bool ParserOptimizeQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expecte
|
||||
if (s_deduplicate.ignore(pos, expected))
|
||||
deduplicate = true;
|
||||
|
||||
ASTPtr deduplicate_by_columns;
|
||||
if (deduplicate && s_by.ignore(pos, expected))
|
||||
{
|
||||
if (!ParserList(std::make_unique<ParserOptimizeQueryColumnsSpecification>(), std::make_unique<ParserToken>(TokenType::Comma), false)
|
||||
.parse(pos, deduplicate_by_columns, expected))
|
||||
return false;
|
||||
}
|
||||
|
||||
auto query = std::make_shared<ASTOptimizeQuery>();
|
||||
node = query;
|
||||
|
||||
@ -66,6 +88,7 @@ bool ParserOptimizeQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expecte
|
||||
query->children.push_back(partition);
|
||||
query->final = final;
|
||||
query->deduplicate = deduplicate;
|
||||
query->deduplicate_by_columns = deduplicate_by_columns;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
@ -7,6 +7,13 @@
|
||||
namespace DB
|
||||
{
|
||||
|
||||
class ParserOptimizeQueryColumnsSpecification : public IParserBase
|
||||
{
|
||||
protected:
|
||||
const char * getName() const override { return "column specification for OPTIMIZE ... DEDUPLICATE BY"; }
|
||||
bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected) override;
|
||||
};
|
||||
|
||||
/** Query OPTIMIZE TABLE [db.]name [PARTITION partition] [FINAL] [DEDUPLICATE]
|
||||
*/
|
||||
class ParserOptimizeQuery : public IParserBase
|
||||
|
134
src/Parsers/tests/gtest_Parser.cpp
Normal file
134
src/Parsers/tests/gtest_Parser.cpp
Normal file
@ -0,0 +1,134 @@
|
||||
#include <Parsers/ParserOptimizeQuery.h>
|
||||
|
||||
#include <Parsers/ParserQueryWithOutput.h>
|
||||
#include <Parsers/parseQuery.h>
|
||||
#include <Parsers/formatAST.h>
|
||||
#include <IO/WriteBufferFromOStream.h>
|
||||
|
||||
#include <string_view>
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
|
||||
namespace
|
||||
{
|
||||
using namespace DB;
|
||||
using namespace std::literals;
|
||||
}
|
||||
|
||||
struct ParserTestCase
|
||||
{
|
||||
std::shared_ptr<IParser> parser;
|
||||
const std::string_view input_text;
|
||||
const char * expected_ast = nullptr;
|
||||
};
|
||||
|
||||
std::ostream & operator<<(std::ostream & ostr, const ParserTestCase & test_case)
|
||||
{
|
||||
return ostr << "parser: " << test_case.parser->getName() << ", input: " << test_case.input_text;
|
||||
}
|
||||
|
||||
class ParserTest : public ::testing::TestWithParam<ParserTestCase>
|
||||
{};
|
||||
|
||||
TEST_P(ParserTest, parseQuery)
|
||||
{
|
||||
const auto & [parser, input_text, expected_ast] = GetParam();
|
||||
|
||||
ASSERT_NE(nullptr, parser);
|
||||
|
||||
if (expected_ast)
|
||||
{
|
||||
ASTPtr ast;
|
||||
ASSERT_NO_THROW(ast = parseQuery(*parser, input_text.begin(), input_text.end(), 0, 0));
|
||||
EXPECT_EQ(expected_ast, serializeAST(*ast->clone(), false));
|
||||
}
|
||||
else
|
||||
{
|
||||
ASSERT_THROW(parseQuery(*parser, input_text.begin(), input_text.end(), 0, 0), DB::Exception);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(ParserOptimizeQuery, ParserTest, ::testing::Values(
|
||||
ParserTestCase
|
||||
{
|
||||
std::make_shared<ParserOptimizeQuery>(),
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY COLUMNS('a, b')",
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY COLUMNS('a, b')"
|
||||
},
|
||||
ParserTestCase
|
||||
{
|
||||
std::make_shared<ParserOptimizeQuery>(),
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY COLUMNS('[a]')",
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY COLUMNS('[a]')"
|
||||
},
|
||||
ParserTestCase
|
||||
{
|
||||
std::make_shared<ParserOptimizeQuery>(),
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY COLUMNS('[a]') EXCEPT b",
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY COLUMNS('[a]') EXCEPT b"
|
||||
},
|
||||
ParserTestCase
|
||||
{
|
||||
std::make_shared<ParserOptimizeQuery>(),
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY COLUMNS('[a]') EXCEPT (a, b)",
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY COLUMNS('[a]') EXCEPT (a, b)"
|
||||
},
|
||||
ParserTestCase
|
||||
{
|
||||
std::make_shared<ParserOptimizeQuery>(),
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY a, b, c",
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY a, b, c"
|
||||
},
|
||||
ParserTestCase
|
||||
{
|
||||
std::make_shared<ParserOptimizeQuery>(),
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY *",
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY *"
|
||||
},
|
||||
ParserTestCase
|
||||
{
|
||||
std::make_shared<ParserOptimizeQuery>(),
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY * EXCEPT a",
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY * EXCEPT a"
|
||||
},
|
||||
ParserTestCase
|
||||
{
|
||||
std::make_shared<ParserOptimizeQuery>(),
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY * EXCEPT (a, b)",
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY * EXCEPT (a, b)"
|
||||
}
|
||||
));
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(ParserOptimizeQuery_FAIL, ParserTest, ::testing::Values(
|
||||
ParserTestCase
|
||||
{
|
||||
std::make_shared<ParserOptimizeQuery>(),
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY",
|
||||
},
|
||||
ParserTestCase
|
||||
{
|
||||
std::make_shared<ParserOptimizeQuery>(),
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY COLUMNS('[a]') APPLY(x)",
|
||||
},
|
||||
ParserTestCase
|
||||
{
|
||||
std::make_shared<ParserOptimizeQuery>(),
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY COLUMNS('[a]') REPLACE(y)",
|
||||
},
|
||||
ParserTestCase
|
||||
{
|
||||
std::make_shared<ParserOptimizeQuery>(),
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY * APPLY(x)",
|
||||
},
|
||||
ParserTestCase
|
||||
{
|
||||
std::make_shared<ParserOptimizeQuery>(),
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY * REPLACE(y)",
|
||||
},
|
||||
ParserTestCase
|
||||
{
|
||||
std::make_shared<ParserOptimizeQuery>(),
|
||||
"OPTIMIZE TABLE table_name DEDUPLICATE BY db.a, db.b, db.c",
|
||||
}
|
||||
));
|
@ -380,6 +380,7 @@ public:
|
||||
const ASTPtr & /*partition*/,
|
||||
bool /*final*/,
|
||||
bool /*deduplicate*/,
|
||||
const Names & /* deduplicate_by_columns */,
|
||||
const Context & /*context*/)
|
||||
{
|
||||
throw Exception("Method optimize is not supported by storage " + getName(), ErrorCodes::NOT_IMPLEMENTED);
|
||||
|
@ -652,7 +652,8 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor
|
||||
time_t time_of_merge,
|
||||
const Context & context,
|
||||
const ReservationPtr & space_reservation,
|
||||
bool deduplicate)
|
||||
bool deduplicate,
|
||||
const Names & deduplicate_by_columns)
|
||||
{
|
||||
static const String TMP_PREFIX = "tmp_merge_";
|
||||
|
||||
@ -667,6 +668,13 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor
|
||||
const MergeTreeData::DataPartsVector & parts = future_part.parts;
|
||||
|
||||
LOG_DEBUG(log, "Merging {} parts: from {} to {} into {}", parts.size(), parts.front()->name, parts.back()->name, future_part.type.toString());
|
||||
if (deduplicate)
|
||||
{
|
||||
if (deduplicate_by_columns.empty())
|
||||
LOG_DEBUG(log, "DEDUPLICATE BY all columns");
|
||||
else
|
||||
LOG_DEBUG(log, "DEDUPLICATE BY ('{}')", fmt::join(deduplicate_by_columns, "', '"));
|
||||
}
|
||||
|
||||
auto disk = space_reservation->getDisk();
|
||||
String part_path = data.relative_data_path;
|
||||
@ -891,7 +899,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor
|
||||
BlockInputStreamPtr merged_stream = std::make_shared<PipelineExecutingBlockInputStream>(std::move(pipeline));
|
||||
|
||||
if (deduplicate)
|
||||
merged_stream = std::make_shared<DistinctSortedBlockInputStream>(merged_stream, sort_description, SizeLimits(), 0 /*limit_hint*/, Names());
|
||||
merged_stream = std::make_shared<DistinctSortedBlockInputStream>(merged_stream, sort_description, SizeLimits(), 0 /*limit_hint*/, deduplicate_by_columns);
|
||||
|
||||
if (need_remove_expired_values)
|
||||
merged_stream = std::make_shared<TTLBlockInputStream>(merged_stream, data, metadata_snapshot, new_data_part, time_of_merge, force_ttl);
|
||||
|
@ -127,7 +127,8 @@ public:
|
||||
time_t time_of_merge,
|
||||
const Context & context,
|
||||
const ReservationPtr & space_reservation,
|
||||
bool deduplicate);
|
||||
bool deduplicate,
|
||||
const Names & deduplicate_by_columns);
|
||||
|
||||
/// Mutate a single data part with the specified commands. Will create and return a temporary part.
|
||||
MergeTreeData::MutableDataPartPtr mutatePartToTemporaryPart(
|
||||
|
@ -6,6 +6,7 @@
|
||||
#include <IO/ReadBufferFromString.h>
|
||||
#include <IO/WriteBufferFromString.h>
|
||||
#include <IO/ReadHelpers.h>
|
||||
#include <IO/WriteHelpers.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
@ -16,15 +17,29 @@ namespace ErrorCodes
|
||||
extern const int LOGICAL_ERROR;
|
||||
}
|
||||
|
||||
enum FormatVersion : UInt8
|
||||
{
|
||||
FORMAT_WITH_CREATE_TIME = 2,
|
||||
FORMAT_WITH_BLOCK_ID = 3,
|
||||
FORMAT_WITH_DEDUPLICATE = 4,
|
||||
FORMAT_WITH_UUID = 5,
|
||||
FORMAT_WITH_DEDUPLICATE_BY_COLUMNS = 6,
|
||||
|
||||
FORMAT_LAST
|
||||
};
|
||||
|
||||
|
||||
void ReplicatedMergeTreeLogEntryData::writeText(WriteBuffer & out) const
|
||||
{
|
||||
UInt8 format_version = 4;
|
||||
UInt8 format_version = FORMAT_WITH_DEDUPLICATE;
|
||||
|
||||
if (!deduplicate_by_columns.empty())
|
||||
format_version = std::max<UInt8>(format_version, FORMAT_WITH_DEDUPLICATE_BY_COLUMNS);
|
||||
|
||||
/// Conditionally bump format_version only when uuid has been assigned.
|
||||
/// If some other feature requires bumping format_version to >= 5 then this code becomes no-op.
|
||||
if (new_part_uuid != UUIDHelpers::Nil)
|
||||
format_version = std::max(format_version, static_cast<UInt8>(5));
|
||||
format_version = std::max<UInt8>(format_version, FORMAT_WITH_UUID);
|
||||
|
||||
out << "format version: " << format_version << "\n"
|
||||
<< "create_time: " << LocalDateTime(create_time ? create_time : time(nullptr)) << "\n"
|
||||
@ -50,6 +65,17 @@ void ReplicatedMergeTreeLogEntryData::writeText(WriteBuffer & out) const
|
||||
if (new_part_uuid != UUIDHelpers::Nil)
|
||||
out << "\ninto_uuid: " << new_part_uuid;
|
||||
|
||||
if (!deduplicate_by_columns.empty())
|
||||
{
|
||||
out << "\ndeduplicate_by_columns: ";
|
||||
for (size_t i = 0; i < deduplicate_by_columns.size(); ++i)
|
||||
{
|
||||
out << quote << deduplicate_by_columns[i];
|
||||
if (i != deduplicate_by_columns.size() - 1)
|
||||
out << ",";
|
||||
}
|
||||
}
|
||||
|
||||
break;
|
||||
|
||||
case DROP_RANGE:
|
||||
@ -129,10 +155,10 @@ void ReplicatedMergeTreeLogEntryData::readText(ReadBuffer & in)
|
||||
|
||||
in >> "format version: " >> format_version >> "\n";
|
||||
|
||||
if (format_version < 1 || format_version > 5)
|
||||
if (format_version < 1 || format_version >= FORMAT_LAST)
|
||||
throw Exception("Unknown ReplicatedMergeTreeLogEntry format version: " + DB::toString(format_version), ErrorCodes::UNKNOWN_FORMAT_VERSION);
|
||||
|
||||
if (format_version >= 2)
|
||||
if (format_version >= FORMAT_WITH_CREATE_TIME)
|
||||
{
|
||||
LocalDateTime create_time_dt;
|
||||
in >> "create_time: " >> create_time_dt >> "\n";
|
||||
@ -141,7 +167,7 @@ void ReplicatedMergeTreeLogEntryData::readText(ReadBuffer & in)
|
||||
|
||||
in >> "source replica: " >> source_replica >> "\n";
|
||||
|
||||
if (format_version >= 3)
|
||||
if (format_version >= FORMAT_WITH_BLOCK_ID)
|
||||
{
|
||||
in >> "block_id: " >> escape >> block_id >> "\n";
|
||||
}
|
||||
@ -167,7 +193,7 @@ void ReplicatedMergeTreeLogEntryData::readText(ReadBuffer & in)
|
||||
}
|
||||
in >> new_part_name;
|
||||
|
||||
if (format_version >= 4)
|
||||
if (format_version >= FORMAT_WITH_DEDUPLICATE)
|
||||
{
|
||||
in >> "\ndeduplicate: " >> deduplicate;
|
||||
|
||||
@ -184,6 +210,20 @@ void ReplicatedMergeTreeLogEntryData::readText(ReadBuffer & in)
|
||||
}
|
||||
else if (checkString("into_uuid: ", in))
|
||||
in >> new_part_uuid;
|
||||
else if (checkString("deduplicate_by_columns: ", in))
|
||||
{
|
||||
Strings new_deduplicate_by_columns;
|
||||
for (;;)
|
||||
{
|
||||
String tmp_column_name;
|
||||
in >> quote >> tmp_column_name;
|
||||
new_deduplicate_by_columns.emplace_back(std::move(tmp_column_name));
|
||||
if (!checkString(",", in))
|
||||
break;
|
||||
}
|
||||
|
||||
deduplicate_by_columns = std::move(new_deduplicate_by_columns);
|
||||
}
|
||||
else
|
||||
trailing_newline_found = true;
|
||||
}
|
||||
|
@ -81,6 +81,7 @@ struct ReplicatedMergeTreeLogEntryData
|
||||
|
||||
Strings source_parts;
|
||||
bool deduplicate = false; /// Do deduplicate on merge
|
||||
Strings deduplicate_by_columns = {}; // Which columns should be checked for duplicates, empty means 'all' (default).
|
||||
MergeType merge_type = MergeType::REGULAR;
|
||||
String column_name;
|
||||
String index_name;
|
||||
@ -111,10 +112,10 @@ struct ReplicatedMergeTreeLogEntryData
|
||||
/// Version of metadata which will be set after this alter
|
||||
/// Also present in MUTATE_PART command, to track mutations
|
||||
/// required for complete alter execution.
|
||||
int alter_version; /// May be equal to -1, if it's normal mutation, not metadata update.
|
||||
int alter_version = -1; /// May be equal to -1, if it's normal mutation, not metadata update.
|
||||
|
||||
/// only ALTER METADATA command
|
||||
bool have_mutation; /// If this alter requires additional mutation step, for data update
|
||||
bool have_mutation = false; /// If this alter requires additional mutation step, for data update
|
||||
|
||||
String columns_str; /// New columns data corresponding to alter_version
|
||||
String metadata_str; /// New metadata corresponding to alter_version
|
||||
|
@ -0,0 +1,348 @@
|
||||
#include <Storages/MergeTree/ReplicatedMergeTreeLogEntry.h>
|
||||
|
||||
#include <IO/ReadBufferFromString.h>
|
||||
|
||||
#include <Core/iostream_debug_helpers.h>
|
||||
|
||||
#include <type_traits>
|
||||
#include <regex>
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
std::ostream & operator<<(std::ostream & ostr, const MergeTreeDataPartType & type)
|
||||
{
|
||||
return ostr << type.toString();
|
||||
}
|
||||
|
||||
std::ostream & operator<<(std::ostream & ostr, const UInt128 & v)
|
||||
{
|
||||
return ostr << v.toHexString();
|
||||
}
|
||||
|
||||
template <typename T, typename Tag>
|
||||
std::ostream & operator<<(std::ostream & ostr, const StrongTypedef<T, Tag> & v)
|
||||
{
|
||||
return ostr << v.toUnderType();
|
||||
}
|
||||
|
||||
std::ostream & operator<<(std::ostream & ostr, const MergeType & v)
|
||||
{
|
||||
return ostr << toString(v);
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
namespace std
|
||||
{
|
||||
|
||||
std::ostream & operator<<(std::ostream & ostr, const std::exception_ptr & exception)
|
||||
{
|
||||
try
|
||||
{
|
||||
if (exception)
|
||||
{
|
||||
std::rethrow_exception(exception);
|
||||
}
|
||||
return ostr << "<NULL EXCEPTION>";
|
||||
}
|
||||
catch (const std::exception& e)
|
||||
{
|
||||
return ostr << e.what();
|
||||
}
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
inline std::ostream& operator<<(std::ostream & ostr, const std::vector<T> & v)
|
||||
{
|
||||
ostr << "[";
|
||||
for (size_t i = 0; i < v.size(); ++i)
|
||||
{
|
||||
ostr << i;
|
||||
if (i != v.size() - 1)
|
||||
ostr << ", ";
|
||||
}
|
||||
return ostr << "] (" << v.size() << ") items";
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
namespace
|
||||
{
|
||||
using namespace DB;
|
||||
|
||||
template <typename T>
|
||||
void compareAttributes(::testing::AssertionResult & result, const char * name, const T & expected_value, const T & actual_value);
|
||||
|
||||
#define CMP_ATTRIBUTE(attribute) compareAttributes(result, #attribute, expected.attribute, actual.attribute)
|
||||
|
||||
::testing::AssertionResult compare(
|
||||
const ReplicatedMergeTreeLogEntryData::ReplaceRangeEntry & expected,
|
||||
const ReplicatedMergeTreeLogEntryData::ReplaceRangeEntry & actual)
|
||||
{
|
||||
auto result = ::testing::AssertionSuccess();
|
||||
|
||||
CMP_ATTRIBUTE(drop_range_part_name);
|
||||
CMP_ATTRIBUTE(from_database);
|
||||
CMP_ATTRIBUTE(from_table);
|
||||
CMP_ATTRIBUTE(src_part_names);
|
||||
CMP_ATTRIBUTE(new_part_names);
|
||||
CMP_ATTRIBUTE(part_names_checksums);
|
||||
CMP_ATTRIBUTE(columns_version);
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
bool compare(const T & expected, const T & actual)
|
||||
{
|
||||
return expected == actual;
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
::testing::AssertionResult compare(const std::shared_ptr<T> & expected, const std::shared_ptr<T> & actual)
|
||||
{
|
||||
if (!!expected != !!actual)
|
||||
return ::testing::AssertionFailure()
|
||||
<< "expected : " << static_cast<const void*>(expected.get())
|
||||
<< "\nactual : " << static_cast<const void*>(actual.get());
|
||||
|
||||
if (expected && actual)
|
||||
return compare(*expected, *actual);
|
||||
|
||||
return ::testing::AssertionSuccess();
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
void compareAttributes(::testing::AssertionResult & result, const char * name, const T & expected_value, const T & actual_value)
|
||||
{
|
||||
const auto cmp_result = compare(expected_value, actual_value);
|
||||
if (cmp_result == false)
|
||||
{
|
||||
if (result)
|
||||
result = ::testing::AssertionFailure();
|
||||
|
||||
result << "\nMismatching attribute: \"" << name << "\"";
|
||||
if constexpr (std::is_same_v<std::decay_t<decltype(cmp_result)>, ::testing::AssertionResult>)
|
||||
result << "\n" << cmp_result.message();
|
||||
else
|
||||
result << "\n\texpected: " << expected_value
|
||||
<< "\n\tactual : " << actual_value;
|
||||
}
|
||||
};
|
||||
|
||||
::testing::AssertionResult compare(const ReplicatedMergeTreeLogEntryData & expected, const ReplicatedMergeTreeLogEntryData & actual)
|
||||
{
|
||||
::testing::AssertionResult result = ::testing::AssertionSuccess();
|
||||
|
||||
CMP_ATTRIBUTE(znode_name);
|
||||
CMP_ATTRIBUTE(type);
|
||||
CMP_ATTRIBUTE(source_replica);
|
||||
CMP_ATTRIBUTE(new_part_name);
|
||||
CMP_ATTRIBUTE(new_part_type);
|
||||
CMP_ATTRIBUTE(block_id);
|
||||
CMP_ATTRIBUTE(actual_new_part_name);
|
||||
CMP_ATTRIBUTE(new_part_uuid);
|
||||
CMP_ATTRIBUTE(source_parts);
|
||||
CMP_ATTRIBUTE(deduplicate);
|
||||
CMP_ATTRIBUTE(deduplicate_by_columns);
|
||||
CMP_ATTRIBUTE(merge_type);
|
||||
CMP_ATTRIBUTE(column_name);
|
||||
CMP_ATTRIBUTE(index_name);
|
||||
CMP_ATTRIBUTE(detach);
|
||||
CMP_ATTRIBUTE(replace_range_entry);
|
||||
CMP_ATTRIBUTE(alter_version);
|
||||
CMP_ATTRIBUTE(have_mutation);
|
||||
CMP_ATTRIBUTE(columns_str);
|
||||
CMP_ATTRIBUTE(metadata_str);
|
||||
CMP_ATTRIBUTE(currently_executing);
|
||||
CMP_ATTRIBUTE(removed_by_other_entry);
|
||||
CMP_ATTRIBUTE(num_tries);
|
||||
CMP_ATTRIBUTE(exception);
|
||||
CMP_ATTRIBUTE(last_attempt_time);
|
||||
CMP_ATTRIBUTE(num_postponed);
|
||||
CMP_ATTRIBUTE(postpone_reason);
|
||||
CMP_ATTRIBUTE(last_postpone_time);
|
||||
CMP_ATTRIBUTE(create_time);
|
||||
CMP_ATTRIBUTE(quorum);
|
||||
|
||||
return result;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
class ReplicatedMergeTreeLogEntryDataTest : public ::testing::TestWithParam<std::tuple<ReplicatedMergeTreeLogEntryData, const char* /* serialized RE*/>>
|
||||
{};
|
||||
|
||||
TEST_P(ReplicatedMergeTreeLogEntryDataTest, transcode)
|
||||
{
|
||||
const auto & [expected, match_regex] = GetParam();
|
||||
const auto str = expected.toString();
|
||||
|
||||
if (match_regex)
|
||||
{
|
||||
try
|
||||
{
|
||||
// egrep since "." matches newline and we can also use "\n" explicitly
|
||||
std::regex re(match_regex, std::regex::egrep);
|
||||
EXPECT_TRUE(std::regex_match(str, re))
|
||||
<< "Failed to match serialized ReplicatedMergeTreeLogEntryData: {\n"
|
||||
<< str << "} \nwith regex: \"" << match_regex << "\"\n";
|
||||
}
|
||||
catch (const std::regex_error &e)
|
||||
{
|
||||
FAIL() << e.what()
|
||||
<< " on regex: " << match_regex
|
||||
<< " (" << strlen(match_regex) << " bytes)" << std::endl;
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
throw;
|
||||
}
|
||||
}
|
||||
|
||||
ReplicatedMergeTreeLogEntryData actual;
|
||||
{
|
||||
DB::ReadBufferFromString buffer(str);
|
||||
EXPECT_NO_THROW(actual.readText(buffer)) << "While reading:\n" << str;
|
||||
}
|
||||
|
||||
ASSERT_TRUE(compare(expected, actual)) << "Via text:\n" << str;
|
||||
}
|
||||
|
||||
// Enabling this warning would ruin test brievity without adding anything else in return,
|
||||
// since most of the fields have default constructors or be will be zero-initialized as by standard,
|
||||
// so values are predicatable and stable accross runs.
|
||||
#pragma GCC diagnostic push
|
||||
#pragma GCC diagnostic ignored "-Wmissing-field-initializers"
|
||||
|
||||
INSTANTIATE_TEST_SUITE_P(Merge, ReplicatedMergeTreeLogEntryDataTest,
|
||||
::testing::ValuesIn(std::initializer_list<std::tuple<ReplicatedMergeTreeLogEntryData, const char*>>{
|
||||
{
|
||||
{
|
||||
// Basic: minimal set of attributes.
|
||||
.type = ReplicatedMergeTreeLogEntryData::MERGE_PARTS,
|
||||
.new_part_type = MergeTreeDataPartType::WIDE,
|
||||
.create_time = 123, // 0 means 'now' which could cause flaky tests.
|
||||
},
|
||||
R"re(^format version: 4.+merge.+into.+deduplicate: 0.+$)re"
|
||||
},
|
||||
{
|
||||
{
|
||||
.type = ReplicatedMergeTreeLogEntryData::MERGE_PARTS,
|
||||
.new_part_type = MergeTreeDataPartType::WIDE,
|
||||
|
||||
// Format version 4
|
||||
.deduplicate = true,
|
||||
|
||||
.create_time = 123,
|
||||
},
|
||||
R"re(^format version: 4.+merge.+into.+deduplicate: 1.+$)re"
|
||||
},
|
||||
{
|
||||
{
|
||||
.type = ReplicatedMergeTreeLogEntryData::MERGE_PARTS,
|
||||
.new_part_type = MergeTreeDataPartType::WIDE,
|
||||
|
||||
// Format version 5
|
||||
.new_part_uuid = UUID(UInt128(123456789, 10111213141516)),
|
||||
|
||||
.create_time = 123,
|
||||
},
|
||||
R"re(^format version: 5.+merge.+into.+deduplicate: 0.+into_uuid: 00000000-075b-cd15-0000-093233447e0c.+$)re"
|
||||
},
|
||||
{
|
||||
{
|
||||
.type = ReplicatedMergeTreeLogEntryData::MERGE_PARTS,
|
||||
.new_part_type = MergeTreeDataPartType::WIDE,
|
||||
|
||||
// Format version 6
|
||||
.deduplicate = true,
|
||||
.deduplicate_by_columns = {"foo", "bar", "qux"},
|
||||
|
||||
.create_time = 123,
|
||||
},
|
||||
R"re(^format version: 6.+merge.+into.+deduplicate: 1.+deduplicate_by_columns: 'foo','bar','qux'.*$)re"
|
||||
},
|
||||
{
|
||||
{
|
||||
.type = ReplicatedMergeTreeLogEntryData::MERGE_PARTS,
|
||||
.new_part_type = MergeTreeDataPartType::WIDE,
|
||||
|
||||
// Mixing features
|
||||
.new_part_uuid = UUID(UInt128(123456789, 10111213141516)),
|
||||
.deduplicate = true,
|
||||
.deduplicate_by_columns = {"foo", "bar", "qux"},
|
||||
|
||||
.create_time = 123,
|
||||
},
|
||||
R"re(^format version: 6.+merge.+into.+deduplicate: 1.+into_uuid: 00000000-075b-cd15-0000-093233447e0c.+deduplicate_by_columns: 'foo','bar','qux'.*$)re"
|
||||
},
|
||||
{
|
||||
// Validate that exotic column names are serialized/deserialized properly
|
||||
{
|
||||
.type = ReplicatedMergeTreeLogEntryData::MERGE_PARTS,
|
||||
.new_part_type = MergeTreeDataPartType::WIDE,
|
||||
|
||||
// Mixing features
|
||||
.new_part_uuid = UUID(UInt128(123456789, 10111213141516)),
|
||||
.deduplicate = true,
|
||||
.deduplicate_by_columns = {"name with space", "\"column\"", "'column'", "колонка", "\u30ab\u30e9\u30e0", "\x01\x03 column \x10\x11\x12"},
|
||||
|
||||
.create_time = 123,
|
||||
},
|
||||
R"re(^format version: 6.+merge.+deduplicate_by_columns: 'name with space','"column"','\\'column\\'','колонка')re"
|
||||
",'\u30ab\u30e9\u30e0','\x01\x03 column \x10\x11\x12'.*$"
|
||||
},
|
||||
}));
|
||||
|
||||
#pragma GCC diagnostic pop
|
||||
|
||||
// This is just an example of how to set all fields. Can't be used as is since depending on type,
|
||||
// only some fields are serialized/deserialized, and even if everything works perfectly,
|
||||
// some fileds in deserialized object would be unset (hence differ from expected).
|
||||
// INSTANTIATE_TEST_SUITE_P(Full, ReplicatedMergeTreeLogEntryDataTest,
|
||||
// ::testing::ValuesIn(std::initializer_list<ReplicatedMergeTreeLogEntryData>{
|
||||
// {
|
||||
// .znode_name = "znode name",
|
||||
// .type = ReplicatedMergeTreeLogEntryData::MERGE_PARTS,
|
||||
// .source_replica = "source replica",
|
||||
// .new_part_name = "new part name",
|
||||
// .new_part_type = MergeTreeDataPartType::WIDE,
|
||||
// .block_id = "block id",
|
||||
// .actual_new_part_name = "new part name",
|
||||
// .new_part_uuid = UUID(UInt128(123456789, 10111213141516)),
|
||||
// .source_parts = {"part1", "part2"},
|
||||
// .deduplicate = true,
|
||||
// .deduplicate_by_columns = {"col1", "col2"},
|
||||
// .merge_type = MergeType::REGULAR,
|
||||
// .column_name = "column name",
|
||||
// .index_name = "index name",
|
||||
// .detach = false,
|
||||
// .replace_range_entry = std::make_shared<ReplicatedMergeTreeLogEntryData::ReplaceRangeEntry>(
|
||||
// ReplicatedMergeTreeLogEntryData::ReplaceRangeEntry
|
||||
// {
|
||||
// .drop_range_part_name = "drop range part name",
|
||||
// .from_database = "from database",
|
||||
// .src_part_names = {"src part name1", "src part name2"},
|
||||
// .new_part_names = {"new part name1", "new part name2"},
|
||||
// .columns_version = 123456,
|
||||
// }),
|
||||
// .alter_version = 56789,
|
||||
// .have_mutation = false,
|
||||
// .columns_str = "columns str",
|
||||
// .metadata_str = "metadata str",
|
||||
// // Those attributes are not serialized to string, hence it makes no sense to set.
|
||||
// // .currently_executing
|
||||
// // .removed_by_other_entry
|
||||
// // .num_tries
|
||||
// // .exception
|
||||
// // .last_attempt_time
|
||||
// // .num_postponed
|
||||
// // .postpone_reason
|
||||
// // .last_postpone_time,
|
||||
// .create_time = static_cast<time_t>(123456789),
|
||||
// .quorum = 321,
|
||||
// },
|
||||
// }));
|
@ -583,7 +583,7 @@ void StorageBuffer::shutdown()
|
||||
|
||||
try
|
||||
{
|
||||
optimize(nullptr /*query*/, getInMemoryMetadataPtr(), {} /*partition*/, false /*final*/, false /*deduplicate*/, global_context);
|
||||
optimize(nullptr /*query*/, getInMemoryMetadataPtr(), {} /*partition*/, false /*final*/, false /*deduplicate*/, {}, global_context);
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
@ -608,6 +608,7 @@ bool StorageBuffer::optimize(
|
||||
const ASTPtr & partition,
|
||||
bool final,
|
||||
bool deduplicate,
|
||||
const Names & /* deduplicate_by_columns */,
|
||||
const Context & /*context*/)
|
||||
{
|
||||
if (partition)
|
||||
@ -906,7 +907,7 @@ void StorageBuffer::alter(const AlterCommands & params, const Context & context,
|
||||
/// Flush all buffers to storages, so that no non-empty blocks of the old
|
||||
/// structure remain. Structure of empty blocks will be updated during first
|
||||
/// insert.
|
||||
optimize({} /*query*/, metadata_snapshot, {} /*partition_id*/, false /*final*/, false /*deduplicate*/, context);
|
||||
optimize({} /*query*/, metadata_snapshot, {} /*partition_id*/, false /*final*/, false /*deduplicate*/, {}, context);
|
||||
|
||||
StorageInMemoryMetadata new_metadata = *metadata_snapshot;
|
||||
params.apply(new_metadata, context);
|
||||
|
@ -87,6 +87,7 @@ public:
|
||||
const ASTPtr & partition,
|
||||
bool final,
|
||||
bool deduplicate,
|
||||
const Names & deduplicate_by_columns,
|
||||
const Context & context) override;
|
||||
|
||||
bool supportsSampling() const override { return true; }
|
||||
|
@ -233,12 +233,13 @@ bool StorageMaterializedView::optimize(
|
||||
const ASTPtr & partition,
|
||||
bool final,
|
||||
bool deduplicate,
|
||||
const Names & deduplicate_by_columns,
|
||||
const Context & context)
|
||||
{
|
||||
checkStatementCanBeForwarded();
|
||||
auto storage_ptr = getTargetTable();
|
||||
auto metadata_snapshot = storage_ptr->getInMemoryMetadataPtr();
|
||||
return getTargetTable()->optimize(query, metadata_snapshot, partition, final, deduplicate, context);
|
||||
return getTargetTable()->optimize(query, metadata_snapshot, partition, final, deduplicate, deduplicate_by_columns, context);
|
||||
}
|
||||
|
||||
void StorageMaterializedView::alter(
|
||||
|
@ -46,6 +46,7 @@ public:
|
||||
const ASTPtr & partition,
|
||||
bool final,
|
||||
bool deduplicate,
|
||||
const Names & deduplicate_by_columns,
|
||||
const Context & context) override;
|
||||
|
||||
void alter(const AlterCommands & params, const Context & context, TableLockHolder & table_lock_holder) override;
|
||||
|
@ -741,6 +741,7 @@ bool StorageMergeTree::merge(
|
||||
const String & partition_id,
|
||||
bool final,
|
||||
bool deduplicate,
|
||||
const Names & deduplicate_by_columns,
|
||||
String * out_disable_reason,
|
||||
bool optimize_skip_merged_partitions)
|
||||
{
|
||||
@ -758,10 +759,15 @@ bool StorageMergeTree::merge(
|
||||
if (!merge_mutate_entry)
|
||||
return false;
|
||||
|
||||
return mergeSelectedParts(metadata_snapshot, deduplicate, *merge_mutate_entry, table_lock_holder);
|
||||
return mergeSelectedParts(metadata_snapshot, deduplicate, deduplicate_by_columns, *merge_mutate_entry, table_lock_holder);
|
||||
}
|
||||
|
||||
bool StorageMergeTree::mergeSelectedParts(const StorageMetadataPtr & metadata_snapshot, bool deduplicate, MergeMutateSelectedEntry & merge_mutate_entry, TableLockHolder & table_lock_holder)
|
||||
bool StorageMergeTree::mergeSelectedParts(
|
||||
const StorageMetadataPtr & metadata_snapshot,
|
||||
bool deduplicate,
|
||||
const Names & deduplicate_by_columns,
|
||||
MergeMutateSelectedEntry & merge_mutate_entry,
|
||||
TableLockHolder & table_lock_holder)
|
||||
{
|
||||
auto & future_part = merge_mutate_entry.future_part;
|
||||
Stopwatch stopwatch;
|
||||
@ -786,7 +792,7 @@ bool StorageMergeTree::mergeSelectedParts(const StorageMetadataPtr & metadata_sn
|
||||
{
|
||||
new_part = merger_mutator.mergePartsToTemporaryPart(
|
||||
future_part, metadata_snapshot, *(merge_list_entry), table_lock_holder, time(nullptr),
|
||||
global_context, merge_mutate_entry.tagger->reserved_space, deduplicate);
|
||||
global_context, merge_mutate_entry.tagger->reserved_space, deduplicate, deduplicate_by_columns);
|
||||
|
||||
merger_mutator.renameMergedTemporaryPart(new_part, future_part.parts, nullptr);
|
||||
write_part_log({});
|
||||
@ -953,7 +959,7 @@ std::optional<JobAndPool> StorageMergeTree::getDataProcessingJob()
|
||||
return JobAndPool{[this, metadata_snapshot, merge_entry, mutate_entry, share_lock] () mutable
|
||||
{
|
||||
if (merge_entry)
|
||||
mergeSelectedParts(metadata_snapshot, false, *merge_entry, share_lock);
|
||||
mergeSelectedParts(metadata_snapshot, false, {}, *merge_entry, share_lock);
|
||||
else if (mutate_entry)
|
||||
mutateSelectedPart(metadata_snapshot, *mutate_entry, share_lock);
|
||||
}, PoolType::MERGE_MUTATE};
|
||||
@ -1036,8 +1042,17 @@ bool StorageMergeTree::optimize(
|
||||
const ASTPtr & partition,
|
||||
bool final,
|
||||
bool deduplicate,
|
||||
const Names & deduplicate_by_columns,
|
||||
const Context & context)
|
||||
{
|
||||
if (deduplicate)
|
||||
{
|
||||
if (deduplicate_by_columns.empty())
|
||||
LOG_DEBUG(log, "DEDUPLICATE BY all columns");
|
||||
else
|
||||
LOG_DEBUG(log, "DEDUPLICATE BY ('{}')", fmt::join(deduplicate_by_columns, "', '"));
|
||||
}
|
||||
|
||||
String disable_reason;
|
||||
if (!partition && final)
|
||||
{
|
||||
@ -1049,7 +1064,7 @@ bool StorageMergeTree::optimize(
|
||||
|
||||
for (const String & partition_id : partition_ids)
|
||||
{
|
||||
if (!merge(true, partition_id, true, deduplicate, &disable_reason, context.getSettingsRef().optimize_skip_merged_partitions))
|
||||
if (!merge(true, partition_id, true, deduplicate, deduplicate_by_columns, &disable_reason, context.getSettingsRef().optimize_skip_merged_partitions))
|
||||
{
|
||||
constexpr const char * message = "Cannot OPTIMIZE table: {}";
|
||||
if (disable_reason.empty())
|
||||
@ -1068,7 +1083,7 @@ bool StorageMergeTree::optimize(
|
||||
if (partition)
|
||||
partition_id = getPartitionIDFromQuery(partition, context);
|
||||
|
||||
if (!merge(true, partition_id, final, deduplicate, &disable_reason, context.getSettingsRef().optimize_skip_merged_partitions))
|
||||
if (!merge(true, partition_id, final, deduplicate, deduplicate_by_columns, &disable_reason, context.getSettingsRef().optimize_skip_merged_partitions))
|
||||
{
|
||||
constexpr const char * message = "Cannot OPTIMIZE table: {}";
|
||||
if (disable_reason.empty())
|
||||
|
@ -70,6 +70,7 @@ public:
|
||||
const ASTPtr & partition,
|
||||
bool final,
|
||||
bool deduplicate,
|
||||
const Names & deduplicate_by_columns,
|
||||
const Context & context) override;
|
||||
|
||||
void mutate(const MutationCommands & commands, const Context & context) override;
|
||||
@ -132,7 +133,7 @@ private:
|
||||
* If aggressive - when selects parts don't takes into account their ratio size and novelty (used for OPTIMIZE query).
|
||||
* Returns true if merge is finished successfully.
|
||||
*/
|
||||
bool merge(bool aggressive, const String & partition_id, bool final, bool deduplicate, String * out_disable_reason = nullptr, bool optimize_skip_merged_partitions = false);
|
||||
bool merge(bool aggressive, const String & partition_id, bool final, bool deduplicate, const Names & deduplicate_by_columns, String * out_disable_reason = nullptr, bool optimize_skip_merged_partitions = false);
|
||||
|
||||
ActionLock stopMergesAndWait();
|
||||
|
||||
@ -183,7 +184,8 @@ private:
|
||||
TableLockHolder & table_lock_holder,
|
||||
bool optimize_skip_merged_partitions = false,
|
||||
SelectPartsDecision * select_decision_out = nullptr);
|
||||
bool mergeSelectedParts(const StorageMetadataPtr & metadata_snapshot, bool deduplicate, MergeMutateSelectedEntry & entry, TableLockHolder & table_lock_holder);
|
||||
|
||||
bool mergeSelectedParts(const StorageMetadataPtr & metadata_snapshot, bool deduplicate, const Names & deduplicate_by_columns, MergeMutateSelectedEntry & entry, TableLockHolder & table_lock_holder);
|
||||
|
||||
std::shared_ptr<MergeMutateSelectedEntry> selectPartsToMutate(const StorageMetadataPtr & metadata_snapshot, String * disable_reason, TableLockHolder & table_lock_holder);
|
||||
bool mutateSelectedPart(const StorageMetadataPtr & metadata_snapshot, MergeMutateSelectedEntry & entry, TableLockHolder & table_lock_holder);
|
||||
|
@ -122,9 +122,10 @@ public:
|
||||
const ASTPtr & partition,
|
||||
bool final,
|
||||
bool deduplicate,
|
||||
const Names & deduplicate_by_columns,
|
||||
const Context & context) override
|
||||
{
|
||||
return getNested()->optimize(query, metadata_snapshot, partition, final, deduplicate, context);
|
||||
return getNested()->optimize(query, metadata_snapshot, partition, final, deduplicate, deduplicate_by_columns, context);
|
||||
}
|
||||
|
||||
void mutate(const MutationCommands & commands, const Context & context) override { getNested()->mutate(commands, context); }
|
||||
|
@ -1508,7 +1508,7 @@ bool StorageReplicatedMergeTree::tryExecuteMerge(const LogEntry & entry)
|
||||
{
|
||||
part = merger_mutator.mergePartsToTemporaryPart(
|
||||
future_merged_part, metadata_snapshot, *merge_entry,
|
||||
table_lock, entry.create_time, global_context, reserved_space, entry.deduplicate);
|
||||
table_lock, entry.create_time, global_context, reserved_space, entry.deduplicate, entry.deduplicate_by_columns);
|
||||
|
||||
merger_mutator.renameMergedTemporaryPart(part, parts, &transaction);
|
||||
|
||||
@ -2712,6 +2712,7 @@ void StorageReplicatedMergeTree::mergeSelectingTask()
|
||||
|
||||
const auto storage_settings_ptr = getSettings();
|
||||
const bool deduplicate = false; /// TODO: read deduplicate option from table config
|
||||
const Names deduplicate_by_columns = {};
|
||||
CreateMergeEntryResult create_result = CreateMergeEntryResult::Other;
|
||||
|
||||
try
|
||||
@ -2762,6 +2763,7 @@ void StorageReplicatedMergeTree::mergeSelectingTask()
|
||||
future_merged_part.uuid,
|
||||
future_merged_part.type,
|
||||
deduplicate,
|
||||
deduplicate_by_columns,
|
||||
nullptr,
|
||||
merge_pred.getVersion(),
|
||||
future_merged_part.merge_type);
|
||||
@ -2851,6 +2853,7 @@ StorageReplicatedMergeTree::CreateMergeEntryResult StorageReplicatedMergeTree::c
|
||||
const UUID & merged_part_uuid,
|
||||
const MergeTreeDataPartType & merged_part_type,
|
||||
bool deduplicate,
|
||||
const Names & deduplicate_by_columns,
|
||||
ReplicatedMergeTreeLogEntryData * out_log_entry,
|
||||
int32_t log_version,
|
||||
MergeType merge_type)
|
||||
@ -2888,6 +2891,7 @@ StorageReplicatedMergeTree::CreateMergeEntryResult StorageReplicatedMergeTree::c
|
||||
entry.new_part_type = merged_part_type;
|
||||
entry.merge_type = merge_type;
|
||||
entry.deduplicate = deduplicate;
|
||||
entry.deduplicate_by_columns = deduplicate_by_columns;
|
||||
entry.merge_type = merge_type;
|
||||
entry.create_time = time(nullptr);
|
||||
|
||||
@ -3862,6 +3866,7 @@ BlockOutputStreamPtr StorageReplicatedMergeTree::write(const ASTPtr & /*query*/,
|
||||
const Settings & query_settings = context.getSettingsRef();
|
||||
bool deduplicate = storage_settings_ptr->replicated_deduplication_window != 0 && query_settings.insert_deduplicate;
|
||||
|
||||
// TODO: should we also somehow pass list of columns to deduplicate on to the ReplicatedMergeTreeBlockOutputStream ?
|
||||
return std::make_shared<ReplicatedMergeTreeBlockOutputStream>(
|
||||
*this, metadata_snapshot, query_settings.insert_quorum,
|
||||
query_settings.insert_quorum_timeout.totalMilliseconds(),
|
||||
@ -3878,6 +3883,7 @@ bool StorageReplicatedMergeTree::optimize(
|
||||
const ASTPtr & partition,
|
||||
bool final,
|
||||
bool deduplicate,
|
||||
const Names & deduplicate_by_columns,
|
||||
const Context & query_context)
|
||||
{
|
||||
assertNotReadonly();
|
||||
@ -3935,7 +3941,8 @@ bool StorageReplicatedMergeTree::optimize(
|
||||
ReplicatedMergeTreeLogEntryData merge_entry;
|
||||
CreateMergeEntryResult create_result = createLogEntryToMergeParts(
|
||||
zookeeper, future_merged_part.parts,
|
||||
future_merged_part.name, future_merged_part.uuid, future_merged_part.type, deduplicate,
|
||||
future_merged_part.name, future_merged_part.uuid, future_merged_part.type,
|
||||
deduplicate, deduplicate_by_columns,
|
||||
&merge_entry, can_merge.getVersion(), future_merged_part.merge_type);
|
||||
|
||||
if (create_result == CreateMergeEntryResult::MissingPart)
|
||||
@ -3996,7 +4003,8 @@ bool StorageReplicatedMergeTree::optimize(
|
||||
ReplicatedMergeTreeLogEntryData merge_entry;
|
||||
CreateMergeEntryResult create_result = createLogEntryToMergeParts(
|
||||
zookeeper, future_merged_part.parts,
|
||||
future_merged_part.name, future_merged_part.uuid, future_merged_part.type, deduplicate,
|
||||
future_merged_part.name, future_merged_part.uuid, future_merged_part.type,
|
||||
deduplicate, deduplicate_by_columns,
|
||||
&merge_entry, can_merge.getVersion(), future_merged_part.merge_type);
|
||||
|
||||
if (create_result == CreateMergeEntryResult::MissingPart)
|
||||
|
@ -120,6 +120,7 @@ public:
|
||||
const ASTPtr & partition,
|
||||
bool final,
|
||||
bool deduplicate,
|
||||
const Names & deduplicate_by_columns,
|
||||
const Context & query_context) override;
|
||||
|
||||
void alter(const AlterCommands & commands, const Context & query_context, TableLockHolder & table_lock_holder) override;
|
||||
@ -470,6 +471,7 @@ private:
|
||||
const UUID & merged_part_uuid,
|
||||
const MergeTreeDataPartType & merged_part_type,
|
||||
bool deduplicate,
|
||||
const Names & deduplicate_by_columns,
|
||||
ReplicatedMergeTreeLogEntryData * out_log_entry,
|
||||
int32_t log_version,
|
||||
MergeType merge_type);
|
||||
|
@ -37,7 +37,7 @@
|
||||
<substitution>
|
||||
<name>table</name>
|
||||
<values>
|
||||
<value>numbers(200000)</value>
|
||||
<value>numbers(2000000)</value>
|
||||
</values>
|
||||
</substitution>
|
||||
<substitution>
|
||||
|
@ -9,7 +9,7 @@ ${CLICKHOUSE_CLIENT} -q "SELECT 'CREATE TABLE test_' || hex(randomPrintableASCII
|
||||
function stress()
|
||||
{
|
||||
while true; do
|
||||
"${CURDIR}"/01526_client_start_and_exit.expect | grep -v -P 'ClickHouse client|Connecting|Connected|:\) Bye\.|^\s*$|spawn bash|^0\s*$'
|
||||
"${CURDIR}"/01526_client_start_and_exit.expect | grep -v -P 'ClickHouse client|Connecting|Connected|:\) Bye\.|new year|^\s*$|spawn bash|^0\s*$'
|
||||
done
|
||||
}
|
||||
|
||||
|
@ -0,0 +1,39 @@
|
||||
TOTAL rows 4
|
||||
OLD DEDUPLICATE
|
||||
0 0 0 1
|
||||
1 1 2 1
|
||||
1 1 3 1
|
||||
DEDUPLICATE BY *
|
||||
0 0 0 1
|
||||
1 1 2 1
|
||||
1 1 3 1
|
||||
DEDUPLICATE BY * EXCEPT mat
|
||||
0 0 0 1
|
||||
1 1 2 1
|
||||
1 1 3 1
|
||||
DEDUPLICATE BY pk,sk,val,mat,partition_key
|
||||
0 0 0 1
|
||||
1 1 2 1
|
||||
1 1 3 1
|
||||
Can not remove full duplicates
|
||||
OLD DEDUPLICATE
|
||||
4
|
||||
DEDUPLICATE BY pk,sk,val,mat
|
||||
4
|
||||
Remove partial duplicates
|
||||
DEDUPLICATE BY *
|
||||
3
|
||||
DEDUPLICATE BY * EXCEPT mat
|
||||
0 0 0 1
|
||||
1 1 2 1
|
||||
1 1 3 1
|
||||
DEDUPLICATE BY COLUMNS("*") EXCEPT mat
|
||||
0 0 0 1
|
||||
1 1 2 1
|
||||
1 1 3 1
|
||||
DEDUPLICATE BY pk,sk
|
||||
0 0 0 1
|
||||
1 1 2 1
|
||||
DEDUPLICATE BY COLUMNS(".*k")
|
||||
0 0 0 1
|
||||
1 1 2 1
|
122
tests/queries/0_stateless/01581_deduplicate_by_columns_local.sql
Normal file
122
tests/queries/0_stateless/01581_deduplicate_by_columns_local.sql
Normal file
@ -0,0 +1,122 @@
|
||||
--- See also tests/queries/0_stateless/01581_deduplicate_by_columns_replicated.sql
|
||||
|
||||
--- local case
|
||||
|
||||
-- Just in case if previous tests run left some stuff behind.
|
||||
DROP TABLE IF EXISTS source_data;
|
||||
|
||||
CREATE TABLE source_data (
|
||||
pk Int32, sk Int32, val UInt32, partition_key UInt32 DEFAULT 1,
|
||||
PRIMARY KEY (pk)
|
||||
) ENGINE=MergeTree
|
||||
ORDER BY (pk, sk);
|
||||
|
||||
INSERT INTO source_data (pk, sk, val) VALUES (0, 0, 0), (0, 0, 0), (1, 1, 2), (1, 1, 3);
|
||||
|
||||
SELECT 'TOTAL rows', count() FROM source_data;
|
||||
|
||||
DROP TABLE IF EXISTS full_duplicates;
|
||||
-- table with duplicates on MATERIALIZED columns
|
||||
CREATE TABLE full_duplicates (
|
||||
pk Int32, sk Int32, val UInt32, partition_key UInt32, mat UInt32 MATERIALIZED 12345, alias UInt32 ALIAS 2,
|
||||
PRIMARY KEY (pk)
|
||||
) ENGINE=MergeTree
|
||||
PARTITION BY (partition_key + 1) -- ensure that column in expression is properly handled when deduplicating. See [1] below.
|
||||
ORDER BY (pk, toString(sk * 10)); -- silly order key to ensure that key column is checked even when it is a part of expression. See [1] below.
|
||||
|
||||
-- ERROR cases
|
||||
OPTIMIZE TABLE full_duplicates DEDUPLICATE BY pk, sk, val, mat, alias; -- { serverError 16 } -- alias column is present
|
||||
OPTIMIZE TABLE full_duplicates DEDUPLICATE BY sk, val; -- { serverError 8 } -- primary key column is missing
|
||||
OPTIMIZE TABLE full_duplicates DEDUPLICATE BY * EXCEPT(pk, sk, val, mat, alias, partition_key); -- { serverError 51 } -- list is empty
|
||||
OPTIMIZE TABLE full_duplicates DEDUPLICATE BY * EXCEPT(pk); -- { serverError 8 } -- primary key column is missing [1]
|
||||
OPTIMIZE TABLE full_duplicates DEDUPLICATE BY * EXCEPT(sk); -- { serverError 8 } -- sorting key column is missing [1]
|
||||
OPTIMIZE TABLE full_duplicates DEDUPLICATE BY * EXCEPT(partition_key); -- { serverError 8 } -- partitioning column is missing [1]
|
||||
|
||||
OPTIMIZE TABLE full_duplicates DEDUPLICATE BY; -- { clientError 62 } -- empty list is a syntax error
|
||||
OPTIMIZE TABLE partial_duplicates DEDUPLICATE BY pk,sk,val,mat EXCEPT mat; -- { clientError 62 } -- invalid syntax
|
||||
OPTIMIZE TABLE partial_duplicates DEDUPLICATE BY pk APPLY(pk + 1); -- { clientError 62 } -- APPLY column transformer is not supported
|
||||
OPTIMIZE TABLE partial_duplicates DEDUPLICATE BY pk REPLACE(pk + 1); -- { clientError 62 } -- REPLACE column transformer is not supported
|
||||
|
||||
-- Valid cases
|
||||
-- NOTE: here and below we need FINAL to force deduplication in such a small set of data in only 1 part.
|
||||
|
||||
SELECT 'OLD DEDUPLICATE';
|
||||
INSERT INTO full_duplicates SELECT * FROM source_data;
|
||||
OPTIMIZE TABLE full_duplicates FINAL DEDUPLICATE;
|
||||
SELECT * FROM full_duplicates;
|
||||
TRUNCATE full_duplicates;
|
||||
|
||||
SELECT 'DEDUPLICATE BY *';
|
||||
INSERT INTO full_duplicates SELECT * FROM source_data;
|
||||
OPTIMIZE TABLE full_duplicates FINAL DEDUPLICATE BY *;
|
||||
SELECT * FROM full_duplicates;
|
||||
TRUNCATE full_duplicates;
|
||||
|
||||
SELECT 'DEDUPLICATE BY * EXCEPT mat';
|
||||
INSERT INTO full_duplicates SELECT * FROM source_data;
|
||||
OPTIMIZE TABLE full_duplicates FINAL DEDUPLICATE BY * EXCEPT mat;
|
||||
SELECT * FROM full_duplicates;
|
||||
TRUNCATE full_duplicates;
|
||||
|
||||
SELECT 'DEDUPLICATE BY pk,sk,val,mat,partition_key';
|
||||
INSERT INTO full_duplicates SELECT * FROM source_data;
|
||||
OPTIMIZE TABLE full_duplicates FINAL DEDUPLICATE BY pk,sk,val,mat,partition_key;
|
||||
SELECT * FROM full_duplicates;
|
||||
TRUNCATE full_duplicates;
|
||||
|
||||
--DROP TABLE full_duplicates;
|
||||
|
||||
-- Now to the partial duplicates when MATERIALIZED column alway has unique value.
|
||||
DROP TABLE IF EXISTS partial_duplicates;
|
||||
CREATE TABLE partial_duplicates (
|
||||
pk Int32, sk Int32, val UInt32, partition_key UInt32 DEFAULT 1, mat UInt32 MATERIALIZED rand(), alias UInt32 ALIAS 2,
|
||||
PRIMARY KEY (pk)
|
||||
) ENGINE=MergeTree
|
||||
ORDER BY (pk, sk);
|
||||
|
||||
SELECT 'Can not remove full duplicates';
|
||||
|
||||
-- should not remove anything
|
||||
SELECT 'OLD DEDUPLICATE';
|
||||
INSERT INTO partial_duplicates SELECT * FROM source_data;
|
||||
OPTIMIZE TABLE partial_duplicates FINAL DEDUPLICATE;
|
||||
SELECT count() FROM partial_duplicates;
|
||||
TRUNCATE partial_duplicates;
|
||||
|
||||
SELECT 'DEDUPLICATE BY pk,sk,val,mat';
|
||||
INSERT INTO partial_duplicates SELECT * FROM source_data;
|
||||
OPTIMIZE TABLE partial_duplicates FINAL DEDUPLICATE BY pk,sk,val,mat;
|
||||
SELECT count() FROM partial_duplicates;
|
||||
TRUNCATE partial_duplicates;
|
||||
|
||||
SELECT 'Remove partial duplicates';
|
||||
|
||||
SELECT 'DEDUPLICATE BY *'; -- all except MATERIALIZED columns, hence will reduce number of rows.
|
||||
INSERT INTO partial_duplicates SELECT * FROM source_data;
|
||||
OPTIMIZE TABLE partial_duplicates FINAL DEDUPLICATE BY *;
|
||||
SELECT count() FROM partial_duplicates;
|
||||
TRUNCATE partial_duplicates;
|
||||
|
||||
SELECT 'DEDUPLICATE BY * EXCEPT mat';
|
||||
INSERT INTO partial_duplicates SELECT * FROM source_data;
|
||||
OPTIMIZE TABLE partial_duplicates FINAL DEDUPLICATE BY * EXCEPT mat;
|
||||
SELECT * FROM partial_duplicates;
|
||||
TRUNCATE partial_duplicates;
|
||||
|
||||
SELECT 'DEDUPLICATE BY COLUMNS("*") EXCEPT mat';
|
||||
INSERT INTO partial_duplicates SELECT * FROM source_data;
|
||||
OPTIMIZE TABLE partial_duplicates FINAL DEDUPLICATE BY COLUMNS('.*') EXCEPT mat;
|
||||
SELECT * FROM partial_duplicates;
|
||||
TRUNCATE partial_duplicates;
|
||||
|
||||
SELECT 'DEDUPLICATE BY pk,sk';
|
||||
INSERT INTO partial_duplicates SELECT * FROM source_data;
|
||||
OPTIMIZE TABLE partial_duplicates FINAL DEDUPLICATE BY pk,sk;
|
||||
SELECT * FROM partial_duplicates;
|
||||
TRUNCATE partial_duplicates;
|
||||
|
||||
SELECT 'DEDUPLICATE BY COLUMNS(".*k")';
|
||||
INSERT INTO partial_duplicates SELECT * FROM source_data;
|
||||
OPTIMIZE TABLE partial_duplicates FINAL DEDUPLICATE BY COLUMNS('.*k');
|
||||
SELECT * FROM partial_duplicates;
|
||||
TRUNCATE partial_duplicates;
|
@ -0,0 +1,47 @@
|
||||
check that we have a data
|
||||
r1 1 1001 3 2 2
|
||||
r1 1 2001 1 1 1
|
||||
r1 2 1002 1 1 1
|
||||
r1 2 2002 1 1 1
|
||||
r1 3 1003 2 2 2
|
||||
r1 4 1004 2 2 2
|
||||
r1 5 2005 2 2 1
|
||||
r1 9 1002 1 1 1
|
||||
r2 1 1001 3 2 2
|
||||
r2 1 2001 1 1 1
|
||||
r2 2 1002 1 1 1
|
||||
r2 2 2002 1 1 1
|
||||
r2 3 1003 2 2 2
|
||||
r2 4 1004 2 2 2
|
||||
r2 5 2005 2 2 1
|
||||
r2 9 1002 1 1 1
|
||||
after old OPTIMIZE DEDUPLICATE
|
||||
r1 1 1001 3 2 2
|
||||
r1 1 2001 1 1 1
|
||||
r1 2 1002 1 1 1
|
||||
r1 2 2002 1 1 1
|
||||
r1 3 1003 2 2 2
|
||||
r1 4 1004 2 2 2
|
||||
r1 5 2005 2 2 1
|
||||
r1 9 1002 1 1 1
|
||||
r2 1 1001 3 2 2
|
||||
r2 1 2001 1 1 1
|
||||
r2 2 1002 1 1 1
|
||||
r2 2 2002 1 1 1
|
||||
r2 3 1003 2 2 2
|
||||
r2 4 1004 2 2 2
|
||||
r2 5 2005 2 2 1
|
||||
r2 9 1002 1 1 1
|
||||
check data again after multiple deduplications with new syntax
|
||||
r1 1 1001 1 1 1
|
||||
r1 2 1002 1 1 1
|
||||
r1 3 1003 1 1 1
|
||||
r1 4 1004 1 1 1
|
||||
r1 5 2005 1 1 1
|
||||
r1 9 1002 1 1 1
|
||||
r2 1 1001 1 1 1
|
||||
r2 2 1002 1 1 1
|
||||
r2 3 1003 1 1 1
|
||||
r2 4 1004 1 1 1
|
||||
r2 5 2005 1 1 1
|
||||
r2 9 1002 1 1 1
|
@ -0,0 +1,60 @@
|
||||
--- See also tests/queries/0_stateless/01581_deduplicate_by_columns_local.sql
|
||||
|
||||
--- replicated case
|
||||
|
||||
-- Just in case if previous tests run left some stuff behind.
|
||||
DROP TABLE IF EXISTS replicated_deduplicate_by_columns_r1;
|
||||
DROP TABLE IF EXISTS replicated_deduplicate_by_columns_r2;
|
||||
|
||||
SET replication_alter_partitions_sync = 2;
|
||||
|
||||
-- IRL insert_replica_id were filled from hostname
|
||||
CREATE TABLE IF NOT EXISTS replicated_deduplicate_by_columns_r1 (
|
||||
id Int32, val UInt32, unique_value UInt64 MATERIALIZED rowNumberInBlock(), insert_replica_id UInt8 MATERIALIZED randConstant()
|
||||
) ENGINE=ReplicatedMergeTree('/clickhouse/tables/test_01581/replicated_deduplicate', 'r1') ORDER BY id;
|
||||
|
||||
CREATE TABLE IF NOT EXISTS replicated_deduplicate_by_columns_r2 (
|
||||
id Int32, val UInt32, unique_value UInt64 MATERIALIZED rowNumberInBlock(), insert_replica_id UInt8 MATERIALIZED randConstant()
|
||||
) ENGINE=ReplicatedMergeTree('/clickhouse/tables/test_01581/replicated_deduplicate', 'r2') ORDER BY id;
|
||||
|
||||
|
||||
SYSTEM STOP REPLICATED SENDS;
|
||||
SYSTEM STOP FETCHES;
|
||||
SYSTEM STOP REPLICATION QUEUES;
|
||||
|
||||
-- insert some data, 2 records: (3, 1003), (4, 1004) are duplicated and have difference in unique_value / insert_replica_id
|
||||
-- (1, 1001), (5, 2005) has full duplicates
|
||||
INSERT INTO replicated_deduplicate_by_columns_r1 VALUES (1, 1001), (1, 1001), (2, 1002), (3, 1003), (4, 1004), (1, 2001), (9, 1002);
|
||||
INSERT INTO replicated_deduplicate_by_columns_r2 VALUES (1, 1001), (2, 2002), (3, 1003), (4, 1004), (5, 2005), (5, 2005);
|
||||
|
||||
SYSTEM START REPLICATION QUEUES;
|
||||
SYSTEM START FETCHES;
|
||||
SYSTEM START REPLICATED SENDS;
|
||||
|
||||
-- wait for syncing replicas
|
||||
SYSTEM SYNC REPLICA replicated_deduplicate_by_columns_r2;
|
||||
SYSTEM SYNC REPLICA replicated_deduplicate_by_columns_r1;
|
||||
|
||||
SELECT 'check that we have a data';
|
||||
SELECT 'r1', id, val, count(), uniqExact(unique_value), uniqExact(insert_replica_id) FROM replicated_deduplicate_by_columns_r1 GROUP BY id, val ORDER BY id, val;
|
||||
SELECT 'r2', id, val, count(), uniqExact(unique_value), uniqExact(insert_replica_id) FROM replicated_deduplicate_by_columns_r2 GROUP BY id, val ORDER BY id, val;
|
||||
|
||||
-- NOTE: here and below we need FINAL to force deduplication in such a small set of data in only 1 part.
|
||||
-- that should remove full duplicates
|
||||
OPTIMIZE TABLE replicated_deduplicate_by_columns_r1 FINAL DEDUPLICATE;
|
||||
|
||||
SELECT 'after old OPTIMIZE DEDUPLICATE';
|
||||
SELECT 'r1', id, val, count(), uniqExact(unique_value), uniqExact(insert_replica_id) FROM replicated_deduplicate_by_columns_r1 GROUP BY id, val ORDER BY id, val;
|
||||
SELECT 'r2', id, val, count(), uniqExact(unique_value), uniqExact(insert_replica_id) FROM replicated_deduplicate_by_columns_r2 GROUP BY id, val ORDER BY id, val;
|
||||
|
||||
OPTIMIZE TABLE replicated_deduplicate_by_columns_r1 FINAL DEDUPLICATE BY id, val;
|
||||
OPTIMIZE TABLE replicated_deduplicate_by_columns_r1 FINAL DEDUPLICATE BY COLUMNS('[id, val]');
|
||||
OPTIMIZE TABLE replicated_deduplicate_by_columns_r1 FINAL DEDUPLICATE BY COLUMNS('[i]') EXCEPT(unique_value, insert_replica_id);
|
||||
|
||||
SELECT 'check data again after multiple deduplications with new syntax';
|
||||
SELECT 'r1', id, val, count(), uniqExact(unique_value), uniqExact(insert_replica_id) FROM replicated_deduplicate_by_columns_r1 GROUP BY id, val ORDER BY id, val;
|
||||
SELECT 'r2', id, val, count(), uniqExact(unique_value), uniqExact(insert_replica_id) FROM replicated_deduplicate_by_columns_r2 GROUP BY id, val ORDER BY id, val;
|
||||
|
||||
-- cleanup the mess
|
||||
DROP TABLE replicated_deduplicate_by_columns_r1;
|
||||
DROP TABLE replicated_deduplicate_by_columns_r2;
|
@ -0,0 +1,11 @@
|
||||
SELECT count()
|
||||
FROM
|
||||
(
|
||||
SELECT number
|
||||
FROM
|
||||
(
|
||||
SELECT number
|
||||
FROM numbers(1000000)
|
||||
)
|
||||
WHERE rand64() < (0.01 * 18446744073709552000.)
|
||||
)
|
@ -0,0 +1 @@
|
||||
EXPLAIN SYNTAX SELECT count(*) FROM ( SELECT number FROM ( SELECT number FROM numbers(1000000) ) WHERE rand64() < (0.01 * 18446744073709552000.));
|
@ -0,0 +1 @@
|
||||
one
|
@ -0,0 +1,4 @@
|
||||
drop table if exists enum;
|
||||
create table enum engine MergeTree order by enum as select cast(1, 'Enum8(\'zero\'=0, \'one\'=1)') AS enum;
|
||||
select * from enum where enum = 1;
|
||||
drop table if exists enum;
|
Loading…
Reference in New Issue
Block a user