Commit Graph

1094 Commits

Author SHA1 Message Date
Vasily Nemkov
e166aae3f9 Using CSV-like strings for list of columns to deduplicate by instead of JSON-like notation. 2020-12-18 13:44:56 +02:00
Vasily Nemkov
59fc301344 Fixed test to be less flaky
Also logging expanded list of columns passed from `DEDUPLICATE BY` to actual deduplication routines.
2020-12-08 19:44:34 +03:00
Vasily Nemkov
8c5daf0925 Fixed building tests with GCC-10
Also a minor cleanup of the test code.
2020-12-08 13:15:18 +03:00
Vasily Nemkov
168155eeec Minor: cleanup 2020-12-07 18:07:40 +03:00
Vasily Nemkov
dbdc018ab8 Fixed and refined unite-test
* no more undefined values for attributes in ReplicatedMergeTreeLogEntry
* validation of string serialization format
2020-12-07 17:38:25 +03:00
Vasily Nemkov
70ea507dae OPTIMIZE DEDUPLICATE BY columns
Extended OPTIMIZE ... DEDUPLICATE syntax to allow explicit (or implicit with asterisk/column transformers) list of columns to check for duplicates on.

Following syntax variants are now supported:

OPTIMIZE TABLE table DEDUPLICATE; -- the old one
OPTIMIZE TABLE table DEDUPLICATE BY *;
OPTIMIZE TABLE table DEDUPLICATE BY * EXCEPT colX;
OPTIMIZE TABLE table DEDUPLICATE BY * EXCEPT (colX, colY);
OPTIMIZE TABLE table DEDUPLICATE BY col1,col2,col3;
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex');
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex') EXCEPT colX;
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex') EXCEPT (colX, colY);

Note that * behaves just like in SELECT: MATERIALIZED, and ALIAS columns are not used for expansion.
Also, it is an error to specify empty list of columns, or write an expression that results in an empty list of columns, or deduplicate by an ALIAS column.
Column transformers other than EXCEPT are not supported.
2020-12-07 09:44:07 +03:00
feng lv
a634e4ed80 fix incorrect initialize MergeTreeWriterSettings 2020-12-06 08:57:02 +00:00
alesapin
0eec52b1fd
Merge pull request #17737 from ClickHouse/fix_segfault_in_distributed_out_stream
Fix segfault when 'not enough space'
2020-12-05 16:14:30 +03:00
Anton Popov
d1ef6a897e
Merge pull request #17802 from detailyang/patch-1
hotfix:check in_memory_parts_enable_wal
2020-12-05 03:05:38 +03:00
Anton Popov
60b0cbb1c1
Merge pull request #15939 from Avogar/optimize_final_optimization
Optimize final optimization
2020-12-05 02:26:27 +03:00
detailyang
e7a151fd1e
hotfix:check in_memory_parts_enable_wal 2020-12-04 23:14:27 +08:00
alesapin
42544a9833 Merge branch 'master' into fix_segfault_in_distributed_out_stream 2020-12-04 17:41:10 +03:00
Pavel Kruglov
9dbced0474 Pass setting instead of context 2020-12-04 17:01:59 +03:00
Ivan
315ff4f0d9
ANTLR4 Grammar for ClickHouse and new parser (#11298) 2020-12-04 05:15:44 +03:00
Anton Popov
cd1917c7a6
Merge branch 'master' into optimize_final_optimization 2020-12-03 16:52:51 +03:00
Alexander Tokmakov
bfbf150c67 fix segfault when 'not enough space' 2020-12-02 17:49:43 +03:00
Alexey Milovidov
6590fdfdd2 Fix logging 2020-11-30 11:44:01 +03:00
alesapin
25f40db2fb
Merge pull request #17499 from ClickHouse/concurrent_mutation_and_random_kill
Fix kill mutation on concurrent alter queries
2020-11-30 10:51:50 +03:00
alexey-milovidov
fabceebbce
Merge pull request #17145 from amosbird/cddt
Fix unmatched type comparison in KeyCondition
2020-11-29 14:29:35 +03:00
alexey-milovidov
f4a61ac3c3
Merge pull request #17527 from ucasFL/spelling
fix spelling errors
2020-11-29 13:45:42 +03:00
feng lv
7e3524caa1 fix spelling errors 2020-11-28 08:17:20 +00:00
alexey-milovidov
c189f6405f
Merge pull request #16767 from azat/optimize_trivial_count_query-fix
Fix optimize_trivial_count_query with partition predicate
2020-11-28 08:43:12 +03:00
Nikita Mikhaylov
6f3db7ff50
Merge pull request #17330 from CurtizJ/move-to-prewhere
Allow to move conditions to prewhere with compact parts
2020-11-28 02:02:42 +03:00
alesapin
6567796014 Fix kill mutation on concurrent alter queries 2020-11-27 18:46:52 +03:00
Anton Popov
1a4ed07a98 add some more comments 2020-11-27 15:47:27 +03:00
alexey-milovidov
dfae1efbbd
Merge pull request #17070 from fastio/master
Support multiple ZooKeeper clusters
2020-11-27 10:38:01 +03:00
Amos Bird
022ba2b0a9
Fix unmatched type comparison in KeyCondition 2020-11-26 16:15:50 +08:00
alesapin
960d077612 Better functions 2020-11-26 10:25:57 +03:00
alexey-milovidov
75a78e6c20
Update BackgroundJobsExecutor.h 2020-11-26 07:09:05 +03:00
Azat Khuzhin
0b47f4a9e9 Fix optimize_trivial_count_query with partition predicate
Consider the following example:

    CREATE TABLE test(p DateTime, k int) ENGINE MergeTree PARTITION BY toDate(p) ORDER BY k;
    INSERT INTO test VALUES ('2020-09-01 00:01:02', 1), ('2020-09-01 20:01:03', 2), ('2020-09-02 00:01:03', 3);

- SELECT count() FROM test WHERE toDate(p) >= '2020-09-01' AND p <= '2020-09-01 00:00:00'
  In this case rpn will be (FUNCTION_IN_RANGE, FUNCTION_UNKNOWN (due to strict), FUNCTION_AND)
  and for optimize_trivial_count_query we cannot use index if there is at least one FUNCTION_UNKNOWN.
  since there is no post processing and return count() based on only the first predicate is wrong.

  Before this patch FUNCTION_UNKNOWN was allowed for optimize_trivial_count_query, and the result was wrong.

And two examples above just to show the difference, the behaviour hadn't been changed with this patch:

- SELECT * FROM test WHERE toDate(p) >= '2020-09-01' AND p <= '2020-09-01 00:00:00'
  In this case will be (FUNCTION_IN_RANGE, FUNCTION_IN_RANGE (due to non-strict), FUNCTION_AND)
  so it will prune everything out and nothing will be read.

- SELECT * FROM test WHERE toDate(p) >= '2020-09-01' AND toUnixTimestamp(p)%5==0
  In this case will be (FUNCTION_IN_RANGE, FUNCTION_UNKNOWN, FUNCTION_AND)
  and all, two, partitions will be scanned, but due to filtering later none of rows will be matched.
2020-11-25 23:09:17 +03:00
tavplubix
6477251ea1
Update BackgroundJobsExecutor.h 2020-11-25 23:06:00 +03:00
Anton Popov
852a08eacb allow to move conditions to prewhere with compact parts 2020-11-23 21:27:59 +03:00
Amos Bird
172b7e9ed1
global in set index. 2020-11-23 22:05:08 +08:00
tavplubix
5cc9cb01cd
Merge pull request #16751 from amosbird/globalcontext
Make global_context consistent.
2020-11-22 18:46:17 +03:00
alesapin
9c8b0da382
Merge pull request #16033 from nvartolomei/nv/parts-uuid
Add unique identifiers IMergeTreeDataPart structure
2020-11-22 16:13:19 +03:00
Pavel Kruglov
623dc2df7a Update comment 2020-11-20 17:32:39 +03:00
Pavel Kruglov
ca3fe49a2a Make setting global 2020-11-20 17:29:13 +03:00
Nicolae Vartolomei
7c8bc1c04e Use JSON metadata in WAL 2020-11-20 13:49:17 +00:00
Nicolae Vartolomei
040aba9f85 Add uuid.txt to checksums for parts stored on disk
We are breaking backwards compatibility anyway (but agted by a setting)
2020-11-20 13:49:17 +00:00
Nicolae Vartolomei
94293ca3ce Assign UUIDs to parts only when configured to do so
Avoid breaking backwards compatibility by default for now.
2020-11-20 13:49:17 +00:00
Amos Bird
1d9d586e20
Make global_context consistent. 2020-11-20 18:23:14 +08:00
Pavel Kruglov
4c30857759 Minor change 2020-11-20 01:22:40 +03:00
alesapin
3f01096c86 Less verbose logging when fetch is impossible 2020-11-19 20:03:20 +03:00
alesapin
2623d35f68
Merge pull request #17120 from ClickHouse/fix_granularity_on_block_borders
Fix index granularity calculation on block borders
2020-11-19 18:36:07 +03:00
Nicolae Vartolomei
746f8e45f5 All new parts must have uuids 2020-11-19 13:18:03 +00:00
Nicolae Vartolomei
425dc4b11b Add unique identifiers IMergeTreeDataPart structure
For now uuids are not generated at all, they are present only if the
part is updated manually (as you can see in the integration test).

The only place where they can be seen today by an end user is in
`system.parts` table. I was looking for hiding this column behind an
option but couldn't find an easy way to do that.

Likely this is also required for WAL, but need to think how not to break
compatibility.

Relates to #13574, https://github.com/ClickHouse/ClickHouse/issues/13574

Next 1: In the upcoming PR the plan is to integrate de-duplication based on
these fingerprints in the query pipeline.

Next 2: We'll enable automatic generation of uuids and come up with a
way for conditionally sending uuids when processing distributed queries
only when part movement is in progress.
2020-11-19 13:14:25 +00:00
Peng Jian
a0683ce460 Support mulitple ZooKeeper clusters 2020-11-19 15:44:47 +08:00
alesapin
7080f424e2 Fix bug for skip indices and make code more complex 2020-11-18 15:04:13 +03:00
filimonov
258170b325
Update ReplicatedMergeTreeMergeStrategyPicker.cpp 2020-11-18 08:45:44 +01:00
Mikhail Filimonov
234c671e52
After CR fixes 2020-11-18 08:45:44 +01:00