Commit Graph

39 Commits

Author SHA1 Message Date
zvonand
a9499eed79 moved getting server TZ DateLUT to separate place, upd tests and fix 2023-04-12 12:47:05 +02:00
youennL-cs
6526c2a8ab
[RFC] Replacing merge tree new engine (#41005)
* Add new engine to ReplacingMergeTree corresponding to the ReplacingCollapsingMergeTree

* Add new test for the new ReplacingMergeTree engine

* Limit sign value to -1/1

* Add new engine to ReplacingMergeTree corresponding to the ReplacingCollapsingMergeTree

* Add new test for the new ReplacingMergeTree engine

* Limit sign value to -1/1

* Replace sign column(Int8) by is_deleted(UInt8)

* Add new engine to ReplacingMergeTree corresponding to the ReplacingCollapsingMergeTree

* Add new test for the new ReplacingMergeTree engine

* Limit sign value to -1/1

* Replace sign column(Int8) by is_deleted(UInt8)

* Add new engine to ReplacingMergeTree corresponding to the ReplacingCollapsingMergeTree

* Add new test for the new ReplacingMergeTree engine

* Limit sign value to -1/1

* Replace sign column(Int8) by is_deleted(UInt8)

* Add keyword 'CLEANUP' when OPTIMIZE

* Cleanup uniquely when it's a replacingMergeTree

* Propagate CLEANUP information and change from 'with_cleanup' to 'cleanup'

* Cleanup data flagged as 'is_deleted'

* Fix merge when optimize and add a test

* Fix OPTIMIZE and INSERT + add tests

* New fix for cleanup at the merge

* Cleanup debug logs

* Add the SETTINGS option 'clean_deleted_rows' that can be 'never' or 'always'

* Fix regression bug; Now REplicatedMergeTree can be called as before without 'is_deleted'

* Add Replicated tests

* Disable tag 'long' for our test and cleanup some white spaces

* Update tests

* Fix tests and remove additional useless whitespace

* Fix replica test

* Style clean && add condition check for is_deleted values

* clean_deleted_rows settings is nom an enum

* Add valid default value to the clean_deleted_rows settings

* Update cleanup checkers to use the enum and fix typos in the test

* Fix submodule contrib/AMQP-CPP pointer

* Add missing messages in test reference and remove a print with non derterministic order

* fix replica test reference

* Fix edge case

* Fix a typo for the spell checker

* Fix reference

* Fix a condition to raise an error if is_deleted differ from 0/1 and cleanup

* Change tests file name and update number

* This should fix the ReplacingMergeTree parameter set

* Fix replicated parameters

* Disable allow_deprecated_syntax_for_merge_tree for our new column

* Fix a test

* Remove non deterministic order print in the test

* Test on replicas

* Remove a condition, when checking optional parameters, that should not be sueful since we disabled the deprected_syntaxe

* Revert "Remove a condition, when checking optional parameters, that should not be useful since we disabled the deprected_syntaxe"

This reverts commit b65d64c05e.

* Fix replica management and limit the number of argument to two maximum, due to the possiblity of deprecated table create/attach failing otherwise

* Test a fix for replicated log information error

* Try to add sync to have consistent results

* Change path of replicas that should cause one issue and add few prints in case it's not that

* Get cleanup info on replicas only if information found

* Fix style issues

* Try to avoid replication error 'cannot select parts...' and and replica read/write field order

* Cleanup according to PR reviews
 and add tests on error raised.

* Update src/Storages/MergeTree/registerStorageMergeTree.cpp

Co-authored-by: Alexander Tokmakov <tavplubix@gmail.com>

* Select ... FINAL don't show rows with is_deleted = true

* Update and fix SELECT ... FINAL merge parameter

* Remove is_deleted rows only on the version inserted when merge

* Fix (master) updates issues

* Revert changes that should not be commited

* Add changes according to review

* Revert changes that should not be commited - part 2

---------

Co-authored-by: Alexander Tokmakov <tavplubix@gmail.com>
2023-02-16 16:03:16 +03:00
Alexander Tokmakov
2e8100b6e4 make separate DROP_PART log entry type 2023-01-31 13:37:56 +01:00
Anton Popov
8e3698c91f refactoring of code near merge tree parts 2023-01-25 17:34:09 +00:00
Azat Khuzhin
4e76629aaf Fixes for -Wshorten-64-to-32
- lots of static_cast
- add safe_cast
- types adjustments
  - config
  - IStorage::read/watch
  - ...
- some TODO's (to convert types in future)

P.S. That was quite a journey...

v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-10-21 13:25:19 +02:00
Alexey Milovidov
c50791dd3b Fix clang-tidy-14, part 1 2022-05-27 22:52:14 +02:00
Alexey Milovidov
fdbb5b75b2
Merge branch 'master' into minor-renames 2022-05-07 14:18:50 +03:00
Alexander Tokmakov
d149ac8bd7 better logs for virtual parts 2022-04-26 16:58:40 +02:00
Anton Popov
630182b2b1 minor renames 2022-03-14 14:42:09 +00:00
Nicolae Vartolomei
0381c634d4
Add support for user defined identifier on log entries
Sometimes we want to push a log entry once and only once. Because it is
not possible to create a sequential node in ZooKeeper and store its name
to a well known location in the same transaction we'll do it in the
other order. First somehow generate a unique identifier, then submit a
log entry with that identifier. Later, we can search through log entries
using the identifier we provided to find the node.

Required for part movement between shards.
2021-09-17 15:32:35 +01:00
alesapin
a8fdc41193 Fix bug and add more trash to test 2021-07-06 19:51:23 +03:00
alesapin
1a4ccab8e6 Fix style 2021-07-01 15:12:27 +03:00
alesapin
2eb27540b2 Some test version 2021-06-30 15:29:09 +03:00
alesapin
1e69128443 Trying to fix 'Tagging already tagged part' 2021-06-04 14:49:00 +03:00
Alexander Tokmakov
5969891611 do not crash on intersecting parts 2021-06-01 16:25:23 +03:00
Alexander Tokmakov
cdd46aa117 Revert "try fix intersecting virtual parts"
This reverts commit 2571ad7d43.
2021-06-01 14:52:25 +03:00
Alexander Tokmakov
2571ad7d43 try fix intersecting virtual parts 2021-05-31 00:30:50 +03:00
alesapin
67d34c0136 merge with master 2021-05-17 14:13:18 +03:00
alesapin
17f229857c Merge branch 'master' into nvartolomei-parts-move 2021-05-17 13:52:48 +03:00
Alexander Tokmakov
df5f3fbc9d review suggestions 2021-05-14 19:11:40 +03:00
Alexander Tokmakov
e114c7eb8b fix virtual parts in REPLACE_RANGE 2021-05-13 14:29:59 +03:00
Nicolae Vartolomei
1fa5871ff7 Fix bad rebase and introduce part_moves_between_shards_enable setting 2021-04-27 14:20:13 +01:00
alesapin
b930ca5d59 Followup fix 2021-04-27 14:20:12 +01:00
Nicolae Vartolomei
53d57ffb52 Part movement between shards
Integrate query deduplication from #17348
2021-04-27 14:20:12 +01:00
Mike Kot
285af08949 Merge remote-tracking branch 'upstream/master' into feature/attach-partition-local 2021-03-24 22:34:20 +03:00
Mike Kot
c55a73b752 Added the solution to handle the corruption case
When the part data (e.g. data.bin) is corrupted, but the checksums.txt
is present -- explicitly deleting the checksums.txt.

Removed the extra logging, changes some exceptions message.
2021-03-22 17:23:43 +03:00
Alexey Milovidov
671395e8c8 Most likely improve performance 2021-03-15 22:23:27 +03:00
Mike Kot
6ea574525c Small fixes regarding the review 2021-03-03 16:51:41 +03:00
Mike Kot
f1ef382cf9 Added part_checksum to Replicated...Entry serialization. 2021-02-19 16:04:12 +03:00
Mike Kot
feff4c6a22 Started adding the new "ATTACH_PART" command into the replicated log
The original ticket idea was to search for the possibly available data
into the /detached folders for the GET_PART command, but
@tavplubix pointed out this would be quite expensive for an every
fetch.

So a new command is going to be introduced, ATTACH_PART, which will
cover ALTER TABLE ATTACH PART and only for which the search will start.
2021-02-15 01:59:13 +03:00
Vasily Nemkov
e5ec81f7cd Single quotes around column names 2020-12-19 20:58:23 +02:00
Vasily Nemkov
e166aae3f9 Using CSV-like strings for list of columns to deduplicate by instead of JSON-like notation. 2020-12-18 13:44:56 +02:00
Vasily Nemkov
70ea507dae OPTIMIZE DEDUPLICATE BY columns
Extended OPTIMIZE ... DEDUPLICATE syntax to allow explicit (or implicit with asterisk/column transformers) list of columns to check for duplicates on.

Following syntax variants are now supported:

OPTIMIZE TABLE table DEDUPLICATE; -- the old one
OPTIMIZE TABLE table DEDUPLICATE BY *;
OPTIMIZE TABLE table DEDUPLICATE BY * EXCEPT colX;
OPTIMIZE TABLE table DEDUPLICATE BY * EXCEPT (colX, colY);
OPTIMIZE TABLE table DEDUPLICATE BY col1,col2,col3;
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex');
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex') EXCEPT colX;
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex') EXCEPT (colX, colY);

Note that * behaves just like in SELECT: MATERIALIZED, and ALIAS columns are not used for expansion.
Also, it is an error to specify empty list of columns, or write an expression that results in an empty list of columns, or deduplicate by an ALIAS column.
Column transformers other than EXCEPT are not supported.
2020-12-07 09:44:07 +03:00
Nicolae Vartolomei
040aba9f85 Add uuid.txt to checksums for parts stored on disk
We are breaking backwards compatibility anyway (but agted by a setting)
2020-11-20 13:49:17 +00:00
Nicolae Vartolomei
94293ca3ce Assign UUIDs to parts only when configured to do so
Avoid breaking backwards compatibility by default for now.
2020-11-20 13:49:17 +00:00
Nicolae Vartolomei
746f8e45f5 All new parts must have uuids 2020-11-19 13:18:03 +00:00
alesapin
e42d0f60da Fix several bugs 2020-09-04 14:27:27 +03:00
alesapin
f4c7ff0376 Add fixed size of Merge TTLS 2020-09-03 16:00:13 +03:00
Ivan Lezhankin
06446b4f08 dbms/ → src/ 2020-04-03 18:14:31 +03:00