Commit Graph

18 Commits

Author SHA1 Message Date
Mike Kot
4c391f8e99
SYSTEM RESTORE REPLICA replica [ON CLUSTER cluster] (#13652)
* initial commit: add setting and stub

* typo

* added test stub

* fix

* wip merging new integration test and code proto

* adding steps interpreters

* adding firstly proposed solution (moving parts etc)

* added checking zookeeper path existence

* fixing the include

* fixing and sorting includes

* fixing outdated struct

* fix the name

* added ast ptr as level of indirection

* fix ref

* updating the changes

* working on test stub

* fix iterator -> reference

* revert rocksdb submodule update

* fixed show privileges test

* updated the test stub

* replaced rand() with thread_local_rng(), updated the tests

updated the test

fixed test config path

test fix

removed error messages

fixed the test

updated the test

fixed string literal

fixed literal

typo: =

* fixed the empty replica error message

* updated the test and the code with logs

* updated the possible test cases, updated

* added the code/test milestone comments

* updated the test (added more testcases)

* replaced native assert with CH one

* individual replicas recursive delete fix

* updated the AS db.name AST

* two small logging fixes

* manually generated AST fixes

* Updated the test, added the possible algo change

* Some thoughts about optimizing the solution:

ALTER MOVE PARTITION .. TO TABLE -> move to detached/ + ALTER ... ATTACH

* fix

* Removed the replica sync in test as it's invalid

* Some test tweaks

* tmp

* Rewrote the algo by using the executeQuery instead of

hand-crafting the ASTPtr.

Two questions still active.

* tr: logging active parts

* Extracted the parts moving algo into a separate helper function

* Fixed the test data and the queries slightly

* Replaced query to system.parts to direct invocation,

started building the test that breaks on various parts.

* Added the case for tables when at least one replica is alive

* Updated the test to test replicas restoration by detaching/attaching

* Altered the test to check restoration without replica restart

* Added the tables swap in the start if the server failed last time

* Hotfix when only /replicas/replica... path was deleted

* Restore ZK paths while creating a replicated MergeTree table

* Updated the docs, fixed the algo for individual replicas restoration case

* Initial parts table storage fix, tests sync fix

* Reverted individual replica restoration to general algo

* Slightly optimised getDataParts

* Trying another solution with parts detaching

* Rewrote algo without any steps, added ON CLUSTER support

* Attaching parts from other replica on restoration

* Getting part checksums from ZK

* Removed ON CLUSTER, finished working solution

* Multiple small changes after review

* Fixing parallel test

* Supporting rewritten form on cluster

* Test fix

* Moar logging

* Using source replica as checksum provider

* improve test, remove some code from parser

* Trying solution with move to detached + forget

* Moving all parts (not only Committed) to detached

* Edited docs for RESTORE REPLICA

* Re-merging

* minor fixes

Co-authored-by: Alexander Tokmakov <avtokmakov@yandex-team.ru>
2021-06-20 11:24:43 +03:00
Yatsishin Ilya
9282c7470c better 2021-05-25 19:53:55 +03:00
Yatsishin Ilya
e9ccf906c4 improvements 2021-05-25 16:40:22 +03:00
Yatsishin Ilya
893cd47bd2 better 2021-05-25 15:40:59 +03:00
Yatsishin Ilya
c191a631e8 touch 2021-05-24 18:30:51 +03:00
Yatsishin Ilya
40e63646ca more 2021-05-17 14:16:16 +03:00
tavplubix
541b601317
Try fix flaky test 2021-04-03 13:25:40 +03:00
Mike Kot
da67e06aa0 Added another test case to handle missing part data 2021-03-22 17:52:21 +03:00
Mike Kot
c55a73b752 Added the solution to handle the corruption case
When the part data (e.g. data.bin) is corrupted, but the checksums.txt
is present -- explicitly deleting the checksums.txt.

Removed the extra logging, changes some exceptions message.
2021-03-22 17:23:43 +03:00
Mike Kot
5789507e8b Investigating, why the checksums may match when they shouldn't. 2021-03-22 17:23:43 +03:00
Mike Kot
2ccdb7ef5c Multiple small code and test updates
- Updated the docs to make everything clear.
- Multiple small logger fixes.
- Changed the attach_part command -- now it's after check for the
covering parts -- motivation is to do less work with the checksums
fetching.
- Better logging in the integration test.
2021-03-17 16:52:35 +03:00
Mike Kot
6ea574525c Small fixes regarding the review 2021-03-03 16:51:41 +03:00
Mike Kot
6191580fe1 Added the PartitionManager to check that replica 1 attaches the local
data
2021-03-01 20:54:02 +03:00
Mike Kot
f3e340fcdf Fixing the tests 2021-03-01 20:41:31 +03:00
Mike Kot
f088dd445d Extended the test to check both the ALTER PARTITION and PART
Added some notes about the SYSTEM SYNC REPLICA and ALTER ... DROP /
ATTACH.
2021-03-01 20:41:31 +03:00
Mike Kot
5281314ac0 Finished the test draft for ATTACH PARTITION,
Extracted the part data corruption function into the helper.
2021-03-01 16:42:31 +03:00
Mike Kot
fefc7234df Replaced the part lookup algo to "by hash only", comments on test stub 2021-02-16 16:00:26 +03:00
Mike Kot
8182482cbd Add test stub 2021-02-15 21:06:20 +03:00