ClickHouse® is a real-time analytics DBMS
Go to file
Nicolae Vartolomei f35e6eee19 Avoid deleting old parts from FS on shutdown for replicated engine
This was introduced in https://github.com/ClickHouse/ClickHouse/pull/8602.
The idea was to avoid data re-appearing in ClickHouse after DROP/DETACH
PARTITION. This problem was only present in MergeTree engine and I don't
understand why we need to do the same in ReplicatedMergeTree.

For ReplicatedMergeTree the state of truth is stored in ZK, deleting
things from filesystem just introduces inconsistencies and this is the
main source for errors like "No active replica has part X or covering
part".

The resulting problem is fixed by
https://github.com/ClickHouse/ClickHouse/pull/25820, but in my opinion
we would better avoid introducing the ZK/FS inconsistency in the first
place.

When does this inconsistency appear? Often the sequence is like this:

0. Write 2 parts to ZK [all_0_0_0, all_1_1_0]
1. A merge gets scheduled
2. New part replaces old parts [new: all_0_1_1, old: all_0_0_0, all_1_1_0]
3. Replica gets shutdown and old parts are removed from filesystem
4. Replica comes back online, metadata about all parts is still stored in ZK for this new replica.
5. Other replica after cleanup thread runs will have only [all_0_1_1] in
   ZK
5. User triggers a DROP_RANGE after a while (drop range is for all_0_1_9999*)
6. Each replica deletes from ZK only [all_0_1_1]. The replica that got
   restarted uses its in-memory state to choose nodes to delete from ZK.
7. Restart the replica again. It will now think that there are 2 parts
   that it lost and needs to fetch them [all_0_0_0, all_1_1_0].

`clearOldPartsAndRemoveFromZK` which is triggered from cleanup thread
runs cleanup sequence correctly, it first removes things from ZK and
then from filesystem. I don't see much benefit of triggering it on
shutdown and would rather have it called only from a single place.

---

This is a very, very edge case situation but it proves that the current
"fix" (https://github.com/ClickHouse/ClickHouse/pull/25820) isn't
complete.

```
create table test(
    v UInt64
)
engine=ReplicatedMergeTree('/clickhouse/test', 'one')
order by v
settings old_parts_lifetime = 30;

create table test2(
    v UInt64
)
engine=ReplicatedMergeTree('/clickhouse/test', 'two')
order by v
settings old_parts_lifetime = 30;

create table test3(
    v UInt64
)
engine=ReplicatedMergeTree('/clickhouse/test', 'three')
order by v
settings old_parts_lifetime = 30;

insert into table test values (1), (2), (3);
insert into table test values (4);

optimize table test final;

detach table test;
detach table test2;

alter table test3 drop partition tuple();

attach table test;
attach table test2;
```

```
(CONNECTED [localhost:9181]) /> ls /clickhouse/test/replicas/one/parts
all_0_0_0
all_1_1_0
(CONNECTED [localhost:9181]) /> ls /clickhouse/test/replicas/two/parts
all_0_0_0
all_1_1_0
(CONNECTED [localhost:9181]) /> ls /clickhouse/test/replicas/three/parts
```

```
detach table test;
attach table test;
```

`test` will now figure out that parts exist only in ZK and will issue `GET_PART`
after first removing parts from ZK.

`test2` will receive fetch for unknown parts and will trigger part checks itself.
Because `test` doesn't have the parts anymore in ZK `test2` will mark them as LostForever.
It will also not insert empty parts, because the partition is empty.

`test` is left with `GET_PART` in the queue and stuck.

```
SELECT
    table,
    type,
    replica_name,
    new_part_name,
    last_exception
FROM system.replication_queue

Query id: 74c5aa00-048d-4bc1-a2ea-6f69501c11a0

Row 1:
──────
table:          test
type:           GET_PART
replica_name:   one
new_part_name:  all_0_0_0
last_exception: Code: 234. DB::Exception: No active replica has part all_0_0_0 or covering part. (NO_REPLICA_HAS_PART) (version 21.9.1.1)

Row 2:
──────
table:          test
type:           GET_PART
replica_name:   one
new_part_name:  all_1_1_0
last_exception: Code: 234. DB::Exception: No active replica has part all_1_1_0 or covering part. (NO_REPLICA_HAS_PART) (version 21.9.1.1)
```
2021-07-22 17:48:16 +01:00
.github Update PULL_REQUEST_TEMPLATE.md 2021-07-20 01:50:26 +03:00
base Fix history file conversion if file is empty 2021-07-20 21:00:53 +03:00
benchmark Convert to python3 (#15007) 2020-10-02 19:54:07 +03:00
cmake Revert "Auto version update to [21.9.1.7485] [54454]" 2021-07-17 13:17:33 +03:00
contrib done 2021-07-19 17:47:24 +03:00
debian Revert "Auto version update to [21.10.1.1] [54455]" 2021-07-17 13:17:30 +03:00
docker Merge pull request #26545 from Kylinrix/patch-3 2021-07-20 21:20:11 +03:00
docs Merge pull request #26527 from ka1bi4/romanzhukov-DOCSUP-11658-merge_selecting_sleep_ms 2021-07-22 13:08:43 +03:00
programs Merge pull request #25674 from amosbird/distributedreturnconnection 2021-07-22 11:04:49 +03:00
src Avoid deleting old parts from FS on shutdown for replicated engine 2021-07-22 17:48:16 +01:00
tests Merge pull request #26684 from ClickHouse/fix-flaky-test-24 2021-07-22 15:37:38 +03:00
utils Update version_date.tsv after release 21.7.4.18 2021-07-19 14:34:22 +03:00
website [ImgBot] Optimize images 2021-07-07 19:12:16 +00:00
.arcignore Added .arcignore 2020-05-21 09:17:03 +03:00
.clang-format Fixed wrong code around Memory Profiler 2020-03-03 03:24:44 +03:00
.clang-tidy Enable clang-tidy for programs and utils 2020-05-18 04:19:50 +03:00
.editorconfig Changed tabs to spaces in editor configs and in style guide [#CLICKHOUSE-3]. 2017-04-01 11:35:09 +03:00
.gitattributes Union merge for arcadia_skip_list.txt to avoid frequent conflicts 2021-03-10 08:50:32 +03:00
.gitignore Add *.log/*.stderr/*.stdout into gitignore 2021-06-08 09:14:47 +03:00
.gitmodules bump 2021-07-13 10:50:38 +00:00
.potato.yml Fix yamllint issues 2021-02-20 23:25:21 +03:00
.pylintrc Add pylintrc config 2021-01-26 23:35:56 +03:00
.vimrc Changed tabs to spaces in editor configs and in style guide [#CLICKHOUSE-3]. 2017-04-01 11:35:09 +03:00
.yamllint Drop truthy.check-keys from yamllint (does not supported on CI) 2021-02-21 06:15:36 +03:00
AUTHORS Update AUTHORS 2020-01-23 17:36:05 +03:00
CHANGELOG.md Update CHANGELOG.md 2021-07-09 03:00:49 +03:00
CMakeLists.txt better 2021-07-13 10:50:12 +00:00
CODE_OF_CONDUCT.md Add minimal code of conduct #9676 2020-03-16 12:44:28 +03:00
CONTRIBUTING.md Update CONTRIBUTING.md 2020-01-27 21:03:30 +03:00
docker-compose.yml Updated docker-compose.yml #1025 2017-07-26 20:05:32 +03:00
format_sources allow several <graphite> targets (#603) 2017-03-21 23:08:09 +04:00
LICENSE Update LICENSE 2021-01-25 17:00:13 +03:00
PreLoad.cmake Check if XCODE_IDE is true and avoid enforcing ninja in that case 2021-01-06 03:06:03 +04:00
README.md Update README.md 2021-07-21 18:44:20 +03:00
release Remove rotten parts of release script 2021-04-25 02:11:31 +03:00
SECURITY.md Update SECURITY.md (outdated long time ago) 2021-01-21 16:48:28 +03:00
uncrustify.cfg Better .clang-format and uncrustify.cfg 2018-11-29 15:45:34 +03:00
ya.make Changes required for auto-sync with Arcadia 2020-04-16 15:31:57 +03:00

ClickHouse — open source distributed column-oriented DBMS

ClickHouse® is an open-source column-oriented database management system that allows generating analytical data reports in real time.

  • Official website has quick high-level overview of ClickHouse on main page.
  • Tutorial shows how to set up and query small ClickHouse cluster.
  • Documentation provides more in-depth information.
  • YouTube channel has a lot of content about ClickHouse in video format.
  • Slack and Telegram allow to chat with ClickHouse users in real-time.
  • Blog contains various ClickHouse-related articles, as well as announcements and reports about events.
  • Code Browser with syntax highlight and navigation.
  • Contacts can help to get your questions answered if there are any.
  • You can also fill this form to meet Yandex ClickHouse team in person.

Upcoming Events