Commit Graph

50 Commits

Author SHA1 Message Date
Alexander Tokmakov
7cba97aeab Merge branch 'master' into resubmit_21474 2022-03-21 12:09:00 +01:00
Alexander Tokmakov
897e94c16c make restarting thread less bad 2022-02-03 23:29:24 +03:00
Alexander Tokmakov
b3ddc601a5 fix race between mergeSelectingTask and queue reinitialization 2022-01-28 15:50:58 +03:00
Alexander Tokmakov
187c43eba8 rename Committed state to Active 2021-12-30 23:45:38 +03:00
tavplubix
4f46ac6b30
Remove LeaderElection (#32140)
* remove LeaderElection

* try fix tests

* Update test.py

* Update test.py
2021-12-07 19:55:55 +03:00
Vitaly Baranov
6634fcbac7 Rename Quota::ResourceType -> QuotaType and move it to Access/Common. 2021-11-19 00:14:23 +03:00
Alexander Tokmakov
89fe606d15 try fix 'some fetches may stuck' 2021-10-18 23:16:02 +03:00
alesapin
ab41384f63 Move queue initialization to restarting thread 2021-09-13 11:00:07 +03:00
Nikita Mikhaylov
6062dd0021 Better 2021-09-08 00:21:21 +00:00
nvartolomei
c09c90125f
Merge branch 'master' into nv/last-queue-update-exception 2021-08-19 10:29:16 +01:00
Alexander Tokmakov
09ff66da0e fix a couple of bugs that may cause replicas to diverge 2021-08-18 12:50:46 +03:00
mergify[bot]
f11e396151
Merge branch 'master' into nv/last-queue-update-exception 2021-08-18 07:00:50 +00:00
Nicolae Vartolomei
3f291b024a
Use plain mutex instead of MultiVersion 2021-08-09 13:58:23 +01:00
Nicolae Vartolomei
8b07a7f180 Store exception generated when we tried to update the queue last time
The use case is to alert when queue contains broken entries. Especially
important when ClickHouse breaks backwards compatibility between
versions and log entries written by newer versions aren't parseable by
old versions.

```
Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected 'quorum: ' before: 'merge_type: 2\n'
```
2021-07-27 15:42:40 +01:00
Amos Bird
623faf47e4
Make sure table is readonly when restarting fails. 2021-07-26 19:21:14 +08:00
tavplubix
60c8dc1c0b
Update ReplicatedMergeTreeRestartingThread.cpp 2021-06-16 00:48:38 +03:00
Alexander Tokmakov
7d7c5638a5 Merge branch 'master' into fix_intersection_with_lost_part 2021-06-01 16:32:59 +03:00
Alexander Tokmakov
16647fe8ce some unrelated fixes 2021-05-31 00:29:37 +03:00
kssenii
9b8df78fdd Merge branch 'master' of github.com:ClickHouse/ClickHouse into poco-file-to-std-fs 2021-05-17 17:42:05 +03:00
alesapin
d4c6a5a05e Better logging 2021-05-14 11:38:53 +03:00
alesapin
67e4393769 If table was not active set readonly mode 2021-05-14 11:32:41 +03:00
kssenii
02288359c5 Less manual concatenation of paths 2021-05-08 13:59:55 +03:00
Ivan
495c6e03aa
Replace all Context references with std::weak_ptr (#22297)
* Replace all Context references with std::weak_ptr

* Fix shared context captured by value

* Fix build

* Fix Context with named sessions

* Fix copy context

* Fix gcc build

* Merge with master and fix build

* Fix gcc-9 build
2021-04-11 02:33:54 +03:00
Nicolae Vartolomei
1d83c596f8 RFC: Throw exception if removing parts from ZooKeeper fails.
This is used for removing part metadata from ZooKeeper when executing
queue events like `DROP_RANGE` triggered when a user tries to drop a
part or a partition. There are other uses but I'll focus only on this
one.

Before this change the method was giving up silently if it was unable to
remove parts from ZooKeeper and this behaviour seems to be problematic.
It could lead to operation being reported as successful at first but
data reappearing later (very rarely) or "stuck" events in replication
queue.

Here is one particular scenario which I think we've hit:

* Execute a DETACH PARTITION
* DROP_RANGE event put in the queue
* Replicas try to execute dropRange but some of them get disconnected
  from ZK and 5 retries aren't enough (ZK is miss-behaving), return code
  (false) is ignored and log pointer advances.
* One of the replica where dropRange failed is restarted.
* checkParts is executed and it finds parts that weren't removed from
  ZK, logs `Removing locally missing part from ZooKeeper and queueing a
  fetch` and puts GET_PART on the queue.
* Few things can happen from here:
  * There is a lagging replica that din't execute DROP_RANGE yet: part will be
    fetched. The other replica will execute DROP_RANGE later and we'll
    get diverging set of parts on replicas.
  * Another replica also silently failed to remove parts from ZK: both
    of them are left with GET_PART in the queue and none of them can
    make progress, logging: `No active replica has part ... or covering
    part`.
2021-03-05 09:50:26 +00:00
Peng Jian
a0683ce460 Support mulitple ZooKeeper clusters 2020-11-19 15:44:47 +08:00
Kruglov Pavel
5aba639430 Update test 2020-10-15 22:01:18 +03:00
sundy-li
153be93544 sub ReadonlyReplica when detach readonly tables 2020-10-15 22:01:18 +03:00
Alexandra Latysheva
f6f33168aa fix compilation error 2020-10-09 12:11:21 +00:00
Alexandra Latysheva
0594a77b57 fix thread restart for parallel quorum inserts 2020-10-09 11:20:20 +00:00
Alexandra Latysheva
a43cac2c1a changing the condition for updateQuorum, specialized it and reduced the number of calls to the zk 2020-10-07 11:28:48 +00:00
Alexandra Latysheva
644091459f style fixes and remove small changes 2020-10-06 22:01:30 +00:00
Alexandra Latysheva
f549ecf9d1 insert_quorum_parallel: status nodes in /quorum/parallel and allow merges 2020-10-06 21:49:48 +00:00
Alexandra Latysheva
8263e62298 working copy (with some debug info) 2020-10-04 19:55:39 +00:00
Alexandra Latysheva
e89a56969f first part 2020-09-30 23:16:27 +00:00
Alexey Milovidov
46caa211d5 Better diagnostics of "Replica {} appears to be already active" message 2020-06-27 16:55:00 +03:00
Alexey Milovidov
5211a42c04 Remove leader election, step 3 2020-06-19 17:18:58 +03:00
Alexey Milovidov
72257061d5 Avoid errors due to implicit int<->bool conversions when using ZK API 2020-06-12 18:09:12 +03:00
Alexey Milovidov
25f941020b Remove namespace pollution 2020-05-31 00:57:37 +03:00
Alexey Milovidov
7e1813825b Return old names of macros 2020-05-24 01:24:01 +03:00
Alexey Milovidov
7e2fb9ad65 Apply all transformations again 2020-05-23 22:38:30 +03:00
Alexey Milovidov
29762240de Remove duplicate whitespaces (preparation) 2020-05-23 22:31:54 +03:00
Alexey Milovidov
9d2a0d2dd7 Apply all transformations again 2020-05-23 21:59:49 +03:00
Alexey Milovidov
a2ad11897f Remove duplicate whitespaces (preparation) 2020-05-23 21:53:58 +03:00
Alexey Milovidov
1f13515a65 Make all LOG in single line (preparation) 2020-05-23 21:31:37 +03:00
Alexey Milovidov
533f86278a find {base,src,programs} -name '*.h' -or -name '*.cpp' | xargs grep -l -P 'LOG_\w+\([^,]+, "[^"]+" << [^<]+ << "[^"]+" << [^<]+ << "[^"]+"\);' | xargs sed -i -r -e 's/(LOG_\w+)\(([^,]+), "([^"]+)" << ([^<]+) << "([^"]+)" << ([^<]+) << "([^"]+)"\);/\1_FORMATTED(\2, "\3{}\5{}\7", \4, \6);/' 2020-05-23 20:00:41 +03:00
Alexey Milovidov
8042e5febe find {base,src,programs} -name '*.h' -or -name '*.cpp' | xargs grep -l -P 'LOG_\w+\([^,]+, "[^"]+" << [^<]+ << "[^"]+" << [^<]+\);' | xargs sed -i -r -e 's/(LOG_\w+)\(([^,]+), "([^"]+)" << ([^<]+) << "([^"]+)" << ([^<]+)\);/\1_FORMATTED(\2, "\3{}\5{}", \4, \6);/' 2020-05-23 19:58:15 +03:00
Alexey Milovidov
e391b77d81 find {base,src,programs} -name '*.h' -or -name '*.cpp' | xargs grep -l -P 'LOG_\w+\([^,]+, "[^"]+" << [^<]+ << "[^"]+"\);' | xargs sed -i -r -e 's/(LOG_\w+)\(([^,]+), "([^"]+)" << ([^<]+) << "([^"]+)"\);/\1_FORMATTED(\2, "\3{}\5", \4);/' 2020-05-23 19:56:05 +03:00
Alexey Milovidov
8d2e80a5e2 find {base,src,programs} -name '*.h' -or -name '*.cpp' | xargs grep -l -P 'LOG_\w+\([^,]+, "[^"]+"\)' | xargs sed -i -r -e 's/(LOG_\w+)\(([^,]+, "[^"]+")\)/\1_FORMATTED(\2)/' 2020-05-23 19:42:39 +03:00
Azat Khuzhin
d93b9a57f6 Forward declaration for Context as much as possible.
Now after changing Context.h 488 modules will be recompiled instead of 582.
2020-05-21 01:53:18 +03:00
Ivan Lezhankin
06446b4f08 dbms/ → src/ 2020-04-03 18:14:31 +03:00