Since if the connection will be closed (by some reason), then the
setting will not be applied after transparent reconnect (since only
native clickhouse-client can do this, since it parses the query, but
perf tests uses python driver).
Just use inplace SETTINGS clause or <settings>.
So now you will get additional message:
2020.11.21 13:37:15.429767 [ 380840 ] {47e1540d-f4cd-4f2f-9383-f1216e8328dc} <Trace> StorageDistributed (dist_01247): (127.0.0.2:9000) Cancelling query
Otherwise flaky tests will run then both, not a problem, but this is
just to trigger flaky check, since there is some tricky issue [1]:
2020-11-20 00:35:51 01247_optimize_distributed_group_by_sharding_key: [ FAIL ] 0.67 sec. - return code 101
2020-11-20 00:35:51 Received exception from server (version 20.12.1):
2020-11-20 00:35:51 Code: 101. DB::Exception: Received from localhost:9000. DB::Exception: Received from 127.0.0.2:9000. DB::Exception: Unexpected packet Data received from client.
[1]: https://clickhouse-test-reports.s3.yandex.net/16996/8d71564056925df1415880f382aaa169cbdf37b0/functional_stateless_tests_flaky_check_(address)/test_run.txt.out.log
While reading from AggregatingMergeTree with
SimpleAggregateFunction(String) in primary key and
optimize_aggregation_in_order perf top shows:
Samples: 1M of event 'cycles', 4000 Hz, Event count (approx.): 287759760270 lost: 0/0 drop: 0/0
Children Self Shared Object Symbol
+ 12.64% 11.39% clickhouse [.] memcpy
+ 9.08% 0.23% [unknown] [.] 0000000000000000
+ 8.45% 8.40% clickhouse [.] ProfileEvents::increment # <-- this, and in debug it has not 0.08x overhead, but 5.8x overhead
+ 7.68% 7.67% clickhouse [.] LZ4_compress_fast_extState
+ 5.29% 5.22% clickhouse [.] DB::IAggregateFunctionHelper<DB::AggregateFunctionNullUnary<true, true> >::addFree
The reason is obvious, ProfileEvents is atomic counters (and also they
are nested):
<details>
```
Samples: 7M of event 'cycles', 4000 Hz, Event count (approx.): 450726149337
ProfileEvents::increment /usr/bin/clickhouse [Percent: local period]
Percent│
│
│
│ Disassembly of section .text:
│
│ 00000000078d8900 <ProfileEvents::increment(unsigned long, unsigned long)@@Base>:
│ ProfileEvents::increment(unsigned long, unsigned long):
0.17 │ push %rbp
0.00 │ mov %rsi,%rbp
0.04 │ push %rbx
0.20 │ mov %rdi,%rbx
0.17 │ sub $0x8,%rsp
0.26 │ → callq DB::CurrentThread::getProfileEvents
│ ProfileEvents::Counters::increment(unsigned long, unsigned long):
0.00 │ lea 0x0(,%rbx,8),%rdi
0.05 │ nop
│ unsigned long std::__1::__cxx_atomic_fetch_add<unsigned long, unsigned long>(std::__1::__cxx_atomic_base_impl<unsigned long>*, unsigned long, std::__1::memory_order):
1.02 │ mov (%rax),%rdx
97.04 │ lock add %rbp,(%rdx,%rdi,1)
│ ProfileEvents::Counters::increment(unsigned long, unsigned long):
0.21 │ mov 0x10(%rax),%rax
0.04 │ test %rax,%rax
0.00 │ → jne 78d8920 <ProfileEvents::increment(unsigned long, unsigned long)@@Base+0x20>
│ ProfileEvents::increment(unsigned long, unsigned long):
0.38 │ add $0x8,%rsp
0.00 │ pop %rbx
0.04 │ pop %rbp
0.38 │ ← retq
```
</details>
These ProfileEvents was ArenaAllocChunks (it shows ~1.5M events per
second), and the reason is that the table has
SimpleAggregateFunction(String) in PK, which requires Arena.
But most of the time there Arena wasn't even used, so avoid this cost by
re-creating Arena only if it was "used" (i.e. has new chunks).
Another possibility is to avoid populating Arena::head in ctor, but this
will make the Arena code more complex, so for now this was preferred.
Also as a long-term solution it worth looking at implementing them via
RCU (to move the extra overhead out from the write code path into read
side).
* Create replicated_fetches.md
Задокументировал системную таблицу system.replicated_fetches.
* Edit and translate replicated_fetches.md
Поправил английскую версию и перевел на русский язык.
* Update replicated_fetches.md
Внес поправки.
* Update replicated_fetches.md
Внес изменения в русскую версию.
Co-authored-by: Dmitriy <sevirov@yandex-team.ru>