Default settings are not very efficient, since they do not even reuse
connections.
And when each query requires connection you can have only ~80 QPS, while
by simply enabling connection reuse (connection_auto_close=false) you
can have ~500 QPS (and by increasing connection_pool_size you can have
better QPS throughput).
So this patch allows to pass through some connection related settings
for the StorageMySQL engine, like:
- connection_pool_size=16
- connection_max_tries=3
- connection_auto_close=true
v2: remove connection_pool_default_size
v3: remove num_tries_on_connection_loss
TODO (suggested by Nikolai)
1. Build query plan fro current query (inside storage::read) up to WithMergableState
2. Check, that plan is simple enough: Aggregating - Expression - Filter - ReadFromStorage (or simplier)
3. Check, that filter is the same as filter in projection, and also expression calculates the same aggregation keys as in projection
4. Return WithMergableState if projection applies
3 will be easier to do with ActionsDAG, cause it sees all functions, and dependencies are direct (but it is possible with ExpressionActions also)
Also need to figure out how prewhere works for projections, and
row_filter_policies.
wip
* add the query data deduplication excluding duplicated parts in MergeTree family engines.
query deduplication is based on parts' UUID which should be enabled first with merge_tree setting
assign_part_uuids=1
allow_experimental_query_deduplication setting is to enable part deduplication, default ot false.
data part UUID is a mechanism of giving a data part a unique identifier.
Having UUID and deduplication mechanism provides a potential of moving parts
between shards preserving data consistency on a read path:
duplicated UUIDs will cause root executor to retry query against on of the replica explicitly
asking to exclude encountered duplicated fingerprints during a distributed query execution.
NOTE: this implementation don't provide any knobs to lock part and hence its UUID. Any mutations/merge will
update part's UUID.
* add _part_uuid virtual column, allowing to use UUIDs in predicates.
Signed-off-by: Aleksei Semiglazov <asemiglazov@cloudflare.com>
address comments
Two new settings (by analogy with MergeTree family) has been added:
- `fsync_after_insert` - Do fsync for every inserted. Will decreases
performance of inserts.
- `fsync_tmp_directory` - Do fsync for temporary directory (that is used
for async INSERT only) after all part operations (writes, renames,
etc.).
Refs: #17380 (p1)
Contains error codes with number of times they have been triggered.
Columns:
- `name` ([String](../../sql-reference/data-types/string.md)) — name of the error (`errorCodeToName`).
- `code` ([Int32](../../sql-reference/data-types/int-uint.md)) — code number of the error.
- `value` ([UInt64](../../sql-reference/data-types/int-uint.md)) - number of times this error has been happened.
**Example**
``` sql
SELECT *
FROM system.errors
WHERE value > 0
ORDER BY code ASC
LIMIT 1
┌─name─────────────┬─code─┬─value─┐
│ CANNOT_OPEN_FILE │ 76 │ 1 │
└──────────────────┴──────┴───────┘