Commit Graph

59 Commits

Author SHA1 Message Date
Antonio Andelic
b11f744252
Correctly disable async insert with deduplication when it's not needed (#50663)
* Correctly disable async insert when it's not used

* Better

* Add comment

* Better

* Fix tests

---------

Co-authored-by: Nikita Mikhaylov <mikhaylovnikitka@gmail.com>
2023-06-07 20:33:08 +02:00
youennL-cs
6526c2a8ab
[RFC] Replacing merge tree new engine (#41005)
* Add new engine to ReplacingMergeTree corresponding to the ReplacingCollapsingMergeTree

* Add new test for the new ReplacingMergeTree engine

* Limit sign value to -1/1

* Add new engine to ReplacingMergeTree corresponding to the ReplacingCollapsingMergeTree

* Add new test for the new ReplacingMergeTree engine

* Limit sign value to -1/1

* Replace sign column(Int8) by is_deleted(UInt8)

* Add new engine to ReplacingMergeTree corresponding to the ReplacingCollapsingMergeTree

* Add new test for the new ReplacingMergeTree engine

* Limit sign value to -1/1

* Replace sign column(Int8) by is_deleted(UInt8)

* Add new engine to ReplacingMergeTree corresponding to the ReplacingCollapsingMergeTree

* Add new test for the new ReplacingMergeTree engine

* Limit sign value to -1/1

* Replace sign column(Int8) by is_deleted(UInt8)

* Add keyword 'CLEANUP' when OPTIMIZE

* Cleanup uniquely when it's a replacingMergeTree

* Propagate CLEANUP information and change from 'with_cleanup' to 'cleanup'

* Cleanup data flagged as 'is_deleted'

* Fix merge when optimize and add a test

* Fix OPTIMIZE and INSERT + add tests

* New fix for cleanup at the merge

* Cleanup debug logs

* Add the SETTINGS option 'clean_deleted_rows' that can be 'never' or 'always'

* Fix regression bug; Now REplicatedMergeTree can be called as before without 'is_deleted'

* Add Replicated tests

* Disable tag 'long' for our test and cleanup some white spaces

* Update tests

* Fix tests and remove additional useless whitespace

* Fix replica test

* Style clean && add condition check for is_deleted values

* clean_deleted_rows settings is nom an enum

* Add valid default value to the clean_deleted_rows settings

* Update cleanup checkers to use the enum and fix typos in the test

* Fix submodule contrib/AMQP-CPP pointer

* Add missing messages in test reference and remove a print with non derterministic order

* fix replica test reference

* Fix edge case

* Fix a typo for the spell checker

* Fix reference

* Fix a condition to raise an error if is_deleted differ from 0/1 and cleanup

* Change tests file name and update number

* This should fix the ReplacingMergeTree parameter set

* Fix replicated parameters

* Disable allow_deprecated_syntax_for_merge_tree for our new column

* Fix a test

* Remove non deterministic order print in the test

* Test on replicas

* Remove a condition, when checking optional parameters, that should not be sueful since we disabled the deprected_syntaxe

* Revert "Remove a condition, when checking optional parameters, that should not be useful since we disabled the deprected_syntaxe"

This reverts commit b65d64c05e.

* Fix replica management and limit the number of argument to two maximum, due to the possiblity of deprecated table create/attach failing otherwise

* Test a fix for replicated log information error

* Try to add sync to have consistent results

* Change path of replicas that should cause one issue and add few prints in case it's not that

* Get cleanup info on replicas only if information found

* Fix style issues

* Try to avoid replication error 'cannot select parts...' and and replica read/write field order

* Cleanup according to PR reviews
 and add tests on error raised.

* Update src/Storages/MergeTree/registerStorageMergeTree.cpp

Co-authored-by: Alexander Tokmakov <tavplubix@gmail.com>

* Select ... FINAL don't show rows with is_deleted = true

* Update and fix SELECT ... FINAL merge parameter

* Remove is_deleted rows only on the version inserted when merge

* Fix (master) updates issues

* Revert changes that should not be commited

* Add changes according to review

* Revert changes that should not be commited - part 2

---------

Co-authored-by: Alexander Tokmakov <tavplubix@gmail.com>
2023-02-16 16:03:16 +03:00
Azat Khuzhin
4e76629aaf Fixes for -Wshorten-64-to-32
- lots of static_cast
- add safe_cast
- types adjustments
  - config
  - IStorage::read/watch
  - ...
- some TODO's (to convert types in future)

P.S. That was quite a journey...

v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-10-21 13:25:19 +02:00
Alexey Milovidov
5b5a16320b Suggestions from @azat 2022-08-26 03:00:56 +02:00
Nikolai Kochetov
c71256ea38 Remove some commented code. 2022-05-30 13:18:20 +00:00
Nikolai Kochetov
fd97a9d885 Move some resources 2022-05-23 19:47:32 +00:00
Robert Schulze
777b5bc15b
Don't let storages inherit from boost::noncopyable
... IStorage has deleted copy ctor / assignment already
2022-05-03 09:07:08 +02:00
Robert Schulze
330212e0f4
Remove inherited create() method + disallow copying
The original motivation for this commit was that shared_ptr_helper used
std::shared_ptr<>() which does two heap allocations instead of
make_shared<>() which does a single allocation. Turned out that
1. the affected code (--> Storages/) is not on a hot path (rendering the
performance argument moot ...)
2. yet copying Storage objects is potentially dangerous and was
   previously allowed.

Hence, this change

- removes shared_ptr_helper and as a result all inherited create() methods,

- instead, Storage objects are now created using make_shared<>() by the
  caller (for that to work, many constructors had to be made public), and

- all Storage classes were marked as noncopyable using boost::noncopyable.

In sum, we are (likely) not making things faster but the code becomes
cleaner and harder to misuse.
2022-05-02 08:46:52 +02:00
Anton Popov
836a348a9c Merge remote-tracking branch 'upstream/master' into HEAD 2022-02-01 15:23:07 +03:00
Azat Khuzhin
9948525816 Simplify different block sturcture (i.e. after ALTER) support for Buffer
v2: fix empty block in case of flush

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-01-28 11:09:56 +03:00
Anton Popov
a20922b2d3 Merge remote-tracking branch 'origin/sparse-serialization' into HEAD 2021-11-09 15:36:25 +03:00
Alexander Tokmakov
2e7e195e77 change alter_lock to std::timed_mutex 2021-10-26 13:37:00 +03:00
Nikolai Kochetov
ab28c6c855 Remove BlockInputStream interfaces. 2021-10-14 13:25:43 +03:00
Alexey Milovidov
fe6b7c77c7 Rename "common" to "base" 2021-10-02 10:13:14 +03:00
Anton Popov
e36736b50c Merge remote-tracking branch 'origin/sparse-serialization' into HEAD 2021-08-02 22:52:02 +03:00
Nikolai Kochetov
2dc5c89b66 Update Storage::write 2021-07-23 17:25:35 +03:00
Anton Popov
3ed7f5a6cc dynamic subcolumns: add snapshot for storage 2021-07-09 06:15:41 +03:00
Maksim Kita
67e9b85951 Merge ext into common 2021-06-16 23:28:41 +03:00
Azat Khuzhin
f20799f683 Destroy Buffer() tables before others (within one database)
This will reduce amount of data that may be lost at shutdown.

v2: replace dynamic_cast with IStorage::isBuffer (@kitaisreal)
v3: replace IStorage::isBuffer with IStorage::flush (@alexey-milovidov)
v4: flush() for StorageProxy/StorageTableFunction
2021-05-16 13:23:53 +03:00
Kseniia Sumarokova
99e2f83c69
Merge pull request #23548 from ucasFL/table-comment
Implement table comments
2021-05-15 19:56:16 +03:00
Maksim Kita
f3ee14d24a
Merge pull request #24066 from azat/buffer-total-lock-contention
Do not acquire lock for total_bytes/total_rows for Buffer engine
2021-05-13 11:15:06 +03:00
feng lv
c6f8ab9826 fix 2021-05-13 02:05:53 +00:00
Azat Khuzhin
074b57fe82 Do not acquire lock for total_bytes/total_rows for Buffer engine
When Buffer() is under preassure, acquiring per-layer lock may take
significant time. And so the following query may take significant amount of time:

    SELECT total_bytes, total_rows FROM system.tables WHERE engine='Buffer'
2021-05-12 23:38:00 +03:00
Amos Bird
cd6414639e
add metadata_snapshot to getQueryProcessingStage 2021-05-11 18:12:26 +08:00
feng lv
4ffe199d39 Implement table comments 2021-04-23 12:18:23 +00:00
Azat Khuzhin
19e0439629 Add ability to flush buffer only in background for StorageBuffer
Add 3 new engine arguments:
- flush_time
- flush_rows
- flush_bytes

That will be checked only for background flush, this maybe useful if
INSERT latency is "crucial".
2021-04-15 21:22:13 +03:00
Ivan
495c6e03aa
Replace all Context references with std::weak_ptr (#22297)
* Replace all Context references with std::weak_ptr

* Fix shared context captured by value

* Fix build

* Fix Context with named sessions

* Fix copy context

* Fix gcc build

* Merge with master and fix build

* Fix gcc-9 build
2021-04-11 02:33:54 +03:00
Azat Khuzhin
c4a7e81287 Add metric to track how much time is spend during waiting for Buffer layer lock
It uses very fast CLOCK_MONOTONIC_COARSE, so this should not be a
problem.
Also note that there is no sense in using microseconds/nanoseconds since
accuracy of CLOCK_MONOTONIC_COARSE usually milliseconds.
2021-04-06 21:13:24 +03:00
feng lv
51021c1164 forbid to drop a column if it's referenced by materialized view 2021-02-28 05:24:39 +00:00
Azat Khuzhin
935870b2c2 Add separate config directive for Buffer profile
If you push data via Buffer engine then all your queries will be done
from one user, however this is not always desired behavior, since this
will not allow to limit queries with max_concurrent_queries_for_user and
similar.
2021-02-10 21:40:26 +03:00
Anton Popov
11283e3d81 Merge remote-tracking branch 'upstream/master' into HEAD 2020-12-25 21:25:59 +03:00
Alexey Milovidov
a671f13595 Fix flaky test 01584_distributed_buffer_cannot_find_column 2020-12-25 04:20:09 +03:00
Anton Popov
1be39fddac fix subcolumns with some storages 2020-12-22 19:42:37 +03:00
Vasily Nemkov
70ea507dae OPTIMIZE DEDUPLICATE BY columns
Extended OPTIMIZE ... DEDUPLICATE syntax to allow explicit (or implicit with asterisk/column transformers) list of columns to check for duplicates on.

Following syntax variants are now supported:

OPTIMIZE TABLE table DEDUPLICATE; -- the old one
OPTIMIZE TABLE table DEDUPLICATE BY *;
OPTIMIZE TABLE table DEDUPLICATE BY * EXCEPT colX;
OPTIMIZE TABLE table DEDUPLICATE BY * EXCEPT (colX, colY);
OPTIMIZE TABLE table DEDUPLICATE BY col1,col2,col3;
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex');
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex') EXCEPT colX;
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex') EXCEPT (colX, colY);

Note that * behaves just like in SELECT: MATERIALIZED, and ALIAS columns are not used for expansion.
Also, it is an error to specify empty list of columns, or write an expression that results in an empty list of columns, or deduplicate by an ALIAS column.
Column transformers other than EXCEPT are not supported.
2020-12-07 09:44:07 +03:00
nikitamikhaylov
72c7cd6693 replace Context& to Settings& 2020-11-25 16:47:32 +03:00
nikitamikhaylov
68bef22fda Merge branch 'master' of github.com:ClickHouse/ClickHouse into merging-sequential-consistency 2020-11-23 16:28:35 +03:00
Amos Bird
1d9d586e20
Make global_context consistent. 2020-11-20 18:23:14 +08:00
Nikolai Kochetov
195c941c4e Merge branch 'master' into storage-read-query-plan 2020-11-10 15:02:22 +03:00
Azat Khuzhin
b838214a35 Pass non-const SelectQueryInfo (and drop mutable qualifiers) 2020-10-02 22:42:35 +03:00
Azat Khuzhin
587cde853e Avoid skipping unused shards twice (for query processing stage and read itself) 2020-10-02 22:42:09 +03:00
hchen9
b3949db00f Merge branch 'master' of https://github.com/ClickHouse/ClickHouse into trivial_count_seq_consistency 2020-10-01 21:15:26 -07:00
hchen9
a5ac39b564 Pass Context parameter for IStorage.totalRows and IStorage.totalBytes 2020-09-30 16:47:42 -07:00
Nikolai Kochetov
5ac6bc071d QueryPlan for StorageBuffer and StorageMaterializedView read. 2020-09-30 15:22:57 +03:00
alesapin
f404925397 More optimal 2020-09-23 15:19:45 +03:00
Alexey Milovidov
2a09aa53cc Support parallel INSERT for more table engines 2020-08-26 19:41:30 +03:00
Nikolai Kochetov
2cca4d5fcf Refactor Pipe [part 2]. 2020-08-03 16:54:14 +03:00
Azat Khuzhin
3bee98c6f0 Fix lifetime_bytes/lifetime_rows for Buffer direct block write 2020-07-12 01:16:05 +03:00
Azat Khuzhin
32a45d0dee Implement lifetime_rows/lifetime_bytes for Buffer engine
Buffer engine is usually used on INSERTs, but right now there is no way
to track number of INSERTed rows per-table, since only summary metrics
exists:
- StorageBufferRows
- StorageBufferBytes

But it can be pretty useful to track INSERTed rows rate (and it can be
exposed via http_handlers for i.e. prometheus)
2020-07-11 16:06:11 +03:00
alesapin
d79982f497 Better locks in Storages 2020-06-18 19:10:47 +03:00
alesapin
ed8f3b2fc4 TTL in storage in memory metadata 2020-06-17 16:39:26 +03:00