* Add new engine to ReplacingMergeTree corresponding to the ReplacingCollapsingMergeTree
* Add new test for the new ReplacingMergeTree engine
* Limit sign value to -1/1
* Add new engine to ReplacingMergeTree corresponding to the ReplacingCollapsingMergeTree
* Add new test for the new ReplacingMergeTree engine
* Limit sign value to -1/1
* Replace sign column(Int8) by is_deleted(UInt8)
* Add new engine to ReplacingMergeTree corresponding to the ReplacingCollapsingMergeTree
* Add new test for the new ReplacingMergeTree engine
* Limit sign value to -1/1
* Replace sign column(Int8) by is_deleted(UInt8)
* Add new engine to ReplacingMergeTree corresponding to the ReplacingCollapsingMergeTree
* Add new test for the new ReplacingMergeTree engine
* Limit sign value to -1/1
* Replace sign column(Int8) by is_deleted(UInt8)
* Add keyword 'CLEANUP' when OPTIMIZE
* Cleanup uniquely when it's a replacingMergeTree
* Propagate CLEANUP information and change from 'with_cleanup' to 'cleanup'
* Cleanup data flagged as 'is_deleted'
* Fix merge when optimize and add a test
* Fix OPTIMIZE and INSERT + add tests
* New fix for cleanup at the merge
* Cleanup debug logs
* Add the SETTINGS option 'clean_deleted_rows' that can be 'never' or 'always'
* Fix regression bug; Now REplicatedMergeTree can be called as before without 'is_deleted'
* Add Replicated tests
* Disable tag 'long' for our test and cleanup some white spaces
* Update tests
* Fix tests and remove additional useless whitespace
* Fix replica test
* Style clean && add condition check for is_deleted values
* clean_deleted_rows settings is nom an enum
* Add valid default value to the clean_deleted_rows settings
* Update cleanup checkers to use the enum and fix typos in the test
* Fix submodule contrib/AMQP-CPP pointer
* Add missing messages in test reference and remove a print with non derterministic order
* fix replica test reference
* Fix edge case
* Fix a typo for the spell checker
* Fix reference
* Fix a condition to raise an error if is_deleted differ from 0/1 and cleanup
* Change tests file name and update number
* This should fix the ReplacingMergeTree parameter set
* Fix replicated parameters
* Disable allow_deprecated_syntax_for_merge_tree for our new column
* Fix a test
* Remove non deterministic order print in the test
* Test on replicas
* Remove a condition, when checking optional parameters, that should not be sueful since we disabled the deprected_syntaxe
* Revert "Remove a condition, when checking optional parameters, that should not be useful since we disabled the deprected_syntaxe"
This reverts commit b65d64c05e.
* Fix replica management and limit the number of argument to two maximum, due to the possiblity of deprecated table create/attach failing otherwise
* Test a fix for replicated log information error
* Try to add sync to have consistent results
* Change path of replicas that should cause one issue and add few prints in case it's not that
* Get cleanup info on replicas only if information found
* Fix style issues
* Try to avoid replication error 'cannot select parts...' and and replica read/write field order
* Cleanup according to PR reviews
and add tests on error raised.
* Update src/Storages/MergeTree/registerStorageMergeTree.cpp
Co-authored-by: Alexander Tokmakov <tavplubix@gmail.com>
* Select ... FINAL don't show rows with is_deleted = true
* Update and fix SELECT ... FINAL merge parameter
* Remove is_deleted rows only on the version inserted when merge
* Fix (master) updates issues
* Revert changes that should not be commited
* Add changes according to review
* Revert changes that should not be commited - part 2
---------
Co-authored-by: Alexander Tokmakov <tavplubix@gmail.com>
- lots of static_cast
- add safe_cast
- types adjustments
- config
- IStorage::read/watch
- ...
- some TODO's (to convert types in future)
P.S. That was quite a journey...
v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
The original motivation for this commit was that shared_ptr_helper used
std::shared_ptr<>() which does two heap allocations instead of
make_shared<>() which does a single allocation. Turned out that
1. the affected code (--> Storages/) is not on a hot path (rendering the
performance argument moot ...)
2. yet copying Storage objects is potentially dangerous and was
previously allowed.
Hence, this change
- removes shared_ptr_helper and as a result all inherited create() methods,
- instead, Storage objects are now created using make_shared<>() by the
caller (for that to work, many constructors had to be made public), and
- all Storage classes were marked as noncopyable using boost::noncopyable.
In sum, we are (likely) not making things faster but the code becomes
cleaner and harder to misuse.
This will reduce amount of data that may be lost at shutdown.
v2: replace dynamic_cast with IStorage::isBuffer (@kitaisreal)
v3: replace IStorage::isBuffer with IStorage::flush (@alexey-milovidov)
v4: flush() for StorageProxy/StorageTableFunction
When Buffer() is under preassure, acquiring per-layer lock may take
significant time. And so the following query may take significant amount of time:
SELECT total_bytes, total_rows FROM system.tables WHERE engine='Buffer'
Add 3 new engine arguments:
- flush_time
- flush_rows
- flush_bytes
That will be checked only for background flush, this maybe useful if
INSERT latency is "crucial".
It uses very fast CLOCK_MONOTONIC_COARSE, so this should not be a
problem.
Also note that there is no sense in using microseconds/nanoseconds since
accuracy of CLOCK_MONOTONIC_COARSE usually milliseconds.
If you push data via Buffer engine then all your queries will be done
from one user, however this is not always desired behavior, since this
will not allow to limit queries with max_concurrent_queries_for_user and
similar.
Extended OPTIMIZE ... DEDUPLICATE syntax to allow explicit (or implicit with asterisk/column transformers) list of columns to check for duplicates on.
Following syntax variants are now supported:
OPTIMIZE TABLE table DEDUPLICATE; -- the old one
OPTIMIZE TABLE table DEDUPLICATE BY *;
OPTIMIZE TABLE table DEDUPLICATE BY * EXCEPT colX;
OPTIMIZE TABLE table DEDUPLICATE BY * EXCEPT (colX, colY);
OPTIMIZE TABLE table DEDUPLICATE BY col1,col2,col3;
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex');
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex') EXCEPT colX;
OPTIMIZE TABLE table DEDUPLICATE BY COLUMNS('column-matched-by-regex') EXCEPT (colX, colY);
Note that * behaves just like in SELECT: MATERIALIZED, and ALIAS columns are not used for expansion.
Also, it is an error to specify empty list of columns, or write an expression that results in an empty list of columns, or deduplicate by an ALIAS column.
Column transformers other than EXCEPT are not supported.
Buffer engine is usually used on INSERTs, but right now there is no way
to track number of INSERTed rows per-table, since only summary metrics
exists:
- StorageBufferRows
- StorageBufferBytes
But it can be pretty useful to track INSERTed rows rate (and it can be
exposed via http_handlers for i.e. prometheus)