Previously during RESTORE you may get the following warning in logs:
<Warning> TemporaryFileOnDisk: Temporary path 'tmp/21672kaaaaa' does not exist in './disks/s3_common_disk/'
The reason is that it uses not disks/s3_common_disk/tmp but instead
disks/s3_common_disk/disks/s3_common_disk/tmp path.
Fix this by adding TemporaryFileOnDisk::getRelativePath() and use it
when appropriate.
And now - `find disks`, does not shows any more temporary leftovers.
v2: rename getPath to getAbsolutePath
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Looks like everything is OK with opentelemetry, and the reason of the
flakiness is this:
$ gg opentelemetry_start_trace_probability tests/**.xml
tests/config/users.d/opentelemetry.xml: <opentelemetry_start_trace_probability>0.1</opentelemetry_start_trace_probability>
So let's simply disable it.
And also let's stop the distributed sends to increase the failure rate
if there is some problem left.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Virtual columns did not supports queries with OR, for example query like
this (here `m` is the `Merge` table, see the test):
select key from m where (value = 10 and _table = 'v1') or (value = 20 and _table = 'v1');
Will always leads to:
Cannot find column `value` in source stream, there are only columns ...
The reason for this is that it actually executes the following queries:
SELECT key, value FROM default.d1 WHERE ((value = 10) AND ('v1' = 'v1')) OR ((value = 20) AND ('v1' = 'v1'));
SELECT key FROM default.d2 WHERE 0;
And this kind of filtering is used not only for `Merge` table but also:
- `_table` for `Merge` (already mentioned)
- `_file` for `File`
- `_idx` for `S3`
- and as well as filtering `system.*` tables by `database`/`table`/...
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Convert hashSets in parallel before merge
Before merge, if one of the lhs and rhs is singleLevelSet and the other is twoLevelSet,
then the SingleLevelSet will call convertToTwoLevel(). The convert process is not in parallel
and it will cost lots of cycle if it cosume all the singleLevelSet.
The idea of the patch is to convert all the singleLevelSets to twoLevelSets in parallel if
the hashsets are not all singleLevel or not all twoLevel.
I have tested the patch on Intel 2 x 112 vCPUs SPR server with clickbench and latest upstream
ClickHouse.
Q5 has got a big 264% performance improvement and 24 queries have got at least 5% performance
gain. The overall geomean of 43 queries has gained 7.4% more than the base code.
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
* add resize() for the data_vec in parallelizeMergePrepare()
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
* Add the performance test prepare_hash_before_merge.xml
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
* Fit the CI to rename the data set from hits_v1 to test.hits.
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
* remove the redundant branch in UniqExactSet
Co-authored-by: Nikita Taranov <nickita.taranov@gmail.com>
* Remove the empty methods and add throw exception in parallelizeMergePrepare()
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
---------
Signed-off-by: Jiebin Sun <jiebin.sun@intel.com>
Co-authored-by: Nikita Taranov <nickita.taranov@gmail.com>