From 7baddc4cb6b503d93b7588e02553bb773f6dddab Mon Sep 17 00:00:00 2001 From: Alexander Tokmakov Date: Fri, 16 Aug 2019 21:34:54 +0300 Subject: [PATCH 01/11] update changelog --- CHANGELOG.md | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 2a0b69bcc6d..081b7f08a4d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,63 @@ +## ClickHouse release 19.13.2.19, 2019-08-14 + +### New Feature +* Added functions for working with the сustom week number [#5212](https://github.com/yandex/ClickHouse/pull/5212) ([Andy Yang](https://github.com/andyyzh)) +* New query processing pipeline. Use `experimental_use_processors=1` option to enable it. [#4914](https://github.com/yandex/ClickHouse/pull/4914) ([Nikolai Kochetov](https://github.com/KochetovNicolai)) +* It is possible to select several columns by providing a pattern of three dots before or after of a stem. [#5951](https://github.com/yandex/ClickHouse/pull/5951) ([mfridental](https://github.com/mfridental)) +* Allow to specify a list of columns with `COLUMNS('regexp')` expression that works like a more sophisticated variant of `*` asterisk. [#6038](https://github.com/yandex/ClickHouse/pull/6038) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* `CREATE TABLE AS table_function()` is now possible [#6057](https://github.com/yandex/ClickHouse/pull/6057) ([dimarub2000](https://github.com/dimarub2000)) +* Throws an exception if `config.d` file doesn't have the corresponding root element as the config file [#6123](https://github.com/yandex/ClickHouse/pull/6123) ([dimarub2000](https://github.com/dimarub2000)) +* Poor man's profiler for each query being executed [#4247](https://github.com/yandex/ClickHouse/issues/4247) ([laplab](https://github.com/laplab)) +* Adam optimizer for stochastic descent. Made it default (because it shows good quality without almost any tuning). [#6000](https://github.com/yandex/ClickHouse/pull/6000) ([Quid37](https://github.com/Quid37)) + +### Bug Fix +* For row-level security feature it's crucial for all storages to provide proper database name. Now it's implemented. [#5953](https://github.com/yandex/ClickHouse/pull/5953) ([Ivan](https://github.com/abyss7)) +* Now client could receive lags with any desired level. [#5964](https://github.com/yandex/ClickHouse/pull/5964) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)) +* Fixed `DoubleDelta` encoding cases for random `Int32` and `Int64`. [#5998](https://github.com/yandex/ClickHouse/pull/5998) ([Vasily Nemkov](https://github.com/Enmk)) +* Fix client version number which is able to read additional progress data from the server. [#6018](https://github.com/yandex/ClickHouse/pull/6018) ([alesapin](https://github.com/alesapin)) +* Fixed overestimation of `max_rows_to_read` if the setting `merge_tree_uniform_read_distribution` is set to 0. [#6019](https://github.com/yandex/ClickHouse/pull/6019) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fix backward compatibility for `CHECK QUERY`. [#6024](https://github.com/yandex/ClickHouse/pull/6024) ([alesapin](https://github.com/alesapin)) +* After introduction of virtual columns in storages `DESCRIBE TABLE` started to show them too, it was unexpected and now is fixed. [#6040](https://github.com/yandex/ClickHouse/pull/6040) ([Ivan](https://github.com/abyss7)) +* Fix non-deterministic result of `uniq` aggregate function in extreme rare cases. The bug was present in all ClickHouse versions. [#6058](https://github.com/yandex/ClickHouse/pull/6058) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fix segfault when we set a little bit too high CIDR on the function `IPv6CIDRToRange`. [#6068](https://github.com/yandex/ClickHouse/pull/6068) ([Guillaume Tassery](https://github.com/YiuRULE)) +* Fix the situation when Kafka consumer got paused before subscription and not resumed afterwards. [#6075](https://github.com/yandex/ClickHouse/pull/6075) ([Ivan](https://github.com/abyss7)) +* Fixed useless and incorrect condition on update field for initial loading of external dictionaries via ODBC, MySQL, ClickHouse and HTTP. This fixes [#6069](https://github.com/yandex/ClickHouse/issues/6069). [#6083](https://github.com/yandex/ClickHouse/pull/6083) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fixed irrelevant exception in cast of `LowCardinality(Nullable)` to not-Nullable column in case if it doesn't contain Nulls (e.g. in query like `SELECT CAST(CAST('Hello' AS LowCardinality(Nullable(String))) AS String)`. This fixes [#6094](https://github.com/yandex/ClickHouse/issues/6094). [#6119](https://github.com/yandex/ClickHouse/pull/6119) ([Nikolai Kochetov](https://github.com/KochetovNicolai)) +* Fix bug with writing secondary indices marks with adaptive granularity. [#6126](https://github.com/yandex/ClickHouse/pull/6126) ([alesapin](https://github.com/alesapin)) + +### Improvement +* The setting `input_format_defaults_for_omitted_fields` is enabled by default. It enables calculation of complex default expressions for ommitted fields in `JSONEachRow` format. It should be the expected behaviour but may lead to neglible performance difference or subtle incompatibilities. [#6043](https://github.com/yandex/ClickHouse/pull/6043) ([Artem Zuikov](https://github.com/4ertus2)) + +### Performance Improvement +* Optimize `count()` [#6028](https://github.com/yandex/ClickHouse/pull/6028) ([Amos Bird](https://github.com/amosbird)) + +### Build/Testing/Packaging Improvement +* Fixed MSan report. [#6144](https://github.com/yandex/ClickHouse/pull/6144) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Report memory usage in performance tests. [#5899](https://github.com/yandex/ClickHouse/pull/5899) ([akuzm](https://github.com/akuzm)) +* Add ability to sign .rpm clickhouse packages. [#5977](https://github.com/yandex/ClickHouse/pull/5977) ([alesapin](https://github.com/alesapin)) +* Add dependencies for RPM packages [#6023](https://github.com/yandex/ClickHouse/pull/6023) ([alesapin](https://github.com/alesapin)) +* Fix build with external `libcxx` [#6010](https://github.com/yandex/ClickHouse/pull/6010) ([Ivan](https://github.com/abyss7)) +* Fix shared build with `rdkafka` library [#6101](https://github.com/yandex/ClickHouse/pull/6101) ([Ivan](https://github.com/abyss7)) + +## ClickHouse release 19.11.7.40, 2019-08-14 + +### Bug fix +* Fix segfault when using `arrayReduce` for constant arguments. [#6326](https://github.com/yandex/ClickHouse/pull/6326) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fixed `toFloat()` monotonicity. [#6374](https://github.com/yandex/ClickHouse/pull/6374) ([dimarub2000](https://github.com/dimarub2000)) +* Fix segfault with enabled `optimize_skip_unused_shards` and missing sharding key. [#6384](https://github.com/yandex/ClickHouse/pull/6384) ([CurtizJ](https://github.com/CurtizJ)) +* Fixed logic of `arrayEnumerateUniqRanked` function. [#6423](https://github.com/yandex/ClickHouse/pull/6423) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Removed extra verbose logging from MySQL handler. [#6389](https://github.com/yandex/ClickHouse/pull/6389) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fix wrong behavior and possible segfaults in `topK` and `topKWeighted` aggregated functions. [#6404](https://github.com/yandex/ClickHouse/pull/6404) ([CurtizJ](https://github.com/CurtizJ)) +* Do not expose virtual columns in `system.columns` table. This is required for backward compatibility. [#6406](https://github.com/yandex/ClickHouse/pull/6406) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fix bug with memory allocation for string fields in complex key cache dictionary. [#6447](https://github.com/yandex/ClickHouse/pull/6447) ([alesapin](https://github.com/alesapin)) +* Fix bug with enabling adaptive granularity when creating new replica for `Replicated*MergeTree` table. [#6452](https://github.com/yandex/ClickHouse/pull/6452) ([alesapin](https://github.com/alesapin)) +* Fix infinite loop when reading Kafka messages. [#6354](https://github.com/yandex/ClickHouse/pull/6354) ([abyss7](https://github.com/abyss7)) +* Fixed the possibility of a fabricated query to cause server crash due to stack overflow in SQL parser and possibility of stack overflow in `Merge` and `Distributed` tables [#6433](https://github.com/yandex/ClickHouse/pull/6433) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fixed Gorilla encoding error on small sequences. [#6344](https://github.com/yandex/ClickHouse/pull/6444) ([Enmk](https://github.com/Enmk)) + +### Improvement +* Allow user to override `poll_interval` and `idle_connection_timeout` settings on connection. [#6230](https://github.com/yandex/ClickHouse/pull/6230) ([alexey-milovidov](https://github.com/alexey-milovidov)) + ## ClickHouse release 19.11.5.28, 2019-08-05 ### Bug fix From 4bc8419042ac55d294b1fb4c5ef231348a2896b7 Mon Sep 17 00:00:00 2001 From: Alexander Tokmakov Date: Sun, 18 Aug 2019 21:39:30 +0300 Subject: [PATCH 02/11] fixes --- CHANGELOG.md | 37 +++++++++++++------------------------ 1 file changed, 13 insertions(+), 24 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 081b7f08a4d..6ec0ea442fc 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,41 +1,30 @@ ## ClickHouse release 19.13.2.19, 2019-08-14 ### New Feature +* Allow to specify a list of columns with `COLUMNS('regexp')` expression that works like a more sophisticated variant of `*` asterisk. [#5951](https://github.com/yandex/ClickHouse/pull/5951) ([mfridental](https://github.com/mfridental)), ([alexey-milovidov](https://github.com/alexey-milovidov)) +* `CREATE TABLE AS table_function()` is now possible [#6057](https://github.com/yandex/ClickHouse/pull/6057) ([dimarub2000](https://github.com/dimarub2000)) +* Poor man's profiler for each query being executed. It stops query execution thread at random time points to collect current backtrace. After some time a developer can analyze which code points profiler visits most often to find probable efficiency issues. [#4247](https://github.com/yandex/ClickHouse/issues/4247) ([laplab](https://github.com/laplab)) +* Adam optimizer for stochastic gradient descent is used by default in `stochasticLinearRegression()` and `stochasticLogisticRegression()` aggregate functions, because it shows good quality without almost any tuning. [#6000](https://github.com/yandex/ClickHouse/pull/6000) ([Quid37](https://github.com/Quid37)) + +### Experimental features * Added functions for working with the сustom week number [#5212](https://github.com/yandex/ClickHouse/pull/5212) ([Andy Yang](https://github.com/andyyzh)) * New query processing pipeline. Use `experimental_use_processors=1` option to enable it. [#4914](https://github.com/yandex/ClickHouse/pull/4914) ([Nikolai Kochetov](https://github.com/KochetovNicolai)) -* It is possible to select several columns by providing a pattern of three dots before or after of a stem. [#5951](https://github.com/yandex/ClickHouse/pull/5951) ([mfridental](https://github.com/mfridental)) -* Allow to specify a list of columns with `COLUMNS('regexp')` expression that works like a more sophisticated variant of `*` asterisk. [#6038](https://github.com/yandex/ClickHouse/pull/6038) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* `CREATE TABLE AS table_function()` is now possible [#6057](https://github.com/yandex/ClickHouse/pull/6057) ([dimarub2000](https://github.com/dimarub2000)) -* Throws an exception if `config.d` file doesn't have the corresponding root element as the config file [#6123](https://github.com/yandex/ClickHouse/pull/6123) ([dimarub2000](https://github.com/dimarub2000)) -* Poor man's profiler for each query being executed [#4247](https://github.com/yandex/ClickHouse/issues/4247) ([laplab](https://github.com/laplab)) -* Adam optimizer for stochastic descent. Made it default (because it shows good quality without almost any tuning). [#6000](https://github.com/yandex/ClickHouse/pull/6000) ([Quid37](https://github.com/Quid37)) ### Bug Fix -* For row-level security feature it's crucial for all storages to provide proper database name. Now it's implemented. [#5953](https://github.com/yandex/ClickHouse/pull/5953) ([Ivan](https://github.com/abyss7)) -* Now client could receive lags with any desired level. [#5964](https://github.com/yandex/ClickHouse/pull/5964) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)) -* Fixed `DoubleDelta` encoding cases for random `Int32` and `Int64`. [#5998](https://github.com/yandex/ClickHouse/pull/5998) ([Vasily Nemkov](https://github.com/Enmk)) -* Fix client version number which is able to read additional progress data from the server. [#6018](https://github.com/yandex/ClickHouse/pull/6018) ([alesapin](https://github.com/alesapin)) +* `RENAME` queries now work with all storages. [#5953](https://github.com/yandex/ClickHouse/pull/5953) ([Ivan](https://github.com/abyss7)) +* Now client could receive logs from server with any desired level by setting `send_logs_level` regardless to the log level specified in server settings. [#5964](https://github.com/yandex/ClickHouse/pull/5964) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)) +* Fixed `DoubleDelta` encoding of `Int64` for large `DoubleDelta` values, improved `DoubleDelta` encoding for random data for `Int32`. [#5998](https://github.com/yandex/ClickHouse/pull/5998) ([Vasily Nemkov](https://github.com/Enmk)) * Fixed overestimation of `max_rows_to_read` if the setting `merge_tree_uniform_read_distribution` is set to 0. [#6019](https://github.com/yandex/ClickHouse/pull/6019) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Fix backward compatibility for `CHECK QUERY`. [#6024](https://github.com/yandex/ClickHouse/pull/6024) ([alesapin](https://github.com/alesapin)) -* After introduction of virtual columns in storages `DESCRIBE TABLE` started to show them too, it was unexpected and now is fixed. [#6040](https://github.com/yandex/ClickHouse/pull/6040) ([Ivan](https://github.com/abyss7)) -* Fix non-deterministic result of `uniq` aggregate function in extreme rare cases. The bug was present in all ClickHouse versions. [#6058](https://github.com/yandex/ClickHouse/pull/6058) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Fix segfault when we set a little bit too high CIDR on the function `IPv6CIDRToRange`. [#6068](https://github.com/yandex/ClickHouse/pull/6068) ([Guillaume Tassery](https://github.com/YiuRULE)) -* Fix the situation when Kafka consumer got paused before subscription and not resumed afterwards. [#6075](https://github.com/yandex/ClickHouse/pull/6075) ([Ivan](https://github.com/abyss7)) -* Fixed useless and incorrect condition on update field for initial loading of external dictionaries via ODBC, MySQL, ClickHouse and HTTP. This fixes [#6069](https://github.com/yandex/ClickHouse/issues/6069). [#6083](https://github.com/yandex/ClickHouse/pull/6083) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Fixed irrelevant exception in cast of `LowCardinality(Nullable)` to not-Nullable column in case if it doesn't contain Nulls (e.g. in query like `SELECT CAST(CAST('Hello' AS LowCardinality(Nullable(String))) AS String)`. This fixes [#6094](https://github.com/yandex/ClickHouse/issues/6094). [#6119](https://github.com/yandex/ClickHouse/pull/6119) ([Nikolai Kochetov](https://github.com/KochetovNicolai)) -* Fix bug with writing secondary indices marks with adaptive granularity. [#6126](https://github.com/yandex/ClickHouse/pull/6126) ([alesapin](https://github.com/alesapin)) +* Throws an exception if `config.d` file doesn't have the corresponding root element as the config file [#6123](https://github.com/yandex/ClickHouse/pull/6123) ([dimarub2000](https://github.com/dimarub2000)) ### Improvement -* The setting `input_format_defaults_for_omitted_fields` is enabled by default. It enables calculation of complex default expressions for ommitted fields in `JSONEachRow` format. It should be the expected behaviour but may lead to neglible performance difference or subtle incompatibilities. [#6043](https://github.com/yandex/ClickHouse/pull/6043) ([Artem Zuikov](https://github.com/4ertus2)) +* The setting `input_format_defaults_for_omitted_fields` is enabled by default. It enables calculation of complex default expressions for omitted fields in `JSONEachRow` and `CSV*` formats. It should be the expected behaviour but may lead to negligible performance difference or subtle incompatibilities. [#6043](https://github.com/yandex/ClickHouse/pull/6043) ([Artem Zuikov](https://github.com/4ertus2)), [#5625](https://github.com/yandex/ClickHouse/pull/5625) ([akuzm](https://github.com/akuzm)) ### Performance Improvement -* Optimize `count()` [#6028](https://github.com/yandex/ClickHouse/pull/6028) ([Amos Bird](https://github.com/amosbird)) +* Optimize `count()`. Now it uses the smallest column (if possible). [#6028](https://github.com/yandex/ClickHouse/pull/6028) ([Amos Bird](https://github.com/amosbird)) ### Build/Testing/Packaging Improvement -* Fixed MSan report. [#6144](https://github.com/yandex/ClickHouse/pull/6144) ([alexey-milovidov](https://github.com/alexey-milovidov)) * Report memory usage in performance tests. [#5899](https://github.com/yandex/ClickHouse/pull/5899) ([akuzm](https://github.com/akuzm)) -* Add ability to sign .rpm clickhouse packages. [#5977](https://github.com/yandex/ClickHouse/pull/5977) ([alesapin](https://github.com/alesapin)) -* Add dependencies for RPM packages [#6023](https://github.com/yandex/ClickHouse/pull/6023) ([alesapin](https://github.com/alesapin)) * Fix build with external `libcxx` [#6010](https://github.com/yandex/ClickHouse/pull/6010) ([Ivan](https://github.com/abyss7)) * Fix shared build with `rdkafka` library [#6101](https://github.com/yandex/ClickHouse/pull/6101) ([Ivan](https://github.com/abyss7)) @@ -53,7 +42,7 @@ * Fix bug with enabling adaptive granularity when creating new replica for `Replicated*MergeTree` table. [#6452](https://github.com/yandex/ClickHouse/pull/6452) ([alesapin](https://github.com/alesapin)) * Fix infinite loop when reading Kafka messages. [#6354](https://github.com/yandex/ClickHouse/pull/6354) ([abyss7](https://github.com/abyss7)) * Fixed the possibility of a fabricated query to cause server crash due to stack overflow in SQL parser and possibility of stack overflow in `Merge` and `Distributed` tables [#6433](https://github.com/yandex/ClickHouse/pull/6433) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Fixed Gorilla encoding error on small sequences. [#6344](https://github.com/yandex/ClickHouse/pull/6444) ([Enmk](https://github.com/Enmk)) +* Fixed Gorilla encoding error on small sequences. [#6444](https://github.com/yandex/ClickHouse/pull/6444) ([Enmk](https://github.com/Enmk)) ### Improvement * Allow user to override `poll_interval` and `idle_connection_timeout` settings on connection. [#6230](https://github.com/yandex/ClickHouse/pull/6230) ([alexey-milovidov](https://github.com/alexey-milovidov)) From 6d29ed99d9b0270dc6e14438f914bd5f04fef001 Mon Sep 17 00:00:00 2001 From: alesapin Date: Mon, 19 Aug 2019 13:37:04 +0300 Subject: [PATCH 03/11] Fix bug with enable_mixed_granularity_parts and mutations --- .../MergeTree/IMergedBlockOutputStream.cpp | 15 ++++++++------- .../Storages/MergeTree/IMergedBlockOutputStream.h | 5 ++++- dbms/src/Storages/MergeTree/MergeTreeData.cpp | 3 ++- .../MergeTree/MergeTreeDataMergerMutator.cpp | 4 +++- .../MergeTree/MergedColumnOnlyOutputStream.cpp | 6 ++++-- .../MergeTree/MergedColumnOnlyOutputStream.h | 3 ++- .../integration/test_adaptive_granularity/test.py | 11 +++++++++++ 7 files changed, 34 insertions(+), 13 deletions(-) diff --git a/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.cpp b/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.cpp index 4109a5511af..407fcb18ad5 100644 --- a/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.cpp +++ b/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.cpp @@ -25,25 +25,26 @@ IMergedBlockOutputStream::IMergedBlockOutputStream( size_t aio_threshold_, bool blocks_are_granules_size_, const std::vector & indices_to_recalc, - const MergeTreeIndexGranularity & index_granularity_) + const MergeTreeIndexGranularity & index_granularity_, + const MergeTreeIndexGranularityInfo * index_granularity_info_) : storage(storage_) , part_path(part_path_) , min_compress_block_size(min_compress_block_size_) , max_compress_block_size(max_compress_block_size_) , aio_threshold(aio_threshold_) - , marks_file_extension(storage.canUseAdaptiveGranularity() ? getAdaptiveMrkExtension() : getNonAdaptiveMrkExtension()) + , can_use_adaptive_granularity(index_granularity_info_ ? index_granularity_info_->is_adaptive : storage.canUseAdaptiveGranularity()) + , marks_file_extension(can_use_adaptive_granularity ? getAdaptiveMrkExtension() : getNonAdaptiveMrkExtension()) , blocks_are_granules_size(blocks_are_granules_size_) , index_granularity(index_granularity_) , compute_granularity(index_granularity.empty()) , codec(std::move(codec_)) , skip_indices(indices_to_recalc) - , with_final_mark(storage.settings.write_final_mark && storage.canUseAdaptiveGranularity()) + , with_final_mark(storage.settings.write_final_mark && can_use_adaptive_granularity) { if (blocks_are_granules_size && !index_granularity.empty()) throw Exception("Can't take information about index granularity from blocks, when non empty index_granularity array specified", ErrorCodes::LOGICAL_ERROR); } - void IMergedBlockOutputStream::addStreams( const String & path, const String & name, @@ -145,7 +146,7 @@ void IMergedBlockOutputStream::fillIndexGranularity(const Block & block) blocks_are_granules_size, index_offset, index_granularity, - storage.canUseAdaptiveGranularity()); + can_use_adaptive_granularity); } void IMergedBlockOutputStream::writeSingleMark( @@ -176,7 +177,7 @@ void IMergedBlockOutputStream::writeSingleMark( writeIntBinary(stream.plain_hashing.count(), stream.marks); writeIntBinary(stream.compressed.offset(), stream.marks); - if (storage.canUseAdaptiveGranularity()) + if (can_use_adaptive_granularity) writeIntBinary(number_of_rows, stream.marks); }, path); } @@ -362,7 +363,7 @@ void IMergedBlockOutputStream::calculateAndSerializeSkipIndices( writeIntBinary(stream.compressed.offset(), stream.marks); /// Actually this numbers is redundant, but we have to store them /// to be compatible with normal .mrk2 file format - if (storage.canUseAdaptiveGranularity()) + if (can_use_adaptive_granularity) writeIntBinary(1UL, stream.marks); ++skip_index_current_mark; diff --git a/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.h b/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.h index cbf78c1a2ea..97c7922042d 100644 --- a/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.h +++ b/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.h @@ -1,6 +1,7 @@ #pragma once #include +#include #include #include #include @@ -23,7 +24,8 @@ public: size_t aio_threshold_, bool blocks_are_granules_size_, const std::vector & indices_to_recalc, - const MergeTreeIndexGranularity & index_granularity_); + const MergeTreeIndexGranularity & index_granularity_, + const MergeTreeIndexGranularityInfo * index_granularity_info_ = nullptr); using WrittenOffsetColumns = std::set; @@ -141,6 +143,7 @@ protected: size_t current_mark = 0; size_t skip_index_mark = 0; + const bool can_use_adaptive_granularity; const std::string marks_file_extension; const bool blocks_are_granules_size; diff --git a/dbms/src/Storages/MergeTree/MergeTreeData.cpp b/dbms/src/Storages/MergeTree/MergeTreeData.cpp index 3b57c27d3e9..d8871b9e1a8 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeData.cpp +++ b/dbms/src/Storages/MergeTree/MergeTreeData.cpp @@ -1581,7 +1581,8 @@ void MergeTreeData::alterDataPart( true /* skip_offsets */, {}, unused_written_offsets, - part->index_granularity); + part->index_granularity, + &part->index_granularity_info); in.readPrefix(); out.writePrefix(); diff --git a/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp b/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp index 19d775890d8..74193fa7156 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp +++ b/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp @@ -934,6 +934,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor new_data_part->relative_path = "tmp_mut_" + future_part.name; new_data_part->is_temp = true; new_data_part->ttl_infos = source_part->ttl_infos; + new_data_part->index_granularity_info = source_part->index_granularity_info; String new_part_tmp_path = new_data_part->getFullPath(); @@ -1069,7 +1070,8 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor /* skip_offsets = */ false, std::vector(indices_to_recalc.begin(), indices_to_recalc.end()), unused_written_offsets, - source_part->index_granularity + source_part->index_granularity, + &source_part->index_granularity_info ); in->readPrefix(); diff --git a/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.cpp b/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.cpp index e79ec7dd046..3c15bd54df2 100644 --- a/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.cpp +++ b/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.cpp @@ -8,14 +8,16 @@ MergedColumnOnlyOutputStream::MergedColumnOnlyOutputStream( CompressionCodecPtr default_codec_, bool skip_offsets_, const std::vector & indices_to_recalc_, WrittenOffsetColumns & already_written_offset_columns_, - const MergeTreeIndexGranularity & index_granularity_) + const MergeTreeIndexGranularity & index_granularity_, + const MergeTreeIndexGranularityInfo * index_granularity_info_) : IMergedBlockOutputStream( storage_, part_path_, storage_.global_context.getSettings().min_compress_block_size, storage_.global_context.getSettings().max_compress_block_size, default_codec_, storage_.global_context.getSettings().min_bytes_to_use_direct_io, false, indices_to_recalc_, - index_granularity_), + index_granularity_, + index_granularity_info_), header(header_), sync(sync_), skip_offsets(skip_offsets_), already_written_offset_columns(already_written_offset_columns_) { diff --git a/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.h b/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.h index b8d637f37fb..8970bf19565 100644 --- a/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.h +++ b/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.h @@ -17,7 +17,8 @@ public: CompressionCodecPtr default_codec_, bool skip_offsets_, const std::vector & indices_to_recalc_, WrittenOffsetColumns & already_written_offset_columns_, - const MergeTreeIndexGranularity & index_granularity_); + const MergeTreeIndexGranularity & index_granularity_, + const MergeTreeIndexGranularityInfo * index_granularity_info_ = nullptr); Block getHeader() const override { return header; } void write(const Block & block) override; diff --git a/dbms/tests/integration/test_adaptive_granularity/test.py b/dbms/tests/integration/test_adaptive_granularity/test.py index db653427f02..50b43fc08ec 100644 --- a/dbms/tests/integration/test_adaptive_granularity/test.py +++ b/dbms/tests/integration/test_adaptive_granularity/test.py @@ -288,6 +288,17 @@ def test_mixed_granularity_single_node(start_dynamic_cluster, node): node.exec_in_container(["bash", "-c", "find {p} -name '*.mrk' | grep '.*'".format(p=path_to_old_part)]) # check that we have non adaptive files + node.query("ALTER TABLE table_with_default_granularity UPDATE dummy = dummy + 1 WHERE 1") + # still works + assert node.query("SELECT count() from table_with_default_granularity") == '6\n' + + node.query("ALTER TABLE table_with_default_granularity MODIFY COLUMN dummy String") + node.query("ALTER TABLE table_with_default_granularity ADD COLUMN dummy2 Float64") + + #still works + assert node.query("SELECT count() from table_with_default_granularity") == '6\n' + + def test_version_update_two_nodes(start_dynamic_cluster): node11.query("INSERT INTO table_with_default_granularity VALUES (toDate('2018-10-01'), 1, 333), (toDate('2018-10-02'), 2, 444)") node12.query("SYSTEM SYNC REPLICA table_with_default_granularity") From c0a8b7d547d95e2ffcc035742d37382ed64a6be6 Mon Sep 17 00:00:00 2001 From: Ivan Blinkov Date: Mon, 19 Aug 2019 14:11:29 +0300 Subject: [PATCH 04/11] [experimental] auto-mark documentation PRs with labels (#6544) --- .github/label-pr.yml | 2 ++ .github/main.workflow | 9 +++++++++ 2 files changed, 11 insertions(+) create mode 100644 .github/label-pr.yml create mode 100644 .github/main.workflow diff --git a/.github/label-pr.yml b/.github/label-pr.yml new file mode 100644 index 00000000000..4ae73a2e720 --- /dev/null +++ b/.github/label-pr.yml @@ -0,0 +1,2 @@ +- regExp: ".*\\.md$" + labels: ["documentation", "pr-documentation"] diff --git a/.github/main.workflow b/.github/main.workflow new file mode 100644 index 00000000000..a450195b955 --- /dev/null +++ b/.github/main.workflow @@ -0,0 +1,9 @@ +workflow "Main workflow" { + resolves = ["Label PR"] + on = "pull_request" +} + +action "Label PR" { + uses = "decathlon/pull-request-labeler-action@v1.0.0" + secrets = ["GITHUB_TOKEN"] +} From 04f553c628a061984a6846531f4af891f645a7ce Mon Sep 17 00:00:00 2001 From: alesapin Date: Mon, 19 Aug 2019 15:16:20 +0300 Subject: [PATCH 05/11] Fix images with coverage --- docker/images.json | 4 ++-- docker/test/stateful_with_coverage/Dockerfile | 2 +- docker/test/stateless_with_coverage/Dockerfile | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docker/images.json b/docker/images.json index e2282fcb653..fef364a942f 100644 --- a/docker/images.json +++ b/docker/images.json @@ -7,9 +7,9 @@ "docker/test/performance": "yandex/clickhouse-performance-test", "docker/test/pvs": "yandex/clickhouse-pvs-test", "docker/test/stateful": "yandex/clickhouse-stateful-test", - "docker/test/stateful_with_coverage": "yandex/clickhouse-stateful-with-coverage-test", + "docker/test/stateful_with_coverage": "yandex/clickhouse-stateful-test-with-coverage", "docker/test/stateless": "yandex/clickhouse-stateless-test", - "docker/test/stateless_with_coverage": "yandex/clickhouse-stateless-with-coverage-test", + "docker/test/stateless_with_coverage": "yandex/clickhouse-stateless-test-with-coverage", "docker/test/unit": "yandex/clickhouse-unit-test", "docker/test/stress": "yandex/clickhouse-stress-test", "dbms/tests/integration/image": "yandex/clickhouse-integration-tests-runner" diff --git a/docker/test/stateful_with_coverage/Dockerfile b/docker/test/stateful_with_coverage/Dockerfile index 2a566bdcf01..863e55e6326 100644 --- a/docker/test/stateful_with_coverage/Dockerfile +++ b/docker/test/stateful_with_coverage/Dockerfile @@ -1,7 +1,7 @@ # docker build -t yandex/clickhouse-stateful-test . FROM yandex/clickhouse-stateless-test -RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic main" >> /etc/apt/sources.list +RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic-9 main" >> /etc/apt/sources.list RUN apt-get update -y \ && env DEBIAN_FRONTEND=noninteractive \ diff --git a/docker/test/stateless_with_coverage/Dockerfile b/docker/test/stateless_with_coverage/Dockerfile index b9da18223ab..afb46533b16 100644 --- a/docker/test/stateless_with_coverage/Dockerfile +++ b/docker/test/stateless_with_coverage/Dockerfile @@ -1,7 +1,7 @@ # docker build -t yandex/clickhouse-stateless-with-coverage-test . FROM yandex/clickhouse-deb-builder -RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic main" >> /etc/apt/sources.list +RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic-9 main" >> /etc/apt/sources.list RUN apt-get update -y \ && env DEBIAN_FRONTEND=noninteractive \ From 646071f360525f831cc5f8066dbcfc3f1b853935 Mon Sep 17 00:00:00 2001 From: Ivan Blinkov Date: Mon, 19 Aug 2019 16:13:58 +0300 Subject: [PATCH 06/11] Update roadmap.md (#6545) --- docs/en/roadmap.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/docs/en/roadmap.md b/docs/en/roadmap.md index 34307e519b0..11f1f793235 100644 --- a/docs/en/roadmap.md +++ b/docs/en/roadmap.md @@ -1,15 +1,14 @@ # Roadmap -## Q2 2019 +## Q3 2019 - DDL for dictionaries - Integration with S3-like object stores - Multiple storages for hot/cold data, JBOD support -## Q3 2019 +## Q4 2019 -- JOIN execution improvements: - - Distributed join not limited by memory +- JOIN not limited by available memory - Resource pools for more precise distribution of cluster capacity between users - Fine-grained authorization - Integration with external authentication services From 7e600ccc06fa107bfee18a9604422570c63df1da Mon Sep 17 00:00:00 2001 From: Ivan Blinkov Date: Mon, 19 Aug 2019 16:15:46 +0300 Subject: [PATCH 07/11] revert #6544 (#6547) From 943a7480b5829b17f20c47906c083832ff81a231 Mon Sep 17 00:00:00 2001 From: alexey-milovidov Date: Mon, 19 Aug 2019 16:43:59 +0300 Subject: [PATCH 08/11] Update CHANGELOG.md --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 6ec0ea442fc..fe803d3259d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,7 +3,7 @@ ### New Feature * Allow to specify a list of columns with `COLUMNS('regexp')` expression that works like a more sophisticated variant of `*` asterisk. [#5951](https://github.com/yandex/ClickHouse/pull/5951) ([mfridental](https://github.com/mfridental)), ([alexey-milovidov](https://github.com/alexey-milovidov)) * `CREATE TABLE AS table_function()` is now possible [#6057](https://github.com/yandex/ClickHouse/pull/6057) ([dimarub2000](https://github.com/dimarub2000)) -* Poor man's profiler for each query being executed. It stops query execution thread at random time points to collect current backtrace. After some time a developer can analyze which code points profiler visits most often to find probable efficiency issues. [#4247](https://github.com/yandex/ClickHouse/issues/4247) ([laplab](https://github.com/laplab)) +* Sampling profiler on query level. [Example](https://gist.github.com/alexey-milovidov/92758583dd41c24c360fdb8d6a4da194). [#4247](https://github.com/yandex/ClickHouse/issues/4247) ([laplab](https://github.com/laplab)) [#6124](https://github.com/yandex/ClickHouse/pull/6124) ([alexey-milovidov](https://github.com/alexey-milovidov)) [#6250](https://github.com/yandex/ClickHouse/pull/6250) [#6283](https://github.com/yandex/ClickHouse/pull/6283) [#6386](https://github.com/yandex/ClickHouse/pull/6386) * Adam optimizer for stochastic gradient descent is used by default in `stochasticLinearRegression()` and `stochasticLogisticRegression()` aggregate functions, because it shows good quality without almost any tuning. [#6000](https://github.com/yandex/ClickHouse/pull/6000) ([Quid37](https://github.com/Quid37)) ### Experimental features From d77c02ecf6ef230d453836e609daba391eb9b62e Mon Sep 17 00:00:00 2001 From: alexey-milovidov Date: Mon, 19 Aug 2019 16:46:26 +0300 Subject: [PATCH 09/11] Update CHANGELOG.md --- CHANGELOG.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index fe803d3259d..d69a6c68e76 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,14 +1,14 @@ ## ClickHouse release 19.13.2.19, 2019-08-14 ### New Feature +* Sampling profiler on query level. [Example](https://gist.github.com/alexey-milovidov/92758583dd41c24c360fdb8d6a4da194). [#4247](https://github.com/yandex/ClickHouse/issues/4247) ([laplab](https://github.com/laplab)) [#6124](https://github.com/yandex/ClickHouse/pull/6124) ([alexey-milovidov](https://github.com/alexey-milovidov)) [#6250](https://github.com/yandex/ClickHouse/pull/6250) [#6283](https://github.com/yandex/ClickHouse/pull/6283) [#6386](https://github.com/yandex/ClickHouse/pull/6386) * Allow to specify a list of columns with `COLUMNS('regexp')` expression that works like a more sophisticated variant of `*` asterisk. [#5951](https://github.com/yandex/ClickHouse/pull/5951) ([mfridental](https://github.com/mfridental)), ([alexey-milovidov](https://github.com/alexey-milovidov)) * `CREATE TABLE AS table_function()` is now possible [#6057](https://github.com/yandex/ClickHouse/pull/6057) ([dimarub2000](https://github.com/dimarub2000)) -* Sampling profiler on query level. [Example](https://gist.github.com/alexey-milovidov/92758583dd41c24c360fdb8d6a4da194). [#4247](https://github.com/yandex/ClickHouse/issues/4247) ([laplab](https://github.com/laplab)) [#6124](https://github.com/yandex/ClickHouse/pull/6124) ([alexey-milovidov](https://github.com/alexey-milovidov)) [#6250](https://github.com/yandex/ClickHouse/pull/6250) [#6283](https://github.com/yandex/ClickHouse/pull/6283) [#6386](https://github.com/yandex/ClickHouse/pull/6386) * Adam optimizer for stochastic gradient descent is used by default in `stochasticLinearRegression()` and `stochasticLogisticRegression()` aggregate functions, because it shows good quality without almost any tuning. [#6000](https://github.com/yandex/ClickHouse/pull/6000) ([Quid37](https://github.com/Quid37)) +* Added functions for working with the сustom week number [#5212](https://github.com/yandex/ClickHouse/pull/5212) ([Andy Yang](https://github.com/andyyzh)) ### Experimental features -* Added functions for working with the сustom week number [#5212](https://github.com/yandex/ClickHouse/pull/5212) ([Andy Yang](https://github.com/andyyzh)) -* New query processing pipeline. Use `experimental_use_processors=1` option to enable it. [#4914](https://github.com/yandex/ClickHouse/pull/4914) ([Nikolai Kochetov](https://github.com/KochetovNicolai)) +* New query processing pipeline. Use `experimental_use_processors=1` option to enable it. Use for your own trouble. [#4914](https://github.com/yandex/ClickHouse/pull/4914) ([Nikolai Kochetov](https://github.com/KochetovNicolai)) ### Bug Fix * `RENAME` queries now work with all storages. [#5953](https://github.com/yandex/ClickHouse/pull/5953) ([Ivan](https://github.com/abyss7)) From 4ded8deea299ac23e8a20bd1bad32d6cd5fff5c7 Mon Sep 17 00:00:00 2001 From: alexey-milovidov Date: Mon, 19 Aug 2019 16:48:29 +0300 Subject: [PATCH 10/11] Update CHANGELOG.md --- CHANGELOG.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index d69a6c68e76..0f8e8166d21 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,19 +6,19 @@ * `CREATE TABLE AS table_function()` is now possible [#6057](https://github.com/yandex/ClickHouse/pull/6057) ([dimarub2000](https://github.com/dimarub2000)) * Adam optimizer for stochastic gradient descent is used by default in `stochasticLinearRegression()` and `stochasticLogisticRegression()` aggregate functions, because it shows good quality without almost any tuning. [#6000](https://github.com/yandex/ClickHouse/pull/6000) ([Quid37](https://github.com/Quid37)) * Added functions for working with the сustom week number [#5212](https://github.com/yandex/ClickHouse/pull/5212) ([Andy Yang](https://github.com/andyyzh)) +* `RENAME` queries now work with all storages. [#5953](https://github.com/yandex/ClickHouse/pull/5953) ([Ivan](https://github.com/abyss7)) +* Now client receive logs from server with any desired level by setting `send_logs_level` regardless to the log level specified in server settings. [#5964](https://github.com/yandex/ClickHouse/pull/5964) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)) ### Experimental features * New query processing pipeline. Use `experimental_use_processors=1` option to enable it. Use for your own trouble. [#4914](https://github.com/yandex/ClickHouse/pull/4914) ([Nikolai Kochetov](https://github.com/KochetovNicolai)) ### Bug Fix -* `RENAME` queries now work with all storages. [#5953](https://github.com/yandex/ClickHouse/pull/5953) ([Ivan](https://github.com/abyss7)) -* Now client could receive logs from server with any desired level by setting `send_logs_level` regardless to the log level specified in server settings. [#5964](https://github.com/yandex/ClickHouse/pull/5964) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)) * Fixed `DoubleDelta` encoding of `Int64` for large `DoubleDelta` values, improved `DoubleDelta` encoding for random data for `Int32`. [#5998](https://github.com/yandex/ClickHouse/pull/5998) ([Vasily Nemkov](https://github.com/Enmk)) * Fixed overestimation of `max_rows_to_read` if the setting `merge_tree_uniform_read_distribution` is set to 0. [#6019](https://github.com/yandex/ClickHouse/pull/6019) ([alexey-milovidov](https://github.com/alexey-milovidov)) -* Throws an exception if `config.d` file doesn't have the corresponding root element as the config file [#6123](https://github.com/yandex/ClickHouse/pull/6123) ([dimarub2000](https://github.com/dimarub2000)) ### Improvement * The setting `input_format_defaults_for_omitted_fields` is enabled by default. It enables calculation of complex default expressions for omitted fields in `JSONEachRow` and `CSV*` formats. It should be the expected behaviour but may lead to negligible performance difference or subtle incompatibilities. [#6043](https://github.com/yandex/ClickHouse/pull/6043) ([Artem Zuikov](https://github.com/4ertus2)), [#5625](https://github.com/yandex/ClickHouse/pull/5625) ([akuzm](https://github.com/akuzm)) +* Throws an exception if `config.d` file doesn't have the corresponding root element as the config file [#6123](https://github.com/yandex/ClickHouse/pull/6123) ([dimarub2000](https://github.com/dimarub2000)) ### Performance Improvement * Optimize `count()`. Now it uses the smallest column (if possible). [#6028](https://github.com/yandex/ClickHouse/pull/6028) ([Amos Bird](https://github.com/amosbird)) From e4dda4332ef2a9ed7e87b6c681dd2cdd74c534a4 Mon Sep 17 00:00:00 2001 From: alexey-milovidov Date: Mon, 19 Aug 2019 16:52:16 +0300 Subject: [PATCH 11/11] Update CHANGELOG.md --- CHANGELOG.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 0f8e8166d21..09774773686 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -13,6 +13,7 @@ * New query processing pipeline. Use `experimental_use_processors=1` option to enable it. Use for your own trouble. [#4914](https://github.com/yandex/ClickHouse/pull/4914) ([Nikolai Kochetov](https://github.com/KochetovNicolai)) ### Bug Fix +* Kafka integration has been fixed in this version. * Fixed `DoubleDelta` encoding of `Int64` for large `DoubleDelta` values, improved `DoubleDelta` encoding for random data for `Int32`. [#5998](https://github.com/yandex/ClickHouse/pull/5998) ([Vasily Nemkov](https://github.com/Enmk)) * Fixed overestimation of `max_rows_to_read` if the setting `merge_tree_uniform_read_distribution` is set to 0. [#6019](https://github.com/yandex/ClickHouse/pull/6019) ([alexey-milovidov](https://github.com/alexey-milovidov)) @@ -31,6 +32,7 @@ ## ClickHouse release 19.11.7.40, 2019-08-14 ### Bug fix +* Kafka integration has been fixed in this version. * Fix segfault when using `arrayReduce` for constant arguments. [#6326](https://github.com/yandex/ClickHouse/pull/6326) ([alexey-milovidov](https://github.com/alexey-milovidov)) * Fixed `toFloat()` monotonicity. [#6374](https://github.com/yandex/ClickHouse/pull/6374) ([dimarub2000](https://github.com/dimarub2000)) * Fix segfault with enabled `optimize_skip_unused_shards` and missing sharding key. [#6384](https://github.com/yandex/ClickHouse/pull/6384) ([CurtizJ](https://github.com/CurtizJ))