diff --git a/.github/label-pr.yml b/.github/label-pr.yml new file mode 100644 index 00000000000..4ae73a2e720 --- /dev/null +++ b/.github/label-pr.yml @@ -0,0 +1,2 @@ +- regExp: ".*\\.md$" + labels: ["documentation", "pr-documentation"] diff --git a/.github/main.workflow b/.github/main.workflow new file mode 100644 index 00000000000..a450195b955 --- /dev/null +++ b/.github/main.workflow @@ -0,0 +1,9 @@ +workflow "Main workflow" { + resolves = ["Label PR"] + on = "pull_request" +} + +action "Label PR" { + uses = "decathlon/pull-request-labeler-action@v1.0.0" + secrets = ["GITHUB_TOKEN"] +} diff --git a/CHANGELOG.md b/CHANGELOG.md index 2a0b69bcc6d..09774773686 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,54 @@ +## ClickHouse release 19.13.2.19, 2019-08-14 + +### New Feature +* Sampling profiler on query level. [Example](https://gist.github.com/alexey-milovidov/92758583dd41c24c360fdb8d6a4da194). [#4247](https://github.com/yandex/ClickHouse/issues/4247) ([laplab](https://github.com/laplab)) [#6124](https://github.com/yandex/ClickHouse/pull/6124) ([alexey-milovidov](https://github.com/alexey-milovidov)) [#6250](https://github.com/yandex/ClickHouse/pull/6250) [#6283](https://github.com/yandex/ClickHouse/pull/6283) [#6386](https://github.com/yandex/ClickHouse/pull/6386) +* Allow to specify a list of columns with `COLUMNS('regexp')` expression that works like a more sophisticated variant of `*` asterisk. [#5951](https://github.com/yandex/ClickHouse/pull/5951) ([mfridental](https://github.com/mfridental)), ([alexey-milovidov](https://github.com/alexey-milovidov)) +* `CREATE TABLE AS table_function()` is now possible [#6057](https://github.com/yandex/ClickHouse/pull/6057) ([dimarub2000](https://github.com/dimarub2000)) +* Adam optimizer for stochastic gradient descent is used by default in `stochasticLinearRegression()` and `stochasticLogisticRegression()` aggregate functions, because it shows good quality without almost any tuning. [#6000](https://github.com/yandex/ClickHouse/pull/6000) ([Quid37](https://github.com/Quid37)) +* Added functions for working with the сustom week number [#5212](https://github.com/yandex/ClickHouse/pull/5212) ([Andy Yang](https://github.com/andyyzh)) +* `RENAME` queries now work with all storages. [#5953](https://github.com/yandex/ClickHouse/pull/5953) ([Ivan](https://github.com/abyss7)) +* Now client receive logs from server with any desired level by setting `send_logs_level` regardless to the log level specified in server settings. [#5964](https://github.com/yandex/ClickHouse/pull/5964) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)) + +### Experimental features +* New query processing pipeline. Use `experimental_use_processors=1` option to enable it. Use for your own trouble. [#4914](https://github.com/yandex/ClickHouse/pull/4914) ([Nikolai Kochetov](https://github.com/KochetovNicolai)) + +### Bug Fix +* Kafka integration has been fixed in this version. +* Fixed `DoubleDelta` encoding of `Int64` for large `DoubleDelta` values, improved `DoubleDelta` encoding for random data for `Int32`. [#5998](https://github.com/yandex/ClickHouse/pull/5998) ([Vasily Nemkov](https://github.com/Enmk)) +* Fixed overestimation of `max_rows_to_read` if the setting `merge_tree_uniform_read_distribution` is set to 0. [#6019](https://github.com/yandex/ClickHouse/pull/6019) ([alexey-milovidov](https://github.com/alexey-milovidov)) + +### Improvement +* The setting `input_format_defaults_for_omitted_fields` is enabled by default. It enables calculation of complex default expressions for omitted fields in `JSONEachRow` and `CSV*` formats. It should be the expected behaviour but may lead to negligible performance difference or subtle incompatibilities. [#6043](https://github.com/yandex/ClickHouse/pull/6043) ([Artem Zuikov](https://github.com/4ertus2)), [#5625](https://github.com/yandex/ClickHouse/pull/5625) ([akuzm](https://github.com/akuzm)) +* Throws an exception if `config.d` file doesn't have the corresponding root element as the config file [#6123](https://github.com/yandex/ClickHouse/pull/6123) ([dimarub2000](https://github.com/dimarub2000)) + +### Performance Improvement +* Optimize `count()`. Now it uses the smallest column (if possible). [#6028](https://github.com/yandex/ClickHouse/pull/6028) ([Amos Bird](https://github.com/amosbird)) + +### Build/Testing/Packaging Improvement +* Report memory usage in performance tests. [#5899](https://github.com/yandex/ClickHouse/pull/5899) ([akuzm](https://github.com/akuzm)) +* Fix build with external `libcxx` [#6010](https://github.com/yandex/ClickHouse/pull/6010) ([Ivan](https://github.com/abyss7)) +* Fix shared build with `rdkafka` library [#6101](https://github.com/yandex/ClickHouse/pull/6101) ([Ivan](https://github.com/abyss7)) + +## ClickHouse release 19.11.7.40, 2019-08-14 + +### Bug fix +* Kafka integration has been fixed in this version. +* Fix segfault when using `arrayReduce` for constant arguments. [#6326](https://github.com/yandex/ClickHouse/pull/6326) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fixed `toFloat()` monotonicity. [#6374](https://github.com/yandex/ClickHouse/pull/6374) ([dimarub2000](https://github.com/dimarub2000)) +* Fix segfault with enabled `optimize_skip_unused_shards` and missing sharding key. [#6384](https://github.com/yandex/ClickHouse/pull/6384) ([CurtizJ](https://github.com/CurtizJ)) +* Fixed logic of `arrayEnumerateUniqRanked` function. [#6423](https://github.com/yandex/ClickHouse/pull/6423) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Removed extra verbose logging from MySQL handler. [#6389](https://github.com/yandex/ClickHouse/pull/6389) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fix wrong behavior and possible segfaults in `topK` and `topKWeighted` aggregated functions. [#6404](https://github.com/yandex/ClickHouse/pull/6404) ([CurtizJ](https://github.com/CurtizJ)) +* Do not expose virtual columns in `system.columns` table. This is required for backward compatibility. [#6406](https://github.com/yandex/ClickHouse/pull/6406) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fix bug with memory allocation for string fields in complex key cache dictionary. [#6447](https://github.com/yandex/ClickHouse/pull/6447) ([alesapin](https://github.com/alesapin)) +* Fix bug with enabling adaptive granularity when creating new replica for `Replicated*MergeTree` table. [#6452](https://github.com/yandex/ClickHouse/pull/6452) ([alesapin](https://github.com/alesapin)) +* Fix infinite loop when reading Kafka messages. [#6354](https://github.com/yandex/ClickHouse/pull/6354) ([abyss7](https://github.com/abyss7)) +* Fixed the possibility of a fabricated query to cause server crash due to stack overflow in SQL parser and possibility of stack overflow in `Merge` and `Distributed` tables [#6433](https://github.com/yandex/ClickHouse/pull/6433) ([alexey-milovidov](https://github.com/alexey-milovidov)) +* Fixed Gorilla encoding error on small sequences. [#6444](https://github.com/yandex/ClickHouse/pull/6444) ([Enmk](https://github.com/Enmk)) + +### Improvement +* Allow user to override `poll_interval` and `idle_connection_timeout` settings on connection. [#6230](https://github.com/yandex/ClickHouse/pull/6230) ([alexey-milovidov](https://github.com/alexey-milovidov)) + ## ClickHouse release 19.11.5.28, 2019-08-05 ### Bug fix diff --git a/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.cpp b/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.cpp index 4109a5511af..407fcb18ad5 100644 --- a/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.cpp +++ b/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.cpp @@ -25,25 +25,26 @@ IMergedBlockOutputStream::IMergedBlockOutputStream( size_t aio_threshold_, bool blocks_are_granules_size_, const std::vector & indices_to_recalc, - const MergeTreeIndexGranularity & index_granularity_) + const MergeTreeIndexGranularity & index_granularity_, + const MergeTreeIndexGranularityInfo * index_granularity_info_) : storage(storage_) , part_path(part_path_) , min_compress_block_size(min_compress_block_size_) , max_compress_block_size(max_compress_block_size_) , aio_threshold(aio_threshold_) - , marks_file_extension(storage.canUseAdaptiveGranularity() ? getAdaptiveMrkExtension() : getNonAdaptiveMrkExtension()) + , can_use_adaptive_granularity(index_granularity_info_ ? index_granularity_info_->is_adaptive : storage.canUseAdaptiveGranularity()) + , marks_file_extension(can_use_adaptive_granularity ? getAdaptiveMrkExtension() : getNonAdaptiveMrkExtension()) , blocks_are_granules_size(blocks_are_granules_size_) , index_granularity(index_granularity_) , compute_granularity(index_granularity.empty()) , codec(std::move(codec_)) , skip_indices(indices_to_recalc) - , with_final_mark(storage.settings.write_final_mark && storage.canUseAdaptiveGranularity()) + , with_final_mark(storage.settings.write_final_mark && can_use_adaptive_granularity) { if (blocks_are_granules_size && !index_granularity.empty()) throw Exception("Can't take information about index granularity from blocks, when non empty index_granularity array specified", ErrorCodes::LOGICAL_ERROR); } - void IMergedBlockOutputStream::addStreams( const String & path, const String & name, @@ -145,7 +146,7 @@ void IMergedBlockOutputStream::fillIndexGranularity(const Block & block) blocks_are_granules_size, index_offset, index_granularity, - storage.canUseAdaptiveGranularity()); + can_use_adaptive_granularity); } void IMergedBlockOutputStream::writeSingleMark( @@ -176,7 +177,7 @@ void IMergedBlockOutputStream::writeSingleMark( writeIntBinary(stream.plain_hashing.count(), stream.marks); writeIntBinary(stream.compressed.offset(), stream.marks); - if (storage.canUseAdaptiveGranularity()) + if (can_use_adaptive_granularity) writeIntBinary(number_of_rows, stream.marks); }, path); } @@ -362,7 +363,7 @@ void IMergedBlockOutputStream::calculateAndSerializeSkipIndices( writeIntBinary(stream.compressed.offset(), stream.marks); /// Actually this numbers is redundant, but we have to store them /// to be compatible with normal .mrk2 file format - if (storage.canUseAdaptiveGranularity()) + if (can_use_adaptive_granularity) writeIntBinary(1UL, stream.marks); ++skip_index_current_mark; diff --git a/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.h b/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.h index cbf78c1a2ea..97c7922042d 100644 --- a/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.h +++ b/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.h @@ -1,6 +1,7 @@ #pragma once #include +#include #include #include #include @@ -23,7 +24,8 @@ public: size_t aio_threshold_, bool blocks_are_granules_size_, const std::vector & indices_to_recalc, - const MergeTreeIndexGranularity & index_granularity_); + const MergeTreeIndexGranularity & index_granularity_, + const MergeTreeIndexGranularityInfo * index_granularity_info_ = nullptr); using WrittenOffsetColumns = std::set; @@ -141,6 +143,7 @@ protected: size_t current_mark = 0; size_t skip_index_mark = 0; + const bool can_use_adaptive_granularity; const std::string marks_file_extension; const bool blocks_are_granules_size; diff --git a/dbms/src/Storages/MergeTree/MergeTreeData.cpp b/dbms/src/Storages/MergeTree/MergeTreeData.cpp index 3b57c27d3e9..d8871b9e1a8 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeData.cpp +++ b/dbms/src/Storages/MergeTree/MergeTreeData.cpp @@ -1581,7 +1581,8 @@ void MergeTreeData::alterDataPart( true /* skip_offsets */, {}, unused_written_offsets, - part->index_granularity); + part->index_granularity, + &part->index_granularity_info); in.readPrefix(); out.writePrefix(); diff --git a/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp b/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp index 19d775890d8..74193fa7156 100644 --- a/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp +++ b/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp @@ -934,6 +934,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor new_data_part->relative_path = "tmp_mut_" + future_part.name; new_data_part->is_temp = true; new_data_part->ttl_infos = source_part->ttl_infos; + new_data_part->index_granularity_info = source_part->index_granularity_info; String new_part_tmp_path = new_data_part->getFullPath(); @@ -1069,7 +1070,8 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor /* skip_offsets = */ false, std::vector(indices_to_recalc.begin(), indices_to_recalc.end()), unused_written_offsets, - source_part->index_granularity + source_part->index_granularity, + &source_part->index_granularity_info ); in->readPrefix(); diff --git a/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.cpp b/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.cpp index e79ec7dd046..3c15bd54df2 100644 --- a/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.cpp +++ b/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.cpp @@ -8,14 +8,16 @@ MergedColumnOnlyOutputStream::MergedColumnOnlyOutputStream( CompressionCodecPtr default_codec_, bool skip_offsets_, const std::vector & indices_to_recalc_, WrittenOffsetColumns & already_written_offset_columns_, - const MergeTreeIndexGranularity & index_granularity_) + const MergeTreeIndexGranularity & index_granularity_, + const MergeTreeIndexGranularityInfo * index_granularity_info_) : IMergedBlockOutputStream( storage_, part_path_, storage_.global_context.getSettings().min_compress_block_size, storage_.global_context.getSettings().max_compress_block_size, default_codec_, storage_.global_context.getSettings().min_bytes_to_use_direct_io, false, indices_to_recalc_, - index_granularity_), + index_granularity_, + index_granularity_info_), header(header_), sync(sync_), skip_offsets(skip_offsets_), already_written_offset_columns(already_written_offset_columns_) { diff --git a/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.h b/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.h index b8d637f37fb..8970bf19565 100644 --- a/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.h +++ b/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.h @@ -17,7 +17,8 @@ public: CompressionCodecPtr default_codec_, bool skip_offsets_, const std::vector & indices_to_recalc_, WrittenOffsetColumns & already_written_offset_columns_, - const MergeTreeIndexGranularity & index_granularity_); + const MergeTreeIndexGranularity & index_granularity_, + const MergeTreeIndexGranularityInfo * index_granularity_info_ = nullptr); Block getHeader() const override { return header; } void write(const Block & block) override; diff --git a/dbms/tests/integration/test_adaptive_granularity/test.py b/dbms/tests/integration/test_adaptive_granularity/test.py index db653427f02..50b43fc08ec 100644 --- a/dbms/tests/integration/test_adaptive_granularity/test.py +++ b/dbms/tests/integration/test_adaptive_granularity/test.py @@ -288,6 +288,17 @@ def test_mixed_granularity_single_node(start_dynamic_cluster, node): node.exec_in_container(["bash", "-c", "find {p} -name '*.mrk' | grep '.*'".format(p=path_to_old_part)]) # check that we have non adaptive files + node.query("ALTER TABLE table_with_default_granularity UPDATE dummy = dummy + 1 WHERE 1") + # still works + assert node.query("SELECT count() from table_with_default_granularity") == '6\n' + + node.query("ALTER TABLE table_with_default_granularity MODIFY COLUMN dummy String") + node.query("ALTER TABLE table_with_default_granularity ADD COLUMN dummy2 Float64") + + #still works + assert node.query("SELECT count() from table_with_default_granularity") == '6\n' + + def test_version_update_two_nodes(start_dynamic_cluster): node11.query("INSERT INTO table_with_default_granularity VALUES (toDate('2018-10-01'), 1, 333), (toDate('2018-10-02'), 2, 444)") node12.query("SYSTEM SYNC REPLICA table_with_default_granularity") diff --git a/docker/images.json b/docker/images.json index e2282fcb653..fef364a942f 100644 --- a/docker/images.json +++ b/docker/images.json @@ -7,9 +7,9 @@ "docker/test/performance": "yandex/clickhouse-performance-test", "docker/test/pvs": "yandex/clickhouse-pvs-test", "docker/test/stateful": "yandex/clickhouse-stateful-test", - "docker/test/stateful_with_coverage": "yandex/clickhouse-stateful-with-coverage-test", + "docker/test/stateful_with_coverage": "yandex/clickhouse-stateful-test-with-coverage", "docker/test/stateless": "yandex/clickhouse-stateless-test", - "docker/test/stateless_with_coverage": "yandex/clickhouse-stateless-with-coverage-test", + "docker/test/stateless_with_coverage": "yandex/clickhouse-stateless-test-with-coverage", "docker/test/unit": "yandex/clickhouse-unit-test", "docker/test/stress": "yandex/clickhouse-stress-test", "dbms/tests/integration/image": "yandex/clickhouse-integration-tests-runner" diff --git a/docker/test/stateful_with_coverage/Dockerfile b/docker/test/stateful_with_coverage/Dockerfile index 2a566bdcf01..863e55e6326 100644 --- a/docker/test/stateful_with_coverage/Dockerfile +++ b/docker/test/stateful_with_coverage/Dockerfile @@ -1,7 +1,7 @@ # docker build -t yandex/clickhouse-stateful-test . FROM yandex/clickhouse-stateless-test -RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic main" >> /etc/apt/sources.list +RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic-9 main" >> /etc/apt/sources.list RUN apt-get update -y \ && env DEBIAN_FRONTEND=noninteractive \ diff --git a/docker/test/stateless_with_coverage/Dockerfile b/docker/test/stateless_with_coverage/Dockerfile index b9da18223ab..afb46533b16 100644 --- a/docker/test/stateless_with_coverage/Dockerfile +++ b/docker/test/stateless_with_coverage/Dockerfile @@ -1,7 +1,7 @@ # docker build -t yandex/clickhouse-stateless-with-coverage-test . FROM yandex/clickhouse-deb-builder -RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic main" >> /etc/apt/sources.list +RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic-9 main" >> /etc/apt/sources.list RUN apt-get update -y \ && env DEBIAN_FRONTEND=noninteractive \ diff --git a/docs/en/roadmap.md b/docs/en/roadmap.md index 34307e519b0..11f1f793235 100644 --- a/docs/en/roadmap.md +++ b/docs/en/roadmap.md @@ -1,15 +1,14 @@ # Roadmap -## Q2 2019 +## Q3 2019 - DDL for dictionaries - Integration with S3-like object stores - Multiple storages for hot/cold data, JBOD support -## Q3 2019 +## Q4 2019 -- JOIN execution improvements: - - Distributed join not limited by memory +- JOIN not limited by available memory - Resource pools for more precise distribution of cluster capacity between users - Fine-grained authorization - Integration with external authentication services