Merge branch 'master' of github.com:yandex/ClickHouse

2024-10-23 00:40:51 +00:00 · 2019-08-19 17:48:00 +03:00 · 2019-08-19 17:48:00 +03:00 · a012daebe6
commit a012daebe6
parent 7c8f161da0 e4dda4332e
14 changed files with 103 additions and 21 deletions
--- a/.github/label-pr.yml
+++ b/.github/label-pr.yml
@ -0,0 +1,2 @@
+- regExp: ".*\\.md$"
+  labels: ["documentation", "pr-documentation"]
--- a/.github/main.workflow
+++ b/.github/main.workflow
@ -0,0 +1,9 @@
+workflow "Main workflow" {
+  resolves = ["Label PR"]
+  on = "pull_request"
+}
+
+action "Label PR" {
+  uses = "decathlon/pull-request-labeler-action@v1.0.0"
+  secrets = ["GITHUB_TOKEN"]
+}
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,3 +1,54 @@
+## ClickHouse release 19.13.2.19, 2019-08-14
+
+### New Feature
+* Sampling profiler on query level. [Example](https://gist.github.com/alexey-milovidov/92758583dd41c24c360fdb8d6a4da194). [#4247](https://github.com/yandex/ClickHouse/issues/4247) ([laplab](https://github.com/laplab)) [#6124](https://github.com/yandex/ClickHouse/pull/6124) ([alexey-milovidov](https://github.com/alexey-milovidov)) [#6250](https://github.com/yandex/ClickHouse/pull/6250) [#6283](https://github.com/yandex/ClickHouse/pull/6283) [#6386](https://github.com/yandex/ClickHouse/pull/6386) 
+* Allow to specify a list of columns with `COLUMNS('regexp')` expression that works like a more sophisticated variant of `*` asterisk. [#5951](https://github.com/yandex/ClickHouse/pull/5951)  ([mfridental](https://github.com/mfridental)), ([alexey-milovidov](https://github.com/alexey-milovidov))
+* `CREATE TABLE AS table_function()` is now possible [#6057](https://github.com/yandex/ClickHouse/pull/6057) ([dimarub2000](https://github.com/dimarub2000))
+* Adam optimizer for stochastic gradient descent is used by default in `stochasticLinearRegression()` and `stochasticLogisticRegression()` aggregate functions, because it shows good quality without almost any tuning. [#6000](https://github.com/yandex/ClickHouse/pull/6000) ([Quid37](https://github.com/Quid37))
+* Added functions for working with the сustom week number [#5212](https://github.com/yandex/ClickHouse/pull/5212) ([Andy Yang](https://github.com/andyyzh))
+* `RENAME` queries now work with all storages. [#5953](https://github.com/yandex/ClickHouse/pull/5953) ([Ivan](https://github.com/abyss7))
+* Now client receive logs from server with any desired level by setting `send_logs_level` regardless to the log level specified in server settings. [#5964](https://github.com/yandex/ClickHouse/pull/5964) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov))
+
+### Experimental features
+* New query processing pipeline. Use `experimental_use_processors=1` option to enable it. Use for your own trouble. [#4914](https://github.com/yandex/ClickHouse/pull/4914) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
+
+### Bug Fix
+* Kafka integration has been fixed in this version.
+* Fixed `DoubleDelta` encoding of `Int64` for large `DoubleDelta` values, improved `DoubleDelta` encoding for random data for `Int32`. [#5998](https://github.com/yandex/ClickHouse/pull/5998) ([Vasily Nemkov](https://github.com/Enmk))
+* Fixed overestimation of `max_rows_to_read` if the setting `merge_tree_uniform_read_distribution` is set to 0. [#6019](https://github.com/yandex/ClickHouse/pull/6019) ([alexey-milovidov](https://github.com/alexey-milovidov))
+
+### Improvement
+* The setting `input_format_defaults_for_omitted_fields` is enabled by default. It enables calculation of complex default expressions for omitted fields in `JSONEachRow` and `CSV*` formats. It should be the expected behaviour but may lead to negligible performance difference or subtle incompatibilities. [#6043](https://github.com/yandex/ClickHouse/pull/6043) ([Artem Zuikov](https://github.com/4ertus2)), [#5625](https://github.com/yandex/ClickHouse/pull/5625) ([akuzm](https://github.com/akuzm))
+* Throws an exception if `config.d` file doesn't have the corresponding root element as the config file [#6123](https://github.com/yandex/ClickHouse/pull/6123) ([dimarub2000](https://github.com/dimarub2000))
+
+### Performance Improvement
+* Optimize `count()`. Now it uses the smallest column (if possible). [#6028](https://github.com/yandex/ClickHouse/pull/6028) ([Amos Bird](https://github.com/amosbird))
+
+### Build/Testing/Packaging Improvement
+* Report memory usage in performance tests. [#5899](https://github.com/yandex/ClickHouse/pull/5899) ([akuzm](https://github.com/akuzm))
+* Fix build with external `libcxx` [#6010](https://github.com/yandex/ClickHouse/pull/6010) ([Ivan](https://github.com/abyss7))
+* Fix shared build with `rdkafka` library [#6101](https://github.com/yandex/ClickHouse/pull/6101) ([Ivan](https://github.com/abyss7))
+
+## ClickHouse release 19.11.7.40, 2019-08-14
+
+### Bug fix
+* Kafka integration has been fixed in this version.
+* Fix segfault when using `arrayReduce` for constant arguments. [#6326](https://github.com/yandex/ClickHouse/pull/6326) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Fixed `toFloat()` monotonicity. [#6374](https://github.com/yandex/ClickHouse/pull/6374) ([dimarub2000](https://github.com/dimarub2000))
+* Fix segfault with enabled `optimize_skip_unused_shards` and missing sharding key. [#6384](https://github.com/yandex/ClickHouse/pull/6384) ([CurtizJ](https://github.com/CurtizJ))
+* Fixed logic of `arrayEnumerateUniqRanked` function. [#6423](https://github.com/yandex/ClickHouse/pull/6423) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Removed extra verbose logging from MySQL handler. [#6389](https://github.com/yandex/ClickHouse/pull/6389) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Fix wrong behavior and possible segfaults in `topK` and `topKWeighted` aggregated functions. [#6404](https://github.com/yandex/ClickHouse/pull/6404) ([CurtizJ](https://github.com/CurtizJ))
+* Do not expose virtual columns in `system.columns` table. This is required for backward compatibility. [#6406](https://github.com/yandex/ClickHouse/pull/6406) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Fix bug with memory allocation for string fields in complex key cache dictionary. [#6447](https://github.com/yandex/ClickHouse/pull/6447) ([alesapin](https://github.com/alesapin))
+* Fix bug with enabling adaptive granularity when creating new replica for `Replicated*MergeTree` table. [#6452](https://github.com/yandex/ClickHouse/pull/6452) ([alesapin](https://github.com/alesapin))
+* Fix infinite loop when reading Kafka messages. [#6354](https://github.com/yandex/ClickHouse/pull/6354) ([abyss7](https://github.com/abyss7))
+* Fixed the possibility of a fabricated query to cause server crash due to stack overflow in SQL parser and possibility of stack overflow in `Merge` and `Distributed` tables [#6433](https://github.com/yandex/ClickHouse/pull/6433) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Fixed Gorilla encoding error on small sequences. [#6444](https://github.com/yandex/ClickHouse/pull/6444) ([Enmk](https://github.com/Enmk))
+
+### Improvement
+* Allow user to override `poll_interval` and `idle_connection_timeout` settings on connection. [#6230](https://github.com/yandex/ClickHouse/pull/6230) ([alexey-milovidov](https://github.com/alexey-milovidov))
+
 ## ClickHouse release 19.11.5.28, 2019-08-05

 ### Bug fix
--- a/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.cpp
+++ b/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.cpp
@ -25,25 +25,26 @@ IMergedBlockOutputStream::IMergedBlockOutputStream(
    size_t aio_threshold_,
    bool blocks_are_granules_size_,
    const std::vector<MergeTreeIndexPtr> & indices_to_recalc,
-    const MergeTreeIndexGranularity & index_granularity_)
+    const MergeTreeIndexGranularity & index_granularity_,
+    const MergeTreeIndexGranularityInfo * index_granularity_info_)
    : storage(storage_)
    , part_path(part_path_)
    , min_compress_block_size(min_compress_block_size_)
    , max_compress_block_size(max_compress_block_size_)
    , aio_threshold(aio_threshold_)
-    , marks_file_extension(storage.canUseAdaptiveGranularity() ? getAdaptiveMrkExtension() : getNonAdaptiveMrkExtension())
+    , can_use_adaptive_granularity(index_granularity_info_ ? index_granularity_info_->is_adaptive : storage.canUseAdaptiveGranularity())
+    , marks_file_extension(can_use_adaptive_granularity ? getAdaptiveMrkExtension() : getNonAdaptiveMrkExtension())
    , blocks_are_granules_size(blocks_are_granules_size_)
    , index_granularity(index_granularity_)
    , compute_granularity(index_granularity.empty())
    , codec(std::move(codec_))
    , skip_indices(indices_to_recalc)
-    , with_final_mark(storage.settings.write_final_mark && storage.canUseAdaptiveGranularity())
+    , with_final_mark(storage.settings.write_final_mark && can_use_adaptive_granularity)
 {
    if (blocks_are_granules_size && !index_granularity.empty())
        throw Exception("Can't take information about index granularity from blocks, when non empty index_granularity array specified", ErrorCodes::LOGICAL_ERROR);
 }

-
 void IMergedBlockOutputStream::addStreams(
    const String & path,
    const String & name,
@ -145,7 +146,7 @@ void IMergedBlockOutputStream::fillIndexGranularity(const Block & block)
        blocks_are_granules_size,
        index_offset,
        index_granularity,
-        storage.canUseAdaptiveGranularity());
+        can_use_adaptive_granularity);
 }

 void IMergedBlockOutputStream::writeSingleMark(
@ -176,7 +177,7 @@ void IMergedBlockOutputStream::writeSingleMark(

         writeIntBinary(stream.plain_hashing.count(), stream.marks);
         writeIntBinary(stream.compressed.offset(), stream.marks);
-         if (storage.canUseAdaptiveGranularity())
+         if (can_use_adaptive_granularity)
             writeIntBinary(number_of_rows, stream.marks);
     }, path);
 }
@ -362,7 +363,7 @@ void IMergedBlockOutputStream::calculateAndSerializeSkipIndices(
                    writeIntBinary(stream.compressed.offset(), stream.marks);
                    /// Actually this numbers is redundant, but we have to store them
                    /// to be compatible with normal .mrk2 file format
-                    if (storage.canUseAdaptiveGranularity())
+                    if (can_use_adaptive_granularity)
                        writeIntBinary(1UL, stream.marks);

                    ++skip_index_current_mark;
--- a/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.h
+++ b/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.h
@ -1,6 +1,7 @@
 #pragma once

 #include <Storages/MergeTree/MergeTreeIndexGranularity.h>
+#include <Storages/MergeTree/MergeTreeIndexGranularityInfo.h>
 #include <IO/WriteBufferFromFile.h>
 #include <Compression/CompressedWriteBuffer.h>
 #include <IO/HashingWriteBuffer.h>
@ -23,7 +24,8 @@ public:
        size_t aio_threshold_,
        bool blocks_are_granules_size_,
        const std::vector<MergeTreeIndexPtr> & indices_to_recalc,
-        const MergeTreeIndexGranularity & index_granularity_);
+        const MergeTreeIndexGranularity & index_granularity_,
+        const MergeTreeIndexGranularityInfo * index_granularity_info_ = nullptr);

    using WrittenOffsetColumns = std::set<std::string>;

@ -141,6 +143,7 @@ protected:
    size_t current_mark = 0;
    size_t skip_index_mark = 0;

+    const bool can_use_adaptive_granularity;
    const std::string marks_file_extension;
    const bool blocks_are_granules_size;

--- a/dbms/src/Storages/MergeTree/MergeTreeData.cpp
+++ b/dbms/src/Storages/MergeTree/MergeTreeData.cpp
@ -1581,7 +1581,8 @@ void MergeTreeData::alterDataPart(
            true /* skip_offsets */,
            {},
            unused_written_offsets,
-            part->index_granularity);
+            part->index_granularity,
+            &part->index_granularity_info);

        in.readPrefix();
        out.writePrefix();
--- a/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp
+++ b/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp
@ -934,6 +934,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor
    new_data_part->relative_path = "tmp_mut_" + future_part.name;
    new_data_part->is_temp = true;
    new_data_part->ttl_infos = source_part->ttl_infos;
+    new_data_part->index_granularity_info = source_part->index_granularity_info;

    String new_part_tmp_path = new_data_part->getFullPath();

@ -1069,7 +1070,8 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor
            /* skip_offsets = */ false,
            std::vector<MergeTreeIndexPtr>(indices_to_recalc.begin(), indices_to_recalc.end()),
            unused_written_offsets,
-            source_part->index_granularity
+            source_part->index_granularity,
+            &source_part->index_granularity_info
        );

        in->readPrefix();
--- a/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.cpp
+++ b/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.cpp
@ -8,14 +8,16 @@ MergedColumnOnlyOutputStream::MergedColumnOnlyOutputStream(
    CompressionCodecPtr default_codec_, bool skip_offsets_,
    const std::vector<MergeTreeIndexPtr> & indices_to_recalc_,
    WrittenOffsetColumns & already_written_offset_columns_,
-    const MergeTreeIndexGranularity & index_granularity_)
+    const MergeTreeIndexGranularity & index_granularity_,
+    const MergeTreeIndexGranularityInfo * index_granularity_info_)
    : IMergedBlockOutputStream(
        storage_, part_path_, storage_.global_context.getSettings().min_compress_block_size,
        storage_.global_context.getSettings().max_compress_block_size, default_codec_,
        storage_.global_context.getSettings().min_bytes_to_use_direct_io,
        false,
        indices_to_recalc_,
-        index_granularity_),
+        index_granularity_,
+        index_granularity_info_),
    header(header_), sync(sync_), skip_offsets(skip_offsets_),
    already_written_offset_columns(already_written_offset_columns_)
 {
--- a/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.h
+++ b/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.h
@ -17,7 +17,8 @@ public:
        CompressionCodecPtr default_codec_, bool skip_offsets_,
        const std::vector<MergeTreeIndexPtr> & indices_to_recalc_,
        WrittenOffsetColumns & already_written_offset_columns_,
-        const MergeTreeIndexGranularity & index_granularity_);
+        const MergeTreeIndexGranularity & index_granularity_,
+        const MergeTreeIndexGranularityInfo * index_granularity_info_ = nullptr);

    Block getHeader() const override { return header; }
    void write(const Block & block) override;
--- a/dbms/tests/integration/test_adaptive_granularity/test.py
+++ b/dbms/tests/integration/test_adaptive_granularity/test.py
@ -288,6 +288,17 @@ def test_mixed_granularity_single_node(start_dynamic_cluster, node):

    node.exec_in_container(["bash", "-c", "find {p} -name '*.mrk' | grep '.*'".format(p=path_to_old_part)]) # check that we have non adaptive files

+    node.query("ALTER TABLE table_with_default_granularity UPDATE dummy = dummy + 1 WHERE 1")
+    # still works
+    assert node.query("SELECT count() from table_with_default_granularity") == '6\n'
+
+    node.query("ALTER TABLE table_with_default_granularity MODIFY COLUMN dummy String")
+    node.query("ALTER TABLE table_with_default_granularity ADD COLUMN dummy2 Float64")
+
+    #still works
+    assert node.query("SELECT count() from table_with_default_granularity") == '6\n'
+
+
 def test_version_update_two_nodes(start_dynamic_cluster):
    node11.query("INSERT INTO table_with_default_granularity VALUES (toDate('2018-10-01'), 1, 333), (toDate('2018-10-02'), 2, 444)")
    node12.query("SYSTEM SYNC REPLICA table_with_default_granularity")
--- a/docker/images.json
+++ b/docker/images.json
@ -7,9 +7,9 @@
    "docker/test/performance": "yandex/clickhouse-performance-test",
    "docker/test/pvs": "yandex/clickhouse-pvs-test",
    "docker/test/stateful": "yandex/clickhouse-stateful-test",
-    "docker/test/stateful_with_coverage": "yandex/clickhouse-stateful-with-coverage-test",
+    "docker/test/stateful_with_coverage": "yandex/clickhouse-stateful-test-with-coverage",
    "docker/test/stateless": "yandex/clickhouse-stateless-test",
-    "docker/test/stateless_with_coverage": "yandex/clickhouse-stateless-with-coverage-test",
+    "docker/test/stateless_with_coverage": "yandex/clickhouse-stateless-test-with-coverage",
    "docker/test/unit": "yandex/clickhouse-unit-test",
    "docker/test/stress": "yandex/clickhouse-stress-test",
    "dbms/tests/integration/image": "yandex/clickhouse-integration-tests-runner"
--- a/docker/test/stateful_with_coverage/Dockerfile
+++ b/docker/test/stateful_with_coverage/Dockerfile
@ -1,7 +1,7 @@
 # docker build -t yandex/clickhouse-stateful-test .
 FROM yandex/clickhouse-stateless-test

-RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic main" >> /etc/apt/sources.list
+RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic-9 main" >> /etc/apt/sources.list

 RUN apt-get update -y \
    && env DEBIAN_FRONTEND=noninteractive \
--- a/docker/test/stateless_with_coverage/Dockerfile
+++ b/docker/test/stateless_with_coverage/Dockerfile
@ -1,7 +1,7 @@
 # docker build -t yandex/clickhouse-stateless-with-coverage-test .
 FROM yandex/clickhouse-deb-builder

-RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic main" >> /etc/apt/sources.list
+RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic-9 main" >> /etc/apt/sources.list

 RUN apt-get update -y \
    && env DEBIAN_FRONTEND=noninteractive \
--- a/docs/en/roadmap.md
+++ b/docs/en/roadmap.md
@ -1,15 +1,14 @@
 # Roadmap

-## Q2 2019
+## Q3 2019

 - DDL for dictionaries
 - Integration with S3-like object stores
 - Multiple storages for hot/cold data, JBOD support

-## Q3 2019
+## Q4 2019

- JOIN execution improvements:
-    - Distributed join not limited by memory
+- JOIN not limited by available memory
 - Resource pools for more precise distribution of cluster capacity between users
 - Fine-grained authorization
 - Integration with external authentication services