Merge branch 'master' of github.com:yandex/ClickHouse

This commit is contained in:
BayoNet 2019-08-19 17:48:00 +03:00
commit a012daebe6
14 changed files with 103 additions and 21 deletions

2
.github/label-pr.yml vendored Normal file
View File

@ -0,0 +1,2 @@
- regExp: ".*\\.md$"
labels: ["documentation", "pr-documentation"]

9
.github/main.workflow vendored Normal file
View File

@ -0,0 +1,9 @@
workflow "Main workflow" {
resolves = ["Label PR"]
on = "pull_request"
}
action "Label PR" {
uses = "decathlon/pull-request-labeler-action@v1.0.0"
secrets = ["GITHUB_TOKEN"]
}

View File

@ -1,3 +1,54 @@
## ClickHouse release 19.13.2.19, 2019-08-14
### New Feature
* Sampling profiler on query level. [Example](https://gist.github.com/alexey-milovidov/92758583dd41c24c360fdb8d6a4da194). [#4247](https://github.com/yandex/ClickHouse/issues/4247) ([laplab](https://github.com/laplab)) [#6124](https://github.com/yandex/ClickHouse/pull/6124) ([alexey-milovidov](https://github.com/alexey-milovidov)) [#6250](https://github.com/yandex/ClickHouse/pull/6250) [#6283](https://github.com/yandex/ClickHouse/pull/6283) [#6386](https://github.com/yandex/ClickHouse/pull/6386)
* Allow to specify a list of columns with `COLUMNS('regexp')` expression that works like a more sophisticated variant of `*` asterisk. [#5951](https://github.com/yandex/ClickHouse/pull/5951) ([mfridental](https://github.com/mfridental)), ([alexey-milovidov](https://github.com/alexey-milovidov))
* `CREATE TABLE AS table_function()` is now possible [#6057](https://github.com/yandex/ClickHouse/pull/6057) ([dimarub2000](https://github.com/dimarub2000))
* Adam optimizer for stochastic gradient descent is used by default in `stochasticLinearRegression()` and `stochasticLogisticRegression()` aggregate functions, because it shows good quality without almost any tuning. [#6000](https://github.com/yandex/ClickHouse/pull/6000) ([Quid37](https://github.com/Quid37))
* Added functions for working with the сustom week number [#5212](https://github.com/yandex/ClickHouse/pull/5212) ([Andy Yang](https://github.com/andyyzh))
* `RENAME` queries now work with all storages. [#5953](https://github.com/yandex/ClickHouse/pull/5953) ([Ivan](https://github.com/abyss7))
* Now client receive logs from server with any desired level by setting `send_logs_level` regardless to the log level specified in server settings. [#5964](https://github.com/yandex/ClickHouse/pull/5964) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov))
### Experimental features
* New query processing pipeline. Use `experimental_use_processors=1` option to enable it. Use for your own trouble. [#4914](https://github.com/yandex/ClickHouse/pull/4914) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
### Bug Fix
* Kafka integration has been fixed in this version.
* Fixed `DoubleDelta` encoding of `Int64` for large `DoubleDelta` values, improved `DoubleDelta` encoding for random data for `Int32`. [#5998](https://github.com/yandex/ClickHouse/pull/5998) ([Vasily Nemkov](https://github.com/Enmk))
* Fixed overestimation of `max_rows_to_read` if the setting `merge_tree_uniform_read_distribution` is set to 0. [#6019](https://github.com/yandex/ClickHouse/pull/6019) ([alexey-milovidov](https://github.com/alexey-milovidov))
### Improvement
* The setting `input_format_defaults_for_omitted_fields` is enabled by default. It enables calculation of complex default expressions for omitted fields in `JSONEachRow` and `CSV*` formats. It should be the expected behaviour but may lead to negligible performance difference or subtle incompatibilities. [#6043](https://github.com/yandex/ClickHouse/pull/6043) ([Artem Zuikov](https://github.com/4ertus2)), [#5625](https://github.com/yandex/ClickHouse/pull/5625) ([akuzm](https://github.com/akuzm))
* Throws an exception if `config.d` file doesn't have the corresponding root element as the config file [#6123](https://github.com/yandex/ClickHouse/pull/6123) ([dimarub2000](https://github.com/dimarub2000))
### Performance Improvement
* Optimize `count()`. Now it uses the smallest column (if possible). [#6028](https://github.com/yandex/ClickHouse/pull/6028) ([Amos Bird](https://github.com/amosbird))
### Build/Testing/Packaging Improvement
* Report memory usage in performance tests. [#5899](https://github.com/yandex/ClickHouse/pull/5899) ([akuzm](https://github.com/akuzm))
* Fix build with external `libcxx` [#6010](https://github.com/yandex/ClickHouse/pull/6010) ([Ivan](https://github.com/abyss7))
* Fix shared build with `rdkafka` library [#6101](https://github.com/yandex/ClickHouse/pull/6101) ([Ivan](https://github.com/abyss7))
## ClickHouse release 19.11.7.40, 2019-08-14
### Bug fix
* Kafka integration has been fixed in this version.
* Fix segfault when using `arrayReduce` for constant arguments. [#6326](https://github.com/yandex/ClickHouse/pull/6326) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed `toFloat()` monotonicity. [#6374](https://github.com/yandex/ClickHouse/pull/6374) ([dimarub2000](https://github.com/dimarub2000))
* Fix segfault with enabled `optimize_skip_unused_shards` and missing sharding key. [#6384](https://github.com/yandex/ClickHouse/pull/6384) ([CurtizJ](https://github.com/CurtizJ))
* Fixed logic of `arrayEnumerateUniqRanked` function. [#6423](https://github.com/yandex/ClickHouse/pull/6423) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Removed extra verbose logging from MySQL handler. [#6389](https://github.com/yandex/ClickHouse/pull/6389) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix wrong behavior and possible segfaults in `topK` and `topKWeighted` aggregated functions. [#6404](https://github.com/yandex/ClickHouse/pull/6404) ([CurtizJ](https://github.com/CurtizJ))
* Do not expose virtual columns in `system.columns` table. This is required for backward compatibility. [#6406](https://github.com/yandex/ClickHouse/pull/6406) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix bug with memory allocation for string fields in complex key cache dictionary. [#6447](https://github.com/yandex/ClickHouse/pull/6447) ([alesapin](https://github.com/alesapin))
* Fix bug with enabling adaptive granularity when creating new replica for `Replicated*MergeTree` table. [#6452](https://github.com/yandex/ClickHouse/pull/6452) ([alesapin](https://github.com/alesapin))
* Fix infinite loop when reading Kafka messages. [#6354](https://github.com/yandex/ClickHouse/pull/6354) ([abyss7](https://github.com/abyss7))
* Fixed the possibility of a fabricated query to cause server crash due to stack overflow in SQL parser and possibility of stack overflow in `Merge` and `Distributed` tables [#6433](https://github.com/yandex/ClickHouse/pull/6433) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed Gorilla encoding error on small sequences. [#6444](https://github.com/yandex/ClickHouse/pull/6444) ([Enmk](https://github.com/Enmk))
### Improvement
* Allow user to override `poll_interval` and `idle_connection_timeout` settings on connection. [#6230](https://github.com/yandex/ClickHouse/pull/6230) ([alexey-milovidov](https://github.com/alexey-milovidov))
## ClickHouse release 19.11.5.28, 2019-08-05
### Bug fix

View File

@ -25,25 +25,26 @@ IMergedBlockOutputStream::IMergedBlockOutputStream(
size_t aio_threshold_,
bool blocks_are_granules_size_,
const std::vector<MergeTreeIndexPtr> & indices_to_recalc,
const MergeTreeIndexGranularity & index_granularity_)
const MergeTreeIndexGranularity & index_granularity_,
const MergeTreeIndexGranularityInfo * index_granularity_info_)
: storage(storage_)
, part_path(part_path_)
, min_compress_block_size(min_compress_block_size_)
, max_compress_block_size(max_compress_block_size_)
, aio_threshold(aio_threshold_)
, marks_file_extension(storage.canUseAdaptiveGranularity() ? getAdaptiveMrkExtension() : getNonAdaptiveMrkExtension())
, can_use_adaptive_granularity(index_granularity_info_ ? index_granularity_info_->is_adaptive : storage.canUseAdaptiveGranularity())
, marks_file_extension(can_use_adaptive_granularity ? getAdaptiveMrkExtension() : getNonAdaptiveMrkExtension())
, blocks_are_granules_size(blocks_are_granules_size_)
, index_granularity(index_granularity_)
, compute_granularity(index_granularity.empty())
, codec(std::move(codec_))
, skip_indices(indices_to_recalc)
, with_final_mark(storage.settings.write_final_mark && storage.canUseAdaptiveGranularity())
, with_final_mark(storage.settings.write_final_mark && can_use_adaptive_granularity)
{
if (blocks_are_granules_size && !index_granularity.empty())
throw Exception("Can't take information about index granularity from blocks, when non empty index_granularity array specified", ErrorCodes::LOGICAL_ERROR);
}
void IMergedBlockOutputStream::addStreams(
const String & path,
const String & name,
@ -145,7 +146,7 @@ void IMergedBlockOutputStream::fillIndexGranularity(const Block & block)
blocks_are_granules_size,
index_offset,
index_granularity,
storage.canUseAdaptiveGranularity());
can_use_adaptive_granularity);
}
void IMergedBlockOutputStream::writeSingleMark(
@ -176,7 +177,7 @@ void IMergedBlockOutputStream::writeSingleMark(
writeIntBinary(stream.plain_hashing.count(), stream.marks);
writeIntBinary(stream.compressed.offset(), stream.marks);
if (storage.canUseAdaptiveGranularity())
if (can_use_adaptive_granularity)
writeIntBinary(number_of_rows, stream.marks);
}, path);
}
@ -362,7 +363,7 @@ void IMergedBlockOutputStream::calculateAndSerializeSkipIndices(
writeIntBinary(stream.compressed.offset(), stream.marks);
/// Actually this numbers is redundant, but we have to store them
/// to be compatible with normal .mrk2 file format
if (storage.canUseAdaptiveGranularity())
if (can_use_adaptive_granularity)
writeIntBinary(1UL, stream.marks);
++skip_index_current_mark;

View File

@ -1,6 +1,7 @@
#pragma once
#include <Storages/MergeTree/MergeTreeIndexGranularity.h>
#include <Storages/MergeTree/MergeTreeIndexGranularityInfo.h>
#include <IO/WriteBufferFromFile.h>
#include <Compression/CompressedWriteBuffer.h>
#include <IO/HashingWriteBuffer.h>
@ -23,7 +24,8 @@ public:
size_t aio_threshold_,
bool blocks_are_granules_size_,
const std::vector<MergeTreeIndexPtr> & indices_to_recalc,
const MergeTreeIndexGranularity & index_granularity_);
const MergeTreeIndexGranularity & index_granularity_,
const MergeTreeIndexGranularityInfo * index_granularity_info_ = nullptr);
using WrittenOffsetColumns = std::set<std::string>;
@ -141,6 +143,7 @@ protected:
size_t current_mark = 0;
size_t skip_index_mark = 0;
const bool can_use_adaptive_granularity;
const std::string marks_file_extension;
const bool blocks_are_granules_size;

View File

@ -1581,7 +1581,8 @@ void MergeTreeData::alterDataPart(
true /* skip_offsets */,
{},
unused_written_offsets,
part->index_granularity);
part->index_granularity,
&part->index_granularity_info);
in.readPrefix();
out.writePrefix();

View File

@ -934,6 +934,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor
new_data_part->relative_path = "tmp_mut_" + future_part.name;
new_data_part->is_temp = true;
new_data_part->ttl_infos = source_part->ttl_infos;
new_data_part->index_granularity_info = source_part->index_granularity_info;
String new_part_tmp_path = new_data_part->getFullPath();
@ -1069,7 +1070,8 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor
/* skip_offsets = */ false,
std::vector<MergeTreeIndexPtr>(indices_to_recalc.begin(), indices_to_recalc.end()),
unused_written_offsets,
source_part->index_granularity
source_part->index_granularity,
&source_part->index_granularity_info
);
in->readPrefix();

View File

@ -8,14 +8,16 @@ MergedColumnOnlyOutputStream::MergedColumnOnlyOutputStream(
CompressionCodecPtr default_codec_, bool skip_offsets_,
const std::vector<MergeTreeIndexPtr> & indices_to_recalc_,
WrittenOffsetColumns & already_written_offset_columns_,
const MergeTreeIndexGranularity & index_granularity_)
const MergeTreeIndexGranularity & index_granularity_,
const MergeTreeIndexGranularityInfo * index_granularity_info_)
: IMergedBlockOutputStream(
storage_, part_path_, storage_.global_context.getSettings().min_compress_block_size,
storage_.global_context.getSettings().max_compress_block_size, default_codec_,
storage_.global_context.getSettings().min_bytes_to_use_direct_io,
false,
indices_to_recalc_,
index_granularity_),
index_granularity_,
index_granularity_info_),
header(header_), sync(sync_), skip_offsets(skip_offsets_),
already_written_offset_columns(already_written_offset_columns_)
{

View File

@ -17,7 +17,8 @@ public:
CompressionCodecPtr default_codec_, bool skip_offsets_,
const std::vector<MergeTreeIndexPtr> & indices_to_recalc_,
WrittenOffsetColumns & already_written_offset_columns_,
const MergeTreeIndexGranularity & index_granularity_);
const MergeTreeIndexGranularity & index_granularity_,
const MergeTreeIndexGranularityInfo * index_granularity_info_ = nullptr);
Block getHeader() const override { return header; }
void write(const Block & block) override;

View File

@ -288,6 +288,17 @@ def test_mixed_granularity_single_node(start_dynamic_cluster, node):
node.exec_in_container(["bash", "-c", "find {p} -name '*.mrk' | grep '.*'".format(p=path_to_old_part)]) # check that we have non adaptive files
node.query("ALTER TABLE table_with_default_granularity UPDATE dummy = dummy + 1 WHERE 1")
# still works
assert node.query("SELECT count() from table_with_default_granularity") == '6\n'
node.query("ALTER TABLE table_with_default_granularity MODIFY COLUMN dummy String")
node.query("ALTER TABLE table_with_default_granularity ADD COLUMN dummy2 Float64")
#still works
assert node.query("SELECT count() from table_with_default_granularity") == '6\n'
def test_version_update_two_nodes(start_dynamic_cluster):
node11.query("INSERT INTO table_with_default_granularity VALUES (toDate('2018-10-01'), 1, 333), (toDate('2018-10-02'), 2, 444)")
node12.query("SYSTEM SYNC REPLICA table_with_default_granularity")

View File

@ -7,9 +7,9 @@
"docker/test/performance": "yandex/clickhouse-performance-test",
"docker/test/pvs": "yandex/clickhouse-pvs-test",
"docker/test/stateful": "yandex/clickhouse-stateful-test",
"docker/test/stateful_with_coverage": "yandex/clickhouse-stateful-with-coverage-test",
"docker/test/stateful_with_coverage": "yandex/clickhouse-stateful-test-with-coverage",
"docker/test/stateless": "yandex/clickhouse-stateless-test",
"docker/test/stateless_with_coverage": "yandex/clickhouse-stateless-with-coverage-test",
"docker/test/stateless_with_coverage": "yandex/clickhouse-stateless-test-with-coverage",
"docker/test/unit": "yandex/clickhouse-unit-test",
"docker/test/stress": "yandex/clickhouse-stress-test",
"dbms/tests/integration/image": "yandex/clickhouse-integration-tests-runner"

View File

@ -1,7 +1,7 @@
# docker build -t yandex/clickhouse-stateful-test .
FROM yandex/clickhouse-stateless-test
RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic main" >> /etc/apt/sources.list
RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic-9 main" >> /etc/apt/sources.list
RUN apt-get update -y \
&& env DEBIAN_FRONTEND=noninteractive \

View File

@ -1,7 +1,7 @@
# docker build -t yandex/clickhouse-stateless-with-coverage-test .
FROM yandex/clickhouse-deb-builder
RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic main" >> /etc/apt/sources.list
RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic-9 main" >> /etc/apt/sources.list
RUN apt-get update -y \
&& env DEBIAN_FRONTEND=noninteractive \

View File

@ -1,15 +1,14 @@
# Roadmap
## Q2 2019
## Q3 2019
- DDL for dictionaries
- Integration with S3-like object stores
- Multiple storages for hot/cold data, JBOD support
## Q3 2019
## Q4 2019
- JOIN execution improvements:
- Distributed join not limited by memory
- JOIN not limited by available memory
- Resource pools for more precise distribution of cluster capacity between users
- Fine-grained authorization
- Integration with external authentication services