Merge branch 'master' of github.com:yandex/ClickHouse

2024-10-04 15:40:49 +00:00 · 2019-08-20 10:02:39 +03:00 · 2019-08-20 10:02:39 +03:00 · b5786fec29
commit b5786fec29
parent 453020593f c870cbce73
39 changed files with 639 additions and 70 deletions
--- a/.github/label-pr.yml
+++ b/.github/label-pr.yml
@ -0,0 +1,2 @@
+- regExp: ".*\\.md$"
+  labels: ["documentation", "pr-documentation"]
--- a/.github/main.workflow
+++ b/.github/main.workflow
@ -0,0 +1,9 @@
+workflow "Main workflow" {
+  resolves = ["Label PR"]
+  on = "pull_request"
+}
+
+action "Label PR" {
+  uses = "decathlon/pull-request-labeler-action@v1.0.0"
+  secrets = ["GITHUB_TOKEN"]
+}
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,3 +1,54 @@
+## ClickHouse release 19.13.2.19, 2019-08-14
+
+### New Feature
+* Sampling profiler on query level. [Example](https://gist.github.com/alexey-milovidov/92758583dd41c24c360fdb8d6a4da194). [#4247](https://github.com/yandex/ClickHouse/issues/4247) ([laplab](https://github.com/laplab)) [#6124](https://github.com/yandex/ClickHouse/pull/6124) ([alexey-milovidov](https://github.com/alexey-milovidov)) [#6250](https://github.com/yandex/ClickHouse/pull/6250) [#6283](https://github.com/yandex/ClickHouse/pull/6283) [#6386](https://github.com/yandex/ClickHouse/pull/6386) 
+* Allow to specify a list of columns with `COLUMNS('regexp')` expression that works like a more sophisticated variant of `*` asterisk. [#5951](https://github.com/yandex/ClickHouse/pull/5951)  ([mfridental](https://github.com/mfridental)), ([alexey-milovidov](https://github.com/alexey-milovidov))
+* `CREATE TABLE AS table_function()` is now possible [#6057](https://github.com/yandex/ClickHouse/pull/6057) ([dimarub2000](https://github.com/dimarub2000))
+* Adam optimizer for stochastic gradient descent is used by default in `stochasticLinearRegression()` and `stochasticLogisticRegression()` aggregate functions, because it shows good quality without almost any tuning. [#6000](https://github.com/yandex/ClickHouse/pull/6000) ([Quid37](https://github.com/Quid37))
+* Added functions for working with the сustom week number [#5212](https://github.com/yandex/ClickHouse/pull/5212) ([Andy Yang](https://github.com/andyyzh))
+* `RENAME` queries now work with all storages. [#5953](https://github.com/yandex/ClickHouse/pull/5953) ([Ivan](https://github.com/abyss7))
+* Now client receive logs from server with any desired level by setting `send_logs_level` regardless to the log level specified in server settings. [#5964](https://github.com/yandex/ClickHouse/pull/5964) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov))
+
+### Experimental features
+* New query processing pipeline. Use `experimental_use_processors=1` option to enable it. Use for your own trouble. [#4914](https://github.com/yandex/ClickHouse/pull/4914) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
+
+### Bug Fix
+* Kafka integration has been fixed in this version.
+* Fixed `DoubleDelta` encoding of `Int64` for large `DoubleDelta` values, improved `DoubleDelta` encoding for random data for `Int32`. [#5998](https://github.com/yandex/ClickHouse/pull/5998) ([Vasily Nemkov](https://github.com/Enmk))
+* Fixed overestimation of `max_rows_to_read` if the setting `merge_tree_uniform_read_distribution` is set to 0. [#6019](https://github.com/yandex/ClickHouse/pull/6019) ([alexey-milovidov](https://github.com/alexey-milovidov))
+
+### Improvement
+* The setting `input_format_defaults_for_omitted_fields` is enabled by default. It enables calculation of complex default expressions for omitted fields in `JSONEachRow` and `CSV*` formats. It should be the expected behaviour but may lead to negligible performance difference or subtle incompatibilities. [#6043](https://github.com/yandex/ClickHouse/pull/6043) ([Artem Zuikov](https://github.com/4ertus2)), [#5625](https://github.com/yandex/ClickHouse/pull/5625) ([akuzm](https://github.com/akuzm))
+* Throws an exception if `config.d` file doesn't have the corresponding root element as the config file [#6123](https://github.com/yandex/ClickHouse/pull/6123) ([dimarub2000](https://github.com/dimarub2000))
+
+### Performance Improvement
+* Optimize `count()`. Now it uses the smallest column (if possible). [#6028](https://github.com/yandex/ClickHouse/pull/6028) ([Amos Bird](https://github.com/amosbird))
+
+### Build/Testing/Packaging Improvement
+* Report memory usage in performance tests. [#5899](https://github.com/yandex/ClickHouse/pull/5899) ([akuzm](https://github.com/akuzm))
+* Fix build with external `libcxx` [#6010](https://github.com/yandex/ClickHouse/pull/6010) ([Ivan](https://github.com/abyss7))
+* Fix shared build with `rdkafka` library [#6101](https://github.com/yandex/ClickHouse/pull/6101) ([Ivan](https://github.com/abyss7))
+
+## ClickHouse release 19.11.7.40, 2019-08-14
+
+### Bug fix
+* Kafka integration has been fixed in this version.
+* Fix segfault when using `arrayReduce` for constant arguments. [#6326](https://github.com/yandex/ClickHouse/pull/6326) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Fixed `toFloat()` monotonicity. [#6374](https://github.com/yandex/ClickHouse/pull/6374) ([dimarub2000](https://github.com/dimarub2000))
+* Fix segfault with enabled `optimize_skip_unused_shards` and missing sharding key. [#6384](https://github.com/yandex/ClickHouse/pull/6384) ([CurtizJ](https://github.com/CurtizJ))
+* Fixed logic of `arrayEnumerateUniqRanked` function. [#6423](https://github.com/yandex/ClickHouse/pull/6423) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Removed extra verbose logging from MySQL handler. [#6389](https://github.com/yandex/ClickHouse/pull/6389) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Fix wrong behavior and possible segfaults in `topK` and `topKWeighted` aggregated functions. [#6404](https://github.com/yandex/ClickHouse/pull/6404) ([CurtizJ](https://github.com/CurtizJ))
+* Do not expose virtual columns in `system.columns` table. This is required for backward compatibility. [#6406](https://github.com/yandex/ClickHouse/pull/6406) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Fix bug with memory allocation for string fields in complex key cache dictionary. [#6447](https://github.com/yandex/ClickHouse/pull/6447) ([alesapin](https://github.com/alesapin))
+* Fix bug with enabling adaptive granularity when creating new replica for `Replicated*MergeTree` table. [#6452](https://github.com/yandex/ClickHouse/pull/6452) ([alesapin](https://github.com/alesapin))
+* Fix infinite loop when reading Kafka messages. [#6354](https://github.com/yandex/ClickHouse/pull/6354) ([abyss7](https://github.com/abyss7))
+* Fixed the possibility of a fabricated query to cause server crash due to stack overflow in SQL parser and possibility of stack overflow in `Merge` and `Distributed` tables [#6433](https://github.com/yandex/ClickHouse/pull/6433) ([alexey-milovidov](https://github.com/alexey-milovidov))
+* Fixed Gorilla encoding error on small sequences. [#6444](https://github.com/yandex/ClickHouse/pull/6444) ([Enmk](https://github.com/Enmk))
+
+### Improvement
+* Allow user to override `poll_interval` and `idle_connection_timeout` settings on connection. [#6230](https://github.com/yandex/ClickHouse/pull/6230) ([alexey-milovidov](https://github.com/alexey-milovidov))
+
 ## ClickHouse release 19.11.5.28, 2019-08-05

 ### Bug fix
@ -299,7 +350,7 @@ It allows to set commit mode: after every batch of messages is handled, or after
 * Renamed functions `leastSqr` to `simpleLinearRegression`, `LinearRegression` to `linearRegression`, `LogisticRegression` to `logisticRegression`. [#5391](https://github.com/yandex/ClickHouse/pull/5391) ([Nikolai Kochetov](https://github.com/KochetovNicolai))

 ### Performance Improvements
-* Paralellize processing of parts in alter modify query. [#4639](https://github.com/yandex/ClickHouse/pull/4639) ([Ivan Kush](https://github.com/IvanKush))
+* Paralellize processing of parts of non-replicated MergeTree tables in ALTER MODIFY query. [#4639](https://github.com/yandex/ClickHouse/pull/4639) ([Ivan Kush](https://github.com/IvanKush))
 * Optimizations in regular expressions extraction. [#5193](https://github.com/yandex/ClickHouse/pull/5193) [#5191](https://github.com/yandex/ClickHouse/pull/5191) ([Danila Kutenin](https://github.com/danlark1))
 * Do not add right join key column to join result if it's used only in join on section. [#5260](https://github.com/yandex/ClickHouse/pull/5260) ([Artem Zuikov](https://github.com/4ertus2))
 * Freeze the Kafka buffer after first empty response. It avoids multiple invokations of `ReadBuffer::next()` for empty result in some row-parsing streams. [#5283](https://github.com/yandex/ClickHouse/pull/5283) ([Ivan](https://github.com/abyss7))
--- a/dbms/programs/odbc-bridge/CMakeLists.txt
+++ b/dbms/programs/odbc-bridge/CMakeLists.txt
@ -5,7 +5,6 @@ set(CLICKHOUSE_ODBC_BRIDGE_SOURCES
    ${CMAKE_CURRENT_SOURCE_DIR}/IdentifierQuoteHandler.cpp
    ${CMAKE_CURRENT_SOURCE_DIR}/MainHandler.cpp
    ${CMAKE_CURRENT_SOURCE_DIR}/ODBCBlockInputStream.cpp
-    ${CMAKE_CURRENT_SOURCE_DIR}/odbc-bridge.cpp
    ${CMAKE_CURRENT_SOURCE_DIR}/ODBCBridge.cpp
    ${CMAKE_CURRENT_SOURCE_DIR}/PingHandler.cpp
    ${CMAKE_CURRENT_SOURCE_DIR}/validateODBCConnectionString.cpp
--- a/dbms/src/AggregateFunctions/AggregateFunctionQuantile.cpp
+++ b/dbms/src/AggregateFunctions/AggregateFunctionQuantile.cpp
@ -24,6 +24,12 @@ template <typename Value, bool FloatReturn> using FuncQuantilesDeterministic = A
 template <typename Value, bool _> using FuncQuantileExact = AggregateFunctionQuantile<Value, QuantileExact<Value>, NameQuantileExact, false, void, false>;
 template <typename Value, bool _> using FuncQuantilesExact = AggregateFunctionQuantile<Value, QuantileExact<Value>, NameQuantilesExact, false, void, true>;

+template <typename Value, bool _> using FuncQuantileExactExclusive = AggregateFunctionQuantile<Value, QuantileExactExclusive<Value>, NameQuantileExactExclusive, false, Float64, false>;
+template <typename Value, bool _> using FuncQuantilesExactExclusive = AggregateFunctionQuantile<Value, QuantileExactExclusive<Value>, NameQuantilesExactExclusive, false, Float64, true>;
+
+template <typename Value, bool _> using FuncQuantileExactInclusive = AggregateFunctionQuantile<Value, QuantileExactInclusive<Value>, NameQuantileExactInclusive, false, Float64, false>;
+template <typename Value, bool _> using FuncQuantilesExactInclusive = AggregateFunctionQuantile<Value, QuantileExactInclusive<Value>, NameQuantilesExactInclusive, false, Float64, true>;
+
 template <typename Value, bool _> using FuncQuantileExactWeighted = AggregateFunctionQuantile<Value, QuantileExactWeighted<Value>, NameQuantileExactWeighted, true, void, false>;
 template <typename Value, bool _> using FuncQuantilesExactWeighted = AggregateFunctionQuantile<Value, QuantileExactWeighted<Value>, NameQuantilesExactWeighted, true, void, true>;

@ -92,6 +98,12 @@ void registerAggregateFunctionsQuantile(AggregateFunctionFactory & factory)
    factory.registerFunction(NameQuantileExact::name, createAggregateFunctionQuantile<FuncQuantileExact>);
    factory.registerFunction(NameQuantilesExact::name, createAggregateFunctionQuantile<FuncQuantilesExact>);

+    factory.registerFunction(NameQuantileExactExclusive::name, createAggregateFunctionQuantile<FuncQuantileExactExclusive>);
+    factory.registerFunction(NameQuantilesExactExclusive::name, createAggregateFunctionQuantile<FuncQuantilesExactExclusive>);
+
+    factory.registerFunction(NameQuantileExactInclusive::name, createAggregateFunctionQuantile<FuncQuantileExactInclusive>);
+    factory.registerFunction(NameQuantilesExactInclusive::name, createAggregateFunctionQuantile<FuncQuantilesExactInclusive>);
+
    factory.registerFunction(NameQuantileExactWeighted::name, createAggregateFunctionQuantile<FuncQuantileExactWeighted>);
    factory.registerFunction(NameQuantilesExactWeighted::name, createAggregateFunctionQuantile<FuncQuantilesExactWeighted>);

--- a/dbms/src/AggregateFunctions/AggregateFunctionQuantile.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionQuantile.h
@ -199,8 +199,15 @@ struct NameQuantileDeterministic { static constexpr auto name = "quantileDetermi
 struct NameQuantilesDeterministic { static constexpr auto name = "quantilesDeterministic"; };

 struct NameQuantileExact { static constexpr auto name = "quantileExact"; };
-struct NameQuantileExactWeighted { static constexpr auto name = "quantileExactWeighted"; };
 struct NameQuantilesExact { static constexpr auto name = "quantilesExact"; };
+
+struct NameQuantileExactExclusive { static constexpr auto name = "quantileExactExclusive"; };
+struct NameQuantilesExactExclusive { static constexpr auto name = "quantilesExactExclusive"; };
+
+struct NameQuantileExactInclusive { static constexpr auto name = "quantileExactInclusive"; };
+struct NameQuantilesExactInclusive { static constexpr auto name = "quantilesExactInclusive"; };
+
+struct NameQuantileExactWeighted { static constexpr auto name = "quantileExactWeighted"; };
 struct NameQuantilesExactWeighted { static constexpr auto name = "quantilesExactWeighted"; };

 struct NameQuantileTiming { static constexpr auto name = "quantileTiming"; };
--- a/dbms/src/AggregateFunctions/AggregateFunctionWindowFunnel.cpp
+++ b/dbms/src/AggregateFunctions/AggregateFunctionWindowFunnel.cpp
@ -17,8 +17,8 @@ namespace
 template <template <typename> class Data>
 AggregateFunctionPtr createAggregateFunctionWindowFunnel(const std::string & name, const DataTypes & arguments, const Array & params)
 {
-    if (params.size() != 1)
-        throw Exception{"Aggregate function " + name + " requires exactly one parameter.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH};
+    if (params.size() < 1)
+        throw Exception{"Aggregate function " + name + " requires at least one parameter: <window>, [option, [option, ...]]", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH};

    if (arguments.size() < 2)
        throw Exception("Aggregate function " + name + " requires one timestamp argument and at least one event condition.", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
--- a/dbms/src/AggregateFunctions/AggregateFunctionWindowFunnel.h
+++ b/dbms/src/AggregateFunctions/AggregateFunctionWindowFunnel.h
@ -139,6 +139,7 @@ class AggregateFunctionWindowFunnel final
 private:
    UInt64 window;
    UInt8 events_size;
+    UInt8 strict;


    // Loop through the entire events_list, update the event timestamp value
@ -165,6 +166,10 @@ private:

            if (event_idx == 0)
                events_timestamp[0] = timestamp;
+            else if (strict && events_timestamp[event_idx] >= 0)
+            {
+                return event_idx + 1;
+            }
            else if (events_timestamp[event_idx - 1] >= 0 && timestamp <= events_timestamp[event_idx - 1] + window)
            {
                events_timestamp[event_idx] = events_timestamp[event_idx - 1];
@ -191,8 +196,17 @@ public:
    {
        events_size = arguments.size() - 1;
        window = params.at(0).safeGet<UInt64>();
-    }

+        strict = 0;
+        for (size_t i = 1; i < params.size(); ++i)
+        {
+            String option = params.at(i).safeGet<String>();
+            if (option.compare("strict") == 0)
+                strict = 1;
+            else
+                throw Exception{"Aggregate function " + getName() + " doesn't support a parameter: " + option, ErrorCodes::BAD_ARGUMENTS};
+        }
+    }

    DataTypePtr getReturnType() const override
    {
--- a/dbms/src/AggregateFunctions/QuantileExact.h
+++ b/dbms/src/AggregateFunctions/QuantileExact.h
@ -14,6 +14,7 @@ namespace DB
 namespace ErrorCodes
 {
    extern const int NOT_IMPLEMENTED;
+    extern const int BAD_ARGUMENTS;
 }

 /** Calculates quantile by collecting all values into array
@ -106,16 +107,134 @@ struct QuantileExact
                result[i] = Value();
        }
    }
+};

-    /// The same, but in the case of an empty state, NaN is returned.
-    Float64 getFloat(Float64) const
+/// QuantileExactExclusive is equivalent to Excel PERCENTILE.EXC, R-6, SAS-4, SciPy-(0,0)
+template <typename Value>
+struct QuantileExactExclusive : public QuantileExact<Value>
+{
+    using QuantileExact<Value>::array;
+
+    /// Get the value of the `level` quantile. The level must be between 0 and 1 excluding bounds.
+    Float64 getFloat(Float64 level)
    {
-        throw Exception("Method getFloat is not implemented for QuantileExact", ErrorCodes::NOT_IMPLEMENTED);
+        if (!array.empty())
+        {
+            if (level == 0. || level == 1.)
+                throw Exception("QuantileExactExclusive cannot interpolate for the percentiles 1 and 0", ErrorCodes::BAD_ARGUMENTS);
+
+            Float64 h = level * (array.size() + 1);
+            auto n = static_cast<size_t>(h);
+
+            if (n >= array.size())
+                return array[array.size() - 1];
+            else if (n < 1)
+                return array[0];
+
+            std::nth_element(array.begin(), array.begin() + n - 1, array.end());
+            auto nth_element = std::min_element(array.begin() + n, array.end());
+
+            return array[n - 1] + (h - n) * (*nth_element - array[n - 1]);
        }

-    void getManyFloat(const Float64 *, const size_t *, size_t, Float64 *) const
+        return std::numeric_limits<Float64>::quiet_NaN();
+    }
+
+    void getManyFloat(const Float64 * levels, const size_t * indices, size_t size, Float64 * result)
    {
-        throw Exception("Method getManyFloat is not implemented for QuantileExact", ErrorCodes::NOT_IMPLEMENTED);
+        if (!array.empty())
+        {
+            size_t prev_n = 0;
+            for (size_t i = 0; i < size; ++i)
+            {
+                auto level = levels[indices[i]];
+                if (level == 0. || level == 1.)
+                    throw Exception("QuantileExactExclusive cannot interpolate for the percentiles 1 and 0", ErrorCodes::BAD_ARGUMENTS);
+
+                Float64 h = level * (array.size() + 1);
+                auto n = static_cast<size_t>(h);
+
+                if (n >= array.size())
+                    result[indices[i]] = array[array.size() - 1];
+                else if (n < 1)
+                    result[indices[i]] = array[0];
+                else
+                {
+                    std::nth_element(array.begin() + prev_n, array.begin() + n - 1, array.end());
+                    auto nth_element = std::min_element(array.begin() + n, array.end());
+
+                    result[indices[i]] = array[n - 1] + (h - n) * (*nth_element - array[n - 1]);
+                    prev_n = n - 1;
+                }
+            }
+        }
+        else
+        {
+            for (size_t i = 0; i < size; ++i)
+                result[i] = std::numeric_limits<Float64>::quiet_NaN();
+        }
+    }
+};
+
+/// QuantileExactInclusive is equivalent to Excel PERCENTILE and PERCENTILE.INC, R-7,  SciPy-(1,1)
+template <typename Value>
+struct QuantileExactInclusive : public QuantileExact<Value>
+{
+    using QuantileExact<Value>::array;
+
+    /// Get the value of the `level` quantile. The level must be between 0 and 1 including bounds.
+    Float64 getFloat(Float64 level)
+    {
+        if (!array.empty())
+        {
+            Float64 h = level * (array.size() - 1) + 1;
+            auto n = static_cast<size_t>(h);
+
+            if (n >= array.size())
+                return array[array.size() - 1];
+            else if (n < 1)
+                return array[0];
+
+            std::nth_element(array.begin(), array.begin() + n - 1, array.end());
+            auto nth_element = std::min_element(array.begin() + n, array.end());
+
+            return array[n - 1] + (h - n) * (*nth_element - array[n - 1]);
+        }
+
+        return std::numeric_limits<Float64>::quiet_NaN();
+    }
+
+    void getManyFloat(const Float64 * levels, const size_t * indices, size_t size, Float64 * result)
+    {
+        if (!array.empty())
+        {
+            size_t prev_n = 0;
+            for (size_t i = 0; i < size; ++i)
+            {
+                auto level = levels[indices[i]];
+
+                Float64 h = level * (array.size() - 1) + 1;
+                auto n = static_cast<size_t>(h);
+
+                if (n >= array.size())
+                    result[indices[i]] = array[array.size() - 1];
+                else if (n < 1)
+                    result[indices[i]] = array[0];
+                else
+                {
+                    std::nth_element(array.begin() + prev_n, array.begin() + n - 1, array.end());
+                    auto nth_element = std::min_element(array.begin() + n, array.end());
+
+                    result[indices[i]] = array[n - 1] + (h - n) * (*nth_element - array[n - 1]);
+                    prev_n = n - 1;
+                }
+            }
+        }
+        else
+        {
+            for (size_t i = 0; i < size; ++i)
+                result[i] = std::numeric_limits<Float64>::quiet_NaN();
+        }
    }
 };

--- a/dbms/src/Formats/CSVRowInputStream.cpp
+++ b/dbms/src/Formats/CSVRowInputStream.cpp
@ -348,10 +348,9 @@ bool OPTIMIZE(1) CSVRowInputStream::parseRowAndPrintDiagnosticInfo(MutableColumn
            const auto & current_column_type = data_types[table_column];
            const bool is_last_file_column =
                    file_column + 1 == column_indexes_for_input_fields.size();
-            const bool at_delimiter = *istr.position() == delimiter;
+            const bool at_delimiter = !istr.eof() && *istr.position() == delimiter;
            const bool at_last_column_line_end = is_last_file_column
-                    && (*istr.position() == '\n' || *istr.position() == '\r'
-                        || istr.eof());
+                    && (istr.eof() || *istr.position() == '\n' || *istr.position() == '\r');

            out << "Column " << file_column << ", " << std::string((file_column < 10 ? 2 : file_column < 100 ? 1 : 0), ' ')
                << "name: " << header.safeGetByPosition(table_column).name << ", " << std::string(max_length_of_column_name - header.safeGetByPosition(table_column).name.size(), ' ')
@ -514,10 +513,9 @@ void CSVRowInputStream::updateDiagnosticInfo()

 bool CSVRowInputStream::readField(IColumn & column, const DataTypePtr & type, bool is_last_file_column, size_t column_idx)
 {
-    const bool at_delimiter = *istr.position() == format_settings.csv.delimiter;
+    const bool at_delimiter = !istr.eof() || *istr.position() == format_settings.csv.delimiter;
    const bool at_last_column_line_end = is_last_file_column
-                                         && (*istr.position() == '\n' || *istr.position() == '\r'
-                                             || istr.eof());
+                                         && (istr.eof() || *istr.position() == '\n' || *istr.position() == '\r');

    if (format_settings.csv.empty_as_default
        && (at_delimiter || at_last_column_line_end))
--- a/dbms/src/Interpreters/Set.cpp
+++ b/dbms/src/Interpreters/Set.cpp
@ -320,13 +320,7 @@ ColumnPtr Set::execute(const Block & block, bool negative) const
        return res;
    }

-    if (data_types.size() != num_key_columns)
-    {
-        std::stringstream message;
-        message << "Number of columns in section IN doesn't match. "
-            << num_key_columns << " at left, " << data_types.size() << " at right.";
-        throw Exception(message.str(), ErrorCodes::NUMBER_OF_COLUMNS_DOESNT_MATCH);
-    }
+    checkColumnsNumber(num_key_columns);

    /// Remember the columns we will work with. Also check that the data types are correct.
    ColumnRawPtrs key_columns;
@ -337,11 +331,7 @@ ColumnPtr Set::execute(const Block & block, bool negative) const

    for (size_t i = 0; i < num_key_columns; ++i)
    {
-        if (!removeNullable(data_types[i])->equals(*removeNullable(block.safeGetByPosition(i).type)))
-            throw Exception("Types of column " + toString(i + 1) + " in section IN don't match: "
-                + data_types[i]->getName() + " on the right, " + block.safeGetByPosition(i).type->getName() +
-                " on the left.", ErrorCodes::TYPE_MISMATCH);
-
+        checkTypesEqual(i, block.safeGetByPosition(i).type);
        materialized_columns.emplace_back(block.safeGetByPosition(i).column->convertToFullColumnIfConst());
        key_columns.emplace_back() = materialized_columns.back().get();
    }
@ -421,6 +411,24 @@ void Set::executeOrdinary(
    }
 }

+void Set::checkColumnsNumber(size_t num_key_columns) const
+{
+    if (data_types.size() != num_key_columns)
+    {
+        std::stringstream message;
+        message << "Number of columns in section IN doesn't match. "
+                << num_key_columns << " at left, " << data_types.size() << " at right.";
+        throw Exception(message.str(), ErrorCodes::NUMBER_OF_COLUMNS_DOESNT_MATCH);
+    }
+}
+
+void Set::checkTypesEqual(size_t set_type_idx, const DataTypePtr & other_type) const
+{
+    if (!removeNullable(data_types[set_type_idx])->equals(*removeNullable(other_type)))
+        throw Exception("Types of column " + toString(set_type_idx + 1) + " in section IN don't match: "
+                        + data_types[set_type_idx]->getName() + " on the right, " + other_type->getName() +
+                        " on the left.", ErrorCodes::TYPE_MISMATCH);
+}

 MergeTreeSetIndex::MergeTreeSetIndex(const Columns & set_elements, std::vector<KeyTuplePositionMapping> && index_mapping_)
    : indexes_mapping(std::move(index_mapping_))
--- a/dbms/src/Interpreters/Set.h
+++ b/dbms/src/Interpreters/Set.h
@ -70,6 +70,9 @@ public:
    bool hasExplicitSetElements() const { return fill_set_elements; }
    Columns getSetElements() const { return { set_elements.begin(), set_elements.end() }; }

+    void checkColumnsNumber(size_t num_key_columns) const;
+    void checkTypesEqual(size_t set_type_idx, const DataTypePtr & other_type) const;
+
 private:
    size_t keys_size = 0;
    Sizes key_sizes;
--- a/dbms/src/Processors/Formats/Impl/CSVRowInputFormat.cpp
+++ b/dbms/src/Processors/Formats/Impl/CSVRowInputFormat.cpp
@ -349,10 +349,9 @@ bool OPTIMIZE(1) CSVRowInputFormat::parseRowAndPrintDiagnosticInfo(MutableColumn
            const auto & current_column_type = data_types[table_column];
            const bool is_last_file_column =
                    file_column + 1 == column_indexes_for_input_fields.size();
-            const bool at_delimiter = *in.position() == delimiter;
+            const bool at_delimiter = !in.eof() && *in.position() == delimiter;
            const bool at_last_column_line_end = is_last_file_column
-                                                 && (*in.position() == '\n' || *in.position() == '\r'
-                                                     || in.eof());
+                                                 && (in.eof() || *in.position() == '\n' || *in.position() == '\r');

            auto & header = getPort().getHeader();
            out << "Column " << file_column << ", " << std::string((file_column < 10 ? 2 : file_column < 100 ? 1 : 0), ' ')
@ -516,10 +515,9 @@ void CSVRowInputFormat::updateDiagnosticInfo()

 bool CSVRowInputFormat::readField(IColumn & column, const DataTypePtr & type, bool is_last_file_column, size_t column_idx)
 {
-    const bool at_delimiter = *in.position() == format_settings.csv.delimiter;
+    const bool at_delimiter = !in.eof() && *in.position() == format_settings.csv.delimiter;
    const bool at_last_column_line_end = is_last_file_column
-                                         && (*in.position() == '\n' || *in.position() == '\r'
-                                             || in.eof());
+                                         && (in.eof() || *in.position() == '\n' || *in.position() == '\r');

    if (format_settings.csv.empty_as_default
        && (at_delimiter || at_last_column_line_end))
--- a/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.cpp
+++ b/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.cpp
@ -25,25 +25,26 @@ IMergedBlockOutputStream::IMergedBlockOutputStream(
    size_t aio_threshold_,
    bool blocks_are_granules_size_,
    const std::vector<MergeTreeIndexPtr> & indices_to_recalc,
-    const MergeTreeIndexGranularity & index_granularity_)
+    const MergeTreeIndexGranularity & index_granularity_,
+    const MergeTreeIndexGranularityInfo * index_granularity_info_)
    : storage(storage_)
    , part_path(part_path_)
    , min_compress_block_size(min_compress_block_size_)
    , max_compress_block_size(max_compress_block_size_)
    , aio_threshold(aio_threshold_)
-    , marks_file_extension(storage.canUseAdaptiveGranularity() ? getAdaptiveMrkExtension() : getNonAdaptiveMrkExtension())
+    , can_use_adaptive_granularity(index_granularity_info_ ? index_granularity_info_->is_adaptive : storage.canUseAdaptiveGranularity())
+    , marks_file_extension(can_use_adaptive_granularity ? getAdaptiveMrkExtension() : getNonAdaptiveMrkExtension())
    , blocks_are_granules_size(blocks_are_granules_size_)
    , index_granularity(index_granularity_)
    , compute_granularity(index_granularity.empty())
    , codec(std::move(codec_))
    , skip_indices(indices_to_recalc)
-    , with_final_mark(storage.settings.write_final_mark && storage.canUseAdaptiveGranularity())
+    , with_final_mark(storage.settings.write_final_mark && can_use_adaptive_granularity)
 {
    if (blocks_are_granules_size && !index_granularity.empty())
        throw Exception("Can't take information about index granularity from blocks, when non empty index_granularity array specified", ErrorCodes::LOGICAL_ERROR);
 }

-
 void IMergedBlockOutputStream::addStreams(
    const String & path,
    const String & name,
@ -145,7 +146,7 @@ void IMergedBlockOutputStream::fillIndexGranularity(const Block & block)
        blocks_are_granules_size,
        index_offset,
        index_granularity,
-        storage.canUseAdaptiveGranularity());
+        can_use_adaptive_granularity);
 }

 void IMergedBlockOutputStream::writeSingleMark(
@ -176,7 +177,7 @@ void IMergedBlockOutputStream::writeSingleMark(

         writeIntBinary(stream.plain_hashing.count(), stream.marks);
         writeIntBinary(stream.compressed.offset(), stream.marks);
-         if (storage.canUseAdaptiveGranularity())
+         if (can_use_adaptive_granularity)
             writeIntBinary(number_of_rows, stream.marks);
     }, path);
 }
@ -362,7 +363,7 @@ void IMergedBlockOutputStream::calculateAndSerializeSkipIndices(
                    writeIntBinary(stream.compressed.offset(), stream.marks);
                    /// Actually this numbers is redundant, but we have to store them
                    /// to be compatible with normal .mrk2 file format
-                    if (storage.canUseAdaptiveGranularity())
+                    if (can_use_adaptive_granularity)
                        writeIntBinary(1UL, stream.marks);

                    ++skip_index_current_mark;
--- a/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.h
+++ b/dbms/src/Storages/MergeTree/IMergedBlockOutputStream.h
@ -1,6 +1,7 @@
 #pragma once

 #include <Storages/MergeTree/MergeTreeIndexGranularity.h>
+#include <Storages/MergeTree/MergeTreeIndexGranularityInfo.h>
 #include <IO/WriteBufferFromFile.h>
 #include <Compression/CompressedWriteBuffer.h>
 #include <IO/HashingWriteBuffer.h>
@ -23,7 +24,8 @@ public:
        size_t aio_threshold_,
        bool blocks_are_granules_size_,
        const std::vector<MergeTreeIndexPtr> & indices_to_recalc,
-        const MergeTreeIndexGranularity & index_granularity_);
+        const MergeTreeIndexGranularity & index_granularity_,
+        const MergeTreeIndexGranularityInfo * index_granularity_info_ = nullptr);

    using WrittenOffsetColumns = std::set<std::string>;

@ -141,6 +143,7 @@ protected:
    size_t current_mark = 0;
    size_t skip_index_mark = 0;

+    const bool can_use_adaptive_granularity;
    const std::string marks_file_extension;
    const bool blocks_are_granules_size;

--- a/dbms/src/Storages/MergeTree/KeyCondition.cpp
+++ b/dbms/src/Storages/MergeTree/KeyCondition.cpp
@ -546,11 +546,13 @@ bool KeyCondition::tryPrepareSetIndex(
        }
    };

+    size_t left_args_count = 1;
    const auto * left_arg_tuple = left_arg->as<ASTFunction>();
    if (left_arg_tuple && left_arg_tuple->name == "tuple")
    {
        const auto & tuple_elements = left_arg_tuple->arguments->children;
-        for (size_t i = 0; i < tuple_elements.size(); ++i)
+        left_args_count = tuple_elements.size();
+        for (size_t i = 0; i < left_args_count; ++i)
            get_key_tuple_position_mapping(tuple_elements[i], i);
    }
    else
@ -577,6 +579,10 @@ bool KeyCondition::tryPrepareSetIndex(
    if (!prepared_set->hasExplicitSetElements())
        return false;

+    prepared_set->checkColumnsNumber(left_args_count);
+    for (size_t i = 0; i < indexes_mapping.size(); ++i)
+        prepared_set->checkTypesEqual(indexes_mapping[i].tuple_index, removeLowCardinality(data_types[i]));
+
    out.set_index = std::make_shared<MergeTreeSetIndex>(prepared_set->getSetElements(), std::move(indexes_mapping));

    return true;
--- a/dbms/src/Storages/MergeTree/MergeTreeData.cpp
+++ b/dbms/src/Storages/MergeTree/MergeTreeData.cpp
@ -1581,7 +1581,8 @@ void MergeTreeData::alterDataPart(
            true /* skip_offsets */,
            {},
            unused_written_offsets,
-            part->index_granularity);
+            part->index_granularity,
+            &part->index_granularity_info);

        in.readPrefix();
        out.writePrefix();
--- a/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp
+++ b/dbms/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp
@ -934,6 +934,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor
    new_data_part->relative_path = "tmp_mut_" + future_part.name;
    new_data_part->is_temp = true;
    new_data_part->ttl_infos = source_part->ttl_infos;
+    new_data_part->index_granularity_info = source_part->index_granularity_info;

    String new_part_tmp_path = new_data_part->getFullPath();

@ -1069,7 +1070,8 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mutatePartToTempor
            /* skip_offsets = */ false,
            std::vector<MergeTreeIndexPtr>(indices_to_recalc.begin(), indices_to_recalc.end()),
            unused_written_offsets,
-            source_part->index_granularity
+            source_part->index_granularity,
+            &source_part->index_granularity_info
        );

        in->readPrefix();
--- a/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.cpp
+++ b/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.cpp
@ -8,14 +8,16 @@ MergedColumnOnlyOutputStream::MergedColumnOnlyOutputStream(
    CompressionCodecPtr default_codec_, bool skip_offsets_,
    const std::vector<MergeTreeIndexPtr> & indices_to_recalc_,
    WrittenOffsetColumns & already_written_offset_columns_,
-    const MergeTreeIndexGranularity & index_granularity_)
+    const MergeTreeIndexGranularity & index_granularity_,
+    const MergeTreeIndexGranularityInfo * index_granularity_info_)
    : IMergedBlockOutputStream(
        storage_, part_path_, storage_.global_context.getSettings().min_compress_block_size,
        storage_.global_context.getSettings().max_compress_block_size, default_codec_,
        storage_.global_context.getSettings().min_bytes_to_use_direct_io,
        false,
        indices_to_recalc_,
-        index_granularity_),
+        index_granularity_,
+        index_granularity_info_),
    header(header_), sync(sync_), skip_offsets(skip_offsets_),
    already_written_offset_columns(already_written_offset_columns_)
 {
--- a/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.h
+++ b/dbms/src/Storages/MergeTree/MergedColumnOnlyOutputStream.h
@ -17,7 +17,8 @@ public:
        CompressionCodecPtr default_codec_, bool skip_offsets_,
        const std::vector<MergeTreeIndexPtr> & indices_to_recalc_,
        WrittenOffsetColumns & already_written_offset_columns_,
-        const MergeTreeIndexGranularity & index_granularity_);
+        const MergeTreeIndexGranularity & index_granularity_,
+        const MergeTreeIndexGranularityInfo * index_granularity_info_ = nullptr);

    Block getHeader() const override { return header; }
    void write(const Block & block) override;
--- a/dbms/src/Storages/StorageDistributed.cpp
+++ b/dbms/src/Storages/StorageDistributed.cpp
@ -176,6 +176,22 @@ IColumn::Selector createSelector(const ClusterPtr cluster, const ColumnWithTypeA
    throw Exception{"Sharding key expression does not evaluate to an integer type", ErrorCodes::TYPE_MISMATCH};
 }

+std::string makeFormattedListOfShards(const ClusterPtr & cluster)
+{
+    std::ostringstream os;
+
+    bool head = true;
+    os << "[";
+    for (const auto & shard_info : cluster->getShardsInfo())
+    {
+        (head ? os : os << ", ") << shard_info.shard_num;
+        head = false;
+    }
+    os << "]";
+
+    return os.str();
+}
+
 }


@ -311,11 +327,24 @@ BlockInputStreams StorageDistributed::read(
            header, processed_stage, QualifiedTableName{remote_database, remote_table}, context.getExternalTables());

    if (settings.optimize_skip_unused_shards)
+    {
+        if (has_sharding_key)
        {
            auto smaller_cluster = skipUnusedShards(cluster, query_info);

            if (smaller_cluster)
+            {
                cluster = smaller_cluster;
+                LOG_DEBUG(log, "Reading from " << database_name << "." << table_name << ": "
+                               "Skipping irrelevant shards - the query will be sent to the following shards of the cluster (shard numbers): "
+                               " " << makeFormattedListOfShards(cluster));
+            }
+            else
+            {
+                LOG_DEBUG(log, "Reading from " << database_name << "." << table_name << ": "
+                               "Unable to figure out irrelevant shards from WHERE/PREWHERE clauses - the query will be sent to all shards of the cluster");
+            }
+        }
    }

    return ClusterProxy::executeQuery(
@ -488,15 +517,32 @@ void StorageDistributed::ClusterNodeData::shutdownAndDropAllData()
 }

 /// Returns a new cluster with fewer shards if constant folding for `sharding_key_expr` is possible
-/// using constraints from "WHERE" condition, otherwise returns `nullptr`
+/// using constraints from "PREWHERE" and "WHERE" conditions, otherwise returns `nullptr`
 ClusterPtr StorageDistributed::skipUnusedShards(ClusterPtr cluster, const SelectQueryInfo & query_info)
 {
+    if (!has_sharding_key)
+    {
+        throw Exception("Internal error: cannot determine shards of a distributed table if no sharding expression is supplied", ErrorCodes::LOGICAL_ERROR);
+    }
+
    const auto & select = query_info.query->as<ASTSelectQuery &>();

-    if (!select.where() || !sharding_key_expr)
+    if (!select.prewhere() && !select.where())
+    {
        return nullptr;
+    }

-    const auto & blocks = evaluateExpressionOverConstantCondition(select.where(), sharding_key_expr);
+    ASTPtr condition_ast;
+    if (select.prewhere() && select.where())
+    {
+        condition_ast = makeASTFunction("and", select.prewhere()->clone(), select.where()->clone());
+    }
+    else
+    {
+        condition_ast = select.prewhere() ? select.prewhere()->clone() : select.where()->clone();
+    }
+
+    const auto blocks = evaluateExpressionOverConstantCondition(condition_ast, sharding_key_expr);

    // Can't get definite answer if we can skip any shards
    if (!blocks)
--- a/dbms/src/Storages/StorageReplicatedMergeTree.cpp
+++ b/dbms/src/Storages/StorageReplicatedMergeTree.cpp
@ -1956,10 +1956,37 @@ void StorageReplicatedMergeTree::cloneReplica(const String & source_replica, Coo
    }

    /// Add to the queue jobs to receive all the active parts that the reference/master replica has.
-    Strings parts = zookeeper->getChildren(source_path + "/parts");
-    ActiveDataPartSet active_parts_set(format_version, parts);
+    Strings source_replica_parts = zookeeper->getChildren(source_path + "/parts");
+    ActiveDataPartSet active_parts_set(format_version, source_replica_parts);

    Strings active_parts = active_parts_set.getParts();
+
+    /// Remove local parts if source replica does not have them, because such parts will never be fetched by other replicas.
+    Strings local_parts_in_zk = zookeeper->getChildren(replica_path + "/parts");
+    Strings parts_to_remove_from_zk;
+    for (const auto & part : local_parts_in_zk)
+    {
+        if (active_parts_set.getContainingPart(part).empty())
+        {
+            queue.remove(zookeeper, part);
+            parts_to_remove_from_zk.emplace_back(part);
+            LOG_WARNING(log, "Source replica does not have part " << part << ". Removing it from ZooKeeper.");
+        }
+    }
+    tryRemovePartsFromZooKeeperWithRetries(parts_to_remove_from_zk);
+
+    auto local_active_parts = getDataParts();
+    DataPartsVector parts_to_remove_from_working_set;
+    for (const auto & part : local_active_parts)
+    {
+        if (active_parts_set.getContainingPart(part->name).empty())
+        {
+            parts_to_remove_from_working_set.emplace_back(part);
+            LOG_WARNING(log, "Source replica does not have part " << part->name << ". Removing it from working set.");
+        }
+    }
+    removePartsFromWorkingSet(parts_to_remove_from_working_set, true);
+
    for (const String & name : active_parts)
    {
        LogEntry log_entry;
--- a/dbms/src/Storages/System/StorageSystemPartsBase.cpp
+++ b/dbms/src/Storages/System/StorageSystemPartsBase.cpp
@ -148,10 +148,9 @@ StoragesInfoStream::StoragesInfoStream(const SelectQueryInfo & query_info, const

 StoragesInfo StoragesInfoStream::next()
 {
-    StoragesInfo info;
-
    while (next_row < rows)
    {
+        StoragesInfo info;

        info.database = (*database_column)[next_row].get<String>();
        info.table = (*table_column)[next_row].get<String>();
@ -198,10 +197,10 @@ StoragesInfo StoragesInfoStream::next()
        if (!info.data)
            throw Exception("Unknown engine " + info.engine, ErrorCodes::LOGICAL_ERROR);

-        break;
+        return info;
    }

-    return info;
+    return {};
 }

 BlockInputStreams StorageSystemPartsBase::read(
--- a/dbms/tests/integration/test_adaptive_granularity/test.py
+++ b/dbms/tests/integration/test_adaptive_granularity/test.py
@ -288,6 +288,17 @@ def test_mixed_granularity_single_node(start_dynamic_cluster, node):

    node.exec_in_container(["bash", "-c", "find {p} -name '*.mrk' | grep '.*'".format(p=path_to_old_part)]) # check that we have non adaptive files

+    node.query("ALTER TABLE table_with_default_granularity UPDATE dummy = dummy + 1 WHERE 1")
+    # still works
+    assert node.query("SELECT count() from table_with_default_granularity") == '6\n'
+
+    node.query("ALTER TABLE table_with_default_granularity MODIFY COLUMN dummy String")
+    node.query("ALTER TABLE table_with_default_granularity ADD COLUMN dummy2 Float64")
+
+    #still works
+    assert node.query("SELECT count() from table_with_default_granularity") == '6\n'
+
+
 def test_version_update_two_nodes(start_dynamic_cluster):
    node11.query("INSERT INTO table_with_default_granularity VALUES (toDate('2018-10-01'), 1, 333), (toDate('2018-10-02'), 2, 444)")
    node12.query("SYSTEM SYNC REPLICA table_with_default_granularity")
--- a/dbms/tests/integration/test_consistent_parts_after_clone_replica/init.py
+++ b/dbms/tests/integration/test_consistent_parts_after_clone_replica/init.py
--- a/dbms/tests/integration/test_consistent_parts_after_clone_replica/configs/remote_servers.xml
+++ b/dbms/tests/integration/test_consistent_parts_after_clone_replica/configs/remote_servers.xml
@ -0,0 +1,19 @@
+<yandex>
+    <remote_servers>
+        <test_cluster>
+            <shard>
+                <internal_replication>true</internal_replication>
+                <replica>
+                    <default_database>shard_0</default_database>
+                    <host>node1</host>
+                    <port>9000</port>
+                </replica>
+                <replica>
+                    <default_database>shard_0</default_database>
+                    <host>node2</host>
+                    <port>9000</port>
+                </replica>
+            </shard>
+        </test_cluster>
+    </remote_servers>
+</yandex>
--- a/dbms/tests/integration/test_consistent_parts_after_clone_replica/test.py
+++ b/dbms/tests/integration/test_consistent_parts_after_clone_replica/test.py
@ -0,0 +1,60 @@
+import pytest
+
+from helpers.cluster import ClickHouseCluster
+from helpers.network import PartitionManager
+from helpers.test_tools import assert_eq_with_retry
+
+
+def fill_nodes(nodes, shard):
+    for node in nodes:
+        node.query(
+        '''
+        CREATE DATABASE test;
+        CREATE TABLE test_table(date Date, id UInt32)
+        ENGINE = ReplicatedMergeTree('/clickhouse/tables/test{shard}/replicated', '{replica}')
+        ORDER BY id PARTITION BY toYYYYMM(date) 
+        SETTINGS min_replicated_logs_to_keep=3, max_replicated_logs_to_keep=5, cleanup_delay_period=0, cleanup_delay_period_random_add=0;
+        '''.format(shard=shard, replica=node.name))
+
+
+cluster = ClickHouseCluster(__file__)
+node1 = cluster.add_instance('node1', main_configs=['configs/remote_servers.xml'], with_zookeeper=True)
+node2 = cluster.add_instance('node2', main_configs=['configs/remote_servers.xml'], with_zookeeper=True)
+
+@pytest.fixture(scope="module")
+def start_cluster():
+    try:
+        cluster.start()
+        fill_nodes([node1, node2], 1)
+        yield cluster
+    except Exception as ex:
+        print ex
+    finally:
+        cluster.shutdown()
+
+
+def test_inconsistent_parts_if_drop_while_replica_not_active(start_cluster):
+    with PartitionManager() as pm:
+        # insert into all replicas
+        for i in range(50):
+            node1.query("INSERT INTO test_table VALUES ('2019-08-16', {})".format(i))
+        assert_eq_with_retry(node2, "SELECT count(*) FROM test_table", node1.query("SELECT count(*) FROM test_table"))
+
+        # disable network on the first replica
+        pm.partition_instances(node1, node2)
+        pm.drop_instance_zk_connections(node1)
+
+        # drop all parts on the second replica
+        node2.query_with_retry("ALTER TABLE test_table DROP PARTITION 201908")
+        assert_eq_with_retry(node2, "SELECT count(*) FROM test_table", "0")
+
+        # insert into the second replica
+        # DROP_RANGE will be removed from the replication log and the first replica will be lost
+        for i in range(50):
+            node2.query("INSERT INTO test_table VALUES ('2019-08-16', {})".format(50 + i))
+
+        # the first replica will be cloned from the second
+        pm.heal_all()
+        assert_eq_with_retry(node1, "SELECT count(*) FROM test_table", node2.query("SELECT count(*) FROM test_table"))
+
+
--- a/dbms/tests/queries/0_stateless/00632_aggregation_window_funnel.reference
+++ b/dbms/tests/queries/0_stateless/00632_aggregation_window_funnel.reference
@ -16,3 +16,5 @@
 1
 1
 1
+1
+1
--- a/dbms/tests/queries/0_stateless/00632_aggregation_window_funnel.sql
+++ b/dbms/tests/queries/0_stateless/00632_aggregation_window_funnel.sql
@ -9,7 +9,6 @@ select 3 = windowFunnel(10000)(timestamp, event = 1000, event = 1001, event = 10
 select 4 = windowFunnel(10000)(timestamp, event = 1000, event = 1001, event = 1002, event = 1008) from funnel_test;


-
 select 1 = windowFunnel(1)(timestamp, event = 1000) from funnel_test;
 select 3 = windowFunnel(2)(timestamp, event = 1003, event = 1004, event = 1005, event = 1006, event = 1007) from funnel_test;
 select 4 = windowFunnel(3)(timestamp, event = 1003, event = 1004, event = 1005, event = 1006, event = 1007) from funnel_test;
@ -39,6 +38,16 @@ select 1 = windowFunnel(10000)(timestamp, event = 1008, event = 1001) from funne
 select 5 = windowFunnel(4)(timestamp, event = 1003, event = 1004, event = 1005, event = 1006, event = 1007) from funnel_test_u64;
 select 4 = windowFunnel(4)(timestamp, event <= 1007, event >= 1002, event <= 1006, event >= 1004) from funnel_test_u64;

+
+drop table if exists funnel_test_strict;
+create table funnel_test_strict (timestamp UInt32, event UInt32) engine=Memory;
+insert into funnel_test_strict values (00,1000),(10,1001),(20,1002),(30,1003),(40,1004),(50,1005),(51,1005),(60,1006),(70,1007),(80,1008);
+
+select 6 = windowFunnel(10000, 'strict')(timestamp, event = 1000, event = 1001, event = 1002, event = 1003, event = 1004, event = 1005, event = 1006) from funnel_test_strict;
+select 7 = windowFunnel(10000)(timestamp, event = 1000, event = 1001, event = 1002, event = 1003, event = 1004, event = 1005, event = 1006) from funnel_test_strict;
+
+
 drop table funnel_test;
 drop table funnel_test2;
 drop table funnel_test_u64;
+drop table funnel_test_strict;
--- a/dbms/tests/queries/0_stateless/00754_distributed_optimize_skip_select_on_unused_shards_with_prewhere.reference
+++ b/dbms/tests/queries/0_stateless/00754_distributed_optimize_skip_select_on_unused_shards_with_prewhere.reference
@ -0,0 +1,15 @@
+OK
+OK
+1
+OK
+0
+1
+4
+4
+2
+4
+OK
+OK
+OK
+OK
+OK
--- a/dbms/tests/queries/0_stateless/00754_distributed_optimize_skip_select_on_unused_shards_with_prewhere.sh
+++ b/dbms/tests/queries/0_stateless/00754_distributed_optimize_skip_select_on_unused_shards_with_prewhere.sh
@ -0,0 +1,100 @@
+#!/usr/bin/env bash
+
+CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
+. $CURDIR/../shell_config.sh
+
+${CLICKHOUSE_CLIENT} --query "DROP TABLE IF EXISTS distributed_00754;"
+${CLICKHOUSE_CLIENT} --query "DROP TABLE IF EXISTS mergetree_00754;"
+
+${CLICKHOUSE_CLIENT} --query "
+    CREATE TABLE mergetree_00754 (a Int64, b Int64, c String) ENGINE = MergeTree ORDER BY (a, b);
+"
+${CLICKHOUSE_CLIENT} --query "
+    CREATE TABLE distributed_00754 AS mergetree_00754
+    ENGINE = Distributed(test_unavailable_shard, ${CLICKHOUSE_DATABASE}, mergetree_00754, jumpConsistentHash(a+b, 2));
+"
+
+${CLICKHOUSE_CLIENT} --query "INSERT INTO mergetree_00754 VALUES (0, 0, 'Hello');"
+${CLICKHOUSE_CLIENT} --query "INSERT INTO mergetree_00754 VALUES (1, 0, 'World');"
+${CLICKHOUSE_CLIENT} --query "INSERT INTO mergetree_00754 VALUES (0, 1, 'Hello');"
+${CLICKHOUSE_CLIENT} --query "INSERT INTO mergetree_00754 VALUES (1, 1, 'World');"
+
+# Should fail because the second shard is unavailable
+${CLICKHOUSE_CLIENT} --query "SELECT count(*) FROM distributed_00754;" 2>&1 \
+| fgrep -q "All connection tries failed" && echo 'OK' || echo 'FAIL'
+
+# Should fail without setting `optimize_skip_unused_shards` = 1
+${CLICKHOUSE_CLIENT} --query "SELECT count(*) FROM distributed_00754 PREWHERE a = 0 AND b = 0;" 2>&1 \
+| fgrep -q "All connection tries failed" && echo 'OK' || echo 'FAIL'
+
+# Should pass now
+${CLICKHOUSE_CLIENT} -n --query="
+    SET optimize_skip_unused_shards = 1;
+    SELECT count(*) FROM distributed_00754 PREWHERE a = 0 AND b = 0;
+"
+
+
+# Should still fail because of matching unavailable shard
+${CLICKHOUSE_CLIENT} -n --query="
+    SET optimize_skip_unused_shards = 1;
+    SELECT count(*) FROM distributed_00754 PREWHERE a = 2 AND b = 2;
+" 2>&1 \ | fgrep -q "All connection tries failed" && echo 'OK' || echo 'FAIL'
+
+# Try more complex expressions for constant folding - all should pass.
+
+${CLICKHOUSE_CLIENT} -n --query="
+    SET optimize_skip_unused_shards = 1;
+    SELECT count(*) FROM distributed_00754 PREWHERE a = 1 AND a = 0 WHERE b = 0;
+"
+
+${CLICKHOUSE_CLIENT} -n --query="
+    SET optimize_skip_unused_shards = 1;
+    SELECT count(*) FROM distributed_00754 PREWHERE a = 1 WHERE b = 1 AND length(c) = 5;
+"
+
+${CLICKHOUSE_CLIENT} -n --query="
+    SET optimize_skip_unused_shards = 1;
+    SELECT count(*) FROM distributed_00754 PREWHERE a IN (0, 1) AND b IN (0, 1) WHERE c LIKE '%l%';
+"
+
+${CLICKHOUSE_CLIENT} -n --query="
+    SET optimize_skip_unused_shards = 1;
+    SELECT count(*) FROM distributed_00754 PREWHERE a IN (0, 1) WHERE b IN (0, 1) AND c LIKE '%l%';
+"
+
+${CLICKHOUSE_CLIENT} -n --query="
+    SET optimize_skip_unused_shards = 1;
+    SELECT count(*) FROM distributed_00754 PREWHERE a = 0 AND b = 0 OR a = 1 AND b = 1 WHERE c LIKE '%l%';
+"
+
+${CLICKHOUSE_CLIENT} -n --query="
+    SET optimize_skip_unused_shards = 1;
+    SELECT count(*) FROM distributed_00754 PREWHERE (a = 0 OR a = 1) WHERE (b = 0 OR b = 1);
+"
+
+# These should fail.
+
+${CLICKHOUSE_CLIENT} -n --query="
+    SET optimize_skip_unused_shards = 1;
+    SELECT count(*) FROM distributed_00754 PREWHERE a = 0 AND b <= 1;
+" 2>&1 \ | fgrep -q "All connection tries failed" && echo 'OK' || echo 'FAIL'
+
+${CLICKHOUSE_CLIENT} -n --query="
+    SET optimize_skip_unused_shards = 1;
+    SELECT count(*) FROM distributed_00754 PREWHERE a = 0 WHERE c LIKE '%l%';
+" 2>&1 \ | fgrep -q "All connection tries failed" && echo 'OK' || echo 'FAIL'
+
+${CLICKHOUSE_CLIENT} -n --query="
+    SET optimize_skip_unused_shards = 1;
+    SELECT count(*) FROM distributed_00754 PREWHERE a = 0 OR a = 1 AND b = 0;
+" 2>&1 \ | fgrep -q "All connection tries failed" && echo 'OK' || echo 'FAIL'
+
+${CLICKHOUSE_CLIENT} -n --query="
+    SET optimize_skip_unused_shards = 1;
+    SELECT count(*) FROM distributed_00754 PREWHERE a = 0 AND b = 0 OR a = 2 AND b = 2;
+" 2>&1 \ | fgrep -q "All connection tries failed" && echo 'OK' || echo 'FAIL'
+
+${CLICKHOUSE_CLIENT} -n --query="
+    SET optimize_skip_unused_shards = 1;
+    SELECT count(*) FROM distributed_00754 PREWHERE a = 0 AND b = 0 OR c LIKE '%l%';
+" 2>&1 \ | fgrep -q "All connection tries failed" && echo 'OK' || echo 'FAIL'
--- a/dbms/tests/queries/0_stateless/00979_quantileExcatExclusive_and_Inclusive.reference
+++ b/dbms/tests/queries/0_stateless/00979_quantileExcatExclusive_and_Inclusive.reference
@ -0,0 +1,6 @@
+[249.25,499.5,749.75,899.9,949.9499999999999,989.99,998.999]
+[249.75,499.5,749.25,899.1,949.05,989.01,998.001]
+[250,500,750,900,950,990,999]
+599.6
+599.4
+600
--- a/dbms/tests/queries/0_stateless/00979_quantileExcatExclusive_and_Inclusive.sql
+++ b/dbms/tests/queries/0_stateless/00979_quantileExcatExclusive_and_Inclusive.sql
@ -0,0 +1,12 @@
+DROP TABLE IF EXISTS num;
+CREATE TABLE num AS numbers(1000);
+
+SELECT quantilesExactExclusive(0.25, 0.5, 0.75, 0.9, 0.95, 0.99, 0.999)(x) FROM (SELECT number AS x FROM num);
+SELECT quantilesExactInclusive(0.25, 0.5, 0.75, 0.9, 0.95, 0.99, 0.999)(x) FROM (SELECT number AS x FROM num);
+SELECT quantilesExact(0.25, 0.5, 0.75, 0.9, 0.95, 0.99, 0.999)(x) FROM (SELECT number AS x FROM num);
+
+SELECT quantileExactExclusive(0.6)(x) FROM (SELECT number AS x FROM num);
+SELECT quantileExactInclusive(0.6)(x) FROM (SELECT number AS x FROM num);
+SELECT quantileExact(0.6)(x) FROM (SELECT number AS x FROM num);
+
+DROP TABLE num;
--- a/dbms/tests/queries/0_stateless/00981_in_subquery_with_tuple.reference
+++ b/dbms/tests/queries/0_stateless/00981_in_subquery_with_tuple.reference
@ -0,0 +1,7 @@
+OK1
+OK2
+OK3
+OK4
+OK5
+2019-08-11	world
+2019-08-12	hello
--- a/dbms/tests/queries/0_stateless/00981_in_subquery_with_tuple.sh
+++ b/dbms/tests/queries/0_stateless/00981_in_subquery_with_tuple.sh
@ -0,0 +1,21 @@
+#!/usr/bin/env bash
+
+CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
+. $CURDIR/../shell_config.sh
+
+$CLICKHOUSE_CLIENT --query="DROP TABLE IF EXISTS bug";
+$CLICKHOUSE_CLIENT --query="CREATE TABLE bug (d Date, s String) ENGINE = MergeTree(d, s, 8192)";
+$CLICKHOUSE_CLIENT --query="INSERT INTO bug VALUES ('2019-08-09', 'hello'), ('2019-08-10', 'world'), ('2019-08-11', 'world'), ('2019-08-12', 'hello')";
+
+#SET force_primary_key = 1;
+
+$CLICKHOUSE_CLIENT --query="SELECT * FROM bug WHERE (s, d) IN (SELECT (s, max(d)) FROM bug GROUP BY s) ORDER BY d" 2>&1 | grep "Number of columns in section IN doesn't match" > /dev/null && echo "OK1";
+$CLICKHOUSE_CLIENT --query="SELECT * FROM bug WHERE (s, d, s) IN (SELECT s, max(d) FROM bug GROUP BY s)" 2>&1 | grep "Number of columns in section IN doesn't match" > /dev/null && echo "OK2";
+$CLICKHOUSE_CLIENT --query="SELECT * FROM bug WHERE (s, d) IN (SELECT s, max(d), s FROM bug GROUP BY s)" 2>&1 | grep "Number of columns in section IN doesn't match" > /dev/null && echo "OK3";
+
+$CLICKHOUSE_CLIENT --query="SELECT * FROM bug WHERE (s, toDateTime(d)) IN (SELECT s, max(d) FROM bug GROUP BY s)" 2>&1 | grep "Types of column 2 in section IN don't match" > /dev/null && echo "OK4";
+$CLICKHOUSE_CLIENT --query="SELECT * FROM bug WHERE (s, d) IN (SELECT s, toDateTime(max(d)) FROM bug GROUP BY s)" 2>&1 | grep "Types of column 2 in section IN don't match" > /dev/null && echo "OK5";
+
+$CLICKHOUSE_CLIENT --query="SELECT * FROM bug WHERE (s, d) IN (SELECT s, max(d) FROM bug GROUP BY s) ORDER BY d";
+
+$CLICKHOUSE_CLIENT --query="DROP TABLE bug";
--- a/docker/images.json
+++ b/docker/images.json
@ -7,9 +7,9 @@
    "docker/test/performance": "yandex/clickhouse-performance-test",
    "docker/test/pvs": "yandex/clickhouse-pvs-test",
    "docker/test/stateful": "yandex/clickhouse-stateful-test",
-    "docker/test/stateful_with_coverage": "yandex/clickhouse-stateful-with-coverage-test",
+    "docker/test/stateful_with_coverage": "yandex/clickhouse-stateful-test-with-coverage",
    "docker/test/stateless": "yandex/clickhouse-stateless-test",
-    "docker/test/stateless_with_coverage": "yandex/clickhouse-stateless-with-coverage-test",
+    "docker/test/stateless_with_coverage": "yandex/clickhouse-stateless-test-with-coverage",
    "docker/test/unit": "yandex/clickhouse-unit-test",
    "docker/test/stress": "yandex/clickhouse-stress-test",
    "dbms/tests/integration/image": "yandex/clickhouse-integration-tests-runner"
--- a/docker/test/stateful_with_coverage/Dockerfile
+++ b/docker/test/stateful_with_coverage/Dockerfile
@ -1,7 +1,7 @@
 # docker build -t yandex/clickhouse-stateful-test .
 FROM yandex/clickhouse-stateless-test

-RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic main" >> /etc/apt/sources.list
+RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic-9 main" >> /etc/apt/sources.list

 RUN apt-get update -y \
    && env DEBIAN_FRONTEND=noninteractive \
--- a/docker/test/stateless_with_coverage/Dockerfile
+++ b/docker/test/stateless_with_coverage/Dockerfile
@ -1,7 +1,7 @@
 # docker build -t yandex/clickhouse-stateless-with-coverage-test .
 FROM yandex/clickhouse-deb-builder

-RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic main" >> /etc/apt/sources.list
+RUN echo "deb [trusted=yes] http://apt.llvm.org/bionic/ llvm-toolchain-bionic-9 main" >> /etc/apt/sources.list

 RUN apt-get update -y \
    && env DEBIAN_FRONTEND=noninteractive \
--- a/docs/en/roadmap.md
+++ b/docs/en/roadmap.md
@ -1,15 +1,14 @@
 # Roadmap

-## Q2 2019
+## Q3 2019

 - DDL for dictionaries
 - Integration with S3-like object stores
 - Multiple storages for hot/cold data, JBOD support

-## Q3 2019
+## Q4 2019

- JOIN execution improvements:
-    - Distributed join not limited by memory
+- JOIN not limited by available memory
 - Resource pools for more precise distribution of cluster capacity between users
 - Fine-grained authorization
 - Integration with external authentication services