diff --git a/CHANGELOG.md b/CHANGELOG.md index 2ab006bcdd3..e5d1c90bf22 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,11 +1,68 @@ ## ClickHouse release v20.3 +### ClickHouse release v20.3.7.46, 2020-04-17 + +#### Bug Fix + +* Fix `Logical error: CROSS JOIN has expressions` error for queries with comma and names joins mix. [#10311](https://github.com/ClickHouse/ClickHouse/pull/10311) ([Artem Zuikov](https://github.com/4ertus2)). +* Fix queries with `max_bytes_before_external_group_by`. [#10302](https://github.com/ClickHouse/ClickHouse/pull/10302) ([Artem Zuikov](https://github.com/4ertus2)). +* Fix move-to-prewhere optimization in presense of arrayJoin functions (in certain cases). This fixes [#10092](https://github.com/ClickHouse/ClickHouse/issues/10092). [#10195](https://github.com/ClickHouse/ClickHouse/pull/10195) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Add the ability to relax the restriction on non-deterministic functions usage in mutations with `allow_nondeterministic_mutations` setting. [#10186](https://github.com/ClickHouse/ClickHouse/pull/10186) ([filimonov](https://github.com/filimonov)). + +### ClickHouse release v20.3.6.40, 2020-04-16 + +#### New Feature + +* Added function `isConstant`. This function checks whether its argument is constant expression and returns 1 or 0. It is intended for development, debugging and demonstration purposes. [#10198](https://github.com/ClickHouse/ClickHouse/pull/10198) ([alexey-milovidov](https://github.com/alexey-milovidov)). + +#### Bug Fix + +* Fix error `Pipeline stuck` with `max_rows_to_group_by` and `group_by_overflow_mode = 'break'`. [#10279](https://github.com/ClickHouse/ClickHouse/pull/10279) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix rare possible exception `Cannot drain connections: cancel first`. [#10239](https://github.com/ClickHouse/ClickHouse/pull/10239) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fixed bug where ClickHouse would throw "Unknown function lambda." error message when user tries to run ALTER UPDATE/DELETE on tables with ENGINE = Replicated*. Check for nondeterministic functions now handles lambda expressions correctly. [#10237](https://github.com/ClickHouse/ClickHouse/pull/10237) ([Alexander Kazakov](https://github.com/Akazz)). +* Fixed "generateRandom" function for Date type. This fixes [#9973](https://github.com/ClickHouse/ClickHouse/issues/9973). Fix an edge case when dates with year 2106 are inserted to MergeTree tables with old-style partitioning but partitions are named with year 1970. [#10218](https://github.com/ClickHouse/ClickHouse/pull/10218) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Convert types if the table definition of a View does not correspond to the SELECT query. This fixes [#10180](https://github.com/ClickHouse/ClickHouse/issues/10180) and [#10022](https://github.com/ClickHouse/ClickHouse/issues/10022). [#10217](https://github.com/ClickHouse/ClickHouse/pull/10217) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Fix `parseDateTimeBestEffort` for strings in RFC-2822 when day of week is Tuesday or Thursday. This fixes [#10082](https://github.com/ClickHouse/ClickHouse/issues/10082). [#10214](https://github.com/ClickHouse/ClickHouse/pull/10214) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Fix column names of constants inside JOIN that may clash with names of constants outside of JOIN. [#10207](https://github.com/ClickHouse/ClickHouse/pull/10207) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Fix possible inifinite query execution when the query actually should stop on LIMIT, while reading from infinite source like `system.numbers` or `system.zeros`. [#10206](https://github.com/ClickHouse/ClickHouse/pull/10206) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix using the current database for access checking when the database isn't specified. [#10192](https://github.com/ClickHouse/ClickHouse/pull/10192) ([Vitaly Baranov](https://github.com/vitlibar)). +* Convert blocks if structure does not match on INSERT into Distributed(). [#10135](https://github.com/ClickHouse/ClickHouse/pull/10135) ([Azat Khuzhin](https://github.com/azat)). +* Fix possible incorrect result for extremes in processors pipeline. [#10131](https://github.com/ClickHouse/ClickHouse/pull/10131) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix some kinds of alters with compact parts. [#10130](https://github.com/ClickHouse/ClickHouse/pull/10130) ([Anton Popov](https://github.com/CurtizJ)). +* Fix incorrect `index_granularity_bytes` check while creating new replica. Fixes [#10098](https://github.com/ClickHouse/ClickHouse/issues/10098). [#10121](https://github.com/ClickHouse/ClickHouse/pull/10121) ([alesapin](https://github.com/alesapin)). +* Fix SIGSEGV on INSERT into Distributed table when its structure differs from the underlying tables. [#10105](https://github.com/ClickHouse/ClickHouse/pull/10105) ([Azat Khuzhin](https://github.com/azat)). +* Fix possible rows loss for queries with `JOIN` and `UNION ALL`. Fixes [#9826](https://github.com/ClickHouse/ClickHouse/issues/9826), [#10113](https://github.com/ClickHouse/ClickHouse/issues/10113). [#10099](https://github.com/ClickHouse/ClickHouse/pull/10099) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fixed replicated tables startup when updating from an old ClickHouse version where `/table/replicas/replica_name/metadata` node doesn't exist. Fixes [#10037](https://github.com/ClickHouse/ClickHouse/issues/10037). [#10095](https://github.com/ClickHouse/ClickHouse/pull/10095) ([alesapin](https://github.com/alesapin)). +* Add some arguments check and support identifier arguments for MySQL Database Engine. [#10077](https://github.com/ClickHouse/ClickHouse/pull/10077) ([Winter Zhang](https://github.com/zhang2014)). +* Fix bug in clickhouse dictionary source from localhost clickhouse server. The bug may lead to memory corruption if types in dictionary and source are not compatible. [#10071](https://github.com/ClickHouse/ClickHouse/pull/10071) ([alesapin](https://github.com/alesapin)). +* Fix bug in `CHECK TABLE` query when table contain skip indices. [#10068](https://github.com/ClickHouse/ClickHouse/pull/10068) ([alesapin](https://github.com/alesapin)). +* Fix error `Cannot clone block with columns because block has 0 columns ... While executing GroupingAggregatedTransform`. It happened when setting `distributed_aggregation_memory_efficient` was enabled, and distributed query read aggregating data with different level from different shards (mixed single and two level aggregation). [#10063](https://github.com/ClickHouse/ClickHouse/pull/10063) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix a segmentation fault that could occur in GROUP BY over string keys containing trailing zero bytes ([#8636](https://github.com/ClickHouse/ClickHouse/issues/8636), [#8925](https://github.com/ClickHouse/ClickHouse/issues/8925)). [#10025](https://github.com/ClickHouse/ClickHouse/pull/10025) ([Alexander Kuzmenkov](https://github.com/akuzm)). +* Fix parallel distributed INSERT SELECT for remote table. This PR fixes the solution provided in [#9759](https://github.com/ClickHouse/ClickHouse/pull/9759). [#9999](https://github.com/ClickHouse/ClickHouse/pull/9999) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix the number of threads used for remote query execution (performance regression, since 20.3). This happened when query from `Distributed` table was executed simultaneously on local and remote shards. Fixes [#9965](https://github.com/ClickHouse/ClickHouse/issues/9965). [#9971](https://github.com/ClickHouse/ClickHouse/pull/9971) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix bug in which the necessary tables weren't retrieved at one of the processing stages of queries to some databases. Fixes [#9699](https://github.com/ClickHouse/ClickHouse/issues/9699). [#9949](https://github.com/ClickHouse/ClickHouse/pull/9949) ([achulkov2](https://github.com/achulkov2)). +* Fix 'Not found column in block' error when `JOIN` appears with `TOTALS`. Fixes [#9839](https://github.com/ClickHouse/ClickHouse/issues/9839). [#9939](https://github.com/ClickHouse/ClickHouse/pull/9939) ([Artem Zuikov](https://github.com/4ertus2)). +* Fix a bug with `ON CLUSTER` DDL queries freezing on server startup. [#9927](https://github.com/ClickHouse/ClickHouse/pull/9927) ([Gagan Arneja](https://github.com/garneja)). +* Fix parsing multiple hosts set in the CREATE USER command, e.g. `CREATE USER user6 HOST NAME REGEXP 'lo.?*host', NAME REGEXP 'lo*host'`. [#9924](https://github.com/ClickHouse/ClickHouse/pull/9924) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix `TRUNCATE` for Join table engine ([#9917](https://github.com/ClickHouse/ClickHouse/issues/9917)). [#9920](https://github.com/ClickHouse/ClickHouse/pull/9920) ([Amos Bird](https://github.com/amosbird)). +* Fix "scalar doesn't exist" error in ALTERs ([#9878](https://github.com/ClickHouse/ClickHouse/issues/9878)). [#9904](https://github.com/ClickHouse/ClickHouse/pull/9904) ([Amos Bird](https://github.com/amosbird)). +* Fix race condition between drop and optimize in `ReplicatedMergeTree`. [#9901](https://github.com/ClickHouse/ClickHouse/pull/9901) ([alesapin](https://github.com/alesapin)). +* Fix error with qualified names in `distributed_product_mode='local'`. Fixes [#4756](https://github.com/ClickHouse/ClickHouse/issues/4756). [#9891](https://github.com/ClickHouse/ClickHouse/pull/9891) ([Artem Zuikov](https://github.com/4ertus2)). +* Fix calculating grants for introspection functions from the setting 'allow_introspection_functions'. [#9840](https://github.com/ClickHouse/ClickHouse/pull/9840) ([Vitaly Baranov](https://github.com/vitlibar)). + +#### Build/Testing/Packaging Improvement + +* Fix integration test `test_settings_constraints`. [#9962](https://github.com/ClickHouse/ClickHouse/pull/9962) ([Vitaly Baranov](https://github.com/vitlibar)). +* Removed dependency on `clock_getres`. [#9833](https://github.com/ClickHouse/ClickHouse/pull/9833) ([alexey-milovidov](https://github.com/alexey-milovidov)). + + ### ClickHouse release v20.3.5.21, 2020-03-27 #### Bug Fix * Fix 'Different expressions with the same alias' error when query has PREWHERE and WHERE on distributed table and `SET distributed_product_mode = 'local'`. [#9871](https://github.com/ClickHouse/ClickHouse/pull/9871) ([Artem Zuikov](https://github.com/4ertus2)). * Fix mutations excessive memory consumption for tables with a composite primary key. This fixes [#9850](https://github.com/ClickHouse/ClickHouse/issues/9850). [#9860](https://github.com/ClickHouse/ClickHouse/pull/9860) ([alesapin](https://github.com/alesapin)). +* For INSERT queries shard now clamps the settings got from the initiator to the shard's constaints instead of throwing an exception. This fix allows to send INSERT queries to a shard with another constraints. This change improves fix [#9447](https://github.com/ClickHouse/ClickHouse/issues/9447). [#9852](https://github.com/ClickHouse/ClickHouse/pull/9852) ([Vitaly Baranov](https://github.com/vitlibar)). * Fix 'COMMA to CROSS JOIN rewriter is not enabled or cannot rewrite query' error in case of subqueries with COMMA JOIN out of tables lists (i.e. in WHERE). Fixes [#9782](https://github.com/ClickHouse/ClickHouse/issues/9782). [#9830](https://github.com/ClickHouse/ClickHouse/pull/9830) ([Artem Zuikov](https://github.com/4ertus2)). * Fix possible exception `Got 0 in totals chunk, expected 1` on client. It happened for queries with `JOIN` in case if right joined table had zero rows. Example: `select * from system.one t1 join system.one t2 on t1.dummy = t2.dummy limit 0 FORMAT TabSeparated;`. Fixes [#9777](https://github.com/ClickHouse/ClickHouse/issues/9777). [#9823](https://github.com/ClickHouse/ClickHouse/pull/9823) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). * Fix SIGSEGV with optimize_skip_unused_shards when type cannot be converted. [#9804](https://github.com/ClickHouse/ClickHouse/pull/9804) ([Azat Khuzhin](https://github.com/azat)). @@ -273,6 +330,55 @@ ## ClickHouse release v20.1 +### ClickHouse release v20.1.10.70, 2020-04-17 + +#### Bug Fix + +* Fix rare possible exception `Cannot drain connections: cancel first`. [#10239](https://github.com/ClickHouse/ClickHouse/pull/10239) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fixed bug where ClickHouse would throw `'Unknown function lambda.'` error message when user tries to run `ALTER UPDATE/DELETE` on tables with `ENGINE = Replicated*`. Check for nondeterministic functions now handles lambda expressions correctly. [#10237](https://github.com/ClickHouse/ClickHouse/pull/10237) ([Alexander Kazakov](https://github.com/Akazz)). +* Fix `parseDateTimeBestEffort` for strings in RFC-2822 when day of week is Tuesday or Thursday. This fixes [#10082](https://github.com/ClickHouse/ClickHouse/issues/10082). [#10214](https://github.com/ClickHouse/ClickHouse/pull/10214) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Fix column names of constants inside `JOIN` that may clash with names of constants outside of `JOIN`. [#10207](https://github.com/ClickHouse/ClickHouse/pull/10207) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Fix possible inifinite query execution when the query actually should stop on LIMIT, while reading from infinite source like `system.numbers` or `system.zeros`. [#10206](https://github.com/ClickHouse/ClickHouse/pull/10206) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix move-to-prewhere optimization in presense of `arrayJoin` functions (in certain cases). This fixes [#10092](https://github.com/ClickHouse/ClickHouse/issues/10092). [#10195](https://github.com/ClickHouse/ClickHouse/pull/10195) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Add the ability to relax the restriction on non-deterministic functions usage in mutations with `allow_nondeterministic_mutations` setting. [#10186](https://github.com/ClickHouse/ClickHouse/pull/10186) ([filimonov](https://github.com/filimonov)). +* Convert blocks if structure does not match on `INSERT` into table with `Distributed` engine. [#10135](https://github.com/ClickHouse/ClickHouse/pull/10135) ([Azat Khuzhin](https://github.com/azat)). +* Fix `SIGSEGV` on `INSERT` into `Distributed` table when its structure differs from the underlying tables. [#10105](https://github.com/ClickHouse/ClickHouse/pull/10105) ([Azat Khuzhin](https://github.com/azat)). +* Fix possible rows loss for queries with `JOIN` and `UNION ALL`. Fixes [#9826](https://github.com/ClickHouse/ClickHouse/issues/9826), [#10113](https://github.com/ClickHouse/ClickHouse/issues/10113). [#10099](https://github.com/ClickHouse/ClickHouse/pull/10099) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Add arguments check and support identifier arguments for MySQL Database Engine. [#10077](https://github.com/ClickHouse/ClickHouse/pull/10077) ([Winter Zhang](https://github.com/zhang2014)). +* Fix bug in clickhouse dictionary source from localhost clickhouse server. The bug may lead to memory corruption if types in dictionary and source are not compatible. [#10071](https://github.com/ClickHouse/ClickHouse/pull/10071) ([alesapin](https://github.com/alesapin)). +* Fix error `Cannot clone block with columns because block has 0 columns ... While executing GroupingAggregatedTransform`. It happened when setting `distributed_aggregation_memory_efficient` was enabled, and distributed query read aggregating data with different level from different shards (mixed single and two level aggregation). [#10063](https://github.com/ClickHouse/ClickHouse/pull/10063) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix a segmentation fault that could occur in `GROUP BY` over string keys containing trailing zero bytes ([#8636](https://github.com/ClickHouse/ClickHouse/issues/8636), [#8925](https://github.com/ClickHouse/ClickHouse/issues/8925)). [#10025](https://github.com/ClickHouse/ClickHouse/pull/10025) ([Alexander Kuzmenkov](https://github.com/akuzm)). +* Fix bug in which the necessary tables weren't retrieved at one of the processing stages of queries to some databases. Fixes [#9699](https://github.com/ClickHouse/ClickHouse/issues/9699). [#9949](https://github.com/ClickHouse/ClickHouse/pull/9949) ([achulkov2](https://github.com/achulkov2)). +* Fix `'Not found column in block'` error when `JOIN` appears with `TOTALS`. Fixes [#9839](https://github.com/ClickHouse/ClickHouse/issues/9839). [#9939](https://github.com/ClickHouse/ClickHouse/pull/9939) ([Artem Zuikov](https://github.com/4ertus2)). +* Fix a bug with `ON CLUSTER` DDL queries freezing on server startup. [#9927](https://github.com/ClickHouse/ClickHouse/pull/9927) ([Gagan Arneja](https://github.com/garneja)). +* Fix `TRUNCATE` for Join table engine ([#9917](https://github.com/ClickHouse/ClickHouse/issues/9917)). [#9920](https://github.com/ClickHouse/ClickHouse/pull/9920) ([Amos Bird](https://github.com/amosbird)). +* Fix `'scalar doesn't exist'` error in ALTER queries ([#9878](https://github.com/ClickHouse/ClickHouse/issues/9878)). [#9904](https://github.com/ClickHouse/ClickHouse/pull/9904) ([Amos Bird](https://github.com/amosbird)). +* Fix race condition between drop and optimize in `ReplicatedMergeTree`. [#9901](https://github.com/ClickHouse/ClickHouse/pull/9901) ([alesapin](https://github.com/alesapin)). +* Fixed `DeleteOnDestroy` logic in `ATTACH PART` which could lead to automatic removal of attached part and added few tests. [#9410](https://github.com/ClickHouse/ClickHouse/pull/9410) ([Vladimir Chebotarev](https://github.com/excitoon)). + +#### Build/Testing/Packaging Improvement + +* Fix unit test `collapsing_sorted_stream`. [#9367](https://github.com/ClickHouse/ClickHouse/pull/9367) ([Deleted user](https://github.com/ghost)). + +### ClickHouse release v20.1.9.54, 2020-03-28 + +#### Bug Fix + +* Fix `'Different expressions with the same alias'` error when query has `PREWHERE` and `WHERE` on distributed table and `SET distributed_product_mode = 'local'`. [#9871](https://github.com/ClickHouse/ClickHouse/pull/9871) ([Artem Zuikov](https://github.com/4ertus2)). +* Fix mutations excessive memory consumption for tables with a composite primary key. This fixes [#9850](https://github.com/ClickHouse/ClickHouse/issues/9850). [#9860](https://github.com/ClickHouse/ClickHouse/pull/9860) ([alesapin](https://github.com/alesapin)). +* For INSERT queries shard now clamps the settings got from the initiator to the shard's constaints instead of throwing an exception. This fix allows to send `INSERT` queries to a shard with another constraints. This change improves fix [#9447](https://github.com/ClickHouse/ClickHouse/issues/9447). [#9852](https://github.com/ClickHouse/ClickHouse/pull/9852) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix possible exception `Got 0 in totals chunk, expected 1` on client. It happened for queries with `JOIN` in case if right joined table had zero rows. Example: `select * from system.one t1 join system.one t2 on t1.dummy = t2.dummy limit 0 FORMAT TabSeparated;`. Fixes [#9777](https://github.com/ClickHouse/ClickHouse/issues/9777). [#9823](https://github.com/ClickHouse/ClickHouse/pull/9823) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix `SIGSEGV` with `optimize_skip_unused_shards` when type cannot be converted. [#9804](https://github.com/ClickHouse/ClickHouse/pull/9804) ([Azat Khuzhin](https://github.com/azat)). +* Fixed a few cases when timezone of the function argument wasn't used properly. [#9574](https://github.com/ClickHouse/ClickHouse/pull/9574) ([Vasily Nemkov](https://github.com/Enmk)). + +#### Improvement + +* Remove `ORDER BY` stage from mutations because we read from a single ordered part in a single thread. Also add check that the order of rows in mutation is ordered in sorting key order and this order is not violated. [#9886](https://github.com/ClickHouse/ClickHouse/pull/9886) ([alesapin](https://github.com/alesapin)). + +#### Build/Testing/Packaging Improvement + +* Clean up duplicated linker flags. Make sure the linker won't look up an unexpected symbol. [#9433](https://github.com/ClickHouse/ClickHouse/pull/9433) ([Amos Bird](https://github.com/amosbird)). + ### ClickHouse release v20.1.8.41, 2020-03-20 #### Bug Fix diff --git a/README.md b/README.md index 955f9d1a5d1..a7a4cb97b2c 100644 --- a/README.md +++ b/README.md @@ -15,5 +15,7 @@ ClickHouse is an open-source column-oriented database management system that all ## Upcoming Events +* [ClickHouse Online Meetup West (in English)](https://www.eventbrite.com/e/clickhouse-online-meetup-registration-102886791162) on April 24, 2020. +* [ClickHouse Online Meetup East (in English)](https://www.eventbrite.com/e/clickhouse-online-meetup-east-registration-102989325846) on April 28, 2020. * [ClickHouse Workshop in Novosibirsk](https://2020.codefest.ru/lecture/1628) on TBD date. * [Yandex C++ Open-Source Sprints in Moscow](https://events.yandex.ru/events/otkrytyj-kod-v-yandek-28-03-2020) on TBD date. diff --git a/base/loggers/OwnPatternFormatter.cpp b/base/loggers/OwnPatternFormatter.cpp index 1f918f01697..029d06ff949 100644 --- a/base/loggers/OwnPatternFormatter.cpp +++ b/base/loggers/OwnPatternFormatter.cpp @@ -75,7 +75,11 @@ void OwnPatternFormatter::formatExtended(const DB::ExtendedLogMessage & msg_ext, if (color) writeCString(resetColor(), wb); writeCString("> ", wb); + if (color) + writeString(setColor(std::hash()(msg.getSource())), wb); DB::writeString(msg.getSource(), wb); + if (color) + writeCString(resetColor(), wb); writeCString(": ", wb); DB::writeString(msg.getText(), wb); } diff --git a/cmake/freebsd/default_libs.cmake b/cmake/freebsd/default_libs.cmake index 2bb76c6a761..d60df52bc6d 100644 --- a/cmake/freebsd/default_libs.cmake +++ b/cmake/freebsd/default_libs.cmake @@ -4,7 +4,11 @@ if (NOT COMPILER_CLANG) message (FATAL_ERROR "FreeBSD build is supported only for Clang") endif () -execute_process (COMMAND ${CMAKE_CXX_COMPILER} --print-file-name=libclang_rt.builtins-${CMAKE_SYSTEM_PROCESSOR}.a OUTPUT_VARIABLE BUILTINS_LIBRARY OUTPUT_STRIP_TRAILING_WHITESPACE) +if (${CMAKE_SYSTEM_PROCESSOR} STREQUAL "amd64") + execute_process (COMMAND ${CMAKE_CXX_COMPILER} --print-file-name=libclang_rt.builtins-x86_64.a OUTPUT_VARIABLE BUILTINS_LIBRARY OUTPUT_STRIP_TRAILING_WHITESPACE) +else () + execute_process (COMMAND ${CMAKE_CXX_COMPILER} --print-file-name=libclang_rt.builtins-${CMAKE_SYSTEM_PROCESSOR}.a OUTPUT_VARIABLE BUILTINS_LIBRARY OUTPUT_STRIP_TRAILING_WHITESPACE) +endif () set (DEFAULT_LIBS "${DEFAULT_LIBS} ${BUILTINS_LIBRARY} ${COVERAGE_OPTION} -lc -lm -lrt -lpthread") diff --git a/contrib/poco b/contrib/poco index ddca76ba495..7d605a1ae5d 160000 --- a/contrib/poco +++ b/contrib/poco @@ -1 +1 @@ -Subproject commit ddca76ba4956cb57150082394536cc43ff28f6fa +Subproject commit 7d605a1ae5d878294f91f68feb62ae51e9a04426 diff --git a/docker/test/performance-comparison/compare.sh b/docker/test/performance-comparison/compare.sh index bf48fe467ca..a7997bdc1f6 100755 --- a/docker/test/performance-comparison/compare.sh +++ b/docker/test/performance-comparison/compare.sh @@ -2,7 +2,7 @@ set -ex set -o pipefail trap "exit" INT TERM -trap "kill $(jobs -pr) ||:" EXIT +trap 'kill $(jobs -pr) ||:' EXIT stage=${stage:-} script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" @@ -18,22 +18,22 @@ function configure sed -i 's/9000/9002/g' right/config/config.xml # Start a temporary server to rename the tables - while killall clickhouse; do echo . ; sleep 1 ; done + while killall clickhouse-server; do echo . ; sleep 1 ; done echo all killed set -m # Spawn temporary in its own process groups - left/clickhouse server --config-file=left/config/config.xml -- --path db0 &> setup-server-log.log & + left/clickhouse-server --config-file=left/config/config.xml -- --path db0 &> setup-server-log.log & left_pid=$! kill -0 $left_pid disown $left_pid set +m - while ! left/clickhouse client --port 9001 --query "select 1" ; do kill -0 $left_pid ; echo . ; sleep 1 ; done + while ! clickhouse-client --port 9001 --query "select 1" ; do kill -0 $left_pid ; echo . ; sleep 1 ; done echo server for setup started - left/clickhouse client --port 9001 --query "create database test" ||: - left/clickhouse client --port 9001 --query "rename table datasets.hits_v1 to test.hits" ||: + clickhouse-client --port 9001 --query "create database test" ||: + clickhouse-client --port 9001 --query "rename table datasets.hits_v1 to test.hits" ||: - while killall clickhouse; do echo . ; sleep 1 ; done + while killall clickhouse-server; do echo . ; sleep 1 ; done echo all killed # Remove logs etc, because they will be updated, and sharing them between @@ -42,43 +42,50 @@ function configure rm db0/metadata/system/* -rf ||: # Make copies of the original db for both servers. Use hardlinks instead - # of copying. Be careful to remove preprocessed configs or it can lead to - # weird effects. + # of copying. Be careful to remove preprocessed configs and system tables,or + # it can lead to weird effects. rm -r left/db ||: rm -r right/db ||: rm -r db0/preprocessed_configs ||: + rm -r db/{data,metadata}/system ||: cp -al db0/ left/db/ cp -al db0/ right/db/ } function restart { - while killall clickhouse; do echo . ; sleep 1 ; done + while killall clickhouse-server; do echo . ; sleep 1 ; done echo all killed set -m # Spawn servers in their own process groups - left/clickhouse server --config-file=left/config/config.xml -- --path left/db &>> left-server-log.log & + left/clickhouse-server --config-file=left/config/config.xml -- --path left/db &>> left-server-log.log & left_pid=$! kill -0 $left_pid disown $left_pid - right/clickhouse server --config-file=right/config/config.xml -- --path right/db &>> right-server-log.log & + right/clickhouse-server --config-file=right/config/config.xml -- --path right/db &>> right-server-log.log & right_pid=$! kill -0 $right_pid disown $right_pid set +m - while ! left/clickhouse client --port 9001 --query "select 1" ; do kill -0 $left_pid ; echo . ; sleep 1 ; done + while ! clickhouse-client --port 9001 --query "select 1" ; do kill -0 $left_pid ; echo . ; sleep 1 ; done echo left ok - while ! right/clickhouse client --port 9002 --query "select 1" ; do kill -0 $right_pid ; echo . ; sleep 1 ; done + while ! clickhouse-client --port 9002 --query "select 1" ; do kill -0 $right_pid ; echo . ; sleep 1 ; done echo right ok - left/clickhouse client --port 9001 --query "select * from system.tables where database != 'system'" - left/clickhouse client --port 9001 --query "select * from system.build_options" - right/clickhouse client --port 9002 --query "select * from system.tables where database != 'system'" - right/clickhouse client --port 9002 --query "select * from system.build_options" + clickhouse-client --port 9001 --query "select * from system.tables where database != 'system'" + clickhouse-client --port 9001 --query "select * from system.build_options" + clickhouse-client --port 9002 --query "select * from system.tables where database != 'system'" + clickhouse-client --port 9002 --query "select * from system.build_options" + + # Check again that both servers we started are running -- this is important + # for running locally, when there might be some other servers started and we + # will connect to them instead. + kill -0 $left_pid + kill -0 $right_pid } function run_tests @@ -129,7 +136,7 @@ function run_tests # FIXME remove some broken long tests for test_name in {IPv4,IPv6,modulo,parse_engine_file,number_formatting_formats,select_format,arithmetic,cryptographic_hashes,logical_functions_{medium,small}} do - printf "$test_name\tMarked as broken (see compare.sh)\n" >> skipped-tests.tsv + printf "%s\tMarked as broken (see compare.sh)\n" "$test_name">> skipped-tests.tsv rm "$test_prefix/$test_name.xml" ||: done test_files=$(ls "$test_prefix"/*.xml) @@ -140,9 +147,9 @@ function run_tests for test in $test_files do # Check that both servers are alive, to fail faster if they die. - left/clickhouse client --port 9001 --query "select 1 format Null" \ + clickhouse-client --port 9001 --query "select 1 format Null" \ || { echo $test_name >> left-server-died.log ; restart ; continue ; } - right/clickhouse client --port 9002 --query "select 1 format Null" \ + clickhouse-client --port 9002 --query "select 1 format Null" \ || { echo $test_name >> right-server-died.log ; restart ; continue ; } test_name=$(basename "$test" ".xml") @@ -160,7 +167,7 @@ function run_tests skipped=$(grep ^skipped "$test_name-raw.tsv" | cut -f2-) if [ "$skipped" != "" ] then - printf "$test_name""\t""$skipped""\n" >> skipped-tests.tsv + printf "%s\t%s\n" "$test_name" "$skipped">> skipped-tests.tsv fi done @@ -172,24 +179,24 @@ function run_tests function get_profiles { # Collect the profiles - left/clickhouse client --port 9001 --query "set query_profiler_cpu_time_period_ns = 0" - left/clickhouse client --port 9001 --query "set query_profiler_real_time_period_ns = 0" - right/clickhouse client --port 9001 --query "set query_profiler_cpu_time_period_ns = 0" - right/clickhouse client --port 9001 --query "set query_profiler_real_time_period_ns = 0" - left/clickhouse client --port 9001 --query "system flush logs" - right/clickhouse client --port 9002 --query "system flush logs" + clickhouse-client --port 9001 --query "set query_profiler_cpu_time_period_ns = 0" + clickhouse-client --port 9001 --query "set query_profiler_real_time_period_ns = 0" + clickhouse-client --port 9001 --query "set query_profiler_cpu_time_period_ns = 0" + clickhouse-client --port 9001 --query "set query_profiler_real_time_period_ns = 0" + clickhouse-client --port 9001 --query "system flush logs" + clickhouse-client --port 9002 --query "system flush logs" - left/clickhouse client --port 9001 --query "select * from system.query_log where type = 2 format TSVWithNamesAndTypes" > left-query-log.tsv ||: & - left/clickhouse client --port 9001 --query "select * from system.query_thread_log format TSVWithNamesAndTypes" > left-query-thread-log.tsv ||: & - left/clickhouse client --port 9001 --query "select * from system.trace_log format TSVWithNamesAndTypes" > left-trace-log.tsv ||: & - left/clickhouse client --port 9001 --query "select arrayJoin(trace) addr, concat(splitByChar('/', addressToLine(addr))[-1], '#', demangle(addressToSymbol(addr)) ) name from system.trace_log group by addr format TSVWithNamesAndTypes" > left-addresses.tsv ||: & - left/clickhouse client --port 9001 --query "select * from system.metric_log format TSVWithNamesAndTypes" > left-metric-log.tsv ||: & + clickhouse-client --port 9001 --query "select * from system.query_log where type = 2 format TSVWithNamesAndTypes" > left-query-log.tsv ||: & + clickhouse-client --port 9001 --query "select * from system.query_thread_log format TSVWithNamesAndTypes" > left-query-thread-log.tsv ||: & + clickhouse-client --port 9001 --query "select * from system.trace_log format TSVWithNamesAndTypes" > left-trace-log.tsv ||: & + clickhouse-client --port 9001 --query "select arrayJoin(trace) addr, concat(splitByChar('/', addressToLine(addr))[-1], '#', demangle(addressToSymbol(addr)) ) name from system.trace_log group by addr format TSVWithNamesAndTypes" > left-addresses.tsv ||: & + clickhouse-client --port 9001 --query "select * from system.metric_log format TSVWithNamesAndTypes" > left-metric-log.tsv ||: & - right/clickhouse client --port 9002 --query "select * from system.query_log where type = 2 format TSVWithNamesAndTypes" > right-query-log.tsv ||: & - right/clickhouse client --port 9002 --query "select * from system.query_thread_log format TSVWithNamesAndTypes" > right-query-thread-log.tsv ||: & - right/clickhouse client --port 9002 --query "select * from system.trace_log format TSVWithNamesAndTypes" > right-trace-log.tsv ||: & - right/clickhouse client --port 9002 --query "select arrayJoin(trace) addr, concat(splitByChar('/', addressToLine(addr))[-1], '#', demangle(addressToSymbol(addr)) ) name from system.trace_log group by addr format TSVWithNamesAndTypes" > right-addresses.tsv ||: & - right/clickhouse client --port 9002 --query "select * from system.metric_log format TSVWithNamesAndTypes" > right-metric-log.tsv ||: & + clickhouse-client --port 9002 --query "select * from system.query_log where type = 2 format TSVWithNamesAndTypes" > right-query-log.tsv ||: & + clickhouse-client --port 9002 --query "select * from system.query_thread_log format TSVWithNamesAndTypes" > right-query-thread-log.tsv ||: & + clickhouse-client --port 9002 --query "select * from system.trace_log format TSVWithNamesAndTypes" > right-trace-log.tsv ||: & + clickhouse-client --port 9002 --query "select arrayJoin(trace) addr, concat(splitByChar('/', addressToLine(addr))[-1], '#', demangle(addressToSymbol(addr)) ) name from system.trace_log group by addr format TSVWithNamesAndTypes" > right-addresses.tsv ||: & + clickhouse-client --port 9002 --query "select * from system.metric_log format TSVWithNamesAndTypes" > right-metric-log.tsv ||: & wait } @@ -197,9 +204,9 @@ function get_profiles # Build and analyze randomization distribution for all queries. function analyze_queries { - find . -maxdepth 1 -name "*-queries.tsv" -print | \ - xargs -n1 -I% basename % -queries.tsv | \ - parallel --verbose right/clickhouse local --file "{}-queries.tsv" \ + find . -maxdepth 1 -name "*-queries.tsv" -print0 | \ + xargs -0 -n1 -I% basename % -queries.tsv | \ + parallel --verbose clickhouse-local --file "{}-queries.tsv" \ --structure "\"query text, run int, version UInt32, time float\"" \ --query "\"$(cat "$script_dir/eqmed.sql")\"" \ ">" {}-report.tsv @@ -221,7 +228,7 @@ done rm ./*.{rep,svg} test-times.tsv test-dump.tsv unstable.tsv unstable-query-ids.tsv unstable-query-metrics.tsv changed-perf.tsv unstable-tests.tsv unstable-queries.tsv bad-tests.tsv slow-on-client.tsv all-queries.tsv ||: -right/clickhouse local --query " +clickhouse-local --query " create table queries engine File(TSVWithNamesAndTypes, 'queries.rep') as select -- FIXME Comparison mode doesn't make sense for queries that complete @@ -230,12 +237,14 @@ create table queries engine File(TSVWithNamesAndTypes, 'queries.rep') -- but the right way to do this is not yet clear. left + right < 0.05 as short, - not short and abs(diff) < 0.10 and rd[3] > 0.10 as unstable, - - -- Do not consider changed the queries with 5% RD below 5% -- e.g., we're - -- likely to observe a difference > 5% in less than 5% cases. - -- Not sure it is correct, but empirically it filters out a lot of noise. - not short and abs(diff) > 0.15 and abs(diff) > rd[3] and rd[1] > 0.05 as changed, + -- Difference > 15% and > rd(99%) -- changed. We can't filter out flaky + -- queries by rd(5%), because it can be zero when the difference is smaller + -- than a typical distribution width. The difference is still real though. + not short and abs(diff) > 0.15 and abs(diff) > rd[4] as changed, + + -- Not changed but rd(99%) > 10% -- unstable. + not short and not changed and rd[4] > 0.10 as unstable, + left, right, diff, rd, replaceAll(_file, '-report.tsv', '') test, query @@ -293,7 +302,7 @@ create table all_tests_tsv engine File(TSV, 'all-queries.tsv') as for version in {right,left} do -right/clickhouse local --query " +clickhouse-local --query " create view queries as select * from file('queries.rep', TSVWithNamesAndTypes, 'short int, unstable int, changed int, left float, right float, @@ -411,6 +420,10 @@ unset IFS grep -H -m2 -i '\(Exception\|Error\):[^:]' ./*-err.log | sed 's/:/\t/' > run-errors.tsv ||: } +# Check that local and client are in PATH +clickhouse-local --version > /dev/null +clickhouse-client --version > /dev/null + case "$stage" in "") ;& diff --git a/docker/test/performance-comparison/config/config.d/perf-comparison-tweaks-config.xml b/docker/test/performance-comparison/config/config.d/perf-comparison-tweaks-config.xml index 863a40718d9..e41ab8eb75d 100644 --- a/docker/test/performance-comparison/config/config.d/perf-comparison-tweaks-config.xml +++ b/docker/test/performance-comparison/config/config.d/perf-comparison-tweaks-config.xml @@ -1,4 +1,9 @@ - + + + + + :: + true diff --git a/docker/test/performance-comparison/download.sh b/docker/test/performance-comparison/download.sh index fc4622fdf39..ded72d8585f 100755 --- a/docker/test/performance-comparison/download.sh +++ b/docker/test/performance-comparison/download.sh @@ -2,7 +2,7 @@ set -ex set -o pipefail trap "exit" INT TERM -trap "kill $(jobs -pr) ||:" EXIT +trap 'kill $(jobs -pr) ||:' EXIT mkdir db0 ||: diff --git a/docker/test/performance-comparison/entrypoint.sh b/docker/test/performance-comparison/entrypoint.sh index b125d624bfe..f316a659f3c 100755 --- a/docker/test/performance-comparison/entrypoint.sh +++ b/docker/test/performance-comparison/entrypoint.sh @@ -98,14 +98,15 @@ fi # Even if we have some errors, try our best to save the logs. set +e -# Older version use 'kill 0', so put the script into a separate process group -# FIXME remove set +m in April 2020 -set +m +# Use clickhouse-client and clickhouse-local from the right server. +PATH="$(readlink -f right/)":"$PATH" +export PATH + +# Start the main comparison script. { \ time ../download.sh "$REF_PR" "$REF_SHA" "$PR_TO_TEST" "$SHA_TO_TEST" && \ time stage=configure "$script_path"/compare.sh ; \ } 2>&1 | ts "$(printf '%%Y-%%m-%%d %%H:%%M:%%S\t')" | tee compare.log -set -m # Stop the servers to free memory. Normally they are restarted before getting # the profile info, so they shouldn't use much, but if the comparison script diff --git a/docker/test/performance-comparison/perf.py b/docker/test/performance-comparison/perf.py index 9be84fdc4b6..c65e4019dc5 100755 --- a/docker/test/performance-comparison/perf.py +++ b/docker/test/performance-comparison/perf.py @@ -140,9 +140,16 @@ report_stage_end('substitute2') for q in test_queries: # Prewarm: run once on both servers. Helps to bring the data into memory, # precompile the queries, etc. - for conn_index, c in enumerate(connections): - res = c.execute(q, query_id = 'prewarm {} {}'.format(0, q)) - print('prewarm\t' + tsv_escape(q) + '\t' + str(conn_index) + '\t' + str(c.last_query.elapsed)) + try: + for conn_index, c in enumerate(connections): + res = c.execute(q, query_id = 'prewarm {} {}'.format(0, q)) + print('prewarm\t' + tsv_escape(q) + '\t' + str(conn_index) + '\t' + str(c.last_query.elapsed)) + except: + # If prewarm fails for some query -- skip it, and try to test the others. + # This might happen if the new test introduces some function that the + # old server doesn't support. Still, report it as an error. + print(traceback.format_exc(), file=sys.stderr) + continue # Now, perform measured runs. # Track the time spent by the client to process this query, so that we can notice diff --git a/docker/test/performance-comparison/report.py b/docker/test/performance-comparison/report.py index 84b0239ccda..a783cc46c46 100755 --- a/docker/test/performance-comparison/report.py +++ b/docker/test/performance-comparison/report.py @@ -256,17 +256,18 @@ if args.report == 'main': print(tableStart('Test times')) print(tableHeader(columns)) - + + runs = 11 # FIXME pass this as an argument attrs = ['' for c in columns] for r in rows: - if float(r[6]) > 22: + if float(r[6]) > 3 * runs: # FIXME should be 15s max -- investigate parallel_insert slow_average_tests += 1 attrs[6] = 'style="background: #ffb0a0"' else: attrs[6] = '' - if float(r[5]) > 30: + if float(r[5]) > 4 * runs: slow_average_tests += 1 attrs[5] = 'style="background: #ffb0a0"' else: diff --git a/docs/en/development/index.md b/docs/en/development/index.md index 34329853509..bb4158554d3 100644 --- a/docs/en/development/index.md +++ b/docs/en/development/index.md @@ -1,5 +1,5 @@ --- -toc_folder_title: Разработка +toc_folder_title: Development toc_hidden: true toc_priority: 58 toc_title: hidden diff --git a/docs/en/engines/table_engines/mergetree_family/aggregatingmergetree.md b/docs/en/engines/table_engines/mergetree_family/aggregatingmergetree.md index 2103efe98dc..9e310d313b9 100644 --- a/docs/en/engines/table_engines/mergetree_family/aggregatingmergetree.md +++ b/docs/en/engines/table_engines/mergetree_family/aggregatingmergetree.md @@ -9,7 +9,10 @@ The engine inherits from [MergeTree](mergetree.md#table_engines-mergetree), alte You can use `AggregatingMergeTree` tables for incremental data aggregation, including for aggregated materialized views. -The engine processes all columns with [AggregateFunction](../../../sql_reference/data_types/aggregatefunction.md) type. +The engine processes all columns with the following types: + +- [AggregateFunction](../../../sql_reference/data_types/aggregatefunction.md) +- [SimpleAggregateFunction](../../../sql_reference/data_types/simpleaggregatefunction.md) It is appropriate to use `AggregatingMergeTree` if it reduces the number of rows by orders. diff --git a/docs/en/sql_reference/data_types/nested_data_structures/simpleaggregatefunction.md b/docs/en/sql_reference/data_types/nested_data_structures/simpleaggregatefunction.md new file mode 100644 index 00000000000..4e086541053 --- /dev/null +++ b/docs/en/sql_reference/data_types/nested_data_structures/simpleaggregatefunction.md @@ -0,0 +1,36 @@ +# SimpleAggregateFunction(name, types\_of\_arguments…) {#data-type-simpleaggregatefunction} + +Unlike [`AggregateFunction`](../aggregatefunction.md), which stores not the value of the aggregate function but it's state: + +- `SimpleAggregateFunction` data type stores current value of the aggregate function, and does not store its full state as [`AggregateFunction`](../aggregatefunction.md) does. This optimization can be applied to functions for which the following property holds: the result of applying a function `f` to a row set `S1 UNION ALL S2` can be obtained by applying `f` to parts of the row set separately, and then again applying `f` to the results: `f(S1 UNION ALL S2) = f(f(S1) UNION ALL f(S2))`. This property guarantees that partial aggregation results are enough to compute the combined one, so we don't have to store and process any extra data. + +Currently, the following aggregate functions are supported: + + - [`any`](../../query_language/agg_functions/reference.md#agg_function-any) + - [`anyLast`](../../query_language/agg_functions/reference.md#anylastx) + - [`min`](../../query_language/agg_functions/reference.md#agg_function-min) + - [`max`](../../query_language/agg_functions/reference.md#agg_function-max) + - [`sum`](../../query_language/agg_functions/reference.md#agg_function-sum) + - [`groupBitAnd`](../../query_language/agg_functions/reference.md#groupbitand) + - [`groupBitOr`](../../query_language/agg_functions/reference.md#groupbitor) + - [`groupBitXor`](../../query_language/agg_functions/reference.md#groupbitxor) + +- Values of the `SimpleAggregateFunction(func, Type)` look and stored the same way as `Type`, so you do not need to apply functions with `-Merge`/`-State` suffixes. +- `SimpleAggregateFunction` has better performance than `AggregateFunction` with same aggregation function. + +**Parameters** + +- Name of the aggregate function. +- Types of the aggregate function arguments. + +**Example** + +``` sql +CREATE TABLE t +( + column1 SimpleAggregateFunction(sum, UInt64), + column2 SimpleAggregateFunction(any, String) +) ENGINE = ... +``` + +[Original article](https://clickhouse.tech/docs/en/data_types/nested_data_structures/simpleaggregatefunction/) diff --git a/docs/en/sql_reference/statements/create.md b/docs/en/sql_reference/statements/create.md index 36dd3aced8d..430bcacbc34 100644 --- a/docs/en/sql_reference/statements/create.md +++ b/docs/en/sql_reference/statements/create.md @@ -179,7 +179,7 @@ CREATE TABLE codec_example ENGINE = MergeTree() ``` -#### Common Purpose Codecs {#create-query-common-purpose-codecs} +#### General Purpose Codecs {#create-query-general-purpose-codecs} Codecs: diff --git a/docs/ru/operations/settings/merge_tree_settings.md b/docs/ru/operations/settings/merge_tree_settings.md new file mode 100644 index 00000000000..5297e359547 --- /dev/null +++ b/docs/ru/operations/settings/merge_tree_settings.md @@ -0,0 +1,96 @@ +# Настройки MergeTree таблиц {#merge-tree-settings} + +Значения настроек merge-tree (для всех MergeTree таблиц) можно посмотреть в таблице `system.merge_tree_settings`, их можно переопределить в `config.xml` в секции `merge_tree`, или задать в секции `SETTINGS` у каждой таблицы. + +Пример переопределения в `config.xml`: +```text + + 5 + +``` + +Пример для определения в `SETTINGS` у конкретной таблицы: +```sql +CREATE TABLE foo +( + `A` Int64 +) +ENGINE = MergeTree +ORDER BY tuple() +SETTINGS max_suspicious_broken_parts = 500; +``` + +Пример изменения настроек у конкретной таблицы командой `ALTER TABLE ... MODIFY SETTING`: +```sql +ALTER TABLE foo + MODIFY SETTING max_suspicious_broken_parts = 100; +``` + + +## parts_to_throw_insert {#parts-to-throw-insert} + +Eсли число кусков в партиции превышает значение `parts_to_throw_insert`, INSERT прерывается с исключением `Too many parts (N). Merges are processing significantly slower than inserts`. + +Возможные значения: + +- Положительное целое число. + +Значение по умолчанию: 300. + +Для достижения максимальной производительности запросов `SELECT` необходимо минимизировать количество обрабатываемых кусков, см. [Дизайн MergeTree](../../development/architecture.md#merge-tree). + +Можно установить большее значение 600 (1200), это уменьшит вероятность возникновения ошибки `Too many parts`, но в тоже время вы позже обнаружите возможную проблему со слияниями (например, из-за недостатка места на диске) и деградацию производительности `SELECT`. + + +## parts_to_delay_insert {#parts-to-delay-insert} + +Eсли число кусков в партиции превышает значение `parts_to_delay_insert`, `INSERT` искусственно замедляется. + +Возможные значения: + +- Положительное целое число. + +Значение по умолчанию: 150. + +ClickHouse искусственно выполняет `INSERT` дольше (добавляет 'sleep'), чтобы фоновый механизм слияния успевал слиять куски быстрее, чем они добавляются. + + +## max_delay_to_insert {#max-delay-to-insert} + +Величина в секундах, которая используется для расчета задержки `INSERT`, если число кусков в партиции превышает значение [parts_to_delay_insert](#parts-to-delay-insert). + +Возможные значения: + +- Положительное целое число. + +Значение по умолчанию: 1. + +Величина задержи (в миллисекундах) для `INSERT` вычисляется по формуле: + +```code +max_k = parts_to_throw_insert - parts_to_delay_insert +k = 1 + parts_count_in_partition - parts_to_delay_insert +delay_milliseconds = pow(max_delay_to_insert * 1000, k / max_k) +``` + +Т.е. если в партиции уже 299 кусков и parts_to_throw_insert = 300, parts_to_delay_insert = 150, max_delay_to_insert = 1, `INSERT` замедлится на `pow( 1 * 1000, (1 + 299 - 150) / (300 - 150) ) = 1000` миллисекунд. + +## old_parts_lifetime {#old-parts-lifetime} + +Время (в секундах) хранения неактивных кусков, для защиты от потери данных при спонтанной перезагрузке сервера или О.С. + +Возможные значения: + +- Положительное целое число. + +Значение по умолчанию: 480. + +После слияния нескольких кусков в новый кусок, ClickHouse помечает исходные куски как неактивные и удаляет их после `old_parts_lifetime` секунд. +Неактивные куски удаляются, если они не используются в текущих запросах, т.е. если счетчик ссылок куска -- `refcount` равен нулю. + +Неактивные куски удаляются не сразу, потому что при записи нового куска не вызывается `fsync`, т.е. некоторое время новый кусок находится только в оперативной памяти сервера (кеше О.С.). Т.о. при спонтанной перезагрузке сервера новый (смерженный) кусок может быть потерян или испорчен. В этом случае ClickHouse в процессе старта при проверке целостности кусков обнаружит проблему, вернет неактивные куски в список активных и позже заново их смержит. Сломанный кусок в этом случае переименовывается (добавляется префикс broken_) и перемещается в папку detached. Если проверка целостности не обнаруживает проблем в смерженном куске, то исходные неактивные куски переименовываются (добавляется префикс ignored_) и перемещаются в папку detached. + +Стандартное значение Linux dirty_expire_centisecs - 30 секунд (максимальное время, которое записанные данные хранятся только в оперативной памяти), но при больших нагрузках на дисковую систему, данные могут быть записаны намного позже. Экспериментально было найдено время - 480 секунд, за которое гарантированно новый кусок будет записан на диск. + + +[Оригинальная статья](https://clickhouse.tech/docs/ru/operations/settings/merge_tree_settings/) diff --git a/docs/tools/requirements.txt b/docs/tools/requirements.txt index 8414ae2c533..e6265b5e9e2 100644 --- a/docs/tools/requirements.txt +++ b/docs/tools/requirements.txt @@ -35,4 +35,4 @@ soupsieve==2.0 termcolor==1.1.0 tornado==5.1.1 Unidecode==1.1.1 -urllib3==1.25.8 +urllib3==1.25.9 diff --git a/docs/tools/translate/requirements.txt b/docs/tools/translate/requirements.txt index b0ea9603555..3c212ee8bc2 100644 --- a/docs/tools/translate/requirements.txt +++ b/docs/tools/translate/requirements.txt @@ -9,4 +9,4 @@ python-slugify==4.0.0 PyYAML==5.3.1 requests==2.23.0 text-unidecode==1.3 -urllib3==1.25.8 +urllib3==1.25.9 diff --git a/src/Columns/IColumn.h b/src/Columns/IColumn.h index 090537d6770..4af593bb658 100644 --- a/src/Columns/IColumn.h +++ b/src/Columns/IColumn.h @@ -44,7 +44,7 @@ public: /// Name of a Column kind, without parameters (example: FixedString, Array). virtual const char * getFamilyName() const = 0; - /** If column isn't constant, returns nullptr (or itself). + /** If column isn't constant, returns itself. * If column is constant, transforms constant to full column (if column type allows such transform) and return it. */ virtual Ptr convertToFullColumnIfConst() const { return getPtr(); } diff --git a/src/Common/Arena.h b/src/Common/Arena.h index 32c0f4c12d1..5febed8ddf6 100644 --- a/src/Common/Arena.h +++ b/src/Common/Arena.h @@ -220,7 +220,8 @@ public: // This method only works for extending the last allocation. For lack of // original size, check a weaker condition: that 'begin' is at least in // the current Chunk. - assert(range_start >= head->begin && range_start < head->end); + assert(range_start >= head->begin); + assert(range_start < head->end); if (head->pos + additional_bytes <= head->end) { diff --git a/src/Common/ErrorCodes.cpp b/src/Common/ErrorCodes.cpp index 27381811a8d..1f10893b08e 100644 --- a/src/Common/ErrorCodes.cpp +++ b/src/Common/ErrorCodes.cpp @@ -491,6 +491,7 @@ namespace ErrorCodes extern const int CANNOT_ASSIGN_ALTER = 517; extern const int CANNOT_COMMIT_OFFSET = 518; extern const int NO_REMOTE_SHARD_AVAILABLE = 519; + extern const int CANNOT_DETACH_DICTIONARY_AS_TABLE = 520; extern const int KEEPER_EXCEPTION = 999; extern const int POCO_EXCEPTION = 1000; diff --git a/src/Core/Settings.h b/src/Core/Settings.h index 325abc16f3f..cffdd4a66e4 100644 --- a/src/Core/Settings.h +++ b/src/Core/Settings.h @@ -78,9 +78,11 @@ struct Settings : public SettingsCollection M(SettingBool, extremes, false, "Calculate minimums and maximums of the result columns. They can be output in JSON-formats.", IMPORTANT) \ M(SettingBool, use_uncompressed_cache, true, "Whether to use the cache of uncompressed blocks.", 0) \ M(SettingBool, replace_running_query, false, "Whether the running request should be canceled with the same id as the new one.", 0) \ + M(SettingUInt64, background_buffer_flush_schedule_pool_size, 16, "Number of threads performing background flush for tables with Buffer engine. Only has meaning at server startup.", 0) \ M(SettingUInt64, background_pool_size, 16, "Number of threads performing background work for tables (for example, merging in merge tree). Only has meaning at server startup.", 0) \ M(SettingUInt64, background_move_pool_size, 8, "Number of threads performing background moves for tables. Only has meaning at server startup.", 0) \ - M(SettingUInt64, background_schedule_pool_size, 16, "Number of threads performing background tasks for replicated tables. Only has meaning at server startup.", 0) \ + M(SettingUInt64, background_schedule_pool_size, 16, "Number of threads performing background tasks for replicated tables, kafka streaming, dns cache updates. Only has meaning at server startup.", 0) \ + M(SettingUInt64, background_distributed_schedule_pool_size, 16, "Number of threads performing background tasks for distributed sends. Only has meaning at server startup.", 0) \ \ M(SettingMilliseconds, distributed_directory_monitor_sleep_time_ms, 100, "Sleep time for StorageDistributed DirectoryMonitors, in case of any errors delay grows exponentially.", 0) \ M(SettingMilliseconds, distributed_directory_monitor_max_sleep_time_ms, 30000, "Maximum sleep time for StorageDistributed DirectoryMonitors, it limits exponential growth too.", 0) \ diff --git a/src/Databases/DatabaseDictionary.cpp b/src/Databases/DatabaseDictionary.cpp index 9e7788bf846..a0f52a8f39a 100644 --- a/src/Databases/DatabaseDictionary.cpp +++ b/src/Databases/DatabaseDictionary.cpp @@ -1,6 +1,7 @@ #include #include #include +#include #include #include #include @@ -15,6 +16,18 @@ namespace DB namespace ErrorCodes { extern const int SYNTAX_ERROR; + extern const int CANNOT_GET_CREATE_DICTIONARY_QUERY; +} + +namespace +{ + StoragePtr createStorageDictionary(const String & database_name, const ExternalLoader::LoadResult & load_result) + { + if (!load_result.config) + return nullptr; + DictionaryStructure dictionary_structure = ExternalDictionariesLoader::getDictionaryStructure(*load_result.config); + return StorageDictionary::create(StorageID(database_name, load_result.name), load_result.name, dictionary_structure); + } } DatabaseDictionary::DatabaseDictionary(const String & name_) @@ -26,29 +39,12 @@ DatabaseDictionary::DatabaseDictionary(const String & name_) Tables DatabaseDictionary::listTables(const Context & context, const FilterByNameFunction & filter_by_name) { Tables tables; - ExternalLoader::LoadResults load_results; - if (filter_by_name) + auto load_results = context.getExternalDictionariesLoader().getLoadResults(filter_by_name); + for (auto & load_result : load_results) { - /// If `filter_by_name` is set, we iterate through all dictionaries with such names. That's why we need to load all of them. - load_results = context.getExternalDictionariesLoader().tryLoad(filter_by_name); - } - else - { - /// If `filter_by_name` isn't set, we iterate through only already loaded dictionaries. We don't try to load all dictionaries in this case. - load_results = context.getExternalDictionariesLoader().getCurrentLoadResults(); - } - - for (const auto & load_result: load_results) - { - /// Load tables only from XML dictionaries, don't touch other - if (load_result.object && load_result.repository_name.empty()) - { - auto dict_ptr = std::static_pointer_cast(load_result.object); - auto dict_name = dict_ptr->getName(); - const DictionaryStructure & dictionary_structure = dict_ptr->getStructure(); - auto columns = StorageDictionary::getNamesAndTypes(dictionary_structure); - tables[dict_name] = StorageDictionary::create(StorageID(getDatabaseName(), dict_name), ColumnsDescription{columns}, context, true, dict_name); - } + auto storage = createStorageDictionary(getDatabaseName(), load_result); + if (storage) + tables.emplace(storage->getStorageID().table_name, storage); } return tables; } @@ -64,15 +60,8 @@ StoragePtr DatabaseDictionary::tryGetTable( const Context & context, const String & table_name) const { - auto dict_ptr = context.getExternalDictionariesLoader().tryGetDictionary(table_name, true /*load*/); - if (dict_ptr) - { - const DictionaryStructure & dictionary_structure = dict_ptr->getStructure(); - auto columns = StorageDictionary::getNamesAndTypes(dictionary_structure); - return StorageDictionary::create(StorageID(getDatabaseName(), table_name), ColumnsDescription{columns}, context, true, table_name); - } - - return {}; + auto load_result = context.getExternalDictionariesLoader().getLoadResult(table_name); + return createStorageDictionary(getDatabaseName(), load_result); } DatabaseTablesIteratorPtr DatabaseDictionary::getTablesIterator(const Context & context, const FilterByNameFunction & filter_by_table_name) @@ -82,7 +71,7 @@ DatabaseTablesIteratorPtr DatabaseDictionary::getTablesIterator(const Context & bool DatabaseDictionary::empty(const Context & context) const { - return !context.getExternalDictionariesLoader().hasCurrentlyLoadedObjects(); + return !context.getExternalDictionariesLoader().hasObjects(); } ASTPtr DatabaseDictionary::getCreateTableQueryImpl(const Context & context, @@ -92,15 +81,17 @@ ASTPtr DatabaseDictionary::getCreateTableQueryImpl(const Context & context, { WriteBufferFromString buffer(query); - const auto & dictionaries = context.getExternalDictionariesLoader(); - auto dictionary = throw_on_error ? dictionaries.getDictionary(table_name) - : dictionaries.tryGetDictionary(table_name, true /*load*/); - if (!dictionary) + auto load_result = context.getExternalDictionariesLoader().getLoadResult(table_name); + if (!load_result.config) + { + if (throw_on_error) + throw Exception{"Dictionary " + backQuote(table_name) + " doesn't exist", ErrorCodes::CANNOT_GET_CREATE_DICTIONARY_QUERY}; return {}; + } - auto names_and_types = StorageDictionary::getNamesAndTypes(dictionary->getStructure()); + auto names_and_types = StorageDictionary::getNamesAndTypes(ExternalDictionariesLoader::getDictionaryStructure(*load_result.config)); buffer << "CREATE TABLE " << backQuoteIfNeed(database_name) << '.' << backQuoteIfNeed(table_name) << " ("; - buffer << StorageDictionary::generateNamesAndTypesDescription(names_and_types.begin(), names_and_types.end()); + buffer << StorageDictionary::generateNamesAndTypesDescription(names_and_types); buffer << ") Engine = Dictionary(" << backQuoteIfNeed(table_name) << ")"; } diff --git a/src/Databases/DatabaseDictionary.h b/src/Databases/DatabaseDictionary.h index b586cb1403f..0001a59efa6 100644 --- a/src/Databases/DatabaseDictionary.h +++ b/src/Databases/DatabaseDictionary.h @@ -43,6 +43,8 @@ public: ASTPtr getCreateDatabaseQuery(const Context & context) const override; + bool shouldBeEmptyOnDetach() const override { return false; } + void shutdown() override; protected: diff --git a/src/Databases/DatabaseOrdinary.cpp b/src/Databases/DatabaseOrdinary.cpp index 11c4a4400cd..ff375a26b13 100644 --- a/src/Databases/DatabaseOrdinary.cpp +++ b/src/Databases/DatabaseOrdinary.cpp @@ -16,6 +16,7 @@ #include #include #include +#include #include #include @@ -74,18 +75,24 @@ namespace void tryAttachDictionary( - Context & context, - const ASTCreateQuery & query, - DatabaseOrdinary & database) + const ASTPtr & query, + DatabaseOrdinary & database, + const String & metadata_path) { - assert(query.is_dictionary); + auto & create_query = query->as(); + assert(create_query.is_dictionary); try { - database.attachDictionary(query.table, context); + Poco::File meta_file(metadata_path); + auto config = getDictionaryConfigurationFromAST(create_query); + time_t modification_time = meta_file.getLastModified().epochTime(); + database.attachDictionary(create_query.table, DictionaryAttachInfo{query, config, modification_time}); } catch (Exception & e) { - e.addMessage("Cannot attach table '" + backQuote(query.table) + "' from query " + serializeAST(query)); + e.addMessage("Cannot attach dictionary " + backQuote(database.getDatabaseName()) + "." + backQuote(create_query.table) + + " from metadata file " + metadata_path + + " from query " + serializeAST(*query)); throw; } } @@ -173,12 +180,12 @@ void DatabaseOrdinary::loadStoredObjects( /// Attach dictionaries. attachToExternalDictionariesLoader(context); - for (const auto & name_with_query : file_names) + for (const auto & [name, query] : file_names) { - auto create_query = name_with_query.second->as(); + auto create_query = query->as(); if (create_query.is_dictionary) { - tryAttachDictionary(context, create_query, *this); + tryAttachDictionary(query, *this, getMetadataPath() + name); /// Messages, so that it's not boring to wait for the server to load for a long time. logAboutProgress(log, ++dictionaries_processed, total_dictionaries, watch); diff --git a/src/Databases/DatabaseWithDictionaries.cpp b/src/Databases/DatabaseWithDictionaries.cpp index 6673fdf8075..aec8b66e572 100644 --- a/src/Databases/DatabaseWithDictionaries.cpp +++ b/src/Databases/DatabaseWithDictionaries.cpp @@ -5,6 +5,8 @@ #include #include #include +#include +#include #include #include #include @@ -24,46 +26,80 @@ namespace ErrorCodes { extern const int CANNOT_GET_CREATE_DICTIONARY_QUERY; extern const int TABLE_ALREADY_EXISTS; - extern const int UNKNOWN_TABLE; + extern const int UNKNOWN_DICTIONARY; extern const int DICTIONARY_ALREADY_EXISTS; extern const int FILE_DOESNT_EXIST; - extern const int CANNOT_GET_CREATE_TABLE_QUERY; } -void DatabaseWithDictionaries::attachDictionary(const String & dictionary_name, const Context & context) +void DatabaseWithDictionaries::attachDictionary(const String & dictionary_name, const DictionaryAttachInfo & attach_info) { String full_name = getDatabaseName() + "." + dictionary_name; { std::lock_guard lock(mutex); - if (!dictionaries.emplace(dictionary_name).second) + auto [it, inserted] = dictionaries.emplace(dictionary_name, attach_info); + if (!inserted) throw Exception("Dictionary " + full_name + " already exists.", ErrorCodes::DICTIONARY_ALREADY_EXISTS); + + /// Attach the dictionary as table too. + try + { + attachTableUnlocked( + dictionary_name, + StorageDictionary::create( + StorageID(getDatabaseName(), dictionary_name), + full_name, + ExternalDictionariesLoader::getDictionaryStructure(*attach_info.config))); + } + catch (...) + { + dictionaries.erase(it); + throw; + } } CurrentStatusInfo::set(CurrentStatusInfo::DictionaryStatus, full_name, static_cast(ExternalLoaderStatus::NOT_LOADED)); + /// ExternalLoader::reloadConfig() will find out that the dictionary's config has been added /// and in case `dictionaries_lazy_load == false` it will load the dictionary. - const auto & external_loader = context.getExternalDictionariesLoader(); - external_loader.reloadConfig(getDatabaseName(), full_name); + external_loader->reloadConfig(getDatabaseName(), full_name); } -void DatabaseWithDictionaries::detachDictionary(const String & dictionary_name, const Context & context) +void DatabaseWithDictionaries::detachDictionary(const String & dictionary_name) +{ + DictionaryAttachInfo attach_info; + detachDictionaryImpl(dictionary_name, attach_info); +} + +void DatabaseWithDictionaries::detachDictionaryImpl(const String & dictionary_name, DictionaryAttachInfo & attach_info) { String full_name = getDatabaseName() + "." + dictionary_name; + { std::lock_guard lock(mutex); auto it = dictionaries.find(dictionary_name); if (it == dictionaries.end()) - throw Exception("Dictionary " + full_name + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE); + throw Exception("Dictionary " + full_name + " doesn't exist.", ErrorCodes::UNKNOWN_DICTIONARY); + attach_info = std::move(it->second); dictionaries.erase(it); + + /// Detach the dictionary as table too. + try + { + detachTableUnlocked(dictionary_name); + } + catch (...) + { + dictionaries.emplace(dictionary_name, std::move(attach_info)); + throw; + } } CurrentStatusInfo::unset(CurrentStatusInfo::DictionaryStatus, full_name); + /// ExternalLoader::reloadConfig() will find out that the dictionary's config has been removed /// and therefore it will unload the dictionary. - const auto & external_loader = context.getExternalDictionariesLoader(); - external_loader.reloadConfig(getDatabaseName(), full_name); - + external_loader->reloadConfig(getDatabaseName(), full_name); } void DatabaseWithDictionaries::createDictionary(const Context & context, const String & dictionary_name, const ASTPtr & query) @@ -85,8 +121,7 @@ void DatabaseWithDictionaries::createDictionary(const Context & context, const S /// A dictionary with the same full name could be defined in *.xml config files. String full_name = getDatabaseName() + "." + dictionary_name; - const auto & external_loader = context.getExternalDictionariesLoader(); - if (external_loader.getCurrentStatus(full_name) != ExternalLoader::Status::NOT_EXIST) + if (external_loader->getCurrentStatus(full_name) != ExternalLoader::Status::NOT_EXIST) throw Exception( "Dictionary " + backQuote(getDatabaseName()) + "." + backQuote(dictionary_name) + " already exists.", ErrorCodes::DICTIONARY_ALREADY_EXISTS); @@ -117,23 +152,22 @@ void DatabaseWithDictionaries::createDictionary(const Context & context, const S /// Add a temporary repository containing the dictionary. /// We need this temp repository to try loading the dictionary before actually attaching it to the database. - auto temp_repository - = const_cast(external_loader) /// the change of ExternalDictionariesLoader is temporary - .addConfigRepository(std::make_unique( - getDatabaseName(), dictionary_metadata_tmp_path, getDictionaryConfigurationFromAST(query->as()))); + auto temp_repository = external_loader->addConfigRepository(std::make_unique( + getDatabaseName(), dictionary_metadata_tmp_path, getDictionaryConfigurationFromAST(query->as()))); bool lazy_load = context.getConfigRef().getBool("dictionaries_lazy_load", true); if (!lazy_load) { /// load() is called here to force loading the dictionary, wait until the loading is finished, /// and throw an exception if the loading is failed. - external_loader.load(full_name); + external_loader->load(full_name); } - attachDictionary(dictionary_name, context); + auto config = getDictionaryConfigurationFromAST(query->as()); + attachDictionary(dictionary_name, DictionaryAttachInfo{query, config, time(nullptr)}); SCOPE_EXIT({ if (!succeeded) - detachDictionary(dictionary_name, context); + detachDictionary(dictionary_name); }); /// If it was ATTACH query and file with dictionary metadata already exist @@ -142,94 +176,31 @@ void DatabaseWithDictionaries::createDictionary(const Context & context, const S /// ExternalDictionariesLoader doesn't know we renamed the metadata path. /// So we have to manually call reloadConfig() here. - external_loader.reloadConfig(getDatabaseName(), full_name); + external_loader->reloadConfig(getDatabaseName(), full_name); /// Everything's ok. succeeded = true; } -void DatabaseWithDictionaries::removeDictionary(const Context & context, const String & dictionary_name) +void DatabaseWithDictionaries::removeDictionary(const Context &, const String & dictionary_name) { - detachDictionary(dictionary_name, context); - - String dictionary_metadata_path = getObjectMetadataPath(dictionary_name); + DictionaryAttachInfo attach_info; + detachDictionaryImpl(dictionary_name, attach_info); try { + String dictionary_metadata_path = getObjectMetadataPath(dictionary_name); Poco::File(dictionary_metadata_path).remove(); CurrentStatusInfo::unset(CurrentStatusInfo::DictionaryStatus, getDatabaseName() + "." + dictionary_name); } catch (...) { /// If remove was not possible for some reason - attachDictionary(dictionary_name, context); + attachDictionary(dictionary_name, attach_info); throw; } } -StoragePtr DatabaseWithDictionaries::tryGetTableImpl(const Context & context, const String & table_name, bool load) const -{ - if (auto table_ptr = DatabaseWithOwnTablesBase::tryGetTable(context, table_name)) - return table_ptr; - - if (isDictionaryExist(context, table_name)) - /// We don't need lock database here, because database doesn't store dictionary itself - /// just metadata - return getDictionaryStorage(context, table_name, load); - - return {}; -} -StoragePtr DatabaseWithDictionaries::tryGetTable(const Context & context, const String & table_name) const -{ - return tryGetTableImpl(context, table_name, true /*load*/); -} - -ASTPtr DatabaseWithDictionaries::getCreateTableQueryImpl(const Context & context, const String & table_name, bool throw_on_error) const -{ - ASTPtr ast; - bool has_table = tryGetTableImpl(context, table_name, false /*load*/) != nullptr; - auto table_metadata_path = getObjectMetadataPath(table_name); - try - { - ast = getCreateQueryFromMetadata(context, table_metadata_path, throw_on_error); - } - catch (const Exception & e) - { - if (!has_table && e.code() == ErrorCodes::FILE_DOESNT_EXIST && throw_on_error) - throw Exception{"Table " + backQuote(table_name) + " doesn't exist", - ErrorCodes::CANNOT_GET_CREATE_TABLE_QUERY}; - else if (throw_on_error) - throw; - } - return ast; -} - -DatabaseTablesIteratorPtr DatabaseWithDictionaries::getTablesWithDictionaryTablesIterator( - const Context & context, const FilterByNameFunction & filter_by_dictionary_name) -{ - /// NOTE: it's not atomic - auto tables_it = getTablesIterator(context, filter_by_dictionary_name); - auto dictionaries_it = getDictionariesIterator(context, filter_by_dictionary_name); - - Tables result; - while (tables_it && tables_it->isValid()) - { - result.emplace(tables_it->name(), tables_it->table()); - tables_it->next(); - } - - while (dictionaries_it && dictionaries_it->isValid()) - { - auto table_name = dictionaries_it->name(); - auto table_ptr = getDictionaryStorage(context, table_name, false /*load*/); - if (table_ptr) - result.emplace(table_name, table_ptr); - dictionaries_it->next(); - } - - return std::make_unique(result); -} - DatabaseDictionariesIteratorPtr DatabaseWithDictionaries::getDictionariesIterator(const Context & /*context*/, const FilterByNameFunction & filter_by_dictionary_name) { std::lock_guard lock(mutex); @@ -237,9 +208,9 @@ DatabaseDictionariesIteratorPtr DatabaseWithDictionaries::getDictionariesIterato return std::make_unique(dictionaries); Dictionaries filtered_dictionaries; - for (const auto & dictionary_name : dictionaries) + for (const auto & dictionary_name : dictionaries | boost::adaptors::map_keys) if (filter_by_dictionary_name(dictionary_name)) - filtered_dictionaries.emplace(dictionary_name); + filtered_dictionaries.emplace_back(dictionary_name); return std::make_unique(std::move(filtered_dictionaries)); } @@ -249,44 +220,84 @@ bool DatabaseWithDictionaries::isDictionaryExist(const Context & /*context*/, co return dictionaries.find(dictionary_name) != dictionaries.end(); } -StoragePtr DatabaseWithDictionaries::getDictionaryStorage(const Context & context, const String & table_name, bool load) const -{ - auto dict_name = database_name + "." + table_name; - const auto & external_loader = context.getExternalDictionariesLoader(); - auto dict_ptr = external_loader.tryGetDictionary(dict_name, load); - if (dict_ptr) - { - const DictionaryStructure & dictionary_structure = dict_ptr->getStructure(); - auto columns = StorageDictionary::getNamesAndTypes(dictionary_structure); - return StorageDictionary::create(StorageID(database_name, table_name), ColumnsDescription{columns}, context, true, dict_name); - } - return nullptr; -} - ASTPtr DatabaseWithDictionaries::getCreateDictionaryQueryImpl( const Context & context, const String & dictionary_name, bool throw_on_error) const { - ASTPtr ast; - - auto dictionary_metadata_path = getObjectMetadataPath(dictionary_name); - ast = getCreateQueryFromMetadata(context, dictionary_metadata_path, throw_on_error); - if (!ast && throw_on_error) { - /// Handle system.* tables for which there are no table.sql files. - bool has_dictionary = isDictionaryExist(context, dictionary_name); - - auto msg = has_dictionary ? "There is no CREATE DICTIONARY query for table " : "There is no metadata file for dictionary "; - - throw Exception(msg + backQuote(dictionary_name), ErrorCodes::CANNOT_GET_CREATE_DICTIONARY_QUERY); + /// Try to get create query ifg for an attached dictionary. + std::lock_guard lock{mutex}; + auto it = dictionaries.find(dictionary_name); + if (it != dictionaries.end()) + { + ASTPtr ast = it->second.create_query->clone(); + auto & create_query = ast->as(); + create_query.attach = false; + create_query.database = getDatabaseName(); + return ast; + } } - return ast; + /// Try to get create query for non-attached dictionary. + ASTPtr ast; + try + { + auto dictionary_metadata_path = getObjectMetadataPath(dictionary_name); + ast = getCreateQueryFromMetadata(context, dictionary_metadata_path, throw_on_error); + } + catch (const Exception & e) + { + if (throw_on_error && (e.code() != ErrorCodes::FILE_DOESNT_EXIST)) + throw; + } + + if (ast) + { + const auto * create_query = ast->as(); + if (create_query && create_query->is_dictionary) + return ast; + } + if (throw_on_error) + throw Exception{"Dictionary " + backQuote(dictionary_name) + " doesn't exist", + ErrorCodes::CANNOT_GET_CREATE_DICTIONARY_QUERY}; + return nullptr; +} + + +Poco::AutoPtr DatabaseWithDictionaries::getDictionaryConfiguration(const String & dictionary_name) const +{ + std::lock_guard lock(mutex); + auto it = dictionaries.find(dictionary_name); + if (it != dictionaries.end()) + return it->second.config; + throw Exception("Dictionary " + backQuote(dictionary_name) + " doesn't exist", ErrorCodes::UNKNOWN_DICTIONARY); +} + +time_t DatabaseWithDictionaries::getObjectMetadataModificationTime(const String & object_name) const +{ + { + std::lock_guard lock(mutex); + auto it = dictionaries.find(object_name); + if (it != dictionaries.end()) + return it->second.modification_time; + } + return DatabaseOnDisk::getObjectMetadataModificationTime(object_name); +} + + +bool DatabaseWithDictionaries::empty(const Context &) const +{ + std::lock_guard lock{mutex}; + return tables.empty() && dictionaries.empty(); } void DatabaseWithDictionaries::shutdown() { + { + std::lock_guard lock(mutex); + dictionaries.clear(); + } detachFromExternalDictionariesLoader(); DatabaseOnDisk::shutdown(); } @@ -295,8 +306,9 @@ DatabaseWithDictionaries::~DatabaseWithDictionaries() = default; void DatabaseWithDictionaries::attachToExternalDictionariesLoader(Context & context) { - database_as_config_repo_for_external_loader = context.getExternalDictionariesLoader().addConfigRepository( - std::make_unique(*this, context)); + external_loader = &context.getExternalDictionariesLoader(); + database_as_config_repo_for_external_loader + = external_loader->addConfigRepository(std::make_unique(*this, context)); } void DatabaseWithDictionaries::detachFromExternalDictionariesLoader() diff --git a/src/Databases/DatabaseWithDictionaries.h b/src/Databases/DatabaseWithDictionaries.h index 50e4dca671f..fe78aa98a19 100644 --- a/src/Databases/DatabaseWithDictionaries.h +++ b/src/Databases/DatabaseWithDictionaries.h @@ -8,9 +8,9 @@ namespace DB class DatabaseWithDictionaries : public DatabaseOnDisk { public: - void attachDictionary(const String & name, const Context & context) override; + void attachDictionary(const String & dictionary_name, const DictionaryAttachInfo & attach_info) override; - void detachDictionary(const String & name, const Context & context) override; + void detachDictionary(const String & dictionary_name) override; void createDictionary(const Context & context, const String & dictionary_name, @@ -18,15 +18,15 @@ public: void removeDictionary(const Context & context, const String & dictionary_name) override; - StoragePtr tryGetTable(const Context & context, const String & table_name) const override; - - ASTPtr getCreateTableQueryImpl(const Context & context, const String & table_name, bool throw_on_error) const override; - - DatabaseTablesIteratorPtr getTablesWithDictionaryTablesIterator(const Context & context, const FilterByNameFunction & filter_by_dictionary_name) override; + bool isDictionaryExist(const Context & context, const String & dictionary_name) const override; DatabaseDictionariesIteratorPtr getDictionariesIterator(const Context & context, const FilterByNameFunction & filter_by_dictionary_name) override; - bool isDictionaryExist(const Context & context, const String & dictionary_name) const override; + Poco::AutoPtr getDictionaryConfiguration(const String & /*name*/) const override; + + time_t getObjectMetadataModificationTime(const String & object_name) const override; + + bool empty(const Context & context) const override; void shutdown() override; @@ -39,16 +39,17 @@ protected: void attachToExternalDictionariesLoader(Context & context); void detachFromExternalDictionariesLoader(); - StoragePtr getDictionaryStorage(const Context & context, const String & table_name, bool load) const; + void detachDictionaryImpl(const String & dictionary_name, DictionaryAttachInfo & attach_info); ASTPtr getCreateDictionaryQueryImpl(const Context & context, const String & dictionary_name, bool throw_on_error) const override; -private: - ext::scope_guard database_as_config_repo_for_external_loader; + std::unordered_map dictionaries; - StoragePtr tryGetTableImpl(const Context & context, const String & table_name, bool load) const; +private: + ExternalDictionariesLoader * external_loader = nullptr; + ext::scope_guard database_as_config_repo_for_external_loader; }; } diff --git a/src/Databases/DatabasesCommon.cpp b/src/Databases/DatabasesCommon.cpp index 9d9c0707a7c..1b54424b484 100644 --- a/src/Databases/DatabasesCommon.cpp +++ b/src/Databases/DatabasesCommon.cpp @@ -27,7 +27,7 @@ bool DatabaseWithOwnTablesBase::isTableExist( const String & table_name) const { std::lock_guard lock(mutex); - return tables.find(table_name) != tables.end() || dictionaries.find(table_name) != dictionaries.end(); + return tables.find(table_name) != tables.end(); } StoragePtr DatabaseWithOwnTablesBase::tryGetTable( @@ -58,7 +58,7 @@ DatabaseTablesIteratorPtr DatabaseWithOwnTablesBase::getTablesIterator(const Con bool DatabaseWithOwnTablesBase::empty(const Context & /*context*/) const { std::lock_guard lock(mutex); - return tables.empty() && dictionaries.empty(); + return tables.empty(); } StoragePtr DatabaseWithOwnTablesBase::detachTable(const String & table_name) @@ -125,7 +125,6 @@ void DatabaseWithOwnTablesBase::shutdown() std::lock_guard lock(mutex); tables.clear(); - dictionaries.clear(); } DatabaseWithOwnTablesBase::~DatabaseWithOwnTablesBase() diff --git a/src/Databases/DatabasesCommon.h b/src/Databases/DatabasesCommon.h index 3bf7460da01..0515c16deaf 100644 --- a/src/Databases/DatabasesCommon.h +++ b/src/Databases/DatabasesCommon.h @@ -42,7 +42,6 @@ public: protected: mutable std::mutex mutex; Tables tables; - Dictionaries dictionaries; Poco::Logger * log; DatabaseWithOwnTablesBase(const String & name_, const String & logger); diff --git a/src/Databases/DictionaryAttachInfo.h b/src/Databases/DictionaryAttachInfo.h new file mode 100644 index 00000000000..b2214d26f3c --- /dev/null +++ b/src/Databases/DictionaryAttachInfo.h @@ -0,0 +1,18 @@ +#pragma once + +#include +#include +#include + + +namespace DB +{ + +struct DictionaryAttachInfo +{ + ASTPtr create_query; + Poco::AutoPtr config; + time_t modification_time; +}; + +} diff --git a/src/Databases/IDatabase.h b/src/Databases/IDatabase.h index 1dc1f9eb36b..9ef649dff58 100644 --- a/src/Databases/IDatabase.h +++ b/src/Databases/IDatabase.h @@ -5,8 +5,11 @@ #include #include #include +#include #include +#include +#include #include #include #include @@ -18,11 +21,10 @@ namespace DB class Context; struct Settings; struct ConstraintsDescription; -class ColumnsDescription; struct IndicesDescription; struct TableStructureWriteLockHolder; class ASTCreateQuery; -using Dictionaries = std::set; +using Dictionaries = std::vector; namespace ErrorCodes { @@ -74,9 +76,14 @@ private: public: DatabaseDictionariesSnapshotIterator() = default; DatabaseDictionariesSnapshotIterator(Dictionaries & dictionaries_) : dictionaries(dictionaries_), it(dictionaries.begin()) {} - DatabaseDictionariesSnapshotIterator(Dictionaries && dictionaries_) : dictionaries(dictionaries_), it(dictionaries.begin()) {} + DatabaseDictionariesSnapshotIterator(const std::unordered_map & dictionaries_) + { + boost::range::copy(dictionaries_ | boost::adaptors::map_keys, std::back_inserter(dictionaries)); + it = dictionaries.begin(); + } + void next() { ++it; } bool isValid() const { return !dictionaries.empty() && it != dictionaries.end(); } @@ -140,12 +147,6 @@ public: return std::make_unique(); } - /// Get an iterator to pass through all the tables and dictionary tables. - virtual DatabaseTablesIteratorPtr getTablesWithDictionaryTablesIterator(const Context & context, const FilterByNameFunction & filter_by_name = {}) - { - return getTablesIterator(context, filter_by_name); - } - /// Is the database empty. virtual bool empty(const Context & context) const = 0; @@ -192,7 +193,7 @@ public: /// Add dictionary to the database, but do not add it to the metadata. The database may not support this method. /// If dictionaries_lazy_load is false it also starts loading the dictionary asynchronously. - virtual void attachDictionary(const String & /*name*/, const Context & /*context*/) + virtual void attachDictionary(const String & /* dictionary_name */, const DictionaryAttachInfo & /* attach_info */) { throw Exception("There is no ATTACH DICTIONARY query for Database" + getEngineName(), ErrorCodes::NOT_IMPLEMENTED); } @@ -204,7 +205,7 @@ public: } /// Forget about the dictionary without deleting it. The database may not support this method. - virtual void detachDictionary(const String & /*name*/, const Context & /*context*/) + virtual void detachDictionary(const String & /*name*/) { throw Exception("There is no DETACH DICTIONARY query for Database" + getEngineName(), ErrorCodes::NOT_IMPLEMENTED); } @@ -260,6 +261,11 @@ public: return getCreateDictionaryQueryImpl(context, name, true); } + virtual Poco::AutoPtr getDictionaryConfiguration(const String & /*name*/) const + { + throw Exception(getEngineName() + ": getDictionaryConfiguration() is not supported", ErrorCodes::NOT_IMPLEMENTED); + } + /// Get the CREATE DATABASE query for current database. virtual ASTPtr getCreateDatabaseQuery(const Context & /*context*/) const = 0; @@ -276,6 +282,9 @@ public: /// Returns metadata path of a concrete table if the database supports it, empty string otherwise virtual String getObjectMetadataPath(const String & /*table_name*/) const { return {}; } + /// All tables and dictionaries should be detached before detaching the database. + virtual bool shouldBeEmptyOnDetach() const { return true; } + /// Ask all tables to complete the background threads they are using and delete all table objects. virtual void shutdown() = 0; diff --git a/src/Dictionaries/CacheDictionary.cpp b/src/Dictionaries/CacheDictionary.cpp index 36a8c704f4f..30bd521c0bb 100644 --- a/src/Dictionaries/CacheDictionary.cpp +++ b/src/Dictionaries/CacheDictionary.cpp @@ -46,6 +46,7 @@ namespace ErrorCodes extern const int BAD_ARGUMENTS; extern const int UNSUPPORTED_METHOD; extern const int TOO_SMALL_BUFFER_SIZE; + extern const int TIMEOUT_EXCEEDED; } @@ -63,10 +64,12 @@ CacheDictionary::CacheDictionary( const DictionaryStructure & dict_struct_, DictionarySourcePtr source_ptr_, DictionaryLifetime dict_lifetime_, + size_t strict_max_lifetime_seconds_, size_t size_, bool allow_read_expired_keys_, size_t max_update_queue_size_, size_t update_queue_push_timeout_milliseconds_, + size_t query_wait_timeout_milliseconds_, size_t max_threads_for_updates_) : database(database_) , name(name_) @@ -74,9 +77,11 @@ CacheDictionary::CacheDictionary( , dict_struct(dict_struct_) , source_ptr{std::move(source_ptr_)} , dict_lifetime(dict_lifetime_) + , strict_max_lifetime_seconds(strict_max_lifetime_seconds_) , allow_read_expired_keys(allow_read_expired_keys_) , max_update_queue_size(max_update_queue_size_) , update_queue_push_timeout_milliseconds(update_queue_push_timeout_milliseconds_) + , query_wait_timeout_milliseconds(query_wait_timeout_milliseconds_) , max_threads_for_updates(max_threads_for_updates_) , log(&Logger::get("ExternalDictionaries")) , size{roundUpToPowerOfTwoOrZero(std::max(size_, size_t(max_collision_length)))} @@ -332,6 +337,13 @@ void CacheDictionary::has(const PaddedPODArray & ids, PaddedPODArray { if (find_result.outdated) { + /// Protection of reading very expired keys. + if (now > cells[find_result.cell_idx].strict_max) + { + cache_not_found_ids[id].push_back(row); + continue; + } + cache_expired_ids[id].push_back(row); if (allow_read_expired_keys) @@ -693,6 +705,9 @@ void registerDictionaryCache(DictionaryFactory & factory) const String name = config.getString(config_prefix + ".name"); const DictionaryLifetime dict_lifetime{config, config_prefix + ".lifetime"}; + const size_t strict_max_lifetime_seconds = + config.getUInt64(layout_prefix + ".cache.strict_max_lifetime_seconds", static_cast(dict_lifetime.max_sec)); + const size_t max_update_queue_size = config.getUInt64(layout_prefix + ".cache.max_update_queue_size", 100000); if (max_update_queue_size == 0) @@ -708,6 +723,9 @@ void registerDictionaryCache(DictionaryFactory & factory) throw Exception{name + ": dictionary of layout 'cache' have too little update_queue_push_timeout", ErrorCodes::BAD_ARGUMENTS}; + const size_t query_wait_timeout_milliseconds = + config.getUInt64(layout_prefix + ".cache.query_wait_timeout_milliseconds", 60000); + const size_t max_threads_for_updates = config.getUInt64(layout_prefix + ".max_threads_for_updates", 4); if (max_threads_for_updates == 0) @@ -715,8 +733,17 @@ void registerDictionaryCache(DictionaryFactory & factory) ErrorCodes::BAD_ARGUMENTS}; return std::make_unique( - database, name, dict_struct, std::move(source_ptr), dict_lifetime, size, - allow_read_expired_keys, max_update_queue_size, update_queue_push_timeout_milliseconds, + database, + name, + dict_struct, + std::move(source_ptr), + dict_lifetime, + strict_max_lifetime_seconds, + size, + allow_read_expired_keys, + max_update_queue_size, + update_queue_push_timeout_milliseconds, + query_wait_timeout_milliseconds, max_threads_for_updates); }; factory.registerLayout("cache", create_layout, false); @@ -782,20 +809,32 @@ void CacheDictionary::updateThreadFunction() void CacheDictionary::waitForCurrentUpdateFinish(UpdateUnitPtr & update_unit_ptr) const { - std::unique_lock lock(update_mutex); + std::unique_lock update_lock(update_mutex); - /* - * We wait here without any timeout to avoid SEGFAULT's. - * Consider timeout for wait had expired and main query's thread ended with exception - * or some other error. But the UpdateUnit with callbacks is left in the queue. - * It has these callback that capture god knows what from the current thread - * (most of the variables lies on the stack of finished thread) that - * intended to do a synchronous update. AsyncUpdate thread can touch deallocated memory and explode. - * */ - is_update_finished.wait( - lock, + size_t timeout_for_wait = 100000; + bool result = is_update_finished.wait_for( + update_lock, + std::chrono::milliseconds(timeout_for_wait), [&] {return update_unit_ptr->is_done || update_unit_ptr->current_exception; }); + if (!result) + { + std::lock_guard callback_lock(update_unit_ptr->callback_mutex); + /* + * We acquire a lock here and store false to special variable to avoid SEGFAULT's. + * Consider timeout for wait had expired and main query's thread ended with exception + * or some other error. But the UpdateUnit with callbacks is left in the queue. + * It has these callback that capture god knows what from the current thread + * (most of the variables lies on the stack of finished thread) that + * intended to do a synchronous update. AsyncUpdate thread can touch deallocated memory and explode. + * */ + update_unit_ptr->can_use_callback = false; + throw DB::Exception( + "Dictionary " + getName() + " source seems unavailable, because " + + toString(timeout_for_wait) + " timeout exceeded.", ErrorCodes::TIMEOUT_EXCEEDED); + } + + if (update_unit_ptr->current_exception) std::rethrow_exception(update_unit_ptr->current_exception); } @@ -968,9 +1007,14 @@ void CacheDictionary::update(BunchUpdateUnit & bunch_update_unit) const { std::uniform_int_distribution distribution{dict_lifetime.min_sec, dict_lifetime.max_sec}; cell.setExpiresAt(now + std::chrono::seconds{distribution(rnd_engine)}); + cell.strict_max = now + std::chrono::seconds{strict_max_lifetime_seconds}; } else + { cell.setExpiresAt(std::chrono::time_point::max()); + cell.strict_max = now + std::chrono::seconds{strict_max_lifetime_seconds}; + } + /// Set null_value for each attribute cell.setDefault(); diff --git a/src/Dictionaries/CacheDictionary.h b/src/Dictionaries/CacheDictionary.h index e425b9c391a..bb103c61107 100644 --- a/src/Dictionaries/CacheDictionary.h +++ b/src/Dictionaries/CacheDictionary.h @@ -55,10 +55,12 @@ public: const DictionaryStructure & dict_struct_, DictionarySourcePtr source_ptr_, DictionaryLifetime dict_lifetime_, + size_t strict_max_lifetime_seconds, size_t size_, bool allow_read_expired_keys_, size_t max_update_queue_size_, size_t update_queue_push_timeout_milliseconds_, + size_t query_wait_timeout_milliseconds, size_t max_threads_for_updates); ~CacheDictionary() override; @@ -87,9 +89,18 @@ public: std::shared_ptr clone() const override { return std::make_shared( - database, name, dict_struct, source_ptr->clone(), dict_lifetime, size, - allow_read_expired_keys, max_update_queue_size, - update_queue_push_timeout_milliseconds, max_threads_for_updates); + database, + name, + dict_struct, + source_ptr->clone(), + dict_lifetime, + strict_max_lifetime_seconds, + size, + allow_read_expired_keys, + max_update_queue_size, + update_queue_push_timeout_milliseconds, + query_wait_timeout_milliseconds, + max_threads_for_updates); } const IDictionarySource * getSource() const override { return source_ptr.get(); } @@ -206,6 +217,8 @@ private: /// Stores both expiration time and `is_default` flag in the most significant bit time_point_urep_t data; + time_point_t strict_max; + /// Sets expiration time, resets `is_default` flag to false time_point_t expiresAt() const { return ext::safe_bit_cast(data & EXPIRES_AT_MASK); } void setExpiresAt(const time_point_t & t) { data = ext::safe_bit_cast(t); } @@ -294,9 +307,11 @@ private: const DictionaryStructure dict_struct; mutable DictionarySourcePtr source_ptr; const DictionaryLifetime dict_lifetime; + const size_t strict_max_lifetime_seconds; const bool allow_read_expired_keys; const size_t max_update_queue_size; const size_t update_queue_push_timeout_milliseconds; + const size_t query_wait_timeout_milliseconds; const size_t max_threads_for_updates; Logger * const log; @@ -366,6 +381,12 @@ private: alive_keys(CurrentMetrics::CacheDictionaryUpdateQueueKeys, requested_ids.size()){} std::vector requested_ids; + + /// It might seem that it is a leak of performance. + /// But aquiring a mutex without contention is rather cheap. + std::mutex callback_mutex; + bool can_use_callback{true}; + PresentIdHandler present_id_handler; AbsentIdHandler absent_id_handler; @@ -412,6 +433,7 @@ private: helper.push_back(unit_ptr->requested_ids.size() + helper.back()); present_id_handlers.emplace_back(unit_ptr->present_id_handler); absent_id_handlers.emplace_back(unit_ptr->absent_id_handler); + update_units.emplace_back(unit_ptr); } concatenated_requested_ids.reserve(total_requested_keys_count); @@ -428,31 +450,51 @@ private: void informCallersAboutPresentId(Key id, size_t cell_idx) { - for (size_t i = 0; i < concatenated_requested_ids.size(); ++i) + for (size_t position = 0; position < concatenated_requested_ids.size(); ++position) { - auto & curr = concatenated_requested_ids[i]; - if (curr == id) - getPresentIdHandlerForPosition(i)(id, cell_idx); + if (concatenated_requested_ids[position] == id) + { + auto unit_number = getUpdateUnitNumberForRequestedIdPosition(position); + auto lock = getLockToCurrentUnit(unit_number); + if (canUseCallback(unit_number)) + getPresentIdHandlerForPosition(unit_number)(id, cell_idx); + } } } void informCallersAboutAbsentId(Key id, size_t cell_idx) { - for (size_t i = 0; i < concatenated_requested_ids.size(); ++i) - if (concatenated_requested_ids[i] == id) - getAbsentIdHandlerForPosition(i)(id, cell_idx); + for (size_t position = 0; position < concatenated_requested_ids.size(); ++position) + if (concatenated_requested_ids[position] == id) + { + auto unit_number = getUpdateUnitNumberForRequestedIdPosition(position); + auto lock = getLockToCurrentUnit(unit_number); + if (canUseCallback(unit_number)) + getAbsentIdHandlerForPosition(unit_number)(id, cell_idx); + } } private: - PresentIdHandler & getPresentIdHandlerForPosition(size_t position) + /// Needed for control the usage of callback to avoid SEGFAULTs. + bool canUseCallback(size_t unit_number) { - return present_id_handlers[getUpdateUnitNumberForRequestedIdPosition(position)]; + return update_units[unit_number].get()->can_use_callback; } - AbsentIdHandler & getAbsentIdHandlerForPosition(size_t position) + std::unique_lock getLockToCurrentUnit(size_t unit_number) { - return absent_id_handlers[getUpdateUnitNumberForRequestedIdPosition((position))]; + return std::unique_lock(update_units[unit_number].get()->callback_mutex); + } + + PresentIdHandler & getPresentIdHandlerForPosition(size_t unit_number) + { + return update_units[unit_number].get()->present_id_handler; + } + + AbsentIdHandler & getAbsentIdHandlerForPosition(size_t unit_number) + { + return update_units[unit_number].get()->absent_id_handler; } size_t getUpdateUnitNumberForRequestedIdPosition(size_t position) @@ -464,6 +506,8 @@ private: std::vector present_id_handlers; std::vector absent_id_handlers; + std::vector> update_units; + std::vector helper; }; diff --git a/src/Dictionaries/CacheDictionary.inc.h b/src/Dictionaries/CacheDictionary.inc.h index 7b108438f76..746b2609a36 100644 --- a/src/Dictionaries/CacheDictionary.inc.h +++ b/src/Dictionaries/CacheDictionary.inc.h @@ -75,6 +75,13 @@ void CacheDictionary::getItemsNumberImpl( if (find_result.outdated) { + /// Protection of reading very expired keys. + if (now > cells[find_result.cell_idx].strict_max) + { + cache_not_found_ids[id].push_back(row); + continue; + } + cache_expired_ids[id].push_back(row); if (allow_read_expired_keys) update_routine(); @@ -249,6 +256,13 @@ void CacheDictionary::getItemsString( { if (find_result.outdated) { + /// Protection of reading very expired keys. + if (now > cells[find_result.cell_idx].strict_max) + { + cache_not_found_ids[id].push_back(row); + continue; + } + cache_expired_ids[id].push_back(row); if (allow_read_expired_keys) diff --git a/src/Dictionaries/getDictionaryConfigurationFromAST.h b/src/Dictionaries/getDictionaryConfigurationFromAST.h index bb48765c492..7dd672ccf4b 100644 --- a/src/Dictionaries/getDictionaryConfigurationFromAST.h +++ b/src/Dictionaries/getDictionaryConfigurationFromAST.h @@ -11,5 +11,4 @@ using DictionaryConfigurationPtr = Poco::AutoPtr; } + bool isInjective(const Block &) const override { return std::is_same_v; } DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override { @@ -1268,7 +1268,7 @@ public: } size_t getNumberOfArguments() const override { return 2; } - bool isInjective(const Block &) override { return true; } + bool isInjective(const Block &) const override { return true; } DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override { diff --git a/src/Functions/FunctionsEmbeddedDictionaries.h b/src/Functions/FunctionsEmbeddedDictionaries.h index 44fdca9736d..a8078bc47b7 100644 --- a/src/Functions/FunctionsEmbeddedDictionaries.h +++ b/src/Functions/FunctionsEmbeddedDictionaries.h @@ -592,7 +592,7 @@ public: /// For the purpose of query optimization, we assume this function to be injective /// even in face of fact that there are many different cities named Moscow. - bool isInjective(const Block &) override { return true; } + bool isInjective(const Block &) const override { return true; } DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override { diff --git a/src/Functions/FunctionsExternalDictionaries.h b/src/Functions/FunctionsExternalDictionaries.h index 61ba555be92..93ed1b75029 100644 --- a/src/Functions/FunctionsExternalDictionaries.h +++ b/src/Functions/FunctionsExternalDictionaries.h @@ -243,7 +243,7 @@ private: bool useDefaultImplementationForConstants() const final { return true; } ColumnNumbers getArgumentsThatAreAlwaysConstant() const final { return {0, 1}; } - bool isInjective(const Block & sample_block) override + bool isInjective(const Block & sample_block) const override { return isDictGetFunctionInjective(dictionaries_loader, sample_block); } @@ -769,7 +769,7 @@ private: bool useDefaultImplementationForConstants() const final { return true; } ColumnNumbers getArgumentsThatAreAlwaysConstant() const final { return {0, 1}; } - bool isInjective(const Block & sample_block) override + bool isInjective(const Block & sample_block) const override { return isDictGetFunctionInjective(dictionaries_loader, sample_block); } @@ -1338,7 +1338,7 @@ private: bool useDefaultImplementationForConstants() const final { return true; } ColumnNumbers getArgumentsThatAreAlwaysConstant() const final { return {0, 1}; } - bool isInjective(const Block & sample_block) override + bool isInjective(const Block & sample_block) const override { return isDictGetFunctionInjective(dictionaries_loader, sample_block); } @@ -1486,7 +1486,7 @@ private: bool useDefaultImplementationForConstants() const final { return true; } ColumnNumbers getArgumentsThatAreAlwaysConstant() const final { return {0, 1}; } - bool isInjective(const Block & sample_block) override + bool isInjective(const Block & sample_block) const override { return isDictGetFunctionInjective(dictionaries_loader, sample_block); } @@ -1627,7 +1627,7 @@ public: private: size_t getNumberOfArguments() const override { return 2; } - bool isInjective(const Block & /*sample_block*/) override { return true; } + bool isInjective(const Block & /*sample_block*/) const override { return true; } bool useDefaultImplementationForConstants() const final { return true; } ColumnNumbers getArgumentsThatAreAlwaysConstant() const final { return {0}; } diff --git a/src/Functions/FunctionsFormatting.h b/src/Functions/FunctionsFormatting.h index 0a789b1223a..a5c5635e0ad 100644 --- a/src/Functions/FunctionsFormatting.h +++ b/src/Functions/FunctionsFormatting.h @@ -42,7 +42,7 @@ public: } size_t getNumberOfArguments() const override { return 1; } - bool isInjective(const Block &) override { return true; } + bool isInjective(const Block &) const override { return true; } DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override { diff --git a/src/Functions/IFunction.h b/src/Functions/IFunction.h index 5ef6e5aaf22..b8873ea2671 100644 --- a/src/Functions/IFunction.h +++ b/src/Functions/IFunction.h @@ -134,7 +134,7 @@ public: * * sample_block should contain data types of arguments and values of constants, if relevant. */ - virtual bool isInjective(const Block & /*sample_block*/) { return false; } + virtual bool isInjective(const Block & /*sample_block*/) const { return false; } /** Function is called "deterministic", if it returns same result for same values of arguments. * Most of functions are deterministic. Notable counterexample is rand(). @@ -189,6 +189,7 @@ public: /// See the comment for the same method in IFunctionBase virtual bool isDeterministic() const = 0; virtual bool isDeterministicInScopeOfQuery() const = 0; + virtual bool isInjective(const Block &) const = 0; /// Override and return true if function needs to depend on the state of the data. virtual bool isStateful() const = 0; diff --git a/src/Functions/IFunctionAdaptors.h b/src/Functions/IFunctionAdaptors.h index 123faa859e9..82afaad4c27 100644 --- a/src/Functions/IFunctionAdaptors.h +++ b/src/Functions/IFunctionAdaptors.h @@ -68,7 +68,7 @@ public: return impl->getResultIfAlwaysReturnsConstantAndHasArguments(block, arguments); } - bool isInjective(const Block & sample_block) final { return impl->isInjective(sample_block); } + bool isInjective(const Block & sample_block) const final { return impl->isInjective(sample_block); } bool isDeterministic() const final { return impl->isDeterministic(); } bool isDeterministicInScopeOfQuery() const final { return impl->isDeterministicInScopeOfQuery(); } bool hasInformationAboutMonotonicity() const final { return impl->hasInformationAboutMonotonicity(); } @@ -96,6 +96,8 @@ public: bool isDeterministicInScopeOfQuery() const final { return impl->isDeterministicInScopeOfQuery(); } + bool isInjective(const Block & block) const final { return impl->isInjective(block); } + bool isStateful() const final { return impl->isStateful(); } bool isVariadic() const final { return impl->isVariadic(); } @@ -195,7 +197,7 @@ public: bool isStateful() const override { return function->isStateful(); } - bool isInjective(const Block & sample_block) override { return function->isInjective(sample_block); } + bool isInjective(const Block & sample_block) const override { return function->isInjective(sample_block); } bool isDeterministic() const override { return function->isDeterministic(); } @@ -226,6 +228,7 @@ public: bool isDeterministic() const override { return function->isDeterministic(); } bool isDeterministicInScopeOfQuery() const override { return function->isDeterministicInScopeOfQuery(); } + bool isInjective(const Block &block) const override { return function->isInjective(block); } String getName() const override { return function->getName(); } bool isStateful() const override { return function->isStateful(); } diff --git a/src/Functions/IFunctionImpl.h b/src/Functions/IFunctionImpl.h index 66196070afe..116363705de 100644 --- a/src/Functions/IFunctionImpl.h +++ b/src/Functions/IFunctionImpl.h @@ -107,7 +107,7 @@ public: virtual bool isSuitableForConstantFolding() const { return true; } virtual ColumnPtr getResultIfAlwaysReturnsConstantAndHasArguments(const Block & /*block*/, const ColumnNumbers & /*arguments*/) const { return nullptr; } - virtual bool isInjective(const Block & /*sample_block*/) { return false; } + virtual bool isInjective(const Block & /*sample_block*/) const { return false; } virtual bool isDeterministic() const { return true; } virtual bool isDeterministicInScopeOfQuery() const { return true; } virtual bool hasInformationAboutMonotonicity() const { return false; } @@ -152,6 +152,7 @@ public: /// Properties from IFunctionOverloadResolver. See comments in IFunction.h virtual bool isDeterministic() const { return true; } virtual bool isDeterministicInScopeOfQuery() const { return true; } + virtual bool isInjective(const Block &) const { return false; } virtual bool isStateful() const { return false; } virtual bool isVariadic() const { return false; } @@ -256,7 +257,7 @@ public: /// Properties from IFunctionBase (see IFunction.h) virtual bool isSuitableForConstantFolding() const { return true; } virtual ColumnPtr getResultIfAlwaysReturnsConstantAndHasArguments(const Block & /*block*/, const ColumnNumbers & /*arguments*/) const { return nullptr; } - virtual bool isInjective(const Block & /*sample_block*/) { return false; } + virtual bool isInjective(const Block & /*sample_block*/) const { return false; } virtual bool isDeterministic() const { return true; } virtual bool isDeterministicInScopeOfQuery() const { return true; } virtual bool isStateful() const { return false; } diff --git a/src/Functions/LeastGreatestGeneric.h b/src/Functions/LeastGreatestGeneric.h new file mode 100644 index 00000000000..2d6d71b20c7 --- /dev/null +++ b/src/Functions/LeastGreatestGeneric.h @@ -0,0 +1,140 @@ +#pragma once + +#include +#include +#include +#include +#include +#include +#include + + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; +} + + +enum class LeastGreatest +{ + Least, + Greatest +}; + + +template +class FunctionLeastGreatestGeneric : public IFunction +{ +public: + static constexpr auto name = kind == LeastGreatest::Least ? "least" : "greatest"; + static FunctionPtr create(const Context &) { return std::make_shared>(); } + +private: + String getName() const override { return name; } + size_t getNumberOfArguments() const override { return 0; } + bool isVariadic() const override { return true; } + bool useDefaultImplementationForConstants() const override { return true; } + + DataTypePtr getReturnTypeImpl(const DataTypes & types) const override + { + if (types.empty()) + throw Exception("Function " + getName() + " cannot be called without arguments", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); + + return getLeastSupertype(types); + } + + void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) override + { + size_t num_arguments = arguments.size(); + if (1 == num_arguments) + { + block.getByPosition(result).column = block.getByPosition(arguments[0]).column; + return; + } + + auto result_type = block.getByPosition(result).type; + + Columns converted_columns(num_arguments); + for (size_t arg = 0; arg < num_arguments; ++arg) + converted_columns[arg] = castColumn(block.getByPosition(arguments[arg]), result_type)->convertToFullColumnIfConst(); + + auto result_column = result_type->createColumn(); + result_column->reserve(input_rows_count); + + for (size_t row_num = 0; row_num < input_rows_count; ++row_num) + { + size_t best_arg = 0; + for (size_t arg = 1; arg < num_arguments; ++arg) + { + auto cmp_result = converted_columns[arg]->compareAt(row_num, row_num, *converted_columns[best_arg], 1); + + if constexpr (kind == LeastGreatest::Least) + { + if (cmp_result < 0) + best_arg = arg; + } + else + { + if (cmp_result > 0) + best_arg = arg; + } + } + + result_column->insertFrom(*converted_columns[best_arg], row_num); + } + + block.getByPosition(result).column = std::move(result_column); + } +}; + + +template +class LeastGreatestOverloadResolver : public IFunctionOverloadResolverImpl +{ +public: + static constexpr auto name = kind == LeastGreatest::Least ? "least" : "greatest"; + + static FunctionOverloadResolverImplPtr create(const Context & context) + { + return std::make_unique>(context); + } + + explicit LeastGreatestOverloadResolver(const Context & context_) : context(context_) {} + + String getName() const override { return name; } + size_t getNumberOfArguments() const override { return 0; } + bool isVariadic() const override { return true; } + + FunctionBaseImplPtr build(const ColumnsWithTypeAndName & arguments, const DataTypePtr & return_type) const override + { + DataTypes argument_types; + + /// More efficient specialization for two numeric arguments. + if (arguments.size() == 2 && isNumber(arguments[0].type) && isNumber(arguments[1].type)) + return std::make_unique(SpecializedFunction::create(context), argument_types, return_type); + + return std::make_unique( + FunctionLeastGreatestGeneric::create(context), argument_types, return_type); + } + + DataTypePtr getReturnType(const DataTypes & types) const override + { + if (types.empty()) + throw Exception("Function " + getName() + " cannot be called without arguments", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH); + + if (types.size() == 2 && isNumber(types[0]) && isNumber(types[1])) + return SpecializedFunction::create(context)->getReturnTypeImpl(types); + + return getLeastSupertype(types); + } + +private: + const Context & context; +}; + +} + + diff --git a/src/Functions/concat.cpp b/src/Functions/concat.cpp index 7cf0b2ce891..3c204f098c1 100644 --- a/src/Functions/concat.cpp +++ b/src/Functions/concat.cpp @@ -41,7 +41,7 @@ public: size_t getNumberOfArguments() const override { return 0; } - bool isInjective(const Block &) override { return is_injective; } + bool isInjective(const Block &) const override { return is_injective; } bool useDefaultImplementationForConstants() const override { return true; } diff --git a/src/Functions/greatest.cpp b/src/Functions/greatest.cpp index 9abf85e751b..63f08d0affe 100644 --- a/src/Functions/greatest.cpp +++ b/src/Functions/greatest.cpp @@ -1,6 +1,8 @@ #include #include #include +#include + namespace DB { @@ -57,7 +59,7 @@ using FunctionGreatest = FunctionBinaryArithmetic; void registerFunctionGreatest(FunctionFactory & factory) { - factory.registerFunction(FunctionFactory::CaseInsensitive); + factory.registerFunction>(FunctionFactory::CaseInsensitive); } } diff --git a/src/Functions/least.cpp b/src/Functions/least.cpp index f2e7c1f15d2..ba87e4bd7e4 100644 --- a/src/Functions/least.cpp +++ b/src/Functions/least.cpp @@ -1,6 +1,8 @@ #include #include #include +#include + namespace DB { @@ -57,7 +59,7 @@ using FunctionLeast = FunctionBinaryArithmetic; void registerFunctionLeast(FunctionFactory & factory) { - factory.registerFunction(FunctionFactory::CaseInsensitive); + factory.registerFunction>(FunctionFactory::CaseInsensitive); } } diff --git a/src/Functions/reverse.cpp b/src/Functions/reverse.cpp index 2c135cf3d7d..bc1237fc457 100644 --- a/src/Functions/reverse.cpp +++ b/src/Functions/reverse.cpp @@ -71,7 +71,7 @@ public: return 1; } - bool isInjective(const Block &) override + bool isInjective(const Block &) const override { return true; } diff --git a/src/Functions/tuple.cpp b/src/Functions/tuple.cpp index 451f732c869..772cb4e3c07 100644 --- a/src/Functions/tuple.cpp +++ b/src/Functions/tuple.cpp @@ -43,7 +43,7 @@ public: return 0; } - bool isInjective(const Block &) override + bool isInjective(const Block &) const override { return true; } diff --git a/src/Interpreters/Context.cpp b/src/Interpreters/Context.cpp index 6e30792277f..647c3fb8020 100644 --- a/src/Interpreters/Context.cpp +++ b/src/Interpreters/Context.cpp @@ -317,9 +317,11 @@ struct ContextShared MergeList merge_list; /// The list of executable merge (for (Replicated)?MergeTree) ConfigurationPtr users_config; /// Config with the users, profiles and quotas sections. InterserverIOHandler interserver_io_handler; /// Handler for interserver communication. + std::optional buffer_flush_schedule_pool; /// A thread pool that can do background flush for Buffer tables. std::optional background_pool; /// The thread pool for the background work performed by the tables. std::optional background_move_pool; /// The thread pool for the background moves performed by the tables. std::optional schedule_pool; /// A thread pool that can run different jobs in background (used in replicated tables) + std::optional distributed_schedule_pool; /// A thread pool that can run different jobs in background (used for distributed sends) MultiVersion macros; /// Substitutions extracted from config. std::unique_ptr ddl_worker; /// Process ddl commands from zk. /// Rules for selecting the compression settings, depending on the size of the part. @@ -413,9 +415,11 @@ struct ContextShared embedded_dictionaries.reset(); external_dictionaries_loader.reset(); external_models_loader.reset(); + buffer_flush_schedule_pool.reset(); background_pool.reset(); background_move_pool.reset(); schedule_pool.reset(); + distributed_schedule_pool.reset(); ddl_worker.reset(); /// Stop trace collector if any @@ -1330,6 +1334,14 @@ BackgroundProcessingPool & Context::getBackgroundMovePool() return *shared->background_move_pool; } +BackgroundSchedulePool & Context::getBufferFlushSchedulePool() +{ + auto lock = getLock(); + if (!shared->buffer_flush_schedule_pool) + shared->buffer_flush_schedule_pool.emplace(settings.background_buffer_flush_schedule_pool_size); + return *shared->buffer_flush_schedule_pool; +} + BackgroundSchedulePool & Context::getSchedulePool() { auto lock = getLock(); @@ -1338,6 +1350,14 @@ BackgroundSchedulePool & Context::getSchedulePool() return *shared->schedule_pool; } +BackgroundSchedulePool & Context::getDistributedSchedulePool() +{ + auto lock = getLock(); + if (!shared->distributed_schedule_pool) + shared->distributed_schedule_pool.emplace(settings.background_distributed_schedule_pool_size); + return *shared->distributed_schedule_pool; +} + void Context::setDDLWorker(std::unique_ptr ddl_worker) { auto lock = getLock(); diff --git a/src/Interpreters/Context.h b/src/Interpreters/Context.h index 6cf1a066b18..1f81cdbc58b 100644 --- a/src/Interpreters/Context.h +++ b/src/Interpreters/Context.h @@ -471,9 +471,11 @@ public: */ void dropCaches() const; + BackgroundSchedulePool & getBufferFlushSchedulePool(); BackgroundProcessingPool & getBackgroundPool(); BackgroundProcessingPool & getBackgroundMovePool(); BackgroundSchedulePool & getSchedulePool(); + BackgroundSchedulePool & getDistributedSchedulePool(); void setDDLWorker(std::unique_ptr ddl_worker); DDLWorker & getDDLWorker() const; diff --git a/src/Interpreters/DictionaryReader.cpp b/src/Interpreters/DictionaryReader.cpp new file mode 100644 index 00000000000..301fe9d57c6 --- /dev/null +++ b/src/Interpreters/DictionaryReader.cpp @@ -0,0 +1,167 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int NUMBER_OF_COLUMNS_DOESNT_MATCH; + extern const int TYPE_MISMATCH; +} + + +DictionaryReader::FunctionWrapper::FunctionWrapper(FunctionOverloadResolverPtr resolver, const ColumnsWithTypeAndName & arguments, + Block & block, const ColumnNumbers & arg_positions_, const String & column_name, + TypeIndex expected_type) + : arg_positions(arg_positions_) + , result_pos(block.columns()) +{ + FunctionBasePtr prepared_function = resolver->build(arguments); + + ColumnWithTypeAndName result; + result.name = "get_" + column_name; + result.type = prepared_function->getReturnType(); + if (result.type->getTypeId() != expected_type) + throw Exception("Type mismatch in dictionary reader for: " + column_name, ErrorCodes::TYPE_MISMATCH); + block.insert(result); + + function = prepared_function->prepare(block, arg_positions, result_pos); +} + +static constexpr const size_t key_size = 1; + +DictionaryReader::DictionaryReader(const String & dictionary_name, const Names & src_column_names, const NamesAndTypesList & result_columns, + const Context & context) + : result_header(makeResultBlock(result_columns)) + , key_position(key_size + result_header.columns()) +{ + if (src_column_names.size() != result_columns.size()) + throw Exception("Columns number mismatch in dictionary reader", ErrorCodes::NUMBER_OF_COLUMNS_DOESNT_MATCH); + + ColumnWithTypeAndName dict_name; + ColumnWithTypeAndName key; + ColumnWithTypeAndName column_name; + + { + dict_name.name = "dict"; + dict_name.type = std::make_shared(); + dict_name.column = dict_name.type->createColumnConst(1, dictionary_name); + + /// TODO: composite key (key_size > 1) + key.name = "key"; + key.type = std::make_shared(); + + column_name.name = "column"; + column_name.type = std::make_shared(); + } + + /// dictHas('dict_name', id) + ColumnsWithTypeAndName arguments_has; + arguments_has.push_back(dict_name); + arguments_has.push_back(key); + + /// dictGet('dict_name', 'attr_name', id) + ColumnsWithTypeAndName arguments_get; + arguments_get.push_back(dict_name); + arguments_get.push_back(column_name); + arguments_get.push_back(key); + + sample_block.insert(dict_name); + + for (auto & columns_name : src_column_names) + { + ColumnWithTypeAndName name; + name.name = "col_" + columns_name; + name.type = std::make_shared(); + name.column = name.type->createColumnConst(1, columns_name); + + sample_block.insert(name); + } + + sample_block.insert(key); + + ColumnNumbers positions_has{0, key_position}; + function_has = std::make_unique(FunctionFactory::instance().get("dictHas", context), + arguments_has, sample_block, positions_has, "has", DataTypeUInt8().getTypeId()); + functions_get.reserve(result_header.columns()); + + for (size_t i = 0; i < result_header.columns(); ++i) + { + size_t column_name_pos = key_size + i; + auto & column = result_header.getByPosition(i); + arguments_get[1].column = DataTypeString().createColumnConst(1, src_column_names[i]); + ColumnNumbers positions_get{0, column_name_pos, key_position}; + functions_get.emplace_back( + FunctionWrapper(FunctionFactory::instance().get("dictGet", context), + arguments_get, sample_block, positions_get, column.name, column.type->getTypeId())); + } +} + +void DictionaryReader::readKeys(const IColumn & keys, Block & out_block, ColumnVector::Container & found, + std::vector & positions) const +{ + Block working_block = sample_block; + size_t has_position = key_position + 1; + size_t size = keys.size(); + + /// set keys for dictHas() + ColumnWithTypeAndName & key_column = working_block.getByPosition(key_position); + key_column.column = keys.cloneResized(size); /// just a copy we cannot avoid + + /// calculate and extract dictHas() + function_has->execute(working_block, size); + ColumnWithTypeAndName & has_column = working_block.getByPosition(has_position); + auto mutable_has = (*std::move(has_column.column)).mutate(); + found.swap(typeid_cast &>(*mutable_has).getData()); + has_column.column = nullptr; + + /// set mapping form source keys to resulting rows in output block + positions.clear(); + positions.resize(size, 0); + size_t pos = 0; + for (size_t i = 0; i < size; ++i) + if (found[i]) + positions[i] = pos++; + + /// set keys for dictGet(): remove not found keys + key_column.column = key_column.column->filter(found, -1); + size_t rows = key_column.column->size(); + + /// calculate dictGet() + for (auto & func : functions_get) + func.execute(working_block, rows); + + /// make result: copy header block with correct names and move data columns + out_block = result_header.cloneEmpty(); + size_t first_get_position = has_position + 1; + for (size_t i = 0; i < out_block.columns(); ++i) + { + auto & src_column = working_block.getByPosition(first_get_position + i); + auto & dst_column = out_block.getByPosition(i); + dst_column.column = src_column.column; + src_column.column = nullptr; + } +} + +Block DictionaryReader::makeResultBlock(const NamesAndTypesList & names) +{ + Block block; + for (auto & nm : names) + { + ColumnWithTypeAndName column{nullptr, nm.type, nm.name}; + if (column.type->isNullable()) + column.type = typeid_cast(*column.type).getNestedType(); + block.insert(std::move(column)); + } + return block; +} + +} diff --git a/src/Interpreters/DictionaryReader.h b/src/Interpreters/DictionaryReader.h new file mode 100644 index 00000000000..92e4924ae80 --- /dev/null +++ b/src/Interpreters/DictionaryReader.h @@ -0,0 +1,46 @@ +#pragma once + +#include +#include +#include + +namespace DB +{ + +class Context; + +/// Read block of required columns from Dictionary by UInt64 key column. Rename columns if needed. +/// Current implementation uses dictHas() + N * dictGet() functions. +class DictionaryReader +{ +public: + struct FunctionWrapper + { + ExecutableFunctionPtr function; + ColumnNumbers arg_positions; + size_t result_pos = 0; + + FunctionWrapper(FunctionOverloadResolverPtr resolver, const ColumnsWithTypeAndName & arguments, Block & block, + const ColumnNumbers & arg_positions_, const String & column_name, TypeIndex expected_type); + + void execute(Block & block, size_t rows) const + { + function->execute(block, arg_positions, result_pos, rows, false); + } + }; + + DictionaryReader(const String & dictionary_name, const Names & src_column_names, const NamesAndTypesList & result_columns, + const Context & context); + void readKeys(const IColumn & keys, Block & out_block, ColumnVector::Container & found, std::vector & positions) const; + +private: + Block result_header; + Block sample_block; /// dictionary name, column names, key, dictHas() result, dictGet() results + size_t key_position; + std::unique_ptr function_has; + std::vector functions_get; + + static Block makeResultBlock(const NamesAndTypesList & names); +}; + +} diff --git a/src/Interpreters/ExpressionAnalyzer.cpp b/src/Interpreters/ExpressionAnalyzer.cpp index bb2553c76a4..8d63faed5cb 100644 --- a/src/Interpreters/ExpressionAnalyzer.cpp +++ b/src/Interpreters/ExpressionAnalyzer.cpp @@ -31,17 +31,20 @@ #include #include #include +#include #include #include #include +#include #include #include #include #include +#include #include #include @@ -502,25 +505,11 @@ bool SelectQueryExpressionAnalyzer::appendJoin(ExpressionActionsChain & chain, b return true; } -static JoinPtr tryGetStorageJoin(const ASTTablesInSelectQueryElement & join_element, std::shared_ptr analyzed_join, - const Context & context) +static JoinPtr tryGetStorageJoin(std::shared_ptr analyzed_join) { - const auto & table_to_join = join_element.table_expression->as(); - - /// TODO This syntax does not support specifying a database name. - if (table_to_join.database_and_table_name) - { - auto table_id = context.resolveStorageID(table_to_join.database_and_table_name); - StoragePtr table = DatabaseCatalog::instance().tryGetTable(table_id); - - if (table) - { - auto * storage_join = dynamic_cast(table.get()); - if (storage_join) - return storage_join->getJoin(analyzed_join); - } - } - + if (auto * table = analyzed_join->joined_storage.get()) + if (auto * storage_join = dynamic_cast(table)) + return storage_join->getJoin(analyzed_join); return {}; } @@ -531,10 +520,44 @@ static ExpressionActionsPtr createJoinedBlockActions(const Context & context, co return ExpressionAnalyzer(expression_list, syntax_result, context).getActions(true, false); } -static std::shared_ptr makeJoin(std::shared_ptr analyzed_join, const Block & sample_block) +static bool allowDictJoin(StoragePtr joined_storage, const Context & context, String & dict_name, String & key_name) +{ + auto * dict = dynamic_cast(joined_storage.get()); + if (!dict) + return false; + + dict_name = dict->dictionaryName(); + auto dictionary = context.getExternalDictionariesLoader().getDictionary(dict_name); + if (!dictionary) + return false; + + const DictionaryStructure & structure = dictionary->getStructure(); + if (structure.id) + { + key_name = structure.id->name; + return true; + } + return false; +} + +static std::shared_ptr makeJoin(std::shared_ptr analyzed_join, const Block & sample_block, const Context & context) { bool allow_merge_join = analyzed_join->allowMergeJoin(); + /// HashJoin with Dictionary optimisation + String dict_name; + String key_name; + if (analyzed_join->joined_storage && allowDictJoin(analyzed_join->joined_storage, context, dict_name, key_name)) + { + Names original_names; + NamesAndTypesList result_columns; + if (analyzed_join->allowDictJoin(key_name, sample_block, original_names, result_columns)) + { + analyzed_join->dictionary_reader = std::make_shared(dict_name, original_names, result_columns, context); + return std::make_shared(analyzed_join, sample_block); + } + } + if (analyzed_join->forceHashJoin() || (analyzed_join->preferMergeJoin() && !allow_merge_join)) return std::make_shared(analyzed_join, sample_block); else if (analyzed_join->forceMergeJoin() || (analyzed_join->preferMergeJoin() && allow_merge_join)) @@ -550,48 +573,49 @@ JoinPtr SelectQueryExpressionAnalyzer::makeTableJoin(const ASTTablesInSelectQuer SubqueryForSet & subquery_for_join = subqueries_for_sets[join_subquery_id]; - /// Special case - if table name is specified on the right of JOIN, then the table has the type Join (the previously prepared mapping). + /// Use StorageJoin if any. if (!subquery_for_join.join) - subquery_for_join.join = tryGetStorageJoin(join_element, syntax->analyzed_join, context); + subquery_for_join.join = tryGetStorageJoin(syntax->analyzed_join); if (!subquery_for_join.join) { /// Actions which need to be calculated on joined block. ExpressionActionsPtr joined_block_actions = createJoinedBlockActions(context, analyzedJoin()); + Names original_right_columns; if (!subquery_for_join.source) { - NamesWithAliases required_columns_with_aliases = - analyzedJoin().getRequiredColumns(joined_block_actions->getSampleBlock(), joined_block_actions->getRequiredColumns()); - makeSubqueryForJoin(join_element, std::move(required_columns_with_aliases), subquery_for_join); + NamesWithAliases required_columns_with_aliases = analyzedJoin().getRequiredColumns( + joined_block_actions->getSampleBlock(), joined_block_actions->getRequiredColumns()); + for (auto & pr : required_columns_with_aliases) + original_right_columns.push_back(pr.first); + + /** For GLOBAL JOINs (in the case, for example, of the push method for executing GLOBAL subqueries), the following occurs + * - in the addExternalStorage function, the JOIN (SELECT ...) subquery is replaced with JOIN _data1, + * in the subquery_for_set object this subquery is exposed as source and the temporary table _data1 as the `table`. + * - this function shows the expression JOIN _data1. + */ + auto interpreter = interpretSubquery(join_element.table_expression, context, original_right_columns, query_options); + + subquery_for_join.makeSource(interpreter, std::move(required_columns_with_aliases)); } /// TODO You do not need to set this up when JOIN is only needed on remote servers. subquery_for_join.setJoinActions(joined_block_actions); /// changes subquery_for_join.sample_block inside - subquery_for_join.join = makeJoin(syntax->analyzed_join, subquery_for_join.sample_block); + subquery_for_join.join = makeJoin(syntax->analyzed_join, subquery_for_join.sample_block, context); + + /// Do not make subquery for join over dictionary. + if (syntax->analyzed_join->dictionary_reader) + { + JoinPtr join = subquery_for_join.join; + subqueries_for_sets.erase(join_subquery_id); + return join; + } } return subquery_for_join.join; } -void SelectQueryExpressionAnalyzer::makeSubqueryForJoin(const ASTTablesInSelectQueryElement & join_element, - NamesWithAliases && required_columns_with_aliases, - SubqueryForSet & subquery_for_set) const -{ - /** For GLOBAL JOINs (in the case, for example, of the push method for executing GLOBAL subqueries), the following occurs - * - in the addExternalStorage function, the JOIN (SELECT ...) subquery is replaced with JOIN _data1, - * in the subquery_for_set object this subquery is exposed as source and the temporary table _data1 as the `table`. - * - this function shows the expression JOIN _data1. - */ - Names original_columns; - for (auto & pr : required_columns_with_aliases) - original_columns.push_back(pr.first); - - auto interpreter = interpretSubquery(join_element.table_expression, context, original_columns, query_options); - - subquery_for_set.makeSource(interpreter, std::move(required_columns_with_aliases)); -} - bool SelectQueryExpressionAnalyzer::appendPrewhere( ExpressionActionsChain & chain, bool only_types, const Names & additional_required_columns) { diff --git a/src/Interpreters/ExpressionAnalyzer.h b/src/Interpreters/ExpressionAnalyzer.h index 4322a897378..b7fda92e33f 100644 --- a/src/Interpreters/ExpressionAnalyzer.h +++ b/src/Interpreters/ExpressionAnalyzer.h @@ -276,8 +276,6 @@ private: SetPtr isPlainStorageSetInSubquery(const ASTPtr & subquery_or_table_name); JoinPtr makeTableJoin(const ASTTablesInSelectQueryElement & join_element); - void makeSubqueryForJoin(const ASTTablesInSelectQueryElement & join_element, NamesWithAliases && required_columns_with_aliases, - SubqueryForSet & subquery_for_set) const; const ASTSelectQuery * getAggregatingQuery() const; diff --git a/src/Interpreters/ExpressionJIT.cpp b/src/Interpreters/ExpressionJIT.cpp index 1f6ac0a9926..d47335eb1ee 100644 --- a/src/Interpreters/ExpressionJIT.cpp +++ b/src/Interpreters/ExpressionJIT.cpp @@ -510,7 +510,7 @@ bool LLVMFunction::isSuitableForConstantFolding() const return true; } -bool LLVMFunction::isInjective(const Block & sample_block) +bool LLVMFunction::isInjective(const Block & sample_block) const { for (const auto & f : originals) if (!f->isInjective(sample_block)) diff --git a/src/Interpreters/ExpressionJIT.h b/src/Interpreters/ExpressionJIT.h index 0a103aab378..2053bbd222a 100644 --- a/src/Interpreters/ExpressionJIT.h +++ b/src/Interpreters/ExpressionJIT.h @@ -53,7 +53,7 @@ public: bool isSuitableForConstantFolding() const override; - bool isInjective(const Block & sample_block) override; + bool isInjective(const Block & sample_block) const override; bool hasInformationAboutMonotonicity() const override; diff --git a/src/Interpreters/ExternalDictionariesLoader.cpp b/src/Interpreters/ExternalDictionariesLoader.cpp index 889b1c58b55..4e958a8c12b 100644 --- a/src/Interpreters/ExternalDictionariesLoader.cpp +++ b/src/Interpreters/ExternalDictionariesLoader.cpp @@ -1,5 +1,6 @@ #include #include +#include #if !defined(ARCADIA_BUILD) # include "config_core.h" @@ -33,6 +34,19 @@ ExternalLoader::LoadablePtr ExternalDictionariesLoader::create( return DictionaryFactory::instance().create(name, config, key_in_config, context, dictionary_from_database); } + +DictionaryStructure +ExternalDictionariesLoader::getDictionaryStructure(const Poco::Util::AbstractConfiguration & config, const std::string & key_in_config) +{ + return {config, key_in_config + ".structure"}; +} + +DictionaryStructure ExternalDictionariesLoader::getDictionaryStructure(const ObjectConfig & config) +{ + return getDictionaryStructure(*config.config, config.key_in_config); +} + + void ExternalDictionariesLoader::resetAll() { #if USE_MYSQL diff --git a/src/Interpreters/ExternalDictionariesLoader.h b/src/Interpreters/ExternalDictionariesLoader.h index 4a54a9963e7..e69046706a3 100644 --- a/src/Interpreters/ExternalDictionariesLoader.h +++ b/src/Interpreters/ExternalDictionariesLoader.h @@ -23,14 +23,14 @@ public: return std::static_pointer_cast(load(name)); } - DictPtr tryGetDictionary(const std::string & name, bool load) const + DictPtr tryGetDictionary(const std::string & name) const { - if (load) - return std::static_pointer_cast(tryLoad(name)); - else - return std::static_pointer_cast(getCurrentLoadResult(name).object); + return std::static_pointer_cast(tryLoad(name)); } + static DictionaryStructure getDictionaryStructure(const Poco::Util::AbstractConfiguration & config, const std::string & key_in_config = "dictionary"); + static DictionaryStructure getDictionaryStructure(const ObjectConfig & config); + static void resetAll(); protected: diff --git a/src/Interpreters/ExternalLoader.cpp b/src/Interpreters/ExternalLoader.cpp index 893d9aa61f9..4b83616fe50 100644 --- a/src/Interpreters/ExternalLoader.cpp +++ b/src/Interpreters/ExternalLoader.cpp @@ -94,15 +94,6 @@ namespace }; } -struct ExternalLoader::ObjectConfig -{ - Poco::AutoPtr config; - String key_in_config; - String repository_name; - bool from_temp_repository = false; - String path; -}; - /** Reads configurations from configuration repository and parses it. */ @@ -141,7 +132,7 @@ public: settings = settings_; } - using ObjectConfigsPtr = std::shared_ptr>; + using ObjectConfigsPtr = std::shared_ptr>>; /// Reads all repositories. ObjectConfigsPtr read() @@ -176,8 +167,9 @@ private: struct FileInfo { Poco::Timestamp last_update_time = 0; - std::vector> objects; // Parsed contents of the file. bool in_use = true; // Whether the `FileInfo` should be destroyed because the correspondent file is deleted. + Poco::AutoPtr file_contents; // Parsed contents of the file. + std::unordered_map objects; }; struct RepositoryInfo @@ -280,14 +272,15 @@ private: } LOG_TRACE(log, "Loading config file '" << path << "'."); - auto file_contents = repository.load(path); + file_info.file_contents = repository.load(path); + auto & file_contents = *file_info.file_contents; /// get all objects' definitions Poco::Util::AbstractConfiguration::Keys keys; - file_contents->keys(keys); + file_contents.keys(keys); /// for each object defined in repositories - std::vector> object_configs_from_file; + std::unordered_map objects; for (const auto & key : keys) { if (!startsWith(key, settings.external_config)) @@ -297,7 +290,7 @@ private: continue; } - String object_name = file_contents->getString(key + "." + settings.external_name); + String object_name = file_contents.getString(key + "." + settings.external_name); if (object_name.empty()) { LOG_WARNING(log, path << ": node '" << key << "' defines " << type_name << " with an empty name. It's not allowed"); @@ -306,14 +299,14 @@ private: String database; if (!settings.external_database.empty()) - database = file_contents->getString(key + "." + settings.external_database, ""); + database = file_contents.getString(key + "." + settings.external_database, ""); if (!database.empty()) object_name = database + "." + object_name; - object_configs_from_file.emplace_back(object_name, ObjectConfig{file_contents, key, {}, {}, {}}); + objects.emplace(object_name, key); } - file_info.objects = std::move(object_configs_from_file); + file_info.objects = std::move(objects); file_info.last_update_time = update_time_from_repository; file_info.in_use = true; return true; @@ -333,33 +326,36 @@ private: need_collect_object_configs = false; // Generate new result. - auto new_configs = std::make_shared>(); + auto new_configs = std::make_shared>>(); for (const auto & [repository, repository_info] : repositories) { for (const auto & [path, file_info] : repository_info.files) { - for (const auto & [object_name, object_config] : file_info.objects) + for (const auto & [object_name, key_in_config] : file_info.objects) { auto already_added_it = new_configs->find(object_name); if (already_added_it == new_configs->end()) { - auto & new_config = new_configs->emplace(object_name, object_config).first->second; - new_config.from_temp_repository = repository->isTemporary(); - new_config.repository_name = repository->getName(); - new_config.path = path; + auto new_config = std::make_shared(); + new_config->config = file_info.file_contents; + new_config->key_in_config = key_in_config; + new_config->repository_name = repository->getName(); + new_config->from_temp_repository = repository->isTemporary(); + new_config->path = path; + new_configs->emplace(object_name, std::move(new_config)); } else { const auto & already_added = already_added_it->second; - if (!already_added.from_temp_repository && !repository->isTemporary()) + if (!already_added->from_temp_repository && !repository->isTemporary()) { LOG_WARNING( log, type_name << " '" << object_name << "' is found " - << (((path == already_added.path) && (repository->getName() == already_added.repository_name)) + << (((path == already_added->path) && (repository->getName() == already_added->repository_name)) ? ("twice in the same file '" + path + "'") - : ("both in file '" + already_added.path + "' and '" + path + "'"))); + : ("both in file '" + already_added->path + "' and '" + path + "'"))); } } } @@ -440,13 +436,10 @@ public: else { const auto & new_config = new_config_it->second; - bool config_is_same = isSameConfiguration(*info.object_config.config, info.object_config.key_in_config, *new_config.config, new_config.key_in_config); - info.object_config = new_config; + bool config_is_same = isSameConfiguration(*info.config->config, info.config->key_in_config, *new_config->config, new_config->key_in_config); + info.config = new_config; if (!config_is_same) { - /// Configuration has been changed. - info.object_config = new_config; - if (info.triedToLoad()) { /// The object has been tried to load before, so it is currently in use or was in use @@ -531,7 +524,7 @@ public: /// Returns the load result of the object. template - ReturnType getCurrentLoadResult(const String & name) const + ReturnType getLoadResult(const String & name) const { std::lock_guard lock{mutex}; const Info * info = getInfo(name); @@ -543,13 +536,13 @@ public: /// Returns all the load results as a map. /// The function doesn't load anything, it just returns the current load results as is. template - ReturnType getCurrentLoadResults(const FilterByNameFunction & filter) const + ReturnType getLoadResults(const FilterByNameFunction & filter) const { std::lock_guard lock{mutex}; return collectLoadResults(filter); } - size_t getNumberOfCurrentlyLoadedObjects() const + size_t getNumberOfLoadedObjects() const { std::lock_guard lock{mutex}; size_t count = 0; @@ -562,7 +555,7 @@ public: return count; } - bool hasCurrentlyLoadedObjects() const + bool hasLoadedObjects() const { std::lock_guard lock{mutex}; for (auto & name_info : infos) @@ -581,6 +574,12 @@ public: return names; } + size_t getNumberOfObjects() const + { + std::lock_guard lock{mutex}; + return infos.size(); + } + /// Tries to load a specified object during the timeout. template ReturnType tryLoad(const String & name, Duration timeout) @@ -698,7 +697,7 @@ public: private: struct Info { - Info(const String & name_, const ObjectConfig & object_config_) : name(name_), object_config(object_config_) {} + Info(const String & name_, const std::shared_ptr & config_) : name(name_), config(config_) {} bool loaded() const { return object != nullptr; } bool failed() const { return !object && exception; } @@ -737,8 +736,7 @@ private: result.loading_start_time = loading_start_time; result.last_successful_update_time = last_successful_update_time; result.loading_duration = loadingDuration(); - result.origin = object_config.path; - result.repository_name = object_config.repository_name; + result.config = config; return result; } else @@ -750,7 +748,7 @@ private: String name; LoadablePtr object; - ObjectConfig object_config; + std::shared_ptr config; TimePoint loading_start_time; TimePoint loading_end_time; TimePoint last_successful_update_time; @@ -784,7 +782,7 @@ private: results.reserve(infos.size()); for (const auto & [name, info] : infos) { - if (filter(name)) + if (!filter || filter(name)) { auto result = info.template getLoadResult(); if constexpr (std::is_same_v) @@ -838,7 +836,7 @@ private: bool all_ready = true; for (auto & [name, info] : infos) { - if (!filter(name)) + if (filter && !filter(name)) continue; if (info.state_id >= min_id) @@ -955,7 +953,7 @@ private: previous_version_as_base_for_loading = nullptr; /// Need complete reloading, cannot use the previous version. /// Loading. - auto [new_object, new_exception] = loadSingleObject(name, info->object_config, previous_version_as_base_for_loading); + auto [new_object, new_exception] = loadSingleObject(name, *info->config, previous_version_as_base_for_loading); if (!new_object && !new_exception) throw Exception("No object created and no exception raised for " + type_name, ErrorCodes::LOGICAL_ERROR); @@ -1296,9 +1294,9 @@ void ExternalLoader::enablePeriodicUpdates(bool enable_) periodic_updater->enable(enable_); } -bool ExternalLoader::hasCurrentlyLoadedObjects() const +bool ExternalLoader::hasLoadedObjects() const { - return loading_dispatcher->hasCurrentlyLoadedObjects(); + return loading_dispatcher->hasLoadedObjects(); } ExternalLoader::Status ExternalLoader::getCurrentStatus(const String & name) const @@ -1307,30 +1305,35 @@ ExternalLoader::Status ExternalLoader::getCurrentStatus(const String & name) con } template -ReturnType ExternalLoader::getCurrentLoadResult(const String & name) const +ReturnType ExternalLoader::getLoadResult(const String & name) const { - return loading_dispatcher->getCurrentLoadResult(name); + return loading_dispatcher->getLoadResult(name); } template -ReturnType ExternalLoader::getCurrentLoadResults(const FilterByNameFunction & filter) const +ReturnType ExternalLoader::getLoadResults(const FilterByNameFunction & filter) const { - return loading_dispatcher->getCurrentLoadResults(filter); + return loading_dispatcher->getLoadResults(filter); } -ExternalLoader::Loadables ExternalLoader::getCurrentlyLoadedObjects() const +ExternalLoader::Loadables ExternalLoader::getLoadedObjects() const { - return getCurrentLoadResults(); + return getLoadResults(); } -ExternalLoader::Loadables ExternalLoader::getCurrentlyLoadedObjects(const FilterByNameFunction & filter) const +ExternalLoader::Loadables ExternalLoader::getLoadedObjects(const FilterByNameFunction & filter) const { - return getCurrentLoadResults(filter); + return getLoadResults(filter); } -size_t ExternalLoader::getNumberOfCurrentlyLoadedObjects() const +size_t ExternalLoader::getNumberOfLoadedObjects() const { - return loading_dispatcher->getNumberOfCurrentlyLoadedObjects(); + return loading_dispatcher->getNumberOfLoadedObjects(); +} + +size_t ExternalLoader::getNumberOfObjects() const +{ + return loading_dispatcher->getNumberOfObjects(); } template @@ -1456,10 +1459,10 @@ ExternalLoader::LoadablePtr ExternalLoader::createObject( return create(name, *config.config, config.key_in_config, config.repository_name); } -template ExternalLoader::LoadablePtr ExternalLoader::getCurrentLoadResult(const String &) const; -template ExternalLoader::LoadResult ExternalLoader::getCurrentLoadResult(const String &) const; -template ExternalLoader::Loadables ExternalLoader::getCurrentLoadResults(const FilterByNameFunction &) const; -template ExternalLoader::LoadResults ExternalLoader::getCurrentLoadResults(const FilterByNameFunction &) const; +template ExternalLoader::LoadablePtr ExternalLoader::getLoadResult(const String &) const; +template ExternalLoader::LoadResult ExternalLoader::getLoadResult(const String &) const; +template ExternalLoader::Loadables ExternalLoader::getLoadResults(const FilterByNameFunction &) const; +template ExternalLoader::LoadResults ExternalLoader::getLoadResults(const FilterByNameFunction &) const; template ExternalLoader::LoadablePtr ExternalLoader::tryLoad(const String &, Duration) const; template ExternalLoader::LoadResult ExternalLoader::tryLoad(const String &, Duration) const; diff --git a/src/Interpreters/ExternalLoader.h b/src/Interpreters/ExternalLoader.h index a9a94ca615e..bcf01eb6625 100644 --- a/src/Interpreters/ExternalLoader.h +++ b/src/Interpreters/ExternalLoader.h @@ -53,17 +53,25 @@ public: using Duration = std::chrono::milliseconds; using TimePoint = std::chrono::system_clock::time_point; + struct ObjectConfig + { + Poco::AutoPtr config; + String key_in_config; + String repository_name; + bool from_temp_repository = false; + String path; + }; + struct LoadResult { Status status = Status::NOT_EXIST; String name; LoadablePtr object; - String origin; TimePoint loading_start_time; TimePoint last_successful_update_time; Duration loading_duration; std::exception_ptr exception; - std::string repository_name; + std::shared_ptr config; }; using LoadResults = std::vector; @@ -99,26 +107,32 @@ public: /// Returns the result of loading the object. /// The function doesn't load anything, it just returns the current load result as is. template , void>> - ReturnType getCurrentLoadResult(const String & name) const; + ReturnType getLoadResult(const String & name) const; using FilterByNameFunction = std::function; /// Returns all the load results as a map. /// The function doesn't load anything, it just returns the current load results as is. template , void>> - ReturnType getCurrentLoadResults() const { return getCurrentLoadResults(alwaysTrue); } + ReturnType getLoadResults() const { return getLoadResults(FilterByNameFunction{}); } template , void>> - ReturnType getCurrentLoadResults(const FilterByNameFunction & filter) const; + ReturnType getLoadResults(const FilterByNameFunction & filter) const; /// Returns all loaded objects as a map. /// The function doesn't load anything, it just returns the current load results as is. - Loadables getCurrentlyLoadedObjects() const; - Loadables getCurrentlyLoadedObjects(const FilterByNameFunction & filter) const; + Loadables getLoadedObjects() const; + Loadables getLoadedObjects(const FilterByNameFunction & filter) const; /// Returns true if any object was loaded. - bool hasCurrentlyLoadedObjects() const; - size_t getNumberOfCurrentlyLoadedObjects() const; + bool hasLoadedObjects() const; + size_t getNumberOfLoadedObjects() const; + + /// Returns true if there is no object. + bool hasObjects() const { return getNumberOfObjects() == 0; } + + /// Returns number of objects. + size_t getNumberOfObjects() const; static constexpr Duration NO_WAIT = Duration::zero(); static constexpr Duration WAIT = Duration::max(); @@ -139,7 +153,7 @@ public: /// The function does nothing for already loaded objects, it just returns them. /// The function doesn't throw an exception if it's failed to load something. template , void>> - ReturnType tryLoadAll(Duration timeout = WAIT) const { return tryLoad(alwaysTrue, timeout); } + ReturnType tryLoadAll(Duration timeout = WAIT) const { return tryLoad(FilterByNameFunction{}, timeout); } /// Loads a specified object. /// The function does nothing if it's already loaded. @@ -157,7 +171,7 @@ public: /// The function does nothing for already loaded objects, it just returns them. /// The function throws an exception if it's failed to load something. template , void>> - ReturnType loadAll() const { return load(alwaysTrue); } + ReturnType loadAll() const { return load(FilterByNameFunction{}); } /// Loads or reloads a specified object. /// The function reloads the object if it's already loaded. @@ -174,7 +188,7 @@ public: /// Load or reloads all objects. Not recommended to use. /// The function throws an exception if it's failed to load or reload something. template , void>> - ReturnType loadOrReloadAll() const { return loadOrReload(alwaysTrue); } + ReturnType loadOrReloadAll() const { return loadOrReload(FilterByNameFunction{}); } /// Reloads objects by filter which were tried to load before (successfully or not). /// The function throws an exception if it's failed to load or reload something. @@ -197,10 +211,8 @@ private: void checkLoaded(const LoadResult & result, bool check_no_errors) const; void checkLoaded(const LoadResults & results, bool check_no_errors) const; - static bool alwaysTrue(const String &) { return true; } Strings getAllTriedToLoadNames() const; - struct ObjectConfig; LoadablePtr createObject(const String & name, const ObjectConfig & config, const LoadablePtr & previous_version) const; class LoadablesConfigReader; diff --git a/src/Interpreters/ExternalLoaderDatabaseConfigRepository.cpp b/src/Interpreters/ExternalLoaderDatabaseConfigRepository.cpp index 10f99262da7..7b3d57b192a 100644 --- a/src/Interpreters/ExternalLoaderDatabaseConfigRepository.cpp +++ b/src/Interpreters/ExternalLoaderDatabaseConfigRepository.cpp @@ -1,30 +1,30 @@ #include -#include -#include #include + namespace DB { - namespace ErrorCodes { extern const int UNKNOWN_DICTIONARY; } + namespace { -String trimDatabaseName(const std::string & loadable_definition_name, const IDatabase & database) -{ - const auto & dbname = database.getDatabaseName(); - if (!startsWith(loadable_definition_name, dbname)) - throw Exception( - "Loadable '" + loadable_definition_name + "' is not from database '" + database.getDatabaseName(), ErrorCodes::UNKNOWN_DICTIONARY); - /// dbname.loadable_name - ///--> remove <--- - return loadable_definition_name.substr(dbname.length() + 1); -} + String trimDatabaseName(const std::string & loadable_definition_name, const IDatabase & database) + { + const auto & dbname = database.getDatabaseName(); + if (!startsWith(loadable_definition_name, dbname)) + throw Exception( + "Loadable '" + loadable_definition_name + "' is not from database '" + database.getDatabaseName(), ErrorCodes::UNKNOWN_DICTIONARY); + /// dbname.loadable_name + ///--> remove <--- + return loadable_definition_name.substr(dbname.length() + 1); + } } + ExternalLoaderDatabaseConfigRepository::ExternalLoaderDatabaseConfigRepository(IDatabase & database_, const Context & context_) : name(database_.getDatabaseName()) , database(database_) @@ -34,8 +34,7 @@ ExternalLoaderDatabaseConfigRepository::ExternalLoaderDatabaseConfigRepository(I LoadablesConfigurationPtr ExternalLoaderDatabaseConfigRepository::load(const std::string & loadable_definition_name) { - String dictname = trimDatabaseName(loadable_definition_name, database); - return getDictionaryConfigurationFromAST(database.getCreateDictionaryQuery(context, dictname)->as()); + return database.getDictionaryConfiguration(trimDatabaseName(loadable_definition_name, database)); } bool ExternalLoaderDatabaseConfigRepository::exists(const std::string & loadable_definition_name) diff --git a/src/Interpreters/HashJoin.cpp b/src/Interpreters/HashJoin.cpp index b8da03acb8b..456573c6c8e 100644 --- a/src/Interpreters/HashJoin.cpp +++ b/src/Interpreters/HashJoin.cpp @@ -4,16 +4,21 @@ #include #include +#include #include #include #include +#include #include #include #include #include #include +#include + +#include #include #include @@ -21,8 +26,6 @@ #include #include #include -#include - namespace DB { @@ -282,6 +285,42 @@ static KeyGetter createKeyGetter(const ColumnRawPtrs & key_columns, const Sizes return KeyGetter(key_columns, key_sizes, nullptr); } +class KeyGetterForDict +{ +public: + using Mapped = JoinStuff::MappedOne; + using FindResult = ColumnsHashing::columns_hashing_impl::FindResultImpl; + + KeyGetterForDict(const ColumnRawPtrs & key_columns_, const Sizes &, void *) + : key_columns(key_columns_) + {} + + FindResult findKey(const TableJoin & table_join, size_t row, const Arena &) + { + const DictionaryReader & reader = *table_join.dictionary_reader; + if (!read_result) + { + reader.readKeys(*key_columns[0], read_result, found, positions); + result.block = &read_result; + + if (table_join.forceNullableRight()) + for (auto & column : read_result) + if (table_join.rightBecomeNullable(column.type)) + JoinCommon::convertColumnToNullable(column); + } + + result.row_num = positions[row]; + return FindResult(&result, found[row]); + } + +private: + const ColumnRawPtrs & key_columns; + Block read_result; + Mapped result; + ColumnVector::Container found; + std::vector positions; +}; + template struct KeyGetterForTypeImpl; @@ -351,7 +390,7 @@ size_t HashJoin::getTotalRowCount() const for (const auto & block : data->blocks) res += block.rows(); } - else + else if (data->type != Type::DICT) { joinDispatch(kind, strictness, data->maps, [&](auto, auto, auto & map) { res += map.getTotalRowCount(data->type); }); } @@ -368,7 +407,7 @@ size_t HashJoin::getTotalByteCount() const for (const auto & block : data->blocks) res += block.bytes(); } - else + else if (data->type != Type::DICT) { joinDispatch(kind, strictness, data->maps, [&](auto, auto, auto & map) { res += map.getTotalByteCountImpl(data->type); }); res += data->pool.size(); @@ -400,7 +439,13 @@ void HashJoin::setSampleBlock(const Block & block) if (nullable_right_side) JoinCommon::convertColumnsToNullable(sample_block_with_columns_to_add); - if (strictness == ASTTableJoin::Strictness::Asof) + if (table_join->dictionary_reader) + { + data->type = Type::DICT; + std::get(data->maps).create(Type::DICT); + chooseMethod(key_columns, key_sizes); /// init key_sizes + } + else if (strictness == ASTTableJoin::Strictness::Asof) { if (kind != ASTTableJoin::Kind::Left and kind != ASTTableJoin::Kind::Inner) throw Exception("ASOF only supports LEFT and INNER as base joins", ErrorCodes::NOT_IMPLEMENTED); @@ -526,7 +571,8 @@ namespace switch (type) { case HashJoin::Type::EMPTY: break; - case HashJoin::Type::CROSS: break; /// Do nothing. We have already saved block, and it is enough. + case HashJoin::Type::CROSS: break; /// Do nothing. We have already saved block, and it is enough. + case HashJoin::Type::DICT: break; /// Noone should call it with Type::DICT. #define M(TYPE) \ case HashJoin::Type::TYPE: \ @@ -598,6 +644,8 @@ bool HashJoin::addJoinedBlock(const Block & source_block, bool check_limits) { if (empty()) throw Exception("Logical error: HashJoin was not initialized", ErrorCodes::LOGICAL_ERROR); + if (overDictionary()) + throw Exception("Logical error: insert into hash-map in HashJoin over dictionary", ErrorCodes::LOGICAL_ERROR); /// There's no optimization for right side const columns. Remove constness if any. Block block = materializeBlock(source_block); @@ -930,8 +978,7 @@ IColumn::Filter switchJoinRightColumns(const Maps & maps_, AddedColumns & added_ case HashJoin::Type::TYPE: \ return joinRightColumnsSwitchNullability>::Type>(\ - *maps_.TYPE, added_columns, null_map);\ - break; + *maps_.TYPE, added_columns, null_map); APPLY_FOR_JOIN_VARIANTS(M) #undef M @@ -940,6 +987,20 @@ IColumn::Filter switchJoinRightColumns(const Maps & maps_, AddedColumns & added_ } } +template +IColumn::Filter dictionaryJoinRightColumns(const TableJoin & table_join, AddedColumns & added_columns, const ConstNullMapPtr & null_map) +{ + if constexpr (KIND == ASTTableJoin::Kind::Left && + (STRICTNESS == ASTTableJoin::Strictness::Any || + STRICTNESS == ASTTableJoin::Strictness::Semi || + STRICTNESS == ASTTableJoin::Strictness::Anti)) + { + return joinRightColumnsSwitchNullability(table_join, added_columns, null_map); + } + + throw Exception("Logical error: wrong JOIN combination", ErrorCodes::LOGICAL_ERROR); +} + } /// nameless @@ -1000,7 +1061,9 @@ void HashJoin::joinBlockImpl( bool has_required_right_keys = (required_right_keys.columns() != 0); added_columns.need_filter = need_filter || has_required_right_keys; - IColumn::Filter row_filter = switchJoinRightColumns(maps_, added_columns, data->type, null_map); + IColumn::Filter row_filter = overDictionary() ? + dictionaryJoinRightColumns(*table_join, added_columns, null_map) : + switchJoinRightColumns(maps_, added_columns, data->type, null_map); for (size_t i = 0; i < added_columns.size(); ++i) block.insert(added_columns.moveColumn(i)); @@ -1211,7 +1274,36 @@ void HashJoin::joinBlock(Block & block, ExtraBlockPtr & not_processed) const Names & key_names_left = table_join->keyNamesLeft(); JoinCommon::checkTypesOfKeys(block, key_names_left, right_table_keys, key_names_right); - if (joinDispatch(kind, strictness, data->maps, [&](auto kind_, auto strictness_, auto & map) + if (overDictionary()) + { + using Kind = ASTTableJoin::Kind; + using Strictness = ASTTableJoin::Strictness; + + auto & map = std::get(data->maps); + if (kind == Kind::Left) + { + switch (strictness) + { + case Strictness::Any: + case Strictness::All: + joinBlockImpl(block, key_names_left, sample_block_with_columns_to_add, map); + break; + case Strictness::Semi: + joinBlockImpl(block, key_names_left, sample_block_with_columns_to_add, map); + break; + case Strictness::Anti: + joinBlockImpl(block, key_names_left, sample_block_with_columns_to_add, map); + break; + default: + throw Exception("Logical error: wrong JOIN combination", ErrorCodes::LOGICAL_ERROR); + } + } + else if (kind == Kind::Inner && strictness == Strictness::All) + joinBlockImpl(block, key_names_left, sample_block_with_columns_to_add, map); + else + throw Exception("Logical error: wrong JOIN combination", ErrorCodes::LOGICAL_ERROR); + } + else if (joinDispatch(kind, strictness, data->maps, [&](auto kind_, auto strictness_, auto & map) { joinBlockImpl(block, key_names_left, sample_block_with_columns_to_add, map); })) diff --git a/src/Interpreters/HashJoin.h b/src/Interpreters/HashJoin.h index b769cfc61c5..9d4e0907f66 100644 --- a/src/Interpreters/HashJoin.h +++ b/src/Interpreters/HashJoin.h @@ -27,6 +27,7 @@ namespace DB { class TableJoin; +class DictionaryReader; namespace JoinStuff { @@ -148,7 +149,8 @@ class HashJoin : public IJoin public: HashJoin(std::shared_ptr table_join_, const Block & right_sample_block, bool any_take_last_row_ = false); - bool empty() { return data->type == Type::EMPTY; } + bool empty() const { return data->type == Type::EMPTY; } + bool overDictionary() const { return data->type == Type::DICT; } /** Add block of data from right hand of JOIN to the map. * Returns false, if some limit was exceeded and you should not insert more data. @@ -186,7 +188,7 @@ public: /// Sum size in bytes of all buffers, used for JOIN maps and for all memory pools. size_t getTotalByteCount() const final; - bool alwaysReturnsEmptySet() const final { return isInnerOrRight(getKind()) && data->empty; } + bool alwaysReturnsEmptySet() const final { return isInnerOrRight(getKind()) && data->empty && !overDictionary(); } ASTTableJoin::Kind getKind() const { return kind; } ASTTableJoin::Strictness getStrictness() const { return strictness; } @@ -220,12 +222,12 @@ public: { EMPTY, CROSS, + DICT, #define M(NAME) NAME, APPLY_FOR_JOIN_VARIANTS(M) #undef M }; - /** Different data structures, that are used to perform JOIN. */ template @@ -247,6 +249,7 @@ public: { case Type::EMPTY: break; case Type::CROSS: break; + case Type::DICT: break; #define M(NAME) \ case Type::NAME: NAME = std::make_unique(); break; @@ -261,6 +264,7 @@ public: { case Type::EMPTY: return 0; case Type::CROSS: return 0; + case Type::DICT: return 0; #define M(NAME) \ case Type::NAME: return NAME ? NAME->size() : 0; @@ -277,6 +281,7 @@ public: { case Type::EMPTY: return 0; case Type::CROSS: return 0; + case Type::DICT: return 0; #define M(NAME) \ case Type::NAME: return NAME ? NAME->getBufferSizeInBytes() : 0; diff --git a/src/Interpreters/InterpreterCreateQuery.cpp b/src/Interpreters/InterpreterCreateQuery.cpp index 37e2c8c5945..be7bd238025 100644 --- a/src/Interpreters/InterpreterCreateQuery.cpp +++ b/src/Interpreters/InterpreterCreateQuery.cpp @@ -45,6 +45,8 @@ #include #include +#include + #include #include @@ -703,7 +705,11 @@ BlockIO InterpreterCreateQuery::createDictionary(ASTCreateQuery & create) } if (create.attach) - database->attachDictionary(dictionary_name, context); + { + auto config = getDictionaryConfigurationFromAST(create); + auto modification_time = database->getObjectMetadataModificationTime(dictionary_name); + database->attachDictionary(dictionary_name, DictionaryAttachInfo{query_ptr, config, modification_time}); + } else database->createDictionary(context, dictionary_name, query_ptr); diff --git a/src/Interpreters/InterpreterDropQuery.cpp b/src/Interpreters/InterpreterDropQuery.cpp index 91783352842..71730c23788 100644 --- a/src/Interpreters/InterpreterDropQuery.cpp +++ b/src/Interpreters/InterpreterDropQuery.cpp @@ -188,7 +188,7 @@ BlockIO InterpreterDropQuery::executeToDictionary( { /// Drop dictionary from memory, don't touch data and metadata context.checkAccess(AccessType::DROP_DICTIONARY, database_name, dictionary_name); - database->detachDictionary(dictionary_name, context); + database->detachDictionary(dictionary_name); } else if (kind == ASTDropQuery::Kind::Truncate) { @@ -254,21 +254,26 @@ BlockIO InterpreterDropQuery::executeToDatabase(const String & database_name, AS bool drop = kind == ASTDropQuery::Kind::Drop; context.checkAccess(AccessType::DROP_DATABASE, database_name); - /// DETACH or DROP all tables and dictionaries inside database - for (auto iterator = database->getTablesIterator(context); iterator->isValid(); iterator->next()) + if (database->shouldBeEmptyOnDetach()) { - String current_table_name = iterator->name(); - executeToTable(database_name, current_table_name, kind, false, false, false); - } + /// DETACH or DROP all tables and dictionaries inside database. + /// First we should DETACH or DROP dictionaries because StorageDictionary + /// must be detached only by detaching corresponding dictionary. + for (auto iterator = database->getDictionariesIterator(context); iterator->isValid(); iterator->next()) + { + String current_dictionary = iterator->name(); + executeToDictionary(database_name, current_dictionary, kind, false, false, false); + } - for (auto iterator = database->getDictionariesIterator(context); iterator->isValid(); iterator->next()) - { - String current_dictionary = iterator->name(); - executeToDictionary(database_name, current_dictionary, kind, false, false, false); + for (auto iterator = database->getTablesIterator(context); iterator->isValid(); iterator->next()) + { + String current_table_name = iterator->name(); + executeToTable(database_name, current_table_name, kind, false, false, false); + } } /// DETACH or DROP database itself - DatabaseCatalog::instance().detachDatabase(database_name, drop); + DatabaseCatalog::instance().detachDatabase(database_name, drop, database->shouldBeEmptyOnDetach()); } } diff --git a/src/Interpreters/InterpreterSelectQuery.cpp b/src/Interpreters/InterpreterSelectQuery.cpp index 691b3c1045b..48ae00ffe1f 100644 --- a/src/Interpreters/InterpreterSelectQuery.cpp +++ b/src/Interpreters/InterpreterSelectQuery.cpp @@ -305,12 +305,13 @@ InterpreterSelectQuery::InterpreterSelectQuery( max_streams = settings.max_threads; ASTSelectQuery & query = getSelectQuery(); + std::shared_ptr table_join = joined_tables.makeTableJoin(query); auto analyze = [&] (bool try_move_to_prewhere = true) { syntax_analyzer_result = SyntaxAnalyzer(*context).analyzeSelect( query_ptr, SyntaxAnalyzerResult(source_header.getNamesAndTypesList(), storage), - options, joined_tables.tablesWithColumns(), required_result_column_names); + options, joined_tables.tablesWithColumns(), required_result_column_names, table_join); /// Save scalar sub queries's results in the query context if (context->hasQueryContext()) diff --git a/src/Interpreters/JoinedTables.cpp b/src/Interpreters/JoinedTables.cpp index cedf95bea06..44bd83458c4 100644 --- a/src/Interpreters/JoinedTables.cpp +++ b/src/Interpreters/JoinedTables.cpp @@ -1,18 +1,26 @@ #include +#include #include #include #include #include #include + #include #include #include +#include +#include + #include +#include #include #include #include #include #include +#include +#include namespace DB { @@ -26,6 +34,34 @@ namespace ErrorCodes namespace { +void replaceJoinedTable(const ASTSelectQuery & select_query) +{ + const ASTTablesInSelectQueryElement * join = select_query.join(); + if (!join || !join->table_expression) + return; + + /// TODO: Push down for CROSS JOIN is not OK [disabled] + const auto & table_join = join->table_join->as(); + if (table_join.kind == ASTTableJoin::Kind::Cross) + return; + + auto & table_expr = join->table_expression->as(); + if (table_expr.database_and_table_name) + { + const auto & table_id = table_expr.database_and_table_name->as(); + String expr = "(select * from " + table_id.name + ") as " + table_id.shortName(); + + // FIXME: since the expression "a as b" exposes both "a" and "b" names, which is not equivalent to "(select * from a) as b", + // we can't replace aliased tables. + // FIXME: long table names include database name, which we can't save within alias. + if (table_id.alias.empty() && table_id.isShort()) + { + ParserTableExpression parser; + table_expr = parseQuery(parser, expr, 0, DBMS_DEFAULT_MAX_PARSER_DEPTH)->as(); + } + } +} + template void checkTablesWithColumns(const std::vector & tables_with_columns, const Context & context) { @@ -209,4 +245,35 @@ void JoinedTables::rewriteDistributedInAndJoins(ASTPtr & query) } } +std::shared_ptr JoinedTables::makeTableJoin(const ASTSelectQuery & select_query) +{ + if (tables_with_columns.size() < 2) + return {}; + + auto settings = context.getSettingsRef(); + auto table_join = std::make_shared(settings, context.getTemporaryVolume()); + + const ASTTablesInSelectQueryElement * ast_join = select_query.join(); + const auto & table_to_join = ast_join->table_expression->as(); + + /// TODO This syntax does not support specifying a database name. + if (table_to_join.database_and_table_name) + { + auto joined_table_id = context.resolveStorageID(table_to_join.database_and_table_name); + StoragePtr table = DatabaseCatalog::instance().tryGetTable(joined_table_id); + if (table) + { + if (dynamic_cast(table.get()) || + dynamic_cast(table.get())) + table_join->joined_storage = table; + } + } + + if (!table_join->joined_storage && + settings.enable_optimize_predicate_expression) + replaceJoinedTable(select_query); + + return table_join; +} + } diff --git a/src/Interpreters/JoinedTables.h b/src/Interpreters/JoinedTables.h index 3bcec883f30..399acdc0768 100644 --- a/src/Interpreters/JoinedTables.h +++ b/src/Interpreters/JoinedTables.h @@ -10,6 +10,7 @@ namespace DB class ASTSelectQuery; class Context; +class TableJoin; struct SelectQueryOptions; /// Joined tables' columns resolver. @@ -30,6 +31,7 @@ public: /// Make fake tables_with_columns[0] in case we have predefined input in InterpreterSelectQuery void makeFakeTable(StoragePtr storage, const Block & source_header); + std::shared_ptr makeTableJoin(const ASTSelectQuery & select_query); const std::vector & tablesWithColumns() const { return tables_with_columns; } diff --git a/src/Interpreters/SyntaxAnalyzer.cpp b/src/Interpreters/SyntaxAnalyzer.cpp index e19961e7a7c..de5bcca1044 100644 --- a/src/Interpreters/SyntaxAnalyzer.cpp +++ b/src/Interpreters/SyntaxAnalyzer.cpp @@ -29,10 +29,10 @@ #include #include #include -#include -#include #include +#include + #include #include @@ -216,28 +216,6 @@ void executeScalarSubqueries(ASTPtr & query, const Context & context, size_t sub ExecuteScalarSubqueriesVisitor(visitor_data, log.stream()).visit(query); } -/** Calls to these functions in the GROUP BY statement would be - * replaced by their immediate argument. - */ -const std::unordered_set injective_function_names -{ - "negate", - "bitNot", - "reverse", - "reverseUTF8", - "toString", - "toFixedString", - "IPv4NumToString", - "IPv4StringToNum", - "hex", - "unhex", - "bitmaskToList", - "bitmaskToArray", - "tuple", - "regionToName", - "concatAssumeInjective", -}; - const std::unordered_set possibly_injective_function_names { "dictGetString", @@ -278,6 +256,8 @@ void appendUnusedGroupByColumn(ASTSelectQuery * select_query, const NameSet & so /// Eliminates injective function calls and constant expressions from group by statement. void optimizeGroupBy(ASTSelectQuery * select_query, const NameSet & source_columns, const Context & context) { + const FunctionFactory & function_factory = FunctionFactory::instance(); + if (!select_query->groupBy()) { // If there is a HAVING clause without GROUP BY, make sure we have some aggregation happen. @@ -327,7 +307,7 @@ void optimizeGroupBy(ASTSelectQuery * select_query, const NameSet & source_colum continue; } } - else if (!injective_function_names.count(function->name)) + else if (!function_factory.get(function->name, context)->isInjective(Block{})) { ++i; continue; @@ -565,34 +545,6 @@ void collectJoinedColumns(TableJoin & analyzed_join, const ASTSelectQuery & sele } } -void replaceJoinedTable(const ASTSelectQuery & select_query) -{ - const ASTTablesInSelectQueryElement * join = select_query.join(); - if (!join || !join->table_expression) - return; - - /// TODO: Push down for CROSS JOIN is not OK [disabled] - const auto & table_join = join->table_join->as(); - if (table_join.kind == ASTTableJoin::Kind::Cross) - return; - - auto & table_expr = join->table_expression->as(); - if (table_expr.database_and_table_name) - { - const auto & table_id = table_expr.database_and_table_name->as(); - String expr = "(select * from " + table_id.name + ") as " + table_id.shortName(); - - // FIXME: since the expression "a as b" exposes both "a" and "b" names, which is not equivalent to "(select * from a) as b", - // we can't replace aliased tables. - // FIXME: long table names include database name, which we can't save within alias. - if (table_id.alias.empty() && table_id.isShort()) - { - ParserTableExpression parser; - table_expr = parseQuery(parser, expr, 0, DBMS_DEFAULT_MAX_PARSER_DEPTH)->as(); - } - } -} - std::vector getAggregates(ASTPtr & query, const ASTSelectQuery & select_query) { /// There can not be aggregate functions inside the WHERE and PREWHERE. @@ -799,7 +751,8 @@ SyntaxAnalyzerResultPtr SyntaxAnalyzer::analyzeSelect( SyntaxAnalyzerResult && result, const SelectQueryOptions & select_options, const std::vector & tables_with_columns, - const Names & required_result_columns) const + const Names & required_result_columns, + std::shared_ptr table_join) const { auto * select_query = query->as(); if (!select_query) @@ -811,14 +764,13 @@ SyntaxAnalyzerResultPtr SyntaxAnalyzer::analyzeSelect( const auto & settings = context.getSettingsRef(); const NameSet & source_columns_set = result.source_columns_set; - result.analyzed_join = std::make_shared(settings, context.getTemporaryVolume()); + result.analyzed_join = table_join; + if (!result.analyzed_join) /// ExpressionAnalyzer expects some not empty object here + result.analyzed_join = std::make_shared(); if (remove_duplicates) renameDuplicatedColumns(select_query); - if (settings.enable_optimize_predicate_expression) - replaceJoinedTable(*select_query); - /// TODO: Remove unneeded conversion std::vector tables_with_column_names; tables_with_column_names.reserve(tables_with_columns.size()); diff --git a/src/Interpreters/SyntaxAnalyzer.h b/src/Interpreters/SyntaxAnalyzer.h index 23e8a4b79aa..08afd14b83c 100644 --- a/src/Interpreters/SyntaxAnalyzer.h +++ b/src/Interpreters/SyntaxAnalyzer.h @@ -94,7 +94,8 @@ public: SyntaxAnalyzerResult && result, const SelectQueryOptions & select_options = {}, const std::vector & tables_with_columns = {}, - const Names & required_result_columns = {}) const; + const Names & required_result_columns = {}, + std::shared_ptr table_join = {}) const; private: const Context & context; diff --git a/src/Interpreters/TableJoin.cpp b/src/Interpreters/TableJoin.cpp index 339fe2dceb3..246c2129297 100644 --- a/src/Interpreters/TableJoin.cpp +++ b/src/Interpreters/TableJoin.cpp @@ -159,22 +159,26 @@ NamesWithAliases TableJoin::getRequiredColumns(const Block & sample, const Names return getNamesWithAliases(required_columns); } +bool TableJoin::leftBecomeNullable(const DataTypePtr & column_type) const +{ + return forceNullableLeft() && column_type->canBeInsideNullable(); +} + +bool TableJoin::rightBecomeNullable(const DataTypePtr & column_type) const +{ + return forceNullableRight() && column_type->canBeInsideNullable(); +} + void TableJoin::addJoinedColumn(const NameAndTypePair & joined_column) { - if (join_use_nulls && isLeftOrFull(table_join.kind)) - { - auto type = joined_column.type->canBeInsideNullable() ? makeNullable(joined_column.type) : joined_column.type; - columns_added_by_join.emplace_back(NameAndTypePair(joined_column.name, std::move(type))); - } + if (rightBecomeNullable(joined_column.type)) + columns_added_by_join.emplace_back(NameAndTypePair(joined_column.name, makeNullable(joined_column.type))); else columns_added_by_join.push_back(joined_column); } void TableJoin::addJoinedColumnsAndCorrectNullability(Block & sample_block) const { - bool right_or_full_join = isRightOrFull(table_join.kind); - bool left_or_full_join = isLeftOrFull(table_join.kind); - for (auto & col : sample_block) { /// Materialize column. @@ -183,9 +187,7 @@ void TableJoin::addJoinedColumnsAndCorrectNullability(Block & sample_block) cons if (col.column) col.column = nullptr; - bool make_nullable = join_use_nulls && right_or_full_join; - - if (make_nullable && col.type->canBeInsideNullable()) + if (leftBecomeNullable(col.type)) col.type = makeNullable(col.type); } @@ -193,9 +195,7 @@ void TableJoin::addJoinedColumnsAndCorrectNullability(Block & sample_block) cons { auto res_type = col.type; - bool make_nullable = join_use_nulls && left_or_full_join; - - if (make_nullable && res_type->canBeInsideNullable()) + if (rightBecomeNullable(res_type)) res_type = makeNullable(res_type); sample_block.insert(ColumnWithTypeAndName(nullptr, res_type, col.name)); @@ -242,4 +242,31 @@ bool TableJoin::allowMergeJoin() const return allow_merge_join; } +bool TableJoin::allowDictJoin(const String & dict_key, const Block & sample_block, Names & names, NamesAndTypesList & result_columns) const +{ + /// Support ALL INNER, [ANY | ALL | SEMI | ANTI] LEFT + if (!isLeft(kind()) && !(isInner(kind()) && strictness() == ASTTableJoin::Strictness::All)) + return false; + + const Names & right_keys = keyNamesRight(); + if (right_keys.size() != 1) + return false; + + for (auto & col : sample_block) + { + String original = original_names.find(col.name)->second; + if (col.name == right_keys[0]) + { + if (original != dict_key) + return false; /// JOIN key != Dictionary key + continue; /// do not extract key column + } + + names.push_back(original); + result_columns.push_back({col.name, col.type}); + } + + return true; +} + } diff --git a/src/Interpreters/TableJoin.h b/src/Interpreters/TableJoin.h index 0b5ed82411a..2047f935966 100644 --- a/src/Interpreters/TableJoin.h +++ b/src/Interpreters/TableJoin.h @@ -8,6 +8,7 @@ #include #include #include +#include #include #include @@ -19,6 +20,7 @@ class Context; class ASTSelectQuery; struct DatabaseAndTableWithAlias; class Block; +class DictionaryReader; struct Settings; @@ -42,10 +44,10 @@ class TableJoin friend class SyntaxAnalyzer; const SizeLimits size_limits; - const size_t default_max_bytes; - const bool join_use_nulls; + const size_t default_max_bytes = 0; + const bool join_use_nulls = false; const size_t max_joined_block_rows = 0; - JoinAlgorithm join_algorithm; + JoinAlgorithm join_algorithm = JoinAlgorithm::AUTO; const bool partial_merge_join_optimizations = false; const size_t partial_merge_join_rows_in_right_blocks = 0; @@ -69,6 +71,7 @@ class TableJoin VolumePtr tmp_volume; public: + TableJoin() = default; TableJoin(const Settings &, VolumePtr tmp_volume); /// for StorageJoin @@ -84,12 +87,16 @@ public: table_join.strictness = strictness; } + StoragePtr joined_storage; + std::shared_ptr dictionary_reader; + ASTTableJoin::Kind kind() const { return table_join.kind; } ASTTableJoin::Strictness strictness() const { return table_join.strictness; } bool sameStrictnessAndKind(ASTTableJoin::Strictness, ASTTableJoin::Kind) const; const SizeLimits & sizeLimits() const { return size_limits; } VolumePtr getTemporaryVolume() { return tmp_volume; } bool allowMergeJoin() const; + bool allowDictJoin(const String & dict_key, const Block & sample_block, Names &, NamesAndTypesList &) const; bool preferMergeJoin() const { return join_algorithm == JoinAlgorithm::PREFER_PARTIAL_MERGE; } bool forceMergeJoin() const { return join_algorithm == JoinAlgorithm::PARTIAL_MERGE; } bool forceHashJoin() const { return join_algorithm == JoinAlgorithm::HASH; } @@ -115,6 +122,8 @@ public: size_t rightKeyInclusion(const String & name) const; NameSet requiredRightKeys() const; + bool leftBecomeNullable(const DataTypePtr & column_type) const; + bool rightBecomeNullable(const DataTypePtr & column_type) const; void addJoinedColumn(const NameAndTypePair & joined_column); void addJoinedColumnsAndCorrectNullability(Block & sample_block) const; diff --git a/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp b/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp index b7da335f0c5..3e112fb1ce6 100644 --- a/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp +++ b/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp @@ -24,14 +24,124 @@ namespace ErrorCodes } MsgPackRowInputFormat::MsgPackRowInputFormat(const Block & header_, ReadBuffer & in_, Params params_) - : IRowInputFormat(header_, in_, std::move(params_)), buf(in), ctx(&reference_func, nullptr, msgpack::unpack_limit()), data_types(header_.getDataTypes()) {} + : IRowInputFormat(header_, in_, std::move(params_)), buf(in), parser(visitor), data_types(header_.getDataTypes()) {} -int MsgPackRowInputFormat::unpack(msgpack::zone & zone, size_t & offset) +void MsgPackVisitor::set_info(IColumn & column, DataTypePtr type) // NOLINT { - offset = 0; - ctx.init(); - ctx.user().set_zone(zone); - return ctx.execute(buf.position(), buf.buffer().end() - buf.position(), offset); + while (!info_stack.empty()) + { + info_stack.pop(); + } + info_stack.push(Info{column, type}); +} + +void MsgPackVisitor::insert_integer(UInt64 value) // NOLINT +{ + Info & info = info_stack.top(); + switch (info.type->getTypeId()) + { + case TypeIndex::UInt8: + { + assert_cast(info.column).insertValue(value); + break; + } + case TypeIndex::Date: [[fallthrough]]; + case TypeIndex::UInt16: + { + assert_cast(info.column).insertValue(value); + break; + } + case TypeIndex::DateTime: [[fallthrough]]; + case TypeIndex::UInt32: + { + assert_cast(info.column).insertValue(value); + break; + } + case TypeIndex::UInt64: + { + assert_cast(info.column).insertValue(value); + break; + } + case TypeIndex::Int8: + { + assert_cast(info.column).insertValue(value); + break; + } + case TypeIndex::Int16: + { + assert_cast(info.column).insertValue(value); + break; + } + case TypeIndex::Int32: + { + assert_cast(info.column).insertValue(value); + break; + } + case TypeIndex::Int64: + { + assert_cast(info.column).insertValue(value); + break; + } + case TypeIndex::DateTime64: + { + assert_cast(info.column).insertValue(value); + break; + } + default: + throw Exception("Type " + info.type->getName() + " is not supported for MsgPack input format", ErrorCodes::ILLEGAL_COLUMN); + } +} + +bool MsgPackVisitor::visit_positive_integer(UInt64 value) // NOLINT +{ + insert_integer(value); + return true; +} + +bool MsgPackVisitor::visit_negative_integer(Int64 value) // NOLINT +{ + insert_integer(value); + return true; +} + +bool MsgPackVisitor::visit_str(const char* value, size_t size) // NOLINT +{ + info_stack.top().column.insertData(value, size); + return true; +} + +bool MsgPackVisitor::visit_float32(Float32 value) // NOLINT +{ + assert_cast(info_stack.top().column).insertValue(value); + return true; +} + +bool MsgPackVisitor::visit_float64(Float64 value) // NOLINT +{ + assert_cast(info_stack.top().column).insertValue(value); + return true; +} + +bool MsgPackVisitor::start_array(size_t size) // NOLINT +{ + auto nested_type = assert_cast(*info_stack.top().type).getNestedType(); + ColumnArray & column_array = assert_cast(info_stack.top().column); + ColumnArray::Offsets & offsets = column_array.getOffsets(); + IColumn & nested_column = column_array.getData(); + offsets.push_back(offsets.back() + size); + info_stack.push(Info{nested_column, nested_type}); + return true; +} + +bool MsgPackVisitor::end_array() // NOLINT +{ + info_stack.pop(); + return true; +} + +void MsgPackVisitor::parse_error(size_t, size_t) // NOLINT +{ + throw Exception("Error occurred while parsing msgpack data.", ErrorCodes::INCORRECT_DATA); } bool MsgPackRowInputFormat::readObject() @@ -40,9 +150,8 @@ bool MsgPackRowInputFormat::readObject() return false; PeekableReadBufferCheckpoint checkpoint{buf}; - std::unique_ptr zone(new msgpack::zone); - size_t offset; - while (!unpack(*zone, offset)) + size_t offset = 0; + while (!parser.execute(buf.position(), buf.available(), offset)) { buf.position() = buf.buffer().end(); if (buf.eof()) @@ -52,123 +161,19 @@ bool MsgPackRowInputFormat::readObject() buf.rollbackToCheckpoint(); } buf.position() += offset; - object_handle = msgpack::object_handle(ctx.data(), std::move(zone)); return true; } -void MsgPackRowInputFormat::insertObject(IColumn & column, DataTypePtr data_type, const msgpack::object & object) -{ - switch (data_type->getTypeId()) - { - case TypeIndex::UInt8: - { - assert_cast(column).insertValue(object.as()); - return; - } - case TypeIndex::Date: [[fallthrough]]; - case TypeIndex::UInt16: - { - assert_cast(column).insertValue(object.as()); - return; - } - case TypeIndex::DateTime: [[fallthrough]]; - case TypeIndex::UInt32: - { - assert_cast(column).insertValue(object.as()); - return; - } - case TypeIndex::UInt64: - { - assert_cast(column).insertValue(object.as()); - return; - } - case TypeIndex::Int8: - { - assert_cast(column).insertValue(object.as()); - return; - } - case TypeIndex::Int16: - { - assert_cast(column).insertValue(object.as()); - return; - } - case TypeIndex::Int32: - { - assert_cast(column).insertValue(object.as()); - return; - } - case TypeIndex::Int64: - { - assert_cast(column).insertValue(object.as()); - return; - } - case TypeIndex::Float32: - { - assert_cast(column).insertValue(object.as()); - return; - } - case TypeIndex::Float64: - { - assert_cast(column).insertValue(object.as()); - return; - } - case TypeIndex::DateTime64: - { - assert_cast(column).insertValue(object.as()); - return; - } - case TypeIndex::FixedString: [[fallthrough]]; - case TypeIndex::String: - { - msgpack::object_str obj_str = object.via.str; - column.insertData(obj_str.ptr, obj_str.size); - return; - } - case TypeIndex::Array: - { - msgpack::object_array object_array = object.via.array; - auto nested_type = assert_cast(*data_type).getNestedType(); - ColumnArray & column_array = assert_cast(column); - ColumnArray::Offsets & offsets = column_array.getOffsets(); - IColumn & nested_column = column_array.getData(); - for (size_t i = 0; i != object_array.size; ++i) - { - insertObject(nested_column, nested_type, object_array.ptr[i]); - } - offsets.push_back(offsets.back() + object_array.size); - return; - } - case TypeIndex::Nullable: - { - auto nested_type = removeNullable(data_type); - ColumnNullable & column_nullable = assert_cast(column); - if (object.type == msgpack::type::NIL) - column_nullable.insertDefault(); - else - insertObject(column_nullable.getNestedColumn(), nested_type, object); - return; - } - case TypeIndex::Nothing: - { - // Nothing to insert, MsgPack object is nil. - return; - } - default: - break; - } - throw Exception("Type " + data_type->getName() + " is not supported for MsgPack input format", ErrorCodes::ILLEGAL_COLUMN); -} - bool MsgPackRowInputFormat::readRow(MutableColumns & columns, RowReadExtension &) { size_t column_index = 0; bool has_more_data = true; for (; column_index != columns.size(); ++column_index) { + visitor.set_info(*columns[column_index], data_types[column_index]); has_more_data = readObject(); if (!has_more_data) break; - insertObject(*columns[column_index], data_types[column_index], object_handle.get()); } if (!has_more_data) { diff --git a/src/Processors/Formats/Impl/MsgPackRowInputFormat.h b/src/Processors/Formats/Impl/MsgPackRowInputFormat.h index a426dc4950c..92e4f5d0bd7 100644 --- a/src/Processors/Formats/Impl/MsgPackRowInputFormat.h +++ b/src/Processors/Formats/Impl/MsgPackRowInputFormat.h @@ -4,12 +4,44 @@ #include #include #include +#include namespace DB { class ReadBuffer; +class MsgPackVisitor : public msgpack::null_visitor +{ +public: + struct Info + { + IColumn & column; + DataTypePtr type; + }; + + /// These functions are called when parser meets corresponding object in parsed data + bool visit_positive_integer(UInt64 value); + bool visit_negative_integer(Int64 value); + bool visit_float32(Float32 value); + bool visit_float64(Float64 value); + bool visit_str(const char* value, size_t size); + bool start_array(size_t size); + bool end_array(); + + /// This function will be called if error occurs in parsing + [[noreturn]] void parse_error(size_t parsed_offset, size_t error_offset); + + /// Update info_stack + void set_info(IColumn & column, DataTypePtr type); + + void insert_integer(UInt64 value); + +private: + /// Stack is needed to process nested arrays + std::stack info_stack; +}; + class MsgPackRowInputFormat : public IRowInputFormat { public: @@ -19,15 +51,10 @@ public: String getName() const override { return "MagPackRowInputFormat"; } private: bool readObject(); - void insertObject(IColumn & column, DataTypePtr type, const msgpack::object & object); - int unpack(msgpack::zone & zone, size_t & offset); - - // msgpack makes a copy of object by default, this function tells unpacker not to copy. - static bool reference_func(msgpack::type::object_type, size_t, void *) { return true; } PeekableReadBuffer buf; - msgpack::object_handle object_handle; - msgpack::v1::detail::context ctx; + MsgPackVisitor visitor; + msgpack::detail::parse_helper parser; DataTypes data_types; }; diff --git a/src/Storages/Distributed/DirectoryMonitor.cpp b/src/Storages/Distributed/DirectoryMonitor.cpp index 01bf0798a63..e937d5e8a90 100644 --- a/src/Storages/Distributed/DirectoryMonitor.cpp +++ b/src/Storages/Distributed/DirectoryMonitor.cpp @@ -1,7 +1,6 @@ #include #include #include -#include #include #include #include @@ -78,7 +77,7 @@ namespace StorageDistributedDirectoryMonitor::StorageDistributedDirectoryMonitor( - StorageDistributed & storage_, std::string path_, ConnectionPoolPtr pool_, ActionBlocker & monitor_blocker_) + StorageDistributed & storage_, std::string path_, ConnectionPoolPtr pool_, ActionBlocker & monitor_blocker_, BackgroundSchedulePool & bg_pool_) /// It's important to initialize members before `thread` to avoid race. : storage(storage_) , pool(std::move(pool_)) @@ -92,7 +91,10 @@ StorageDistributedDirectoryMonitor::StorageDistributedDirectoryMonitor( , max_sleep_time{storage.global_context.getSettingsRef().distributed_directory_monitor_max_sleep_time_ms.totalMilliseconds()} , log{&Logger::get(getLoggerName())} , monitor_blocker(monitor_blocker_) + , bg_pool(bg_pool_) { + task_handle = bg_pool.createTask(getLoggerName() + "/Bg", [this]{ run(); }); + task_handle->activateAndSchedule(); } @@ -100,12 +102,9 @@ StorageDistributedDirectoryMonitor::~StorageDistributedDirectoryMonitor() { if (!quit) { - { - quit = true; - std::lock_guard lock{mutex}; - } + quit = true; cond.notify_one(); - thread.join(); + task_handle->deactivate(); } } @@ -122,12 +121,9 @@ void StorageDistributedDirectoryMonitor::shutdownAndDropAllData() { if (!quit) { - { - quit = true; - std::lock_guard lock{mutex}; - } + quit = true; cond.notify_one(); - thread.join(); + task_handle->deactivate(); } Poco::File(path).remove(true); @@ -136,16 +132,11 @@ void StorageDistributedDirectoryMonitor::shutdownAndDropAllData() void StorageDistributedDirectoryMonitor::run() { - setThreadName("DistrDirMonitor"); - std::unique_lock lock{mutex}; - const auto quit_requested = [this] { return quit.load(std::memory_order_relaxed); }; - - while (!quit_requested()) + while (!quit) { - auto do_sleep = true; - + bool do_sleep = true; if (!monitor_blocker.isCancelled()) { try @@ -167,15 +158,25 @@ void StorageDistributedDirectoryMonitor::run() LOG_DEBUG(log, "Skipping send data over distributed table."); } - if (do_sleep) - cond.wait_for(lock, sleep_time, quit_requested); - const auto now = std::chrono::system_clock::now(); if (now - last_decrease_time > decrease_error_count_period) { error_count /= 2; last_decrease_time = now; } + + if (do_sleep) + break; + } + + if (!quit) + { + /// If there is no error, then it will be scheduled by the DistributedBlockOutputStream, + /// so this is just in case, hence it is distributed_directory_monitor_max_sleep_time_ms + if (error_count) + task_handle->scheduleAfter(sleep_time.count()); + else + task_handle->scheduleAfter(max_sleep_time.count()); } } @@ -586,6 +587,13 @@ BlockInputStreamPtr StorageDistributedDirectoryMonitor::createStreamFromFile(con return std::make_shared(file_name); } +bool StorageDistributedDirectoryMonitor::scheduleAfter(size_t ms) +{ + if (quit) + return false; + return task_handle->scheduleAfter(ms); +} + void StorageDistributedDirectoryMonitor::processFilesWithBatching(const std::map & files) { std::unordered_set file_indices_to_skip; @@ -714,8 +722,13 @@ std::string StorageDistributedDirectoryMonitor::getLoggerName() const void StorageDistributedDirectoryMonitor::updatePath(const std::string & new_path) { std::lock_guard lock{mutex}; + + task_handle->deactivate(); + path = new_path; current_batch_file_path = path + "current_batch.txt"; + + task_handle->activateAndSchedule(); } } diff --git a/src/Storages/Distributed/DirectoryMonitor.h b/src/Storages/Distributed/DirectoryMonitor.h index 475a3bc7bc6..61d51e5acfd 100644 --- a/src/Storages/Distributed/DirectoryMonitor.h +++ b/src/Storages/Distributed/DirectoryMonitor.h @@ -1,10 +1,9 @@ #pragma once #include -#include +#include #include -#include #include #include #include @@ -20,7 +19,7 @@ class StorageDistributedDirectoryMonitor { public: StorageDistributedDirectoryMonitor( - StorageDistributed & storage_, std::string path_, ConnectionPoolPtr pool_, ActionBlocker & monitor_blocker_); + StorageDistributed & storage_, std::string path_, ConnectionPoolPtr pool_, ActionBlocker & monitor_blocker_, BackgroundSchedulePool & bg_pool_); ~StorageDistributedDirectoryMonitor(); @@ -33,6 +32,9 @@ public: void shutdownAndDropAllData(); static BlockInputStreamPtr createStreamFromFile(const String & file_name); + + /// For scheduling via DistributedBlockOutputStream + bool scheduleAfter(size_t ms); private: void run(); bool processFiles(); @@ -67,7 +69,9 @@ private: std::condition_variable cond; Logger * log; ActionBlocker & monitor_blocker; - ThreadFromGlobalPool thread{&StorageDistributedDirectoryMonitor::run, this}; + + BackgroundSchedulePool & bg_pool; + BackgroundSchedulePoolTaskHolder task_handle; /// Read insert query and insert settings for backward compatible. static void readHeader(ReadBuffer & in, Settings & insert_settings, std::string & insert_query, ClientInfo & client_info, Logger * log); diff --git a/src/Storages/Distributed/DistributedBlockOutputStream.cpp b/src/Storages/Distributed/DistributedBlockOutputStream.cpp index b0695ccad1b..3a341f9f43c 100644 --- a/src/Storages/Distributed/DistributedBlockOutputStream.cpp +++ b/src/Storages/Distributed/DistributedBlockOutputStream.cpp @@ -589,8 +589,8 @@ void DistributedBlockOutputStream::writeToShard(const Block & block, const std:: const std::string path(disk + data_path + dir_name + '/'); /// ensure shard subdirectory creation and notify storage - if (Poco::File(path).createDirectory()) - storage.requireDirectoryMonitor(disk, dir_name); + Poco::File(path).createDirectory(); + auto & directory_monitor = storage.requireDirectoryMonitor(disk, dir_name); const auto & file_name = toString(storage.file_names_increment.get()) + ".bin"; const auto & block_file_path = path + file_name; @@ -632,6 +632,9 @@ void DistributedBlockOutputStream::writeToShard(const Block & block, const std:: stream.writePrefix(); stream.write(block); stream.writeSuffix(); + + auto sleep_ms = context.getSettingsRef().distributed_directory_monitor_sleep_time_ms; + directory_monitor.scheduleAfter(sleep_ms.totalMilliseconds()); } if (link(first_file_tmp_path.data(), block_file_path.data())) diff --git a/src/Storages/MergeTree/DataPartsExchange.cpp b/src/Storages/MergeTree/DataPartsExchange.cpp index c656fbf0c58..70c06efff5c 100644 --- a/src/Storages/MergeTree/DataPartsExchange.cpp +++ b/src/Storages/MergeTree/DataPartsExchange.cpp @@ -255,23 +255,23 @@ MergeTreeData::MutableDataPartPtr Fetcher::downloadPart( const ReservationPtr reservation, PooledReadWriteBufferFromHTTP & in) { - size_t files; readBinary(files, in); + auto disk = reservation->getDisk(); + static const String TMP_PREFIX = "tmp_fetch_"; String tmp_prefix = tmp_prefix_.empty() ? TMP_PREFIX : tmp_prefix_; - String relative_part_path = String(to_detached ? "detached/" : "") + tmp_prefix + part_name; - String absolute_part_path = Poco::Path(data.getFullPathOnDisk(reservation->getDisk()) + relative_part_path + "/").absolute().toString(); - Poco::File part_file(absolute_part_path); + String part_relative_path = String(to_detached ? "detached/" : "") + tmp_prefix + part_name; + String part_download_path = data.getRelativeDataPath() + part_relative_path + "/"; - if (part_file.exists()) - throw Exception("Directory " + absolute_part_path + " already exists.", ErrorCodes::DIRECTORY_ALREADY_EXISTS); + if (disk->exists(part_download_path)) + throw Exception("Directory " + fullPath(disk, part_download_path) + " already exists.", ErrorCodes::DIRECTORY_ALREADY_EXISTS); CurrentMetrics::Increment metric_increment{CurrentMetrics::ReplicatedFetch}; - part_file.createDirectory(); + disk->createDirectories(part_download_path); MergeTreeData::DataPart::Checksums checksums; for (size_t i = 0; i < files; ++i) @@ -284,21 +284,21 @@ MergeTreeData::MutableDataPartPtr Fetcher::downloadPart( /// File must be inside "absolute_part_path" directory. /// Otherwise malicious ClickHouse replica may force us to write to arbitrary path. - String absolute_file_path = Poco::Path(absolute_part_path + file_name).absolute().toString(); - if (!startsWith(absolute_file_path, absolute_part_path)) - throw Exception("File path (" + absolute_file_path + ") doesn't appear to be inside part path (" + absolute_part_path + ")." + String absolute_file_path = Poco::Path(part_download_path + file_name).absolute().toString(); + if (!startsWith(absolute_file_path, Poco::Path(part_download_path).absolute().toString())) + throw Exception("File path (" + absolute_file_path + ") doesn't appear to be inside part path (" + part_download_path + ")." " This may happen if we are trying to download part from malicious replica or logical error.", ErrorCodes::INSECURE_PATH); - WriteBufferFromFile file_out(absolute_file_path); - HashingWriteBuffer hashing_out(file_out); + auto file_out = disk->writeFile(part_download_path + file_name); + HashingWriteBuffer hashing_out(*file_out); copyData(in, hashing_out, file_size, blocker.getCounter()); if (blocker.isCancelled()) { /// NOTE The is_cancelled flag also makes sense to check every time you read over the network, performing a poll with a not very large timeout. /// And now we check it only between read chunks (in the `copyData` function). - part_file.remove(true); + disk->removeRecursive(part_download_path); throw Exception("Fetching of part was cancelled", ErrorCodes::ABORTED); } @@ -306,7 +306,7 @@ MergeTreeData::MutableDataPartPtr Fetcher::downloadPart( readPODBinary(expected_hash, in); if (expected_hash != hashing_out.getHash()) - throw Exception("Checksum mismatch for file " + absolute_part_path + file_name + " transferred from " + replica_path, + throw Exception("Checksum mismatch for file " + fullPath(disk, part_download_path + file_name) + " transferred from " + replica_path, ErrorCodes::CHECKSUM_DOESNT_MATCH); if (file_name != "checksums.txt" && @@ -316,7 +316,7 @@ MergeTreeData::MutableDataPartPtr Fetcher::downloadPart( assertEOF(in); - MergeTreeData::MutableDataPartPtr new_data_part = data.createPart(part_name, reservation->getDisk(), relative_part_path); + MergeTreeData::MutableDataPartPtr new_data_part = data.createPart(part_name, reservation->getDisk(), part_relative_path); new_data_part->is_temp = true; new_data_part->modification_time = time(nullptr); new_data_part->loadColumnsChecksumsIndexes(true, false); diff --git a/src/Storages/MergeTree/IMergeTreeDataPart.cpp b/src/Storages/MergeTree/IMergeTreeDataPart.cpp index 5d799d257bc..a5a7f1f4494 100644 --- a/src/Storages/MergeTree/IMergeTreeDataPart.cpp +++ b/src/Storages/MergeTree/IMergeTreeDataPart.cpp @@ -731,35 +731,43 @@ void IMergeTreeDataPart::remove() const return; } - try + if (checksums.empty()) { - /// Remove each expected file in directory, then remove directory itself. - -#if !__clang__ -# pragma GCC diagnostic push -# pragma GCC diagnostic ignored "-Wunused-variable" -#endif - for (const auto & [file, _] : checksums.files) - disk->remove(to + "/" + file); -#if !__clang__ -# pragma GCC diagnostic pop -#endif - - for (const auto & file : {"checksums.txt", "columns.txt"}) - disk->remove(to + "/" + file); - disk->removeIfExists(to + "/" + DELETE_ON_DESTROY_MARKER_PATH); - - disk->remove(to); - } - catch (...) - { - /// Recursive directory removal does many excessive "stat" syscalls under the hood. - - LOG_ERROR(storage.log, "Cannot quickly remove directory " << fullPath(disk, to) << " by removing files; fallback to recursive removal. Reason: " - << getCurrentExceptionMessage(false)); - + /// If the part is not completely written, we cannot use fast path by listing files. disk->removeRecursive(to + "/"); } + else + { + try + { + /// Remove each expected file in directory, then remove directory itself. + + #if !__clang__ + # pragma GCC diagnostic push + # pragma GCC diagnostic ignored "-Wunused-variable" + #endif + for (const auto & [file, _] : checksums.files) + disk->remove(to + "/" + file); + #if !__clang__ + # pragma GCC diagnostic pop + #endif + + for (const auto & file : {"checksums.txt", "columns.txt"}) + disk->remove(to + "/" + file); + disk->removeIfExists(to + "/" + DELETE_ON_DESTROY_MARKER_PATH); + + disk->remove(to); + } + catch (...) + { + /// Recursive directory removal does many excessive "stat" syscalls under the hood. + + LOG_ERROR(storage.log, "Cannot quickly remove directory " << fullPath(disk, to) << " by removing files; fallback to recursive removal. Reason: " + << getCurrentExceptionMessage(false)); + + disk->removeRecursive(to + "/"); + } + } } String IMergeTreeDataPart::getRelativePathForDetachedPart(const String & prefix) const diff --git a/src/Storages/MergeTree/IMergeTreeDataPartWriter.h b/src/Storages/MergeTree/IMergeTreeDataPartWriter.h index d18b31edc72..3e3496c88da 100644 --- a/src/Storages/MergeTree/IMergeTreeDataPartWriter.h +++ b/src/Storages/MergeTree/IMergeTreeDataPartWriter.h @@ -107,7 +107,7 @@ public: void initSkipIndices(); void initPrimaryIndex(); - virtual void finishDataSerialization(IMergeTreeDataPart::Checksums & checksums, bool sync = false) = 0; + virtual void finishDataSerialization(IMergeTreeDataPart::Checksums & checksums) = 0; void finishPrimaryIndexSerialization(MergeTreeData::DataPart::Checksums & checksums); void finishSkipIndicesSerialization(MergeTreeData::DataPart::Checksums & checksums); diff --git a/src/Storages/MergeTree/IMergedBlockOutputStream.h b/src/Storages/MergeTree/IMergedBlockOutputStream.h index ba04d7fa71b..7b808ef6784 100644 --- a/src/Storages/MergeTree/IMergedBlockOutputStream.h +++ b/src/Storages/MergeTree/IMergedBlockOutputStream.h @@ -25,7 +25,7 @@ public: protected: using SerializationState = IDataType::SerializeBinaryBulkStatePtr; - IDataType::OutputStreamGetter createStreamGetter(const String & name, WrittenOffsetColumns & offset_columns, bool skip_offsets); + IDataType::OutputStreamGetter createStreamGetter(const String & name, WrittenOffsetColumns & offset_columns); /// Remove all columns marked expired in data_part. Also, clears checksums /// and columns array. Return set of removed files names. diff --git a/src/Storages/MergeTree/MergeTreeData.h b/src/Storages/MergeTree/MergeTreeData.h index d299d39726e..b50db21f5bf 100644 --- a/src/Storages/MergeTree/MergeTreeData.h +++ b/src/Storages/MergeTree/MergeTreeData.h @@ -620,6 +620,8 @@ public: return storage_settings.get(); } + String getRelativeDataPath() const { return relative_data_path; } + /// Get table path on disk String getFullPathOnDisk(const DiskPtr & disk) const; diff --git a/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp b/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp index c10a6c6dd59..8bc871476ed 100644 --- a/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp +++ b/src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp @@ -876,9 +876,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor MergedColumnOnlyOutputStream column_to( new_data_part, column_gathered_stream.getHeader(), - false, compression_codec, - false, /// we don't need to recalc indices here /// because all of them were already recalculated and written /// as key part of vertical merge @@ -1588,9 +1586,7 @@ void MergeTreeDataMergerMutator::mutateSomePartColumns( MergedColumnOnlyOutputStream out( new_data_part, mutation_header, - /* sync = */ false, compression_codec, - /* skip_offsets = */ false, std::vector(indices_to_recalc.begin(), indices_to_recalc.end()), nullptr, source_part->index_granularity, diff --git a/src/Storages/MergeTree/MergeTreeDataPartCompact.h b/src/Storages/MergeTree/MergeTreeDataPartCompact.h index 12b2e106284..fa98a2c863f 100644 --- a/src/Storages/MergeTree/MergeTreeDataPartCompact.h +++ b/src/Storages/MergeTree/MergeTreeDataPartCompact.h @@ -20,7 +20,6 @@ class MergeTreeDataPartCompact : public IMergeTreeDataPart public: static constexpr auto DATA_FILE_NAME = "data"; static constexpr auto DATA_FILE_EXTENSION = ".bin"; - static constexpr auto TEMP_FILE_SUFFIX = "_temp"; static constexpr auto DATA_FILE_NAME_WITH_EXTENSION = "data.bin"; MergeTreeDataPartCompact( diff --git a/src/Storages/MergeTree/MergeTreeDataPartWriterCompact.cpp b/src/Storages/MergeTree/MergeTreeDataPartWriterCompact.cpp index bd507866e9b..e33d4a97cac 100644 --- a/src/Storages/MergeTree/MergeTreeDataPartWriterCompact.cpp +++ b/src/Storages/MergeTree/MergeTreeDataPartWriterCompact.cpp @@ -22,8 +22,6 @@ MergeTreeDataPartWriterCompact::MergeTreeDataPartWriterCompact( { using DataPart = MergeTreeDataPartCompact; String data_file_name = DataPart::DATA_FILE_NAME; - if (settings.is_writing_temp_files) - data_file_name += DataPart::TEMP_FILE_SUFFIX; stream = std::make_unique( data_file_name, @@ -145,7 +143,7 @@ void MergeTreeDataPartWriterCompact::writeColumnSingleGranule(const ColumnWithTy column.type->serializeBinaryBulkStateSuffix(serialize_settings, state); } -void MergeTreeDataPartWriterCompact::finishDataSerialization(IMergeTreeDataPart::Checksums & checksums, bool sync) +void MergeTreeDataPartWriterCompact::finishDataSerialization(IMergeTreeDataPart::Checksums & checksums) { if (columns_buffer.size() != 0) writeBlock(header.cloneWithColumns(columns_buffer.releaseColumns())); @@ -161,8 +159,6 @@ void MergeTreeDataPartWriterCompact::finishDataSerialization(IMergeTreeDataPart: } stream->finalize(); - if (sync) - stream->sync(); stream->addToChecksums(checksums); stream.reset(); } diff --git a/src/Storages/MergeTree/MergeTreeDataPartWriterCompact.h b/src/Storages/MergeTree/MergeTreeDataPartWriterCompact.h index 598a4dd47fb..0aff55588aa 100644 --- a/src/Storages/MergeTree/MergeTreeDataPartWriterCompact.h +++ b/src/Storages/MergeTree/MergeTreeDataPartWriterCompact.h @@ -21,7 +21,7 @@ public: void write(const Block & block, const IColumn::Permutation * permutation, const Block & primary_key_block, const Block & skip_indexes_block) override; - void finishDataSerialization(IMergeTreeDataPart::Checksums & checksums, bool sync) override; + void finishDataSerialization(IMergeTreeDataPart::Checksums & checksums) override; private: /// Write single granule of one column (rows between 2 marks) diff --git a/src/Storages/MergeTree/MergeTreeDataPartWriterWide.cpp b/src/Storages/MergeTree/MergeTreeDataPartWriterWide.cpp index 5c1c6e35a20..1e5640b4e23 100644 --- a/src/Storages/MergeTree/MergeTreeDataPartWriterWide.cpp +++ b/src/Storages/MergeTree/MergeTreeDataPartWriterWide.cpp @@ -39,9 +39,6 @@ void MergeTreeDataPartWriterWide::addStreams( { IDataType::StreamCallback callback = [&] (const IDataType::SubstreamPath & substream_path) { - if (settings.skip_offsets && !substream_path.empty() && substream_path.back().type == IDataType::Substream::ArraySizes) - return; - String stream_name = IDataType::getFileNameForStream(name, substream_path); /// Shared offsets for Nested type. if (column_streams.count(stream_name)) @@ -69,8 +66,6 @@ IDataType::OutputStreamGetter MergeTreeDataPartWriterWide::createStreamGetter( return [&, this] (const IDataType::SubstreamPath & substream_path) -> WriteBuffer * { bool is_offsets = !substream_path.empty() && substream_path.back().type == IDataType::Substream::ArraySizes; - if (is_offsets && settings.skip_offsets) - return nullptr; String stream_name = IDataType::getFileNameForStream(name, substream_path); @@ -135,8 +130,6 @@ void MergeTreeDataPartWriterWide::writeSingleMark( type.enumerateStreams([&] (const IDataType::SubstreamPath & substream_path) { bool is_offsets = !substream_path.empty() && substream_path.back().type == IDataType::Substream::ArraySizes; - if (is_offsets && settings.skip_offsets) - return; String stream_name = IDataType::getFileNameForStream(name, substream_path); @@ -177,8 +170,6 @@ size_t MergeTreeDataPartWriterWide::writeSingleGranule( type.enumerateStreams([&] (const IDataType::SubstreamPath & substream_path) { bool is_offsets = !substream_path.empty() && substream_path.back().type == IDataType::Substream::ArraySizes; - if (is_offsets && settings.skip_offsets) - return; String stream_name = IDataType::getFileNameForStream(name, substream_path); @@ -270,7 +261,7 @@ void MergeTreeDataPartWriterWide::writeColumn( next_index_offset = current_row - total_rows; } -void MergeTreeDataPartWriterWide::finishDataSerialization(IMergeTreeDataPart::Checksums & checksums, bool sync) +void MergeTreeDataPartWriterWide::finishDataSerialization(IMergeTreeDataPart::Checksums & checksums) { const auto & global_settings = storage.global_context.getSettingsRef(); IDataType::SerializeBinaryBulkSettings serialize_settings; @@ -300,8 +291,6 @@ void MergeTreeDataPartWriterWide::finishDataSerialization(IMergeTreeDataPart::Ch for (auto & stream : column_streams) { stream.second->finalize(); - if (sync) - stream.second->sync(); stream.second->addToChecksums(checksums); } diff --git a/src/Storages/MergeTree/MergeTreeDataPartWriterWide.h b/src/Storages/MergeTree/MergeTreeDataPartWriterWide.h index 95e43cd31af..4e4f4806d53 100644 --- a/src/Storages/MergeTree/MergeTreeDataPartWriterWide.h +++ b/src/Storages/MergeTree/MergeTreeDataPartWriterWide.h @@ -24,7 +24,7 @@ public: void write(const Block & block, const IColumn::Permutation * permutation, const Block & primary_key_block, const Block & skip_indexes_block) override; - void finishDataSerialization(IMergeTreeDataPart::Checksums & checksums, bool sync) override; + void finishDataSerialization(IMergeTreeDataPart::Checksums & checksums) override; IDataType::OutputStreamGetter createStreamGetter(const String & name, WrittenOffsetColumns & offset_columns); diff --git a/src/Storages/MergeTree/MergeTreeIOSettings.h b/src/Storages/MergeTree/MergeTreeIOSettings.h index 5d3b2945d47..f5c57659052 100644 --- a/src/Storages/MergeTree/MergeTreeIOSettings.h +++ b/src/Storages/MergeTree/MergeTreeIOSettings.h @@ -30,10 +30,7 @@ struct MergeTreeWriterSettings size_t aio_threshold; bool can_use_adaptive_granularity; bool blocks_are_granules_size; - /// true if we write temporary files during alter. - bool is_writing_temp_files = false; + size_t estimated_size = 0; - /// used when ALTERing columns if we know that array offsets are not altered. - bool skip_offsets = false; }; } diff --git a/src/Storages/MergeTree/MergedColumnOnlyOutputStream.cpp b/src/Storages/MergeTree/MergedColumnOnlyOutputStream.cpp index 77bff8e4f02..892b4eccfbc 100644 --- a/src/Storages/MergeTree/MergedColumnOnlyOutputStream.cpp +++ b/src/Storages/MergeTree/MergedColumnOnlyOutputStream.cpp @@ -8,15 +8,14 @@ namespace ErrorCodes } MergedColumnOnlyOutputStream::MergedColumnOnlyOutputStream( - const MergeTreeDataPartPtr & data_part, const Block & header_, bool sync_, - CompressionCodecPtr default_codec, bool skip_offsets_, + const MergeTreeDataPartPtr & data_part, + const Block & header_, + CompressionCodecPtr default_codec, const std::vector & indices_to_recalc, WrittenOffsetColumns * offset_columns_, const MergeTreeIndexGranularity & index_granularity, - const MergeTreeIndexGranularityInfo * index_granularity_info, - bool is_writing_temp_files) - : IMergedBlockOutputStream(data_part), - header(header_), sync(sync_) + const MergeTreeIndexGranularityInfo * index_granularity_info) + : IMergedBlockOutputStream(data_part), header(header_) { const auto & global_settings = data_part->storage.global_context.getSettings(); MergeTreeWriterSettings writer_settings( @@ -24,11 +23,13 @@ MergedColumnOnlyOutputStream::MergedColumnOnlyOutputStream( index_granularity_info ? index_granularity_info->is_adaptive : data_part->storage.canUseAdaptiveGranularity(), global_settings.min_bytes_to_use_direct_io); - writer_settings.is_writing_temp_files = is_writing_temp_files; - writer_settings.skip_offsets = skip_offsets_; + writer = data_part->getWriter( + header.getNamesAndTypesList(), + indices_to_recalc, + default_codec, + std::move(writer_settings), + index_granularity); - writer = data_part->getWriter(header.getNamesAndTypesList(), indices_to_recalc, - default_codec,std::move(writer_settings), index_granularity); writer->setWrittenOffsetColumns(offset_columns_); writer->initSkipIndices(); } @@ -62,7 +63,7 @@ MergedColumnOnlyOutputStream::writeSuffixAndGetChecksums(MergeTreeData::MutableD { /// Finish columns serialization. MergeTreeData::DataPart::Checksums checksums; - writer->finishDataSerialization(checksums, sync); + writer->finishDataSerialization(checksums); writer->finishSkipIndicesSerialization(checksums); auto columns = new_part->getColumns(); diff --git a/src/Storages/MergeTree/MergedColumnOnlyOutputStream.h b/src/Storages/MergeTree/MergedColumnOnlyOutputStream.h index c785bbaf6d0..2c5024bbcfe 100644 --- a/src/Storages/MergeTree/MergedColumnOnlyOutputStream.h +++ b/src/Storages/MergeTree/MergedColumnOnlyOutputStream.h @@ -11,17 +11,16 @@ class MergeTreeDataPartWriterWide; class MergedColumnOnlyOutputStream final : public IMergedBlockOutputStream { public: - /// skip_offsets: used when ALTERing columns if we know that array offsets are not altered. /// Pass empty 'already_written_offset_columns' first time then and pass the same object to subsequent instances of MergedColumnOnlyOutputStream /// if you want to serialize elements of Nested data structure in different instances of MergedColumnOnlyOutputStream. MergedColumnOnlyOutputStream( - const MergeTreeDataPartPtr & data_part, const Block & header_, bool sync_, - CompressionCodecPtr default_codec_, bool skip_offsets_, + const MergeTreeDataPartPtr & data_part, + const Block & header_, + CompressionCodecPtr default_codec_, const std::vector & indices_to_recalc_, WrittenOffsetColumns * offset_columns_ = nullptr, const MergeTreeIndexGranularity & index_granularity = {}, - const MergeTreeIndexGranularityInfo * index_granularity_info_ = nullptr, - bool is_writing_temp_files = false); + const MergeTreeIndexGranularityInfo * index_granularity_info_ = nullptr); Block getHeader() const override { return header; } void write(const Block & block) override; @@ -31,7 +30,6 @@ public: private: Block header; - bool sync; }; diff --git a/src/Storages/StorageBuffer.cpp b/src/Storages/StorageBuffer.cpp index 1765e663902..4f098b46ff5 100644 --- a/src/Storages/StorageBuffer.cpp +++ b/src/Storages/StorageBuffer.cpp @@ -13,7 +13,6 @@ #include #include #include -#include #include #include #include @@ -76,6 +75,7 @@ StorageBuffer::StorageBuffer( , destination_id(destination_id_) , allow_materialized(allow_materialized_) , log(&Logger::get("StorageBuffer (" + table_id_.getFullTableName() + ")")) + , bg_pool(global_context.getBufferFlushSchedulePool()) { setColumns(columns_); setConstraints(constraints_); @@ -83,12 +83,7 @@ StorageBuffer::StorageBuffer( StorageBuffer::~StorageBuffer() { - // Should not happen if shutdown was called - if (flush_thread.joinable()) - { - shutdown_event.set(); - flush_thread.join(); - } + flush_handle->deactivate(); } @@ -397,6 +392,9 @@ public: least_busy_lock = std::unique_lock(least_busy_buffer->mutex); } insertIntoBuffer(block, *least_busy_buffer); + least_busy_lock.unlock(); + + storage.reschedule(); } private: StorageBuffer & storage; @@ -458,16 +456,15 @@ void StorageBuffer::startup() << " Set appropriate system_profile to fix this."); } - flush_thread = ThreadFromGlobalPool(&StorageBuffer::flushThread, this); + + flush_handle = bg_pool.createTask(log->name() + "/Bg", [this]{ flushBack(); }); + flush_handle->activateAndSchedule(); } void StorageBuffer::shutdown() { - shutdown_event.set(); - - if (flush_thread.joinable()) - flush_thread.join(); + flush_handle->deactivate(); try { @@ -595,7 +592,7 @@ void StorageBuffer::flushBuffer(Buffer & buffer, bool check_thresholds, bool loc ProfileEvents::increment(ProfileEvents::StorageBufferFlush); - LOG_TRACE(log, "Flushing buffer with " << rows << " rows, " << bytes << " bytes, age " << time_passed << " seconds."); + LOG_TRACE(log, "Flushing buffer with " << rows << " rows, " << bytes << " bytes, age " << time_passed << " seconds " << (check_thresholds ? "(bg)" : "(direct)") << "."); if (!destination_id) return; @@ -697,21 +694,42 @@ void StorageBuffer::writeBlockToDestination(const Block & block, StoragePtr tabl } -void StorageBuffer::flushThread() +void StorageBuffer::flushBack() { - setThreadName("BufferFlush"); - - do + try { - try - { - flushAllBuffers(true); - } - catch (...) - { - tryLogCurrentException(__PRETTY_FUNCTION__); - } - } while (!shutdown_event.tryWait(1000)); + flushAllBuffers(true); + } + catch (...) + { + tryLogCurrentException(__PRETTY_FUNCTION__); + } + + reschedule(); +} + +void StorageBuffer::reschedule() +{ + time_t min_first_write_time = std::numeric_limits::max(); + time_t rows = 0; + + for (auto & buffer : buffers) + { + std::lock_guard lock(buffer.mutex); + min_first_write_time = buffer.first_write_time; + rows += buffer.data.rows(); + } + + /// will be rescheduled via INSERT + if (!rows) + return; + + time_t current_time = time(nullptr); + time_t time_passed = current_time - min_first_write_time; + + size_t min = std::max(min_thresholds.time - time_passed, 1); + size_t max = std::max(max_thresholds.time - time_passed, 1); + flush_handle->scheduleAfter(std::min(min, max) * 1000); } void StorageBuffer::checkAlterIsPossible(const AlterCommands & commands, const Settings & /* settings */) diff --git a/src/Storages/StorageBuffer.h b/src/Storages/StorageBuffer.h index 93f95692b18..4c6c911e339 100644 --- a/src/Storages/StorageBuffer.h +++ b/src/Storages/StorageBuffer.h @@ -4,7 +4,7 @@ #include #include #include -#include +#include #include #include #include @@ -118,10 +118,6 @@ private: Poco::Logger * log; - Poco::Event shutdown_event; - /// Resets data by timeout. - ThreadFromGlobalPool flush_thread; - void flushAllBuffers(bool check_thresholds = true); /// Reset the buffer. If check_thresholds is set - resets only if thresholds are exceeded. void flushBuffer(Buffer & buffer, bool check_thresholds, bool locked = false); @@ -131,7 +127,11 @@ private: /// `table` argument is passed, as it is sometimes evaluated beforehand. It must match the `destination`. void writeBlockToDestination(const Block & block, StoragePtr table); - void flushThread(); + void flushBack(); + void reschedule(); + + BackgroundSchedulePool & bg_pool; + BackgroundSchedulePoolTaskHolder flush_handle; protected: /** num_shards - the level of internal parallelism (the number of independent buffers) diff --git a/src/Storages/StorageDictionary.cpp b/src/Storages/StorageDictionary.cpp index 86831593d54..9b2c5784d85 100644 --- a/src/Storages/StorageDictionary.cpp +++ b/src/Storages/StorageDictionary.cpp @@ -1,73 +1,48 @@ -#include -#include -#include -#include -#include #include #include +#include +#include #include #include #include #include -#include -#include #include #include #include +#include namespace DB { - namespace ErrorCodes { extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; extern const int THERE_IS_NO_COLUMN; - extern const int UNKNOWN_TABLE; + extern const int CANNOT_DETACH_DICTIONARY_AS_TABLE; } - -StorageDictionary::StorageDictionary( - const StorageID & table_id_, - const ColumnsDescription & columns_, - const Context & context, - bool attach, - const String & dictionary_name_) - : IStorage(table_id_) - , dictionary_name(dictionary_name_) - , logger(&Poco::Logger::get("StorageDictionary")) +namespace { - setColumns(columns_); - - if (!attach) + void checkNamesAndTypesCompatibleWithDictionary(const String & dictionary_name, const ColumnsDescription & columns, const DictionaryStructure & dictionary_structure) { - const auto & dictionary = context.getExternalDictionariesLoader().getDictionary(dictionary_name); - const DictionaryStructure & dictionary_structure = dictionary->getStructure(); - checkNamesAndTypesCompatibleWithDictionary(dictionary_structure); + auto dictionary_names_and_types = StorageDictionary::getNamesAndTypes(dictionary_structure); + std::set names_and_types_set(dictionary_names_and_types.begin(), dictionary_names_and_types.end()); + + for (const auto & column : columns.getOrdinary()) + { + if (names_and_types_set.find(column) == names_and_types_set.end()) + { + std::string message = "Not found column "; + message += column.name + " " + column.type->getName(); + message += " in dictionary " + backQuote(dictionary_name) + ". "; + message += "There are only columns "; + message += StorageDictionary::generateNamesAndTypesDescription(dictionary_names_and_types); + throw Exception(message, ErrorCodes::THERE_IS_NO_COLUMN); + } + } } } -void StorageDictionary::checkTableCanBeDropped() const -{ - throw Exception("Cannot detach dictionary " + backQuoteIfNeed(dictionary_name) + " as table, use DETACH DICTIONARY query.", ErrorCodes::UNKNOWN_TABLE); -} - -Pipes StorageDictionary::read( - const Names & column_names, - const SelectQueryInfo & /*query_info*/, - const Context & context, - QueryProcessingStage::Enum /*processed_stage*/, - const size_t max_block_size, - const unsigned /*threads*/) -{ - auto dictionary = context.getExternalDictionariesLoader().getDictionary(dictionary_name); - auto stream = dictionary->getBlockInputStream(column_names, max_block_size); - auto source = std::make_shared(stream); - /// TODO: update dictionary interface for processors. - Pipes pipes; - pipes.emplace_back(std::move(source)); - return pipes; -} NamesAndTypesList StorageDictionary::getNamesAndTypes(const DictionaryStructure & dictionary_structure) { @@ -103,25 +78,55 @@ NamesAndTypesList StorageDictionary::getNamesAndTypes(const DictionaryStructure return dictionary_names_and_types; } -void StorageDictionary::checkNamesAndTypesCompatibleWithDictionary(const DictionaryStructure & dictionary_structure) const -{ - auto dictionary_names_and_types = getNamesAndTypes(dictionary_structure); - std::set names_and_types_set(dictionary_names_and_types.begin(), dictionary_names_and_types.end()); - for (const auto & column : getColumns().getOrdinary()) +String StorageDictionary::generateNamesAndTypesDescription(const NamesAndTypesList & list) +{ + std::stringstream ss; + bool first = true; + for (const auto & name_and_type : list) { - if (names_and_types_set.find(column) == names_and_types_set.end()) - { - std::string message = "Not found column "; - message += column.name + " " + column.type->getName(); - message += " in dictionary " + dictionary_name + ". "; - message += "There are only columns "; - message += generateNamesAndTypesDescription(dictionary_names_and_types.begin(), dictionary_names_and_types.end()); - throw Exception(message, ErrorCodes::THERE_IS_NO_COLUMN); - } + if (!std::exchange(first, false)) + ss << ", "; + ss << name_and_type.name << ' ' << name_and_type.type->getName(); } + return ss.str(); } + +StorageDictionary::StorageDictionary( + const StorageID & table_id_, + const String & dictionary_name_, + const DictionaryStructure & dictionary_structure_) + : IStorage(table_id_) + , dictionary_name(dictionary_name_) +{ + setColumns(ColumnsDescription{getNamesAndTypes(dictionary_structure_)}); +} + + +void StorageDictionary::checkTableCanBeDropped() const +{ + throw Exception("Cannot detach dictionary " + backQuote(dictionary_name) + " as table, use DETACH DICTIONARY query.", ErrorCodes::CANNOT_DETACH_DICTIONARY_AS_TABLE); +} + +Pipes StorageDictionary::read( + const Names & column_names, + const SelectQueryInfo & /*query_info*/, + const Context & context, + QueryProcessingStage::Enum /*processed_stage*/, + const size_t max_block_size, + const unsigned /*threads*/) +{ + auto dictionary = context.getExternalDictionariesLoader().getDictionary(dictionary_name); + auto stream = dictionary->getBlockInputStream(column_names, max_block_size); + auto source = std::make_shared(stream); + /// TODO: update dictionary interface for processors. + Pipes pipes; + pipes.emplace_back(std::move(source)); + return pipes; +} + + void registerStorageDictionary(StorageFactory & factory) { factory.registerStorage("Dictionary", [](const StorageFactory::Arguments & args) @@ -133,8 +138,11 @@ void registerStorageDictionary(StorageFactory & factory) args.engine_args[0] = evaluateConstantExpressionOrIdentifierAsLiteral(args.engine_args[0], args.local_context); String dictionary_name = args.engine_args[0]->as().value.safeGet(); - return StorageDictionary::create( - args.table_id, args.columns, args.context, args.attach, dictionary_name); + const auto & dictionary = args.context.getExternalDictionariesLoader().getDictionary(dictionary_name); + const DictionaryStructure & dictionary_structure = dictionary->getStructure(); + checkNamesAndTypesCompatibleWithDictionary(dictionary_name, args.columns, dictionary_structure); + + return StorageDictionary::create(args.table_id, dictionary_name, dictionary_structure); }); } diff --git a/src/Storages/StorageDictionary.h b/src/Storages/StorageDictionary.h index 87826304166..7bb6fc22480 100644 --- a/src/Storages/StorageDictionary.h +++ b/src/Storages/StorageDictionary.h @@ -1,23 +1,12 @@ #pragma once #include -#include -#include #include -#include -#include -namespace Poco -{ -class Logger; -} - namespace DB { struct DictionaryStructure; -struct IDictionaryBase; -class ExternalDictionaries; class StorageDictionary final : public ext::shared_ptr_helper, public IStorage { @@ -35,42 +24,18 @@ public: unsigned threads) override; static NamesAndTypesList getNamesAndTypes(const DictionaryStructure & dictionary_structure); + static String generateNamesAndTypesDescription(const NamesAndTypesList & list); - template - static std::string generateNamesAndTypesDescription(ForwardIterator begin, ForwardIterator end) - { - std::string description; - { - WriteBufferFromString buffer(description); - bool first = true; - for (; begin != end; ++begin) - { - if (!first) - buffer << ", "; - first = false; - - buffer << begin->name << ' ' << begin->type->getName(); - } - } - - return description; - } + const String & dictionaryName() const { return dictionary_name; } private: - using Ptr = MultiVersion::Version; - String dictionary_name; - Poco::Logger * logger; - - void checkNamesAndTypesCompatibleWithDictionary(const DictionaryStructure & dictionary_structure) const; protected: StorageDictionary( const StorageID & table_id_, - const ColumnsDescription & columns_, - const Context & context, - bool attach, - const String & dictionary_name_); + const String & dictionary_name_, + const DictionaryStructure & dictionary_structure); }; } diff --git a/src/Storages/StorageDistributed.cpp b/src/Storages/StorageDistributed.cpp index b453b73c4cb..14e7eea5c96 100644 --- a/src/Storages/StorageDistributed.cpp +++ b/src/Storages/StorageDistributed.cpp @@ -577,15 +577,20 @@ void StorageDistributed::createDirectoryMonitors(const std::string & disk) } -void StorageDistributed::requireDirectoryMonitor(const std::string & disk, const std::string & name) +StorageDistributedDirectoryMonitor& StorageDistributed::requireDirectoryMonitor(const std::string & disk, const std::string & name) { const std::string path(disk + relative_data_path + name); const std::string key(disk + name); std::lock_guard lock(cluster_nodes_mutex); auto & node_data = cluster_nodes_data[key]; - node_data.conneciton_pool = StorageDistributedDirectoryMonitor::createPool(name, *this); - node_data.directory_monitor = std::make_unique(*this, path, node_data.conneciton_pool, monitors_blocker); + if (!node_data.directory_monitor) + { + node_data.conneciton_pool = StorageDistributedDirectoryMonitor::createPool(name, *this); + node_data.directory_monitor = std::make_unique( + *this, path, node_data.conneciton_pool, monitors_blocker, global_context.getDistributedSchedulePool()); + } + return *node_data.directory_monitor; } size_t StorageDistributed::getShardCount() const diff --git a/src/Storages/StorageDistributed.h b/src/Storages/StorageDistributed.h index 81c6b54a63e..2c5e321fc5f 100644 --- a/src/Storages/StorageDistributed.h +++ b/src/Storages/StorageDistributed.h @@ -109,7 +109,7 @@ public: /// create directory monitors for each existing subdirectory void createDirectoryMonitors(const std::string & disk); /// ensure directory monitor thread and connectoin pool creation by disk and subdirectory name - void requireDirectoryMonitor(const std::string & disk, const std::string & name); + StorageDistributedDirectoryMonitor & requireDirectoryMonitor(const std::string & disk, const std::string & name); void flushClusterNodesAllData(); diff --git a/src/Storages/System/StorageSystemColumns.cpp b/src/Storages/System/StorageSystemColumns.cpp index 26e2376c3f7..43594f5355a 100644 --- a/src/Storages/System/StorageSystemColumns.cpp +++ b/src/Storages/System/StorageSystemColumns.cpp @@ -303,7 +303,7 @@ Pipes StorageSystemColumns::read( const DatabasePtr database = databases.at(database_name); offsets[i] = i ? offsets[i - 1] : 0; - for (auto iterator = database->getTablesWithDictionaryTablesIterator(context); iterator->isValid(); iterator->next()) + for (auto iterator = database->getTablesIterator(context); iterator->isValid(); iterator->next()) { const String & table_name = iterator->name(); storages.emplace(std::piecewise_construct, diff --git a/src/Storages/System/StorageSystemDictionaries.cpp b/src/Storages/System/StorageSystemDictionaries.cpp index a3b32324700..4c54353c44d 100644 --- a/src/Storages/System/StorageSystemDictionaries.cpp +++ b/src/Storages/System/StorageSystemDictionaries.cpp @@ -53,7 +53,7 @@ void StorageSystemDictionaries::fillData(MutableColumns & res_columns, const Con const bool check_access_for_dictionaries = !access->isGranted(AccessType::SHOW_DICTIONARIES); const auto & external_dictionaries = context.getExternalDictionariesLoader(); - for (const auto & load_result : external_dictionaries.getCurrentLoadResults()) + for (const auto & load_result : external_dictionaries.getLoadResults()) { const auto dict_ptr = std::dynamic_pointer_cast(load_result.object); @@ -66,9 +66,10 @@ void StorageSystemDictionaries::fillData(MutableColumns & res_columns, const Con else { short_name = load_result.name; - if (!load_result.repository_name.empty() && startsWith(short_name, load_result.repository_name + ".")) + String repository_name = load_result.config ? load_result.config->repository_name : ""; + if (!repository_name.empty() && startsWith(short_name, repository_name + ".")) { - database = load_result.repository_name; + database = repository_name; short_name = short_name.substr(database.length() + 1); } } @@ -81,7 +82,7 @@ void StorageSystemDictionaries::fillData(MutableColumns & res_columns, const Con res_columns[i++]->insert(database); res_columns[i++]->insert(short_name); res_columns[i++]->insert(static_cast(load_result.status)); - res_columns[i++]->insert(load_result.origin); + res_columns[i++]->insert(load_result.config ? load_result.config->path : ""); std::exception_ptr last_exception = load_result.exception; diff --git a/src/Storages/System/StorageSystemModels.cpp b/src/Storages/System/StorageSystemModels.cpp index 306829cf6de..9fae9803b96 100644 --- a/src/Storages/System/StorageSystemModels.cpp +++ b/src/Storages/System/StorageSystemModels.cpp @@ -28,13 +28,13 @@ NamesAndTypesList StorageSystemModels::getNamesAndTypes() void StorageSystemModels::fillData(MutableColumns & res_columns, const Context & context, const SelectQueryInfo &) const { const auto & external_models_loader = context.getExternalModelsLoader(); - auto load_results = external_models_loader.getCurrentLoadResults(); + auto load_results = external_models_loader.getLoadResults(); for (const auto & load_result : load_results) { res_columns[0]->insert(load_result.name); res_columns[1]->insert(static_cast(load_result.status)); - res_columns[2]->insert(load_result.origin); + res_columns[2]->insert(load_result.config ? load_result.config->path : ""); if (load_result.object) { diff --git a/src/Storages/System/StorageSystemTables.cpp b/src/Storages/System/StorageSystemTables.cpp index 81ff6a03e12..d50a5fe4185 100644 --- a/src/Storages/System/StorageSystemTables.cpp +++ b/src/Storages/System/StorageSystemTables.cpp @@ -226,7 +226,7 @@ protected: const bool check_access_for_tables = check_access_for_databases && !access->isGranted(AccessType::SHOW_TABLES, database_name); if (!tables_it || !tables_it->isValid()) - tables_it = database->getTablesWithDictionaryTablesIterator(context); + tables_it = database->getTablesIterator(context); const bool need_lock_structure = needLockStructure(database, getPort().getHeader()); diff --git a/tests/clickhouse-test b/tests/clickhouse-test index 27a71f9949c..81a3275c218 100755 --- a/tests/clickhouse-test +++ b/tests/clickhouse-test @@ -182,7 +182,8 @@ def run_tests_array(all_tests_with_params): if args.skip and any(s in name for s in args.skip): print(MSG_SKIPPED + " - skip") skipped_total += 1 - elif not args.zookeeper and 'zookeeper' in name: + elif not args.zookeeper and ('zookeeper' in name + or 'replica' in name): print(MSG_SKIPPED + " - no zookeeper") skipped_total += 1 elif not args.shard and ('shard' in name @@ -568,7 +569,7 @@ if __name__ == '__main__': if args.tmp is None: args.tmp = '/tmp/clickhouse-test' if args.queries is None: - print_err("Failed to detect path to the queries directory. Please specify it with '--queries' option.") + print("Failed to detect path to the queries directory. Please specify it with '--queries' option.", file=sys.stderr) exit(1) if args.tmp is None: args.tmp = args.queries @@ -578,7 +579,7 @@ if __name__ == '__main__': elif find_binary(args.binary): args.client = args.binary + ' client' else: - print("No 'clickhouse' binary found in PATH") + print("No 'clickhouse' binary found in PATH", file=sys.stderr) parser.print_help() exit(1) diff --git a/tests/integration/pytest.ini b/tests/integration/pytest.ini index bff275e3188..a7ca8c57da8 100644 --- a/tests/integration/pytest.ini +++ b/tests/integration/pytest.ini @@ -3,3 +3,4 @@ python_files = test*.py norecursedirs = _instances timeout = 300 junit_duration_report = call +junit_suite_name = integration diff --git a/tests/integration/test_merge_tree_s3/configs/config.d/storage_conf.xml b/tests/integration/test_merge_tree_s3/configs/config.d/storage_conf.xml index 5b292446c6b..d097675ca63 100644 --- a/tests/integration/test_merge_tree_s3/configs/config.d/storage_conf.xml +++ b/tests/integration/test_merge_tree_s3/configs/config.d/storage_conf.xml @@ -13,7 +13,7 @@ - +
s3 @@ -22,7 +22,7 @@ hdd - + diff --git a/tests/integration/test_merge_tree_s3/test.py b/tests/integration/test_merge_tree_s3/test.py index f69c09631e8..50cf532e9a4 100644 --- a/tests/integration/test_merge_tree_s3/test.py +++ b/tests/integration/test_merge_tree_s3/test.py @@ -67,7 +67,9 @@ def create_table(cluster, table_name, additional_settings=None): PARTITION BY dt ORDER BY (dt, id) SETTINGS - old_parts_lifetime=0, index_granularity=512 + storage_policy='s3', + old_parts_lifetime=0, + index_granularity=512 """.format(table_name) if additional_settings: @@ -84,7 +86,12 @@ def drop_table(cluster): minio = cluster.minio_client node.query("DROP TABLE IF EXISTS s3_test") - assert len(list(minio.list_objects(cluster.minio_bucket, 'data/'))) == 0 + try: + assert len(list(minio.list_objects(cluster.minio_bucket, 'data/'))) == 0 + finally: + # Remove extra objects to prevent tests cascade failing + for obj in list(minio.list_objects(cluster.minio_bucket, 'data/')): + minio.remove_object(cluster.minio_bucket, obj.object_name) @pytest.mark.parametrize( @@ -210,7 +217,7 @@ def test_attach_detach_partition(cluster): assert len(list(minio.list_objects(cluster.minio_bucket, 'data/'))) == FILES_OVERHEAD + FILES_OVERHEAD_PER_PART_WIDE node.query("ALTER TABLE s3_test DETACH PARTITION '2020-01-04'") - node.query("SET allow_drop_detached=1; ALTER TABLE s3_test DROP DETACHED PARTITION '2020-01-04'") + node.query("ALTER TABLE s3_test DROP DETACHED PARTITION '2020-01-04'", settings={"allow_drop_detached": 1}) assert node.query("SELECT count(*) FROM s3_test FORMAT Values") == "(0)" assert len(list(minio.list_objects(cluster.minio_bucket, 'data/'))) == FILES_OVERHEAD @@ -245,8 +252,7 @@ def test_table_manipulations(cluster): assert len(list(minio.list_objects(cluster.minio_bucket, 'data/'))) == FILES_OVERHEAD + FILES_OVERHEAD_PER_PART_WIDE*2 node.query("RENAME TABLE s3_renamed TO s3_test") - # TODO: Doesn't work with min_max index. - #assert node.query("SET check_query_single_value_result='false'; CHECK TABLE s3_test FORMAT Values") == "(1)" + assert node.query("CHECK TABLE s3_test FORMAT Values") == "(1)" node.query("DETACH TABLE s3_test") node.query("ATTACH TABLE s3_test") diff --git a/tests/integration/test_replicated_merge_tree_s3/__init__.py b/tests/integration/test_replicated_merge_tree_s3/__init__.py new file mode 100644 index 00000000000..e69de29bb2d diff --git a/tests/integration/test_replicated_merge_tree_s3/configs/config.d/storage_conf.xml b/tests/integration/test_replicated_merge_tree_s3/configs/config.d/storage_conf.xml new file mode 100644 index 00000000000..b32770095fc --- /dev/null +++ b/tests/integration/test_replicated_merge_tree_s3/configs/config.d/storage_conf.xml @@ -0,0 +1,21 @@ + + + + + s3 + http://minio1:9001/root/data/ + minio + minio123 + + + + + +
+ s3 +
+
+
+
+
+
diff --git a/tests/integration/test_replicated_merge_tree_s3/test.py b/tests/integration/test_replicated_merge_tree_s3/test.py new file mode 100644 index 00000000000..d6b6015a388 --- /dev/null +++ b/tests/integration/test_replicated_merge_tree_s3/test.py @@ -0,0 +1,106 @@ +import logging +import random +import string +import time + +import pytest +from helpers.cluster import ClickHouseCluster + +logging.getLogger().setLevel(logging.INFO) +logging.getLogger().addHandler(logging.StreamHandler()) + + +# Creates S3 bucket for tests and allows anonymous read-write access to it. +def prepare_s3_bucket(cluster): + minio_client = cluster.minio_client + + if minio_client.bucket_exists(cluster.minio_bucket): + minio_client.remove_bucket(cluster.minio_bucket) + + minio_client.make_bucket(cluster.minio_bucket) + + +@pytest.fixture(scope="module") +def cluster(): + try: + cluster = ClickHouseCluster(__file__) + + cluster.add_instance("node1", config_dir="configs", macros={'cluster': 'test1'}, with_minio=True, with_zookeeper=True) + cluster.add_instance("node2", config_dir="configs", macros={'cluster': 'test1'}, with_zookeeper=True) + cluster.add_instance("node3", config_dir="configs", macros={'cluster': 'test1'}, with_zookeeper=True) + + logging.info("Starting cluster...") + cluster.start() + logging.info("Cluster started") + + prepare_s3_bucket(cluster) + logging.info("S3 bucket created") + + yield cluster + finally: + cluster.shutdown() + + +FILES_OVERHEAD = 1 +FILES_OVERHEAD_PER_COLUMN = 2 # Data and mark files +FILES_OVERHEAD_PER_PART = FILES_OVERHEAD_PER_COLUMN * 3 + 2 + 6 + + +def random_string(length): + letters = string.ascii_letters + return ''.join(random.choice(letters) for i in range(length)) + + +def generate_values(date_str, count, sign=1): + data = [[date_str, sign*(i + 1), random_string(10)] for i in range(count)] + data.sort(key=lambda tup: tup[1]) + return ",".join(["('{}',{},'{}')".format(x, y, z) for x, y, z in data]) + + +def create_table(cluster): + create_table_statement = """ + CREATE TABLE s3_test ( + dt Date, + id Int64, + data String, + INDEX min_max (id) TYPE minmax GRANULARITY 3 + ) ENGINE=ReplicatedMergeTree('/clickhouse/{cluster}/tables/test/s3', '{instance}') + PARTITION BY dt + ORDER BY (dt, id) + SETTINGS storage_policy='s3' + """ + + for node in cluster.instances.values(): + node.query(create_table_statement) + + +@pytest.fixture(autouse=True) +def drop_table(cluster): + yield + for node in cluster.instances.values(): + node.query("DROP TABLE IF EXISTS s3_test") + + minio = cluster.minio_client + # Remove extra objects to prevent tests cascade failing + for obj in list(minio.list_objects(cluster.minio_bucket, 'data/')): + minio.remove_object(cluster.minio_bucket, obj.object_name) + + +def test_insert_select_replicated(cluster): + create_table(cluster) + + all_values = "" + for node_idx in range(1, 4): + node = cluster.instances["node" + str(node_idx)] + values = generate_values("2020-01-0" + str(node_idx), 4096) + node.query("INSERT INTO s3_test VALUES {}".format(values), settings={"insert_quorum": 3}) + if node_idx != 1: + all_values += "," + all_values += values + + for node_idx in range(1, 4): + node = cluster.instances["node" + str(node_idx)] + assert node.query("SELECT * FROM s3_test order by dt, id FORMAT Values", settings={"select_sequential_consistency": 1}) == all_values + + minio = cluster.minio_client + assert len(list(minio.list_objects(cluster.minio_bucket, 'data/'))) == 3 * (FILES_OVERHEAD + FILES_OVERHEAD_PER_PART * 3) diff --git a/tests/integration/test_settings_constraints_distributed/test.py b/tests/integration/test_settings_constraints_distributed/test.py index 854f101fb18..86456f8a099 100644 --- a/tests/integration/test_settings_constraints_distributed/test.py +++ b/tests/integration/test_settings_constraints_distributed/test.py @@ -103,4 +103,5 @@ def test_insert_clamps_settings(): distributed.query("INSERT INTO proxy VALUES (toDate('2020-02-20'), 2, 2)") distributed.query("INSERT INTO proxy VALUES (toDate('2020-02-21'), 2, 2)", settings={"max_memory_usage": 5000000}) - assert distributed.query("SELECT COUNT() FROM proxy") == "4\n" + distributed.query("SYSTEM FLUSH DISTRIBUTED proxy") + assert_eq_with_retry(distributed, "SELECT COUNT() FROM proxy", "4") diff --git a/tests/performance/IPv4.xml b/tests/performance/IPv4.xml index 8f5b61d70c9..b3f6cf52584 100644 --- a/tests/performance/IPv4.xml +++ b/tests/performance/IPv4.xml @@ -1,14 +1,5 @@ - - - 30000 - - - 5000 - 60000 - - CREATE TABLE IF NOT EXISTS ips_v4(ip String) ENGINE = MergeTree() PARTITION BY tuple() ORDER BY tuple() - 10000 - - diff --git a/tests/performance/bit_operations_fixed_string.xml b/tests/performance/bit_operations_fixed_string.xml index 90df91f1025..c08761ba8fc 100644 --- a/tests/performance/bit_operations_fixed_string.xml +++ b/tests/performance/bit_operations_fixed_string.xml @@ -1,15 +1,5 @@ - - - 3 - 10000 - - - 5 - 60000 - - diff --git a/tests/performance/bit_operations_fixed_string_numbers.xml b/tests/performance/bit_operations_fixed_string_numbers.xml index 779aea19cdc..e10e665ac81 100644 --- a/tests/performance/bit_operations_fixed_string_numbers.xml +++ b/tests/performance/bit_operations_fixed_string_numbers.xml @@ -1,15 +1,5 @@ - - - 3 - 10000 - - - 5 - 60000 - - SELECT count() FROM numbers(10000000) WHERE NOT ignore(bitXor(reinterpretAsFixedString(number), reinterpretAsFixedString(number + 1))) SELECT count() FROM numbers(10000000) WHERE NOT ignore(bitXor(reinterpretAsFixedString(number), reinterpretAsFixedString(0xabcd0123cdef4567))) diff --git a/tests/performance/bloom_filter.xml b/tests/performance/bloom_filter.xml index 079b7a43da3..3d9096afb03 100644 --- a/tests/performance/bloom_filter.xml +++ b/tests/performance/bloom_filter.xml @@ -1,10 +1,5 @@ - - - 30000 - - DROP TABLE IF EXISTS test_bf CREATE TABLE test_bf (`id` int, `ary` Array(String), INDEX idx_ary ary TYPE bloom_filter(0.01) GRANULARITY 8192) ENGINE = MergeTree() ORDER BY id diff --git a/tests/performance/bounding_ratio.xml b/tests/performance/bounding_ratio.xml index 0d0adfaea45..113c9c4dc14 100644 --- a/tests/performance/bounding_ratio.xml +++ b/tests/performance/bounding_ratio.xml @@ -1,11 +1,5 @@ - - - - 10000 - - SELECT boundingRatio(number, number) FROM numbers(1000000) SELECT (argMax(number, number) - argMin(number, number)) / (max(number) - min(number)) FROM numbers(1000000) diff --git a/tests/performance/cidr.xml b/tests/performance/cidr.xml index 938734e3709..a83a7e19182 100644 --- a/tests/performance/cidr.xml +++ b/tests/performance/cidr.xml @@ -1,15 +1,5 @@ - - - 3 - 10000 - - - 5 - 60000 - - diff --git a/tests/performance/codecs_float_insert.xml b/tests/performance/codecs_float_insert.xml index 706a2f3c0a0..273b5c07b67 100644 --- a/tests/performance/codecs_float_insert.xml +++ b/tests/performance/codecs_float_insert.xml @@ -1,15 +1,5 @@ - - - 10 - 10000 - - - 100 - 60000 - - diff --git a/tests/performance/codecs_float_select.xml b/tests/performance/codecs_float_select.xml index 4c2f671a90e..8dc78e5f90f 100644 --- a/tests/performance/codecs_float_select.xml +++ b/tests/performance/codecs_float_select.xml @@ -1,15 +1,5 @@ - - - 10 - 10000 - - - 100 - 60000 - - diff --git a/tests/performance/codecs_int_insert.xml b/tests/performance/codecs_int_insert.xml index 1226d9020a0..8efdc031b33 100644 --- a/tests/performance/codecs_int_insert.xml +++ b/tests/performance/codecs_int_insert.xml @@ -1,15 +1,5 @@ - - - 10 - 10000 - - - 100 - 60000 - - diff --git a/tests/performance/codecs_int_select.xml b/tests/performance/codecs_int_select.xml index 8054c2b2de4..a8ed2e90069 100644 --- a/tests/performance/codecs_int_select.xml +++ b/tests/performance/codecs_int_select.xml @@ -1,15 +1,5 @@ - - - 10 - 10000 - - - 100 - 60000 - - diff --git a/tests/performance/collations.xml b/tests/performance/collations.xml index 03d77fa5e27..17b2d36b7e3 100644 --- a/tests/performance/collations.xml +++ b/tests/performance/collations.xml @@ -1,15 +1,5 @@ - - - 5 - 10000 - - - 100 - 60000 - - diff --git a/tests/performance/column_column_comparison.xml b/tests/performance/column_column_comparison.xml index 7559e03e506..88ceda7bf83 100644 --- a/tests/performance/column_column_comparison.xml +++ b/tests/performance/column_column_comparison.xml @@ -8,16 +8,6 @@ - - - 5 - 10000 - - - 50 - 60000 - - diff --git a/tests/performance/columns_hashing.xml b/tests/performance/columns_hashing.xml index ca330b0d435..414c85c3853 100644 --- a/tests/performance/columns_hashing.xml +++ b/tests/performance/columns_hashing.xml @@ -9,16 +9,6 @@ - - - 5 - 60000 - - - 10 - 150000 - - SELECT count() FROM numbers(1000000) WHERE NOT ignore(geohashEncode((number % 150)*1.1 - 75, (number * 3.14 % 300)*1.1 - 150)) diff --git a/tests/performance/general_purpose_hashes.xml b/tests/performance/general_purpose_hashes.xml index 458e646f3a7..d636de1ddaa 100644 --- a/tests/performance/general_purpose_hashes.xml +++ b/tests/performance/general_purpose_hashes.xml @@ -1,15 +1,5 @@ - - - 3 - 10000 - - - 5 - 60000 - - diff --git a/tests/performance/general_purpose_hashes_on_UUID.xml b/tests/performance/general_purpose_hashes_on_UUID.xml index 9b749ae79e0..020e2a3c134 100644 --- a/tests/performance/general_purpose_hashes_on_UUID.xml +++ b/tests/performance/general_purpose_hashes_on_UUID.xml @@ -1,15 +1,5 @@ - - - 3 - 10000 - - - 5 - 60000 - - diff --git a/tests/performance/generate_table_function.xml b/tests/performance/generate_table_function.xml index c53ec285bf5..8de711304ae 100644 --- a/tests/performance/generate_table_function.xml +++ b/tests/performance/generate_table_function.xml @@ -1,10 +1,5 @@ - - - 10000 - - SELECT sum(NOT ignore(*)) FROM (SELECT * FROM generateRandom('ui64 UInt64, i64 Int64, ui32 UInt32, i32 Int32, ui16 UInt16, i16 Int16, ui8 UInt8, i8 Int8') LIMIT 10000000); SELECT sum(NOT ignore(*)) FROM (SELECT * FROM generateRandom('ui64 UInt64, i64 Int64, ui32 UInt32, i32 Int32, ui16 UInt16, i16 Int16, ui8 UInt8, i8 Int8', 0, 10, 10) LIMIT 10000000); diff --git a/tests/performance/great_circle_dist.xml b/tests/performance/great_circle_dist.xml index 3b2aac65230..ff297cffc58 100644 --- a/tests/performance/great_circle_dist.xml +++ b/tests/performance/great_circle_dist.xml @@ -1,10 +1,5 @@ - - - 10000 - - SELECT count() FROM numbers(1000000) WHERE NOT ignore(greatCircleDistance((rand(1) % 360) * 1. - 180, (number % 150) * 1.2 - 90, (number % 360) + toFloat64(rand(2)) / 4294967296 - 180, (rand(3) % 180) * 1. - 90)) diff --git a/tests/performance/group_array_moving_sum.xml b/tests/performance/group_array_moving_sum.xml index 76b6b5bb980..465306d837d 100644 --- a/tests/performance/group_array_moving_sum.xml +++ b/tests/performance/group_array_moving_sum.xml @@ -1,12 +1,4 @@ - - - 30000 - - - 60000 - - 30000000000 diff --git a/tests/performance/h3.xml b/tests/performance/h3.xml index 104e777fcc5..ce00ebbc9ec 100644 --- a/tests/performance/h3.xml +++ b/tests/performance/h3.xml @@ -1,11 +1,5 @@ - - - - 10000 - - SELECT count() FROM zeros(100000) WHERE NOT ignore(geoToH3(37.62 + rand(1) / 0x100000000, 55.75 + rand(2) / 0x100000000, 15)) diff --git a/tests/performance/if_array_num.xml b/tests/performance/if_array_num.xml index 4ecd1e66daa..f3f418b809c 100644 --- a/tests/performance/if_array_num.xml +++ b/tests/performance/if_array_num.xml @@ -1,10 +1,5 @@ - - - 10000 - - SELECT count() FROM zeros(10000000) WHERE NOT ignore(rand() % 2 ? [1, 2, 3] : [4, 5]) diff --git a/tests/performance/if_array_string.xml b/tests/performance/if_array_string.xml index 40302131665..6713822e5a4 100644 --- a/tests/performance/if_array_string.xml +++ b/tests/performance/if_array_string.xml @@ -1,10 +1,5 @@ - - - 10000 - - SELECT count() FROM zeros(10000000) WHERE NOT ignore(rand() % 2 ? ['Hello', 'World'] : ['a', 'b', 'c']) diff --git a/tests/performance/if_string_const.xml b/tests/performance/if_string_const.xml index 7e273db36d8..69dd8f75463 100644 --- a/tests/performance/if_string_const.xml +++ b/tests/performance/if_string_const.xml @@ -1,10 +1,5 @@ - - - 10000 - - SELECT count() FROM zeros(1000000) WHERE NOT ignore(rand() % 2 ? 'hello' : 'world') SELECT count() FROM zeros(1000000) WHERE NOT ignore(rand() % 2 ? 'hello' : '') diff --git a/tests/performance/if_string_hits.xml b/tests/performance/if_string_hits.xml index ec9ea39f7cf..ca23d710185 100644 --- a/tests/performance/if_string_hits.xml +++ b/tests/performance/if_string_hits.xml @@ -1,15 +1,5 @@ - - - 5 - 10000 - - - 50 - 60000 - - diff --git a/tests/performance/if_to_multiif.xml b/tests/performance/if_to_multiif.xml index 373318c316c..88fc38d48f0 100644 --- a/tests/performance/if_to_multiif.xml +++ b/tests/performance/if_to_multiif.xml @@ -1,10 +1,5 @@ - - - 10000 - - - - - 10 - 3000 - - - 100 - 60000 - - file('test_all_expr_matches.values', Values, 'd DateTime, i UInt32, s String, ni Nullable(UInt64), ns Nullable(String), ars Array(String)') diff --git a/tests/performance/inserts_arrays_lowcardinality.xml b/tests/performance/inserts_arrays_lowcardinality.xml index bca5c858576..40617fb9593 100644 --- a/tests/performance/inserts_arrays_lowcardinality.xml +++ b/tests/performance/inserts_arrays_lowcardinality.xml @@ -1,14 +1,4 @@ - - - 5 - 12000 - - - 50 - 60000 - - CREATE TABLE lot_of_string_arrays_src (`id` UInt64, `col00` Array(String), `col01` Array(String), `col02` Array(String), `col03` Array(String), `col04` Array(String), `col05` Array(String), `col06` Array(String), `col07` Array(String), `col08` Array(String), `col09` Array(String), `col10` Array(String), `col11` Array(String), `col12` Array(String), `col13` Array(String), `col14` Array(String), `col15` Array(String), `col16` Array(String), `col17` Array(String), `col18` Array(String), `col19` Array(String), `col20` Array(String), `col21` Array(String), `col22` Array(String), `col23` Array(String), `col24` Array(String), `col25` Array(String), `col26` Array(String), `col27` Array(String), `col28` Array(String), `col29` Array(String), `col30` Array(String), `col31` Array(String), `col32` Array(String), `col33` Array(String), `col34` Array(String), `col35` Array(String), `col36` Array(String), `col37` Array(String), `col38` Array(String), `col39` Array(String), `col40` Array(String), `col41` Array(String), `col42` Array(String), `col43` Array(String), `col44` Array(String), `col45` Array(String), `col46` Array(String), `col47` Array(String), `col48` Array(String), `col49` Array(String)) ENGINE = MergeTree ORDER BY id SETTINGS index_granularity = 8192; CREATE TABLE lot_of_string_arrays_dst_lowcardinality (`id` UInt64, `col00` Array(LowCardinality(String)), `col01` Array(LowCardinality(String)), `col02` Array(LowCardinality(String)), `col03` Array(LowCardinality(String)), `col04` Array(LowCardinality(String)), `col05` Array(LowCardinality(String)), `col06` Array(LowCardinality(String)), `col07` Array(LowCardinality(String)), `col08` Array(LowCardinality(String)), `col09` Array(LowCardinality(String)), `col10` Array(LowCardinality(String)), `col11` Array(LowCardinality(String)), `col12` Array(LowCardinality(String)), `col13` Array(LowCardinality(String)), `col14` Array(LowCardinality(String)), `col15` Array(LowCardinality(String)), `col16` Array(LowCardinality(String)), `col17` Array(LowCardinality(String)), `col18` Array(LowCardinality(String)), `col19` Array(LowCardinality(String)), `col20` Array(LowCardinality(String)), `col21` Array(LowCardinality(String)), `col22` Array(LowCardinality(String)), `col23` Array(LowCardinality(String)), `col24` Array(LowCardinality(String)), `col25` Array(LowCardinality(String)), `col26` Array(LowCardinality(String)), `col27` Array(LowCardinality(String)), `col28` Array(LowCardinality(String)), `col29` Array(LowCardinality(String)), `col30` Array(LowCardinality(String)), `col31` Array(LowCardinality(String)), `col32` Array(LowCardinality(String)), `col33` Array(LowCardinality(String)), `col34` Array(LowCardinality(String)), `col35` Array(LowCardinality(String)), `col36` Array(LowCardinality(String)), `col37` Array(LowCardinality(String)), `col38` Array(LowCardinality(String)), `col39` Array(LowCardinality(String)), `col40` Array(LowCardinality(String)), `col41` Array(LowCardinality(String)), `col42` Array(LowCardinality(String)), `col43` Array(LowCardinality(String)), `col44` Array(LowCardinality(String)), `col45` Array(LowCardinality(String)), `col46` Array(LowCardinality(String)), `col47` Array(LowCardinality(String)), `col48` Array(LowCardinality(String)), `col49` Array(LowCardinality(String))) ENGINE = MergeTree ORDER BY id SETTINGS index_granularity = 8192; diff --git a/tests/performance/int_parsing.xml b/tests/performance/int_parsing.xml index 8a6475546bf..a9258875b5e 100644 --- a/tests/performance/int_parsing.xml +++ b/tests/performance/int_parsing.xml @@ -1,15 +1,5 @@ - - - 3 - 10000 - - - 5 - 60000 - - diff --git a/tests/performance/jit_large_requests.xml b/tests/performance/jit_large_requests.xml index 4b390c36e20..805b7f2edb1 100644 --- a/tests/performance/jit_large_requests.xml +++ b/tests/performance/jit_large_requests.xml @@ -1,10 +1,5 @@ - - - 100 - - CREATE TABLE jit_test ( diff --git a/tests/performance/jit_small_requests.xml b/tests/performance/jit_small_requests.xml index 7a4bedf6832..f90415371ce 100644 --- a/tests/performance/jit_small_requests.xml +++ b/tests/performance/jit_small_requests.xml @@ -1,15 +1,5 @@ - - - 5 - 10000 - - - 5000 - 60000 - - diff --git a/tests/performance/joins_in_memory.xml b/tests/performance/joins_in_memory.xml index 67c86f49ff2..f24e7425ecd 100644 --- a/tests/performance/joins_in_memory.xml +++ b/tests/performance/joins_in_memory.xml @@ -1,10 +1,5 @@ - - - 10 - - CREATE TABLE ints (i64 Int64, i32 Int32, i16 Int16, i8 Int8) ENGINE = Memory diff --git a/tests/performance/joins_in_memory_pmj.xml b/tests/performance/joins_in_memory_pmj.xml index 19383467fa1..6742943151e 100644 --- a/tests/performance/joins_in_memory_pmj.xml +++ b/tests/performance/joins_in_memory_pmj.xml @@ -1,10 +1,5 @@ - - - 10 - - CREATE TABLE ints (i64 Int64, i32 Int32, i16 Int16, i8 Int8) ENGINE = Memory diff --git a/tests/performance/json_extract_rapidjson.xml b/tests/performance/json_extract_rapidjson.xml index 9818abb8581..f8d40c1e58d 100644 --- a/tests/performance/json_extract_rapidjson.xml +++ b/tests/performance/json_extract_rapidjson.xml @@ -1,15 +1,5 @@ - - - 3 - 10000 - - - 5 - 60000 - - diff --git a/tests/performance/json_extract_simdjson.xml b/tests/performance/json_extract_simdjson.xml index fa18d43df3e..f9f6df5140e 100644 --- a/tests/performance/json_extract_simdjson.xml +++ b/tests/performance/json_extract_simdjson.xml @@ -1,15 +1,5 @@ - - - 3 - 10000 - - - 5 - 60000 - - diff --git a/tests/performance/least_greatest_hits.xml b/tests/performance/least_greatest_hits.xml new file mode 100644 index 00000000000..464656b0201 --- /dev/null +++ b/tests/performance/least_greatest_hits.xml @@ -0,0 +1,9 @@ + + + test.hits + + + SELECT count() FROM test.hits WHERE NOT ignore(least(URL, Referer)) + SELECT count() FROM test.hits WHERE NOT ignore(greatest(URL, Referer, Title)) + SELECT count() FROM test.hits WHERE NOT ignore(greatest(ClientIP, RemoteIP)) + diff --git a/tests/performance/leftpad.xml b/tests/performance/leftpad.xml index eb0b09c72ed..3f747054122 100644 --- a/tests/performance/leftpad.xml +++ b/tests/performance/leftpad.xml @@ -9,16 +9,6 @@ - - - 5 - 10000 - - - 50 - 60000 - - diff --git a/tests/performance/linear_regression.xml b/tests/performance/linear_regression.xml index 87fa034d851..a23924939a2 100644 --- a/tests/performance/linear_regression.xml +++ b/tests/performance/linear_regression.xml @@ -1,10 +1,5 @@ - - - 10000 - - test.hits diff --git a/tests/performance/logical_functions_large.xml b/tests/performance/logical_functions_large.xml index e199094c43c..077c88fa32c 100644 --- a/tests/performance/logical_functions_large.xml +++ b/tests/performance/logical_functions_large.xml @@ -1,16 +1,6 @@ 1 - - - 60 - 30000 - - - 120 - 60000 - - 1 diff --git a/tests/performance/logical_functions_medium.xml b/tests/performance/logical_functions_medium.xml index 1c4fd2a24dc..990fe2b6040 100644 --- a/tests/performance/logical_functions_medium.xml +++ b/tests/performance/logical_functions_medium.xml @@ -1,16 +1,6 @@ 1 - - - 60 - 30000 - - - 120 - 60000 - - 1 diff --git a/tests/performance/logical_functions_small.xml b/tests/performance/logical_functions_small.xml index d3d7a2eecca..15c1e87a558 100644 --- a/tests/performance/logical_functions_small.xml +++ b/tests/performance/logical_functions_small.xml @@ -1,16 +1,6 @@ 1 - - - 15 - 20000 - - - 120 - 60000 - - 1 diff --git a/tests/performance/materialized_view_parallel_insert.xml b/tests/performance/materialized_view_parallel_insert.xml new file mode 100644 index 00000000000..4b71354dec3 --- /dev/null +++ b/tests/performance/materialized_view_parallel_insert.xml @@ -0,0 +1,32 @@ + + + hits_10m_single + + + + CREATE MATERIALIZED VIEW hits_mv ENGINE MergeTree + PARTITION BY toYYYYMM(EventDate) + ORDER BY (CounterID, EventDate, intHash32(UserID)) + SAMPLE BY intHash32(UserID) + SETTINGS + parts_to_delay_insert = 5000, + parts_to_throw_insert = 5000 + AS + -- don't select all columns to keep the run time down + SELECT CounterID, EventDate, UserID, Title + FROM hits_10m_single + -- do not select anything because we only need column types + LIMIT 0 + + SET max_insert_threads=8 + SYSTEM STOP MERGES + + + INSERT INTO hits_mv + SELECT CounterID, EventDate, UserID, Title + FROM hits_10m_single + + + SYSTEM START MERGES + DROP TABLE IF EXISTS hits_mv + diff --git a/tests/performance/math.xml b/tests/performance/math.xml index 6ab497749f1..78effabaf1e 100644 --- a/tests/performance/math.xml +++ b/tests/performance/math.xml @@ -1,16 +1,6 @@ - - - 5 - 10000 - - - 50 - 60000 - - diff --git a/tests/performance/merge_table_streams.xml b/tests/performance/merge_table_streams.xml index 084fa2da575..01f0444e54c 100644 --- a/tests/performance/merge_table_streams.xml +++ b/tests/performance/merge_table_streams.xml @@ -4,15 +4,6 @@ hits_100m_single - - - 60000 - 3 - - - 30 - - diff --git a/tests/performance/merge_tree_huge_pk.xml b/tests/performance/merge_tree_huge_pk.xml index 78a6cf6308e..2332769b522 100644 --- a/tests/performance/merge_tree_huge_pk.xml +++ b/tests/performance/merge_tree_huge_pk.xml @@ -1,15 +1,5 @@ - - - 10 - 12000 - - - 50 - 60000 - - CREATE TABLE huge_pk ENGINE = MergeTree ORDER BY ( diff --git a/tests/performance/merge_tree_many_partitions.xml b/tests/performance/merge_tree_many_partitions.xml index 33bb12ed22b..d3a5d204d5a 100644 --- a/tests/performance/merge_tree_many_partitions.xml +++ b/tests/performance/merge_tree_many_partitions.xml @@ -3,16 +3,6 @@ CREATE TABLE bad_partitions (x UInt64) ENGINE = MergeTree PARTITION BY x ORDER BY x INSERT INTO bad_partitions SELECT * FROM numbers(10000) - - - 5 - 10000 - - - 100 - 60000 - - diff --git a/tests/performance/merge_tree_many_partitions_2.xml b/tests/performance/merge_tree_many_partitions_2.xml index 42bb0ac29c9..6799153ed65 100644 --- a/tests/performance/merge_tree_many_partitions_2.xml +++ b/tests/performance/merge_tree_many_partitions_2.xml @@ -3,16 +3,6 @@ CREATE TABLE bad_partitions (a UInt64, b UInt64, c UInt64, d UInt64, e UInt64, f UInt64, g UInt64, h UInt64, i UInt64, j UInt64, k UInt64, l UInt64, m UInt64, n UInt64, o UInt64, p UInt64, q UInt64, r UInt64, s UInt64, t UInt64, u UInt64, v UInt64, w UInt64, x UInt64, y UInt64, z UInt64) ENGINE = MergeTree PARTITION BY x ORDER BY x INSERT INTO bad_partitions (x) SELECT * FROM numbers_mt(3000) - - - 5 - 10000 - - - 100 - 60000 - - diff --git a/tests/performance/merge_tree_simple_select.xml b/tests/performance/merge_tree_simple_select.xml index f38a5241cb5..4d449e878d3 100644 --- a/tests/performance/merge_tree_simple_select.xml +++ b/tests/performance/merge_tree_simple_select.xml @@ -1,11 +1,6 @@ - - - 10 - - CREATE TABLE simple_mergetree (EventDate Date, x UInt64) ENGINE = MergeTree ORDER BY x INSERT INTO simple_mergetree SELECT number, today() + intDiv(number, 10000000) FROM numbers_mt(100000000) diff --git a/tests/performance/mingroupby-orderbylimit1.xml b/tests/performance/mingroupby-orderbylimit1.xml index 306095a7e1c..d6ff3bedbd2 100644 --- a/tests/performance/mingroupby-orderbylimit1.xml +++ b/tests/performance/mingroupby-orderbylimit1.xml @@ -1,10 +1,5 @@ - - - 30000 - - 1 diff --git a/tests/performance/modulo.xml b/tests/performance/modulo.xml index e31de5c1701..77b544ff389 100644 --- a/tests/performance/modulo.xml +++ b/tests/performance/modulo.xml @@ -1,10 +1,5 @@ - - - 10 - - SELECT number % 128 FROM numbers(300000000) FORMAT Null diff --git a/tests/performance/ngram_distance.xml b/tests/performance/ngram_distance.xml index e90f49155b1..f102b90466d 100644 --- a/tests/performance/ngram_distance.xml +++ b/tests/performance/ngram_distance.xml @@ -13,16 +13,6 @@ 20000000000 - - - 5 - 10000 - - - 50 - 60000 - - diff --git a/tests/performance/number_formatting_formats.xml b/tests/performance/number_formatting_formats.xml index c2a9a9c081d..9c9eb601b5a 100644 --- a/tests/performance/number_formatting_formats.xml +++ b/tests/performance/number_formatting_formats.xml @@ -2,16 +2,6 @@ CREATE TABLE IF NOT EXISTS table_{format} (x UInt64) ENGINE = File(`{format}`) - - - 5 - 10000 - - - 100 - 60000 - - diff --git a/tests/performance/nyc_taxi.xml b/tests/performance/nyc_taxi.xml index 92a1dd59441..b7c1cf58146 100644 --- a/tests/performance/nyc_taxi.xml +++ b/tests/performance/nyc_taxi.xml @@ -1,15 +1,5 @@ - - - 5 - 10000 - - - 50 - 60000 - - diff --git a/tests/performance/order_by_decimals.xml b/tests/performance/order_by_decimals.xml index 5479181fb08..4889137865d 100644 --- a/tests/performance/order_by_decimals.xml +++ b/tests/performance/order_by_decimals.xml @@ -5,16 +5,6 @@ - - - 5 - 10000 - - - 50 - 60000 - - SELECT toInt32(number) AS n FROM numbers(1000000) ORDER BY n DESC FORMAT Null SELECT toDecimal32(number, 0) AS n FROM numbers(1000000) ORDER BY n FORMAT Null diff --git a/tests/performance/order_by_read_in_order.xml b/tests/performance/order_by_read_in_order.xml index e37e4df4681..5749a49a3aa 100644 --- a/tests/performance/order_by_read_in_order.xml +++ b/tests/performance/order_by_read_in_order.xml @@ -1,15 +1,5 @@ - - - 5 - 10000 - - - 500 - 60000 - - diff --git a/tests/performance/order_by_single_column.xml b/tests/performance/order_by_single_column.xml index 148b14e8959..9b708ea393c 100644 --- a/tests/performance/order_by_single_column.xml +++ b/tests/performance/order_by_single_column.xml @@ -9,16 +9,6 @@ - - - 5 - 10000 - - - 50 - 60000 - - SELECT URL as col FROM hits_100m_single ORDER BY col LIMIT 1000,1 SELECT SearchPhrase as col FROM hits_100m_single ORDER BY col LIMIT 10000,1 diff --git a/tests/performance/parallel_insert.xml b/tests/performance/parallel_insert.xml index 7af2a1418d9..34c45e08bc0 100644 --- a/tests/performance/parallel_insert.xml +++ b/tests/performance/parallel_insert.xml @@ -1,9 +1,4 @@ - - - 2 - - default.hits_10m_single diff --git a/tests/performance/parse_engine_file.xml b/tests/performance/parse_engine_file.xml index a88945125b3..f962cbf4369 100644 --- a/tests/performance/parse_engine_file.xml +++ b/tests/performance/parse_engine_file.xml @@ -4,16 +4,6 @@ INSERT INTO table_{format} SELECT * FROM test.hits LIMIT 100000 - - - 5 - 10000 - - - 100 - 60000 - - diff --git a/tests/performance/pre_limit_no_sorting.xml b/tests/performance/pre_limit_no_sorting.xml index e93aef049aa..a1e50f736b8 100644 --- a/tests/performance/pre_limit_no_sorting.xml +++ b/tests/performance/pre_limit_no_sorting.xml @@ -1,14 +1,4 @@ - - - 10 - 200 - - - 100 - 1000 - - SELECT sum(number) FROM (select number from system.numbers_mt limit 1000000000) diff --git a/tests/performance/prewhere.xml b/tests/performance/prewhere.xml index e3350d765ee..40a12a68bb9 100644 --- a/tests/performance/prewhere.xml +++ b/tests/performance/prewhere.xml @@ -1,15 +1,5 @@ - - - 5 - 10000 - - - 50 - 60000 - - diff --git a/tests/performance/random_printable_ascii.xml b/tests/performance/random_printable_ascii.xml index 20868bb4c27..3eb1441a4cc 100644 --- a/tests/performance/random_printable_ascii.xml +++ b/tests/performance/random_printable_ascii.xml @@ -1,10 +1,5 @@ - - - 10000 - - SELECT count() FROM zeros(1000000) WHERE NOT ignore(randomPrintableASCII(10)) diff --git a/tests/performance/range.xml b/tests/performance/range.xml index 95b8455057e..c676f9124ba 100644 --- a/tests/performance/range.xml +++ b/tests/performance/range.xml @@ -1,10 +1,5 @@ - - - 10000 - - SELECT range(number % 100) FROM numbers(10000000) FORMAT Null diff --git a/tests/performance/read_hits_with_aio.xml b/tests/performance/read_hits_with_aio.xml index 850fd0fbadc..1e9a81f7693 100644 --- a/tests/performance/read_hits_with_aio.xml +++ b/tests/performance/read_hits_with_aio.xml @@ -1,10 +1,5 @@ - - - 30000 - - hits_100m_single diff --git a/tests/performance/right.xml b/tests/performance/right.xml index 73030e52f21..ac889e21e73 100644 --- a/tests/performance/right.xml +++ b/tests/performance/right.xml @@ -4,14 +4,6 @@ hits_100m_single - - - 10000 - - - 20000 - - diff --git a/tests/performance/round_down.xml b/tests/performance/round_down.xml index f453467ab2d..c309a767843 100644 --- a/tests/performance/round_down.xml +++ b/tests/performance/round_down.xml @@ -1,13 +1,5 @@ - - - 10000 - - - 20000 - - SELECT count() FROM zeros(10000000) WHERE NOT ignore(roundDuration(rand() % 65536)) diff --git a/tests/performance/round_methods.xml b/tests/performance/round_methods.xml index 0e560b2eae6..fac9c1908b0 100644 --- a/tests/performance/round_methods.xml +++ b/tests/performance/round_methods.xml @@ -1,13 +1,5 @@ - - - 10000 - - - 20000 - - SELECT count() FROM numbers(1000000) WHERE NOT ignore(round(toInt64(number), -2)) diff --git a/tests/performance/scalar.xml b/tests/performance/scalar.xml index e8e487a80da..b50aef8747c 100644 --- a/tests/performance/scalar.xml +++ b/tests/performance/scalar.xml @@ -1,14 +1,5 @@ - - - 30000 - - - 5000 - 60000 - - CREATE TABLE cdp_tags (tag_id String, mid_seqs AggregateFunction(groupBitmap, UInt32)) engine=MergeTree() ORDER BY (tag_id) SETTINGS index_granularity=1 diff --git a/tests/performance/select_format.xml b/tests/performance/select_format.xml index 2bdbde83c2d..82d5198a71a 100644 --- a/tests/performance/select_format.xml +++ b/tests/performance/select_format.xml @@ -2,16 +2,6 @@ CREATE TABLE IF NOT EXISTS table_{format} ENGINE = File({format}, '/dev/null') AS test.hits - - - 5 - 10000 - - - 100 - 60000 - - @@ -44,7 +34,7 @@ ODBCDriver2 MySQLWire Avro - + MsgPack diff --git a/tests/performance/set.xml b/tests/performance/set.xml index 345d9c05573..576a26390d1 100644 --- a/tests/performance/set.xml +++ b/tests/performance/set.xml @@ -3,14 +3,6 @@ long - - - 10000 - - - 20000 - - diff --git a/tests/performance/set_hits.xml b/tests/performance/set_hits.xml index 09860aa1cd7..8b9ae1da83b 100644 --- a/tests/performance/set_hits.xml +++ b/tests/performance/set_hits.xml @@ -5,15 +5,6 @@ hits_100m_single - - - 8000 - - - 7000 - 20000 - - SELECT count() FROM hits_100m_single WHERE UserID IN (SELECT UserID FROM hits_100m_single WHERE AdvEngineID != 0) diff --git a/tests/performance/set_index.xml b/tests/performance/set_index.xml index f158c481d93..e1ced2ba53c 100644 --- a/tests/performance/set_index.xml +++ b/tests/performance/set_index.xml @@ -3,15 +3,6 @@ CREATE TABLE test_in (`a` UInt32) ENGINE = MergeTree() ORDER BY a INSERT INTO test_in SELECT number FROM numbers(500000000) - - - 8000 - - - 7000 - 20000 - - SELECT count() FROM test_in WHERE a IN (SELECT rand(1) FROM numbers(100000)) SETTINGS max_rows_to_read = 1, read_overflow_mode = 'break' diff --git a/tests/performance/simple_join_query.xml b/tests/performance/simple_join_query.xml index 92fdfd23f93..98f2b1eaebf 100644 --- a/tests/performance/simple_join_query.xml +++ b/tests/performance/simple_join_query.xml @@ -1,13 +1,5 @@ - - - 30000 - - - 60000 - - CREATE TABLE join_table(A Int64, S0 String, S1 String, S2 String, S3 String) ENGINE = MergeTree ORDER BY A diff --git a/tests/performance/slices_hits.xml b/tests/performance/slices_hits.xml index 1745df3328c..4a6813579bf 100644 --- a/tests/performance/slices_hits.xml +++ b/tests/performance/slices_hits.xml @@ -1,15 +1,5 @@ - - - 5 - 10000 - - - 50 - 60000 - - diff --git a/tests/performance/sort.xml b/tests/performance/sort.xml index 652dd7f4670..e5781548a37 100644 --- a/tests/performance/sort.xml +++ b/tests/performance/sort.xml @@ -1,10 +1,5 @@ - - - 10 - - CREATE TABLE rand_unlimited_10m_8 (key UInt8) Engine = Memory diff --git a/tests/performance/string_join.xml b/tests/performance/string_join.xml index 6aa2c576b4e..477f62c1327 100644 --- a/tests/performance/string_join.xml +++ b/tests/performance/string_join.xml @@ -1,10 +1,5 @@ - - - 10 - - diff --git a/tests/performance/string_set.xml b/tests/performance/string_set.xml index 7890ab11a4a..bbbfe2d3c2b 100644 --- a/tests/performance/string_set.xml +++ b/tests/performance/string_set.xml @@ -1,10 +1,5 @@ - - - 10 - - diff --git a/tests/performance/string_sort.xml b/tests/performance/string_sort.xml index 6a4e68270f9..71b56bdb9d6 100644 --- a/tests/performance/string_sort.xml +++ b/tests/performance/string_sort.xml @@ -5,16 +5,6 @@ - - - 5 - 10000 - - - 50 - 60000 - - diff --git a/tests/performance/sum_map.xml b/tests/performance/sum_map.xml index a88983fdbea..4f9ce56488c 100644 --- a/tests/performance/sum_map.xml +++ b/tests/performance/sum_map.xml @@ -1,10 +1,5 @@ - - - 30000 - - 1 diff --git a/tests/performance/synthetic_hardware_benchmark.xml b/tests/performance/synthetic_hardware_benchmark.xml index fc910077c9f..256fd623b3c 100644 --- a/tests/performance/synthetic_hardware_benchmark.xml +++ b/tests/performance/synthetic_hardware_benchmark.xml @@ -1,10 +1,5 @@ - - - 12000 - - 30000000000 diff --git a/tests/performance/trim_numbers.xml b/tests/performance/trim_numbers.xml index 62e26f8245a..35cd479d48c 100644 --- a/tests/performance/trim_numbers.xml +++ b/tests/performance/trim_numbers.xml @@ -1,13 +1,5 @@ - - - 10000 - - - 20000 - - diff --git a/tests/performance/trim_urls.xml b/tests/performance/trim_urls.xml index f29d878682f..276a12bc570 100644 --- a/tests/performance/trim_urls.xml +++ b/tests/performance/trim_urls.xml @@ -4,14 +4,6 @@ hits_100m_single - - - 10000 - - - 20000 - - diff --git a/tests/performance/trim_whitespace.xml b/tests/performance/trim_whitespace.xml index 8ec4aeaa54e..049387bbc0c 100644 --- a/tests/performance/trim_whitespace.xml +++ b/tests/performance/trim_whitespace.xml @@ -10,11 +10,6 @@ from numbers_mt(100000000); - - - 30000 - - diff --git a/tests/performance/uniq.xml b/tests/performance/uniq.xml index 0b7c8e58c86..2766c95e6a7 100644 --- a/tests/performance/uniq.xml +++ b/tests/performance/uniq.xml @@ -5,12 +5,6 @@ 30000000000 - - - 5000 - 20000 - - diff --git a/tests/performance/url_hits.xml b/tests/performance/url_hits.xml index f9383eb3910..c8cf119a7d7 100644 --- a/tests/performance/url_hits.xml +++ b/tests/performance/url_hits.xml @@ -4,14 +4,6 @@ hits_100m_single - - - 10000 - - - 20000 - - diff --git a/tests/performance/vectorize_aggregation_combinators.xml b/tests/performance/vectorize_aggregation_combinators.xml index 88870b56d1f..47ac0719bb5 100644 --- a/tests/performance/vectorize_aggregation_combinators.xml +++ b/tests/performance/vectorize_aggregation_combinators.xml @@ -1,14 +1,6 @@ - - - 30000 - - - 60000 - - diff --git a/tests/performance/visit_param_extract_raw.xml b/tests/performance/visit_param_extract_raw.xml index ca46c79c9b5..7be780d5d42 100644 --- a/tests/performance/visit_param_extract_raw.xml +++ b/tests/performance/visit_param_extract_raw.xml @@ -1,9 +1,4 @@ - - - 10000 - - diff --git a/tests/performance/website.xml b/tests/performance/website.xml index c21f09c57d8..6ed60c0860a 100644 --- a/tests/performance/website.xml +++ b/tests/performance/website.xml @@ -5,15 +5,6 @@ hits_100m_single - - - 60000 - 3 - - - 30 - - diff --git a/tests/queries/0_stateless/00561_storage_join.sql b/tests/queries/0_stateless/00561_storage_join.sql index 08f76815702..62ca80d31fe 100644 --- a/tests/queries/0_stateless/00561_storage_join.sql +++ b/tests/queries/0_stateless/00561_storage_join.sql @@ -1,5 +1,3 @@ -SET any_join_distinct_right_table_keys = 1; - drop table IF EXISTS joinbug; CREATE TABLE joinbug ( @@ -21,7 +19,7 @@ CREATE TABLE joinbug_join ( val UInt64, val2 Int32, created UInt64 -) ENGINE = Join(ANY, INNER, id2); +) ENGINE = Join(SEMI, LEFT, id2); insert into joinbug_join (id, id2, val, val2, created) select id, id2, val, val2, created @@ -36,7 +34,7 @@ select id, id2, val, val2, created from ( SELECT toUInt64(arrayJoin(range(50))) AS id2 ) js1 -ANY INNER JOIN joinbug_join using id2; +SEMI LEFT JOIN joinbug_join using id2; DROP TABLE joinbug; DROP TABLE joinbug_join; diff --git a/tests/queries/0_stateless/01018_dictionaries_from_dictionaries.reference b/tests/queries/0_stateless/01018_dictionaries_from_dictionaries.reference index 87dc6a5b6bf..4a22b3a52cf 100644 --- a/tests/queries/0_stateless/01018_dictionaries_from_dictionaries.reference +++ b/tests/queries/0_stateless/01018_dictionaries_from_dictionaries.reference @@ -9,6 +9,7 @@ dict1 dict2 dict3 +dict4 table_for_dict dict1 dict2 diff --git a/tests/queries/0_stateless/01040_distributed_directory_monitor_batch_inserts.sql b/tests/queries/0_stateless/01040_distributed_directory_monitor_batch_inserts.sql index ffc33ce6949..dbec319ab76 100644 --- a/tests/queries/0_stateless/01040_distributed_directory_monitor_batch_inserts.sql +++ b/tests/queries/0_stateless/01040_distributed_directory_monitor_batch_inserts.sql @@ -2,8 +2,11 @@ SET distributed_directory_monitor_batch_inserts=1; SET distributed_directory_monitor_sleep_time_ms=10; SET distributed_directory_monitor_max_sleep_time_ms=100; -CREATE TABLE test (key UInt64) ENGINE=TinyLog(); -CREATE TABLE dist_test AS test Engine=Distributed(test_cluster_two_shards, currentDatabase(), test, key); -INSERT INTO dist_test SELECT toUInt64(number) FROM numbers(2); -SYSTEM FLUSH DISTRIBUTED dist_test; -SELECT * FROM dist_test; +DROP TABLE IF EXISTS test_01040; +DROP TABLE IF EXISTS dist_test_01040; + +CREATE TABLE test_01040 (key UInt64) ENGINE=TinyLog(); +CREATE TABLE dist_test_01040 AS test_01040 Engine=Distributed(test_cluster_two_shards, currentDatabase(), test_01040, key); +INSERT INTO dist_test_01040 SELECT toUInt64(number) FROM numbers(2); +SYSTEM FLUSH DISTRIBUTED dist_test_01040; +SELECT * FROM dist_test_01040; diff --git a/tests/queries/0_stateless/01048_exists_query.sql b/tests/queries/0_stateless/01048_exists_query.sql index 9a4c0558b60..700b4f5983d 100644 --- a/tests/queries/0_stateless/01048_exists_query.sql +++ b/tests/queries/0_stateless/01048_exists_query.sql @@ -32,7 +32,7 @@ EXISTS TABLE db_01048.t_01048; -- Dictionaries are tables as well. But not all t EXISTS DICTIONARY db_01048.t_01048; -- But dictionary-tables cannot be dropped as usual tables. -DROP TABLE db_01048.t_01048; -- { serverError 60 } +DROP TABLE db_01048.t_01048; -- { serverError 520 } DROP DICTIONARY db_01048.t_01048; EXISTS db_01048.t_01048; EXISTS TABLE db_01048.t_01048; diff --git a/tests/queries/0_stateless/01062_alter_on_mutataion.reference b/tests/queries/0_stateless/01062_alter_on_mutataion_zookeeper.reference similarity index 100% rename from tests/queries/0_stateless/01062_alter_on_mutataion.reference rename to tests/queries/0_stateless/01062_alter_on_mutataion_zookeeper.reference diff --git a/tests/queries/0_stateless/01062_alter_on_mutataion.sql b/tests/queries/0_stateless/01062_alter_on_mutataion_zookeeper.sql similarity index 100% rename from tests/queries/0_stateless/01062_alter_on_mutataion.sql rename to tests/queries/0_stateless/01062_alter_on_mutataion_zookeeper.sql diff --git a/tests/queries/0_stateless/01115_join_with_dictionary.reference b/tests/queries/0_stateless/01115_join_with_dictionary.reference new file mode 100644 index 00000000000..f909a3d61f5 --- /dev/null +++ b/tests/queries/0_stateless/01115_join_with_dictionary.reference @@ -0,0 +1,103 @@ +flat: left on +0 0 0 0 0 +1 1 1 1 1 +2 2 2 2 2 +3 3 3 3 3 +4 0 0 0 +flat: left +0 0 0 0 +1 1 1 1 +2 2 2 2 +3 3 3 3 +4 0 0 +flat: any left +0 0 0 0 +1 1 1 1 +2 2 2 2 +3 3 3 3 +4 0 0 +flat: semi left +0 0 0 0 +1 1 1 1 +2 2 2 2 +3 3 3 3 +flat: anti left +4 0 0 +flat: inner +0 0 0 0 +1 1 1 1 +flat: inner on +0 0 0 0 0 +1 1 1 1 1 +2 2 2 2 2 +3 3 3 3 3 +hashed: left on +0 0 0 0 0 +1 1 1 1 1 +2 2 2 2 2 +3 3 3 3 3 +4 \N \N \N \N +hashed: left +0 0 0 0 +1 1 1 1 +2 2 2 2 +3 3 3 3 +4 \N \N \N +hashed: any left +0 0 0 0 +1 1 1 1 +2 2 2 2 +3 3 3 3 +4 \N \N \N +hashed: semi left +0 0 0 0 +1 1 1 1 +2 2 2 2 +3 3 3 3 +hashed: anti left +4 \N \N \N +hashed: inner +0 0 0 0 +1 1 1 1 +hashed: inner on +0 0 0 0 0 +1 1 1 1 1 +2 2 2 2 2 +3 3 3 3 3 +complex_cache (smoke) +0 \N \N \N \N +1 \N \N \N \N +2 \N \N \N \N +3 \N \N \N \N +4 \N \N \N \N +not optimized (smoke) +0 0 0 0 +1 1 1 1 +2 2 2 2 +3 3 3 3 +- +0 0 0 0 0 +1 1 1 1 1 +\N 2 2 2 2 +\N 3 3 3 3 +- +2 2 2 2 +3 3 3 3 +4 \N \N \N +5 \N \N \N +\N 0 0 0 +\N 1 1 1 +- +0 0 0 0 +1 1 1 1 +- +0 0 0 0 +1 1 1 1 +3 3 3 3 +2 2 2 2 +- +0 0 0 0 +1 1 1 1 +- +3 3 3 3 +2 2 2 2 diff --git a/tests/queries/0_stateless/01115_join_with_dictionary.sql b/tests/queries/0_stateless/01115_join_with_dictionary.sql new file mode 100644 index 00000000000..65704f2b3eb --- /dev/null +++ b/tests/queries/0_stateless/01115_join_with_dictionary.sql @@ -0,0 +1,90 @@ +SET send_logs_level = 'none'; + +DROP DATABASE IF EXISTS db_01115; +CREATE DATABASE db_01115 Engine = Ordinary; + +USE db_01115; + +DROP DICTIONARY IF EXISTS dict_flat; +DROP DICTIONARY IF EXISTS dict_hashed; +DROP DICTIONARY IF EXISTS dict_complex_cache; + +CREATE TABLE t1 (key UInt64, a UInt8, b String, c Float64) ENGINE = MergeTree() ORDER BY key; +INSERT INTO t1 SELECT number, number, toString(number), number from numbers(4); + +CREATE DICTIONARY dict_flat (key UInt64 DEFAULT 0, a UInt8 DEFAULT 42, b String DEFAULT 'x', c Float64 DEFAULT 42.0) +PRIMARY KEY key +SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000 USER 'default' TABLE 't1' PASSWORD '' DB 'db_01115')) +LIFETIME(MIN 1 MAX 10) +LAYOUT(FLAT()); + +CREATE DICTIONARY db_01115.dict_hashed (key UInt64 DEFAULT 0, a UInt8 DEFAULT 42, b String DEFAULT 'x', c Float64 DEFAULT 42.0) +PRIMARY KEY key +SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000 USER 'default' TABLE 't1' DB 'db_01115')) +LIFETIME(MIN 1 MAX 10) +LAYOUT(HASHED()); + +CREATE DICTIONARY dict_complex_cache (key UInt64 DEFAULT 0, a UInt8 DEFAULT 42, b String DEFAULT 'x', c Float64 DEFAULT 42.0) +PRIMARY KEY key, b +SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000 USER 'default' TABLE 't1' DB 'db_01115')) +LIFETIME(MIN 1 MAX 10) +LAYOUT(COMPLEX_KEY_CACHE(SIZE_IN_CELLS 1)); + +SET join_use_nulls = 0; + +SELECT 'flat: left on'; +SELECT * FROM (SELECT number AS key FROM numbers(5)) s1 LEFT JOIN dict_flat d ON s1.key = d.key ORDER BY s1.key; +SELECT 'flat: left'; +SELECT * FROM (SELECT number AS key FROM numbers(5)) s1 LEFT JOIN dict_flat d USING(key) ORDER BY key; +SELECT 'flat: any left'; +SELECT * FROM (SELECT number AS key FROM numbers(5)) s1 ANY LEFT JOIN dict_flat d USING(key) ORDER BY key; +SELECT 'flat: semi left'; +SELECT * FROM (SELECT number AS key FROM numbers(5)) s1 SEMI JOIN dict_flat d USING(key) ORDER BY key; +SELECT 'flat: anti left'; +SELECT * FROM (SELECT number AS key FROM numbers(5)) s1 ANTI JOIN dict_flat d USING(key) ORDER BY key; +SELECT 'flat: inner'; +SELECT * FROM (SELECT number AS key FROM numbers(2)) s1 JOIN dict_flat d USING(key); +SELECT 'flat: inner on'; +SELECT * FROM (SELECT number AS k FROM numbers(100)) s1 JOIN dict_flat d ON k = key ORDER BY k; + +SET join_use_nulls = 1; + +SELECT 'hashed: left on'; +SELECT * FROM (SELECT number AS key FROM numbers(5)) s1 LEFT JOIN dict_hashed d ON s1.key = d.key ORDER BY s1.key; +SELECT 'hashed: left'; +SELECT * FROM (SELECT number AS key FROM numbers(5)) s1 LEFT JOIN dict_hashed d USING(key) ORDER BY key; +SELECT 'hashed: any left'; +SELECT * FROM (SELECT number AS key FROM numbers(5)) s1 ANY LEFT JOIN dict_hashed d USING(key) ORDER BY key; +SELECT 'hashed: semi left'; +SELECT * FROM (SELECT number AS key FROM numbers(5)) s1 SEMI JOIN dict_hashed d USING(key) ORDER BY key; +SELECT 'hashed: anti left'; +SELECT * FROM (SELECT number AS key FROM numbers(5)) s1 ANTI JOIN dict_hashed d USING(key) ORDER BY key; +SELECT 'hashed: inner'; +SELECT * FROM (SELECT number AS key FROM numbers(2)) s1 JOIN dict_hashed d USING(key); +SELECT 'hashed: inner on'; +SELECT * FROM (SELECT number AS k FROM numbers(100)) s1 JOIN dict_hashed d ON k = key ORDER BY k; + +SELECT 'complex_cache (smoke)'; +SELECT * FROM (SELECT number AS key FROM numbers(5)) s1 LEFT JOIN dict_complex_cache d ON s1.key = d.key ORDER BY s1.key; + +SELECT 'not optimized (smoke)'; +SELECT * FROM (SELECT number AS key FROM numbers(2)) s1 RIGHT JOIN dict_flat d USING(key) ORDER BY key; +SELECT '-'; +SELECT * FROM (SELECT number AS key FROM numbers(2)) s1 RIGHT JOIN dict_flat d ON s1.key = d.key ORDER BY d.key; +SELECT '-'; +SELECT * FROM (SELECT number + 2 AS key FROM numbers(4)) s1 FULL JOIN dict_flat d USING(key) ORDER BY s1.key, d.key; +SELECT '-'; +SELECT * FROM (SELECT number AS key FROM numbers(2)) s1 ANY INNER JOIN dict_flat d USING(key); +SELECT '-'; +SELECT * FROM (SELECT number AS key FROM numbers(2)) s1 ANY RIGHT JOIN dict_flat d USING(key); +SELECT '-'; +SELECT * FROM (SELECT number AS key FROM numbers(2)) s1 SEMI RIGHT JOIN dict_flat d USING(key); +SELECT '-'; +SELECT * FROM (SELECT number AS key FROM numbers(2)) s1 ANTI RIGHT JOIN dict_flat d USING(key); + +DROP DICTIONARY dict_flat; +DROP DICTIONARY dict_hashed; +DROP DICTIONARY dict_complex_cache; + +DROP TABLE t1; +DROP DATABASE IF EXISTS db_01115; diff --git a/tests/queries/0_stateless/01224_no_superfluous_dict_reload.reference b/tests/queries/0_stateless/01224_no_superfluous_dict_reload.reference index 5321624de02..524fbdd26fc 100644 --- a/tests/queries/0_stateless/01224_no_superfluous_dict_reload.reference +++ b/tests/queries/0_stateless/01224_no_superfluous_dict_reload.reference @@ -16,4 +16,9 @@ CREATE TABLE dict_db_01224_dictionary.`dict_db_01224.dict` `val` UInt64 ) ENGINE = Dictionary(`dict_db_01224.dict`) -LOADED +NOT_LOADED +Dictionary 1 CREATE DICTIONARY dict_db_01224.dict (`key` UInt64 DEFAULT 0, `val` UInt64 DEFAULT 10) PRIMARY KEY key SOURCE(CLICKHOUSE(HOST \'localhost\' PORT 9000 USER \'default\' TABLE \'dict_data\' PASSWORD \'\' DB \'dict_db_01224\')) LIFETIME(MIN 0 MAX 0) LAYOUT(FLAT()) +NOT_LOADED +key UInt64 +val UInt64 +NOT_LOADED diff --git a/tests/queries/0_stateless/01224_no_superfluous_dict_reload.sql b/tests/queries/0_stateless/01224_no_superfluous_dict_reload.sql index a6eed6f072c..cf8b2a471c4 100644 --- a/tests/queries/0_stateless/01224_no_superfluous_dict_reload.sql +++ b/tests/queries/0_stateless/01224_no_superfluous_dict_reload.sql @@ -1,6 +1,7 @@ DROP DATABASE IF EXISTS dict_db_01224; DROP DATABASE IF EXISTS dict_db_01224_dictionary; CREATE DATABASE dict_db_01224; +CREATE DATABASE dict_db_01224_dictionary Engine=Dictionary; CREATE TABLE dict_db_01224.dict_data (key UInt64, val UInt64) Engine=Memory(); CREATE DICTIONARY dict_db_01224.dict @@ -21,10 +22,15 @@ SELECT status FROM system.dictionaries WHERE database = 'dict_db_01224' AND name SHOW CREATE TABLE dict_db_01224.dict FORMAT TSVRaw; SELECT status FROM system.dictionaries WHERE database = 'dict_db_01224' AND name = 'dict'; -CREATE DATABASE dict_db_01224_dictionary Engine=Dictionary; SHOW CREATE TABLE dict_db_01224_dictionary.`dict_db_01224.dict` FORMAT TSVRaw; SELECT status FROM system.dictionaries WHERE database = 'dict_db_01224' AND name = 'dict'; +SELECT engine, metadata_path LIKE '%/metadata/dict\_db\_01224/dict.sql', create_table_query FROM system.tables WHERE database = 'dict_db_01224' AND name = 'dict'; +SELECT status FROM system.dictionaries WHERE database = 'dict_db_01224' AND name = 'dict'; + +SELECT name, type FROM system.columns WHERE database = 'dict_db_01224' AND table = 'dict'; +SELECT status FROM system.dictionaries WHERE database = 'dict_db_01224' AND name = 'dict'; + DROP DICTIONARY dict_db_01224.dict; SELECT status FROM system.dictionaries WHERE database = 'dict_db_01224' AND name = 'dict'; diff --git a/tests/queries/0_stateless/01225_drop_dictionary_as_table.sql b/tests/queries/0_stateless/01225_drop_dictionary_as_table.sql index 045775aec2b..866f2dff56b 100644 --- a/tests/queries/0_stateless/01225_drop_dictionary_as_table.sql +++ b/tests/queries/0_stateless/01225_drop_dictionary_as_table.sql @@ -13,8 +13,8 @@ LIFETIME(MIN 0 MAX 0) LAYOUT(FLAT()); SYSTEM RELOAD DICTIONARY dict_db_01225.dict; -DROP TABLE dict_db_01225.dict; -- { serverError 60; } --- Regression: --- Code: 1000. DB::Exception: Received from localhost:9000. DB::Exception: File not found: ./metadata/dict_db_01225/dict.sql. + +DROP TABLE dict_db_01225.dict; -- { serverError 520; } DROP DICTIONARY dict_db_01225.dict; + DROP DATABASE dict_db_01225; diff --git a/tests/queries/0_stateless/01225_show_create_table_from_dictionary.sql b/tests/queries/0_stateless/01225_show_create_table_from_dictionary.sql index 7550d5292d0..a494511ebd8 100644 --- a/tests/queries/0_stateless/01225_show_create_table_from_dictionary.sql +++ b/tests/queries/0_stateless/01225_show_create_table_from_dictionary.sql @@ -15,7 +15,7 @@ LIFETIME(MIN 0 MAX 0) LAYOUT(FLAT()); SHOW CREATE TABLE dict_db_01225_dictionary.`dict_db_01225.dict` FORMAT TSVRaw; -SHOW CREATE TABLE dict_db_01225_dictionary.`dict_db_01225.no_such_dict`; -- { serverError 36; } +SHOW CREATE TABLE dict_db_01225_dictionary.`dict_db_01225.no_such_dict`; -- { serverError 487; } DROP DATABASE dict_db_01225; DROP DATABASE dict_db_01225_dictionary; diff --git a/tests/queries/0_stateless/01246_buffer_flush.reference b/tests/queries/0_stateless/01246_buffer_flush.reference new file mode 100644 index 00000000000..a877e94b919 --- /dev/null +++ b/tests/queries/0_stateless/01246_buffer_flush.reference @@ -0,0 +1,10 @@ +min +0 +5 +max +5 +10 +direct +20 +drop +30 diff --git a/tests/queries/0_stateless/01246_buffer_flush.sql b/tests/queries/0_stateless/01246_buffer_flush.sql new file mode 100644 index 00000000000..efe0adf703a --- /dev/null +++ b/tests/queries/0_stateless/01246_buffer_flush.sql @@ -0,0 +1,44 @@ +drop table if exists data_01256; +drop table if exists buffer_01256; + +create table data_01256 as system.numbers Engine=Memory(); + +select 'min'; +create table buffer_01256 as system.numbers Engine=Buffer(currentDatabase(), data_01256, 1, + 2, 100, /* time */ + 4, 100, /* rows */ + 1, 1e6 /* bytes */ +); +insert into buffer_01256 select * from system.numbers limit 5; +select count() from data_01256; +-- sleep 2 (min time) + 1 (round up) + bias (1) = 4 +select sleepEachRow(2) from numbers(2) FORMAT Null; +select count() from data_01256; +drop table buffer_01256; + +select 'max'; +create table buffer_01256 as system.numbers Engine=Buffer(currentDatabase(), data_01256, 1, + 100, 2, /* time */ + 0, 100, /* rows */ + 0, 1e6 /* bytes */ +); +insert into buffer_01256 select * from system.numbers limit 5; +select count() from data_01256; +-- sleep 2 (min time) + 1 (round up) + bias (1) = 4 +select sleepEachRow(2) from numbers(2) FORMAT Null; +select count() from data_01256; +drop table buffer_01256; + +select 'direct'; +create table buffer_01256 as system.numbers Engine=Buffer(currentDatabase(), data_01256, 1, + 100, 100, /* time */ + 0, 9, /* rows */ + 0, 1e6 /* bytes */ +); +insert into buffer_01256 select * from system.numbers limit 10; +select count() from data_01256; + +select 'drop'; +insert into buffer_01256 select * from system.numbers limit 10; +drop table if exists buffer_01256; +select count() from data_01256; diff --git a/tests/queries/0_stateless/01246_least_greatest_generic.reference b/tests/queries/0_stateless/01246_least_greatest_generic.reference new file mode 100644 index 00000000000..24c2233eed2 --- /dev/null +++ b/tests/queries/0_stateless/01246_least_greatest_generic.reference @@ -0,0 +1,22 @@ +hello +world + +z +hello +world +1 +\N +\N +nan +inf +-0 +123 +-1 +4294967295 +['world'] +[[[]]] +[[[],[]]] +[] +[NULL] +[0] +[NULL] diff --git a/tests/queries/0_stateless/01246_least_greatest_generic.sql b/tests/queries/0_stateless/01246_least_greatest_generic.sql new file mode 100644 index 00000000000..f0dceabfcb5 --- /dev/null +++ b/tests/queries/0_stateless/01246_least_greatest_generic.sql @@ -0,0 +1,36 @@ +SELECT least('hello', 'world'); +SELECT greatest('hello', 'world'); +SELECT least('hello', 'world', ''); +SELECT greatest('hello', 'world', 'z'); + +SELECT least('hello'); +SELECT greatest('world'); + +SELECT least(1, inf, nan); +SELECT least(1, inf, nan, NULL); +SELECT greatest(1, inf, nan, NULL); +SELECT greatest(1, inf, nan); +SELECT greatest(1, inf); + +SELECT least(0., -0.); +SELECT least(toNullable(123), 456); + +-- This can be improved +SELECT LEAST(-1, 18446744073709551615); -- { serverError 43 } +SELECT LEAST(-1., 18446744073709551615); -- { serverError 43 } + +SELECT LEAST(-1., 18446744073709551615.); +SELECT greatest(-1, 1, 4294967295); + +SELECT greatest([], ['hello'], ['world']); + +SELECT least([[[], []]], [[[]]], [[[]], [[]]]); +SELECT greatest([[[], []]], [[[]]], [[[]], [[]]]); + +SELECT least([], [NULL]); +SELECT greatest([], [NULL]); + +SELECT LEAST([NULL], [0]); +SELECT GREATEST([NULL], [0]); + +SELECT Greatest(); -- { serverError 42 } diff --git a/tests/queries/0_stateless/01247_least_greatest_filimonov.reference b/tests/queries/0_stateless/01247_least_greatest_filimonov.reference new file mode 100644 index 00000000000..5b3b2abada8 --- /dev/null +++ b/tests/queries/0_stateless/01247_least_greatest_filimonov.reference @@ -0,0 +1,3 @@ +2 +767 +C diff --git a/tests/queries/0_stateless/01247_least_greatest_filimonov.sql b/tests/queries/0_stateless/01247_least_greatest_filimonov.sql new file mode 100644 index 00000000000..b845d65dcb9 --- /dev/null +++ b/tests/queries/0_stateless/01247_least_greatest_filimonov.sql @@ -0,0 +1,3 @@ +SELECT GREATEST(2,0); +SELECT GREATEST(34.0,3.0,5.0,767.0); +SELECT GREATEST('B','A','C'); diff --git a/tests/queries/0_stateless/01248_least_greatest_mixed_const.reference b/tests/queries/0_stateless/01248_least_greatest_mixed_const.reference new file mode 100644 index 00000000000..bbdc93bd5ee --- /dev/null +++ b/tests/queries/0_stateless/01248_least_greatest_mixed_const.reference @@ -0,0 +1,10 @@ +0 6 +1 6 +2 6 +3 6 +4 6 +4 6 +4 6 +4 7 +4 8 +4 9 diff --git a/tests/queries/0_stateless/01248_least_greatest_mixed_const.sql b/tests/queries/0_stateless/01248_least_greatest_mixed_const.sql new file mode 100644 index 00000000000..3fcf20623d6 --- /dev/null +++ b/tests/queries/0_stateless/01248_least_greatest_mixed_const.sql @@ -0,0 +1 @@ +SELECT least(4, number, 6), greatest(4, number, 6) FROM numbers(10); diff --git a/tests/queries/1_stateful/00065_loyalty_with_storage_join.sql b/tests/queries/1_stateful/00065_loyalty_with_storage_join.sql index d3e73faa7be..15a2a75cf58 100644 --- a/tests/queries/1_stateful/00065_loyalty_with_storage_join.sql +++ b/tests/queries/1_stateful/00065_loyalty_with_storage_join.sql @@ -1,16 +1,14 @@ -SET any_join_distinct_right_table_keys = 1; - USE test; DROP TABLE IF EXISTS join; -CREATE TABLE join (UserID UInt64, loyalty Int8) ENGINE = Join(ANY, INNER, UserID); +CREATE TABLE join (UserID UInt64, loyalty Int8) ENGINE = Join(SEMI, LEFT, UserID); INSERT INTO join SELECT UserID, toInt8(if((sum(SearchEngineID = 2) AS yandex) > (sum(SearchEngineID = 3) AS google), - yandex / (yandex + google), - -google / (yandex + google)) * 10) AS loyalty + yandex / (yandex + google), + -google / (yandex + google)) * 10) AS loyalty FROM hits WHERE (SearchEngineID = 2) OR (SearchEngineID = 3) GROUP BY UserID @@ -19,17 +17,17 @@ HAVING (yandex + google) > 10; SELECT loyalty, count() -FROM hits ANY INNER JOIN join USING UserID +FROM hits SEMI LEFT JOIN join USING UserID GROUP BY loyalty ORDER BY loyalty ASC; DETACH TABLE join; -ATTACH TABLE join (UserID UInt64, loyalty Int8) ENGINE = Join(ANY, INNER, UserID); +ATTACH TABLE join (UserID UInt64, loyalty Int8) ENGINE = Join(SEMI, LEFT, UserID); SELECT loyalty, count() -FROM hits ANY INNER JOIN join USING UserID +FROM hits SEMI LEFT JOIN join USING UserID GROUP BY loyalty ORDER BY loyalty ASC; diff --git a/utils/junit_to_html/junit_to_html b/utils/junit_to_html/junit_to_html index d6bebccbf9f..cf50e7df00a 100755 --- a/utils/junit_to_html/junit_to_html +++ b/utils/junit_to_html/junit_to_html @@ -1,24 +1,86 @@ #!/usr/bin/env python # -*- coding: utf-8 -*- import os -import sys import lxml.etree as etree +import json +import argparse -def _convert_junit_to_html(junit_path, html_path): +def export_testcases_json(report, path): + with open(os.path.join(path, "cases.jer"), "w") as testcases_file: + for testsuite in report.getroot(): + for testcase in testsuite: + row = {} + row["hostname"] = testsuite.get("hostname") + row["suite"] = testsuite.get("name") + row["suite_duration"] = testsuite.get("time") + row["timestamp"] = testsuite.get("timestamp") + row["testname"] = testcase.get("name") + row["classname"] = testcase.get("classname") + row["file"] = testcase.get("file") + row["line"] = testcase.get("line") + row["duration"] = testcase.get("time") + for el in testcase: + if el.tag == "system-err": + row["stderr"] = el.text + else: + row["stderr"] = "" + + if el.tag == "system-out": + row["stdout"] = el.text + else: + row["stdout"] = "" + + json.dump(row, testcases_file) + testcases_file.write("\n") + +def export_testsuites_json(report, path): + with open(os.path.join(path, "suites.jer"), "w") as testsuites_file: + for testsuite in report.getroot(): + row = {} + row["suite"] = testsuite.get("name") + row["errors"] = testsuite.get("errors") + row["failures"] = testsuite.get("failures") + row["hostname"] = testsuite.get("hostname") + row["skipped"] = testsuite.get("skipped") + row["duration"] = testsuite.get("time") + row["timestamp"] = testsuite.get("timestamp") + json.dump(row, testsuites_file) + testsuites_file.write("\n") + + +def _convert_junit_to_html(junit_path, result_path, export_cases, export_suites): with open(os.path.join(os.path.dirname(__file__), "junit-noframes.xsl")) as xslt_file: junit_to_html_xslt = etree.parse(xslt_file) + if not os.path.exists(result_path): + os.makedirs(result_path) + with open(junit_path) as junit_file: junit_xml = etree.parse(junit_file) + + if export_suites: + export_testsuites_json(junit_xml, result_path) + if export_cases: + export_testcases_json(junit_xml, result_path) transform = etree.XSLT(junit_to_html_xslt) html = etree.tostring(transform(junit_xml), encoding="utf-8") - html_dir = os.path.dirname(html_path) - if not os.path.exists(html_dir): - os.makedirs(html_dir) - with open(html_path, "w") as html_file: + + with open(os.path.join(result_path, "result.html"), "w") as html_file: html_file.write(html) if __name__ == "__main__": - if len(sys.argv) < 3: - raise "Insufficient arguments: junit.xml result.html", level - junit_path, html_path = sys.argv[1] , sys.argv[2] - _convert_junit_to_html(junit_path, html_path) + + parser = argparse.ArgumentParser(description='Convert JUnit XML.') + parser.add_argument('junit', help='path to junit.xml report') + parser.add_argument('result_dir', nargs='?', help='directory for result files. Default to junit.xml directory') + parser.add_argument('--export-cases', help='Export JSONEachRow result for testcases to upload in CI', action='store_true') + parser.add_argument('--export-suites', help='Export JSONEachRow result for testsuites to upload in CI', action='store_true') + + args = parser.parse_args() + + junit_path = args.junit + if args.result_dir: + result_path = args.result_dir + else: + result_path = os.path.dirname(junit_path) + print "junit_path: {}, result_path: {}, export cases:{}, export suites: {}".format(junit_path, result_path, args.export_cases, args.export_suites) + _convert_junit_to_html(junit_path, result_path, args.export_cases, args.export_suites) diff --git a/utils/upload_test_results/README.md b/utils/upload_test_results/README.md new file mode 100644 index 00000000000..e6b361081a2 --- /dev/null +++ b/utils/upload_test_results/README.md @@ -0,0 +1,34 @@ +## Tool to upload results to CI ClickHouse + +Currently allows to upload results from `junit_to_html` tool to ClickHouse CI + +``` +usage: upload_test_results [-h] --sha SHA --pr PR --file FILE --type + {suites,cases} [--user USER] --password PASSWORD + [--ca-cert CA_CERT] [--host HOST] [--db DB] + +Upload test result to CI ClickHouse. + +optional arguments: + -h, --help show this help message and exit + --sha SHA sha of current commit + --pr PR pr of current commit. 0 for master + --file FILE file to upload + --type {suites,cases} + Export type + --user USER user name + --password PASSWORD password + --ca-cert CA_CERT CA certificate path + --host HOST CI ClickHouse host + --db DB CI ClickHouse database name +``` + +$ ./upload_test_results --sha "cf7eaee3301d4634acdacbfa308ddbe0cc6a061d" --pr "0" --file xyz/cases.jer --type cases --password $PASSWD + +CI checks has single commit sha and pr identifier. +While uploading your local results for testing purposes try to use correct sha and pr. + +CA Certificate for ClickHouse CI can be obtained from Yandex.Cloud where CI database is hosted +``` bash +wget "https://storage.yandexcloud.net/cloud-certs/CA.pem" -O YandexInternalRootCA.crt +``` \ No newline at end of file diff --git a/utils/upload_test_results/upload_test_results b/utils/upload_test_results/upload_test_results new file mode 100755 index 00000000000..058a73d8081 --- /dev/null +++ b/utils/upload_test_results/upload_test_results @@ -0,0 +1,127 @@ +#!/usr/bin/env python +import requests +import argparse + +# CREATE TABLE test_suites +# ( +# sha String, +# pr UInt16, +# suite String, +# errors UInt16, +# failures UInt16, +# hostname String, +# skipped UInt16, +# duration Double, +# timestamp DateTime +# ) ENGINE = MergeTree ORDER BY tuple(timestamp, suite); + +QUERY_SUITES="INSERT INTO test_suites "\ + "SELECT '{sha}' AS sha, "\ + "{pr} AS pr, "\ + "suite, "\ + "errors, "\ + "failures, "\ + "hostname, "\ + "skipped, "\ + "duration, "\ + "timestamp "\ + "FROM input('"\ + "suite String, "\ + "errors UInt16, "\ + "failures UInt16, "\ + "hostname String, "\ + "skipped UInt16, "\ + "duration Double, "\ + "timestamp DateTime"\ + "') FORMAT JSONEachRow" + +# CREATE TABLE test_cases +# ( +# sha String, +# pr UInt16, +# hostname String, +# suite String, +# timestamp DateTime, +# testname String, +# classname String, +# file String, +# line UInt16, +# duration Double, +# suite_duration Double, +# stderr String, +# stdout String +# ) ENGINE = MergeTree ORDER BY tuple(timestamp, testname); + +QUERY_CASES="INSERT INTO test_cases "\ + "SELECT '{sha}' AS sha, "\ + "{pr} AS pr, "\ + "hostname, "\ + "suite, "\ + "timestamp, "\ + "testname, "\ + "classname, "\ + "file, "\ + "line, "\ + "duration, "\ + "suite_duration, "\ + "stderr,"\ + "stdout "\ + "FROM input('"\ + "hostname String, "\ + "suite String, "\ + "timestamp DateTime, "\ + "testname String, "\ + "classname String, "\ + "file String, "\ + "line UInt16, "\ + "duration Double, "\ + "suite_duration Double, "\ + "stderr String, "\ + "stdout String"\ + "') FORMAT JSONEachRow" + + +def upload_request(sha, pr, file, q_type, user, password, ca_cert, host, db): + with open(file) as upload_f: + query = QUERY_SUITES if q_type=="suites" else QUERY_CASES + query = query.format(sha=sha, pr=pr) + url = 'https://{host}:8443/?database={db}&query={query}&date_time_input_format=best_effort'.format( + host=host, + db=db, + query=query + ) + data=upload_f + auth = { + 'X-ClickHouse-User': user, + 'X-ClickHouse-Key': password, + } + + print query; + + res = requests.post( + url, + data=data, + headers=auth, + verify=ca_cert) + res.raise_for_status() + return res.text + +if __name__ == "__main__": + + parser = argparse.ArgumentParser(description='Upload test result to CI ClickHouse.') + parser.add_argument('--sha', help='sha of current commit', type=str, required=True) + parser.add_argument('--pr', help='pr of current commit. 0 for master', type=int, required=True) + parser.add_argument('--file', help='file to upload', required=True) + parser.add_argument('--type', help='Export type', choices=['suites', 'cases'] , required=True) + parser.add_argument('--user', help='user name', type=str, default="clickhouse-ci") + parser.add_argument('--password', help='password', type=str, required=True) + parser.add_argument('--ca-cert', help='CA certificate path', type=str, default="/usr/local/share/ca-certificates/YandexInternalRootCA.crt") + parser.add_argument('--host', help='CI ClickHouse host', type=str, default="c1a-ity5agjmuhyu6nu9.mdb.yandexcloud.net") + parser.add_argument('--db', help='CI ClickHouse database name', type=str, default="clickhouse-ci") + + args = parser.parse_args() + + print(upload_request(args.sha, args.pr, args.file, args.type, args.user, args.password, args.ca_cert, args.host, args.db)) + + + diff --git a/website/images/clickhouse-3069x1531.png b/website/images/clickhouse-3069x1531.png new file mode 100644 index 00000000000..f143c1d1518 Binary files /dev/null and b/website/images/clickhouse-3069x1531.png differ diff --git a/website/images/flags/tr.svg b/website/images/flags/tr.svg index 30524de46d8..7f9f90a079d 100644 --- a/website/images/flags/tr.svg +++ b/website/images/flags/tr.svg @@ -1,8 +1 @@ - - - - - - - + \ No newline at end of file diff --git a/website/js/base.js b/website/js/base.js index 4e43a44d63a..fa0e8431839 100644 --- a/website/js/base.js +++ b/website/js/base.js @@ -73,7 +73,7 @@ f = function () { n.parentNode.insertBefore(s, n); }; s.type = "text/javascript"; s.async = true; - s.src = "https://mc.yandex.ru/metrika/tag.js"; + s.src = "/js/metrika.js"; if (w.opera == "[object Opera]") { d.addEventListener("DOMContentLoaded", f, false); diff --git a/website/templates/common_meta.html b/website/templates/common_meta.html index 2aca17f93a2..0fd1f474b17 100644 --- a/website/templates/common_meta.html +++ b/website/templates/common_meta.html @@ -1,24 +1,27 @@ +{% set description = description or _('ClickHouse is a fast open-source column-oriented database management system that allows generating analytical data reports in real-time using SQL queries') %} -{% if title %}{{ title }}{% else %}{{ _('ClickHouse DBMS') }}{% endif %} +{% if title %}{{ title }}{% else %}{{ _('ClickHouse - fast open-source OLAP DBMS') }}{% endif %} - - - - + + + +{% if page %} + +{% else %} + +{% endif %} - + + content="ClickHouse, DBMS, OLAP, SQL, open-source, relational, analytics, analytical, Big Data, web-analytics" /> {% for prefetch_item in prefetch_items %} diff --git a/website/templates/index/success.html b/website/templates/index/success.html index 961dc859535..83b5c1427c9 100644 --- a/website/templates/index/success.html +++ b/website/templates/index/success.html @@ -13,7 +13,7 @@