mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-22 15:42:02 +00:00
Merge branch 'master' into async-metric-memory-usage
This commit is contained in:
commit
25f3184dbe
106
CHANGELOG.md
106
CHANGELOG.md
@ -1,11 +1,68 @@
|
||||
## ClickHouse release v20.3
|
||||
|
||||
### ClickHouse release v20.3.7.46, 2020-04-17
|
||||
|
||||
#### Bug Fix
|
||||
|
||||
* Fix `Logical error: CROSS JOIN has expressions` error for queries with comma and names joins mix. [#10311](https://github.com/ClickHouse/ClickHouse/pull/10311) ([Artem Zuikov](https://github.com/4ertus2)).
|
||||
* Fix queries with `max_bytes_before_external_group_by`. [#10302](https://github.com/ClickHouse/ClickHouse/pull/10302) ([Artem Zuikov](https://github.com/4ertus2)).
|
||||
* Fix move-to-prewhere optimization in presense of arrayJoin functions (in certain cases). This fixes [#10092](https://github.com/ClickHouse/ClickHouse/issues/10092). [#10195](https://github.com/ClickHouse/ClickHouse/pull/10195) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Add the ability to relax the restriction on non-deterministic functions usage in mutations with `allow_nondeterministic_mutations` setting. [#10186](https://github.com/ClickHouse/ClickHouse/pull/10186) ([filimonov](https://github.com/filimonov)).
|
||||
|
||||
### ClickHouse release v20.3.6.40, 2020-04-16
|
||||
|
||||
#### New Feature
|
||||
|
||||
* Added function `isConstant`. This function checks whether its argument is constant expression and returns 1 or 0. It is intended for development, debugging and demonstration purposes. [#10198](https://github.com/ClickHouse/ClickHouse/pull/10198) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
|
||||
#### Bug Fix
|
||||
|
||||
* Fix error `Pipeline stuck` with `max_rows_to_group_by` and `group_by_overflow_mode = 'break'`. [#10279](https://github.com/ClickHouse/ClickHouse/pull/10279) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix rare possible exception `Cannot drain connections: cancel first`. [#10239](https://github.com/ClickHouse/ClickHouse/pull/10239) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fixed bug where ClickHouse would throw "Unknown function lambda." error message when user tries to run ALTER UPDATE/DELETE on tables with ENGINE = Replicated*. Check for nondeterministic functions now handles lambda expressions correctly. [#10237](https://github.com/ClickHouse/ClickHouse/pull/10237) ([Alexander Kazakov](https://github.com/Akazz)).
|
||||
* Fixed "generateRandom" function for Date type. This fixes [#9973](https://github.com/ClickHouse/ClickHouse/issues/9973). Fix an edge case when dates with year 2106 are inserted to MergeTree tables with old-style partitioning but partitions are named with year 1970. [#10218](https://github.com/ClickHouse/ClickHouse/pull/10218) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Convert types if the table definition of a View does not correspond to the SELECT query. This fixes [#10180](https://github.com/ClickHouse/ClickHouse/issues/10180) and [#10022](https://github.com/ClickHouse/ClickHouse/issues/10022). [#10217](https://github.com/ClickHouse/ClickHouse/pull/10217) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fix `parseDateTimeBestEffort` for strings in RFC-2822 when day of week is Tuesday or Thursday. This fixes [#10082](https://github.com/ClickHouse/ClickHouse/issues/10082). [#10214](https://github.com/ClickHouse/ClickHouse/pull/10214) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fix column names of constants inside JOIN that may clash with names of constants outside of JOIN. [#10207](https://github.com/ClickHouse/ClickHouse/pull/10207) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fix possible inifinite query execution when the query actually should stop on LIMIT, while reading from infinite source like `system.numbers` or `system.zeros`. [#10206](https://github.com/ClickHouse/ClickHouse/pull/10206) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix using the current database for access checking when the database isn't specified. [#10192](https://github.com/ClickHouse/ClickHouse/pull/10192) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Convert blocks if structure does not match on INSERT into Distributed(). [#10135](https://github.com/ClickHouse/ClickHouse/pull/10135) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix possible incorrect result for extremes in processors pipeline. [#10131](https://github.com/ClickHouse/ClickHouse/pull/10131) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix some kinds of alters with compact parts. [#10130](https://github.com/ClickHouse/ClickHouse/pull/10130) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Fix incorrect `index_granularity_bytes` check while creating new replica. Fixes [#10098](https://github.com/ClickHouse/ClickHouse/issues/10098). [#10121](https://github.com/ClickHouse/ClickHouse/pull/10121) ([alesapin](https://github.com/alesapin)).
|
||||
* Fix SIGSEGV on INSERT into Distributed table when its structure differs from the underlying tables. [#10105](https://github.com/ClickHouse/ClickHouse/pull/10105) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix possible rows loss for queries with `JOIN` and `UNION ALL`. Fixes [#9826](https://github.com/ClickHouse/ClickHouse/issues/9826), [#10113](https://github.com/ClickHouse/ClickHouse/issues/10113). [#10099](https://github.com/ClickHouse/ClickHouse/pull/10099) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fixed replicated tables startup when updating from an old ClickHouse version where `/table/replicas/replica_name/metadata` node doesn't exist. Fixes [#10037](https://github.com/ClickHouse/ClickHouse/issues/10037). [#10095](https://github.com/ClickHouse/ClickHouse/pull/10095) ([alesapin](https://github.com/alesapin)).
|
||||
* Add some arguments check and support identifier arguments for MySQL Database Engine. [#10077](https://github.com/ClickHouse/ClickHouse/pull/10077) ([Winter Zhang](https://github.com/zhang2014)).
|
||||
* Fix bug in clickhouse dictionary source from localhost clickhouse server. The bug may lead to memory corruption if types in dictionary and source are not compatible. [#10071](https://github.com/ClickHouse/ClickHouse/pull/10071) ([alesapin](https://github.com/alesapin)).
|
||||
* Fix bug in `CHECK TABLE` query when table contain skip indices. [#10068](https://github.com/ClickHouse/ClickHouse/pull/10068) ([alesapin](https://github.com/alesapin)).
|
||||
* Fix error `Cannot clone block with columns because block has 0 columns ... While executing GroupingAggregatedTransform`. It happened when setting `distributed_aggregation_memory_efficient` was enabled, and distributed query read aggregating data with different level from different shards (mixed single and two level aggregation). [#10063](https://github.com/ClickHouse/ClickHouse/pull/10063) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix a segmentation fault that could occur in GROUP BY over string keys containing trailing zero bytes ([#8636](https://github.com/ClickHouse/ClickHouse/issues/8636), [#8925](https://github.com/ClickHouse/ClickHouse/issues/8925)). [#10025](https://github.com/ClickHouse/ClickHouse/pull/10025) ([Alexander Kuzmenkov](https://github.com/akuzm)).
|
||||
* Fix parallel distributed INSERT SELECT for remote table. This PR fixes the solution provided in [#9759](https://github.com/ClickHouse/ClickHouse/pull/9759). [#9999](https://github.com/ClickHouse/ClickHouse/pull/9999) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Fix the number of threads used for remote query execution (performance regression, since 20.3). This happened when query from `Distributed` table was executed simultaneously on local and remote shards. Fixes [#9965](https://github.com/ClickHouse/ClickHouse/issues/9965). [#9971](https://github.com/ClickHouse/ClickHouse/pull/9971) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix bug in which the necessary tables weren't retrieved at one of the processing stages of queries to some databases. Fixes [#9699](https://github.com/ClickHouse/ClickHouse/issues/9699). [#9949](https://github.com/ClickHouse/ClickHouse/pull/9949) ([achulkov2](https://github.com/achulkov2)).
|
||||
* Fix 'Not found column in block' error when `JOIN` appears with `TOTALS`. Fixes [#9839](https://github.com/ClickHouse/ClickHouse/issues/9839). [#9939](https://github.com/ClickHouse/ClickHouse/pull/9939) ([Artem Zuikov](https://github.com/4ertus2)).
|
||||
* Fix a bug with `ON CLUSTER` DDL queries freezing on server startup. [#9927](https://github.com/ClickHouse/ClickHouse/pull/9927) ([Gagan Arneja](https://github.com/garneja)).
|
||||
* Fix parsing multiple hosts set in the CREATE USER command, e.g. `CREATE USER user6 HOST NAME REGEXP 'lo.?*host', NAME REGEXP 'lo*host'`. [#9924](https://github.com/ClickHouse/ClickHouse/pull/9924) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Fix `TRUNCATE` for Join table engine ([#9917](https://github.com/ClickHouse/ClickHouse/issues/9917)). [#9920](https://github.com/ClickHouse/ClickHouse/pull/9920) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Fix "scalar doesn't exist" error in ALTERs ([#9878](https://github.com/ClickHouse/ClickHouse/issues/9878)). [#9904](https://github.com/ClickHouse/ClickHouse/pull/9904) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Fix race condition between drop and optimize in `ReplicatedMergeTree`. [#9901](https://github.com/ClickHouse/ClickHouse/pull/9901) ([alesapin](https://github.com/alesapin)).
|
||||
* Fix error with qualified names in `distributed_product_mode='local'`. Fixes [#4756](https://github.com/ClickHouse/ClickHouse/issues/4756). [#9891](https://github.com/ClickHouse/ClickHouse/pull/9891) ([Artem Zuikov](https://github.com/4ertus2)).
|
||||
* Fix calculating grants for introspection functions from the setting 'allow_introspection_functions'. [#9840](https://github.com/ClickHouse/ClickHouse/pull/9840) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
|
||||
#### Build/Testing/Packaging Improvement
|
||||
|
||||
* Fix integration test `test_settings_constraints`. [#9962](https://github.com/ClickHouse/ClickHouse/pull/9962) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Removed dependency on `clock_getres`. [#9833](https://github.com/ClickHouse/ClickHouse/pull/9833) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
|
||||
|
||||
### ClickHouse release v20.3.5.21, 2020-03-27
|
||||
|
||||
#### Bug Fix
|
||||
|
||||
* Fix 'Different expressions with the same alias' error when query has PREWHERE and WHERE on distributed table and `SET distributed_product_mode = 'local'`. [#9871](https://github.com/ClickHouse/ClickHouse/pull/9871) ([Artem Zuikov](https://github.com/4ertus2)).
|
||||
* Fix mutations excessive memory consumption for tables with a composite primary key. This fixes [#9850](https://github.com/ClickHouse/ClickHouse/issues/9850). [#9860](https://github.com/ClickHouse/ClickHouse/pull/9860) ([alesapin](https://github.com/alesapin)).
|
||||
* For INSERT queries shard now clamps the settings got from the initiator to the shard's constaints instead of throwing an exception. This fix allows to send INSERT queries to a shard with another constraints. This change improves fix [#9447](https://github.com/ClickHouse/ClickHouse/issues/9447). [#9852](https://github.com/ClickHouse/ClickHouse/pull/9852) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Fix 'COMMA to CROSS JOIN rewriter is not enabled or cannot rewrite query' error in case of subqueries with COMMA JOIN out of tables lists (i.e. in WHERE). Fixes [#9782](https://github.com/ClickHouse/ClickHouse/issues/9782). [#9830](https://github.com/ClickHouse/ClickHouse/pull/9830) ([Artem Zuikov](https://github.com/4ertus2)).
|
||||
* Fix possible exception `Got 0 in totals chunk, expected 1` on client. It happened for queries with `JOIN` in case if right joined table had zero rows. Example: `select * from system.one t1 join system.one t2 on t1.dummy = t2.dummy limit 0 FORMAT TabSeparated;`. Fixes [#9777](https://github.com/ClickHouse/ClickHouse/issues/9777). [#9823](https://github.com/ClickHouse/ClickHouse/pull/9823) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix SIGSEGV with optimize_skip_unused_shards when type cannot be converted. [#9804](https://github.com/ClickHouse/ClickHouse/pull/9804) ([Azat Khuzhin](https://github.com/azat)).
|
||||
@ -273,6 +330,55 @@
|
||||
|
||||
## ClickHouse release v20.1
|
||||
|
||||
### ClickHouse release v20.1.10.70, 2020-04-17
|
||||
|
||||
#### Bug Fix
|
||||
|
||||
* Fix rare possible exception `Cannot drain connections: cancel first`. [#10239](https://github.com/ClickHouse/ClickHouse/pull/10239) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fixed bug where ClickHouse would throw `'Unknown function lambda.'` error message when user tries to run `ALTER UPDATE/DELETE` on tables with `ENGINE = Replicated*`. Check for nondeterministic functions now handles lambda expressions correctly. [#10237](https://github.com/ClickHouse/ClickHouse/pull/10237) ([Alexander Kazakov](https://github.com/Akazz)).
|
||||
* Fix `parseDateTimeBestEffort` for strings in RFC-2822 when day of week is Tuesday or Thursday. This fixes [#10082](https://github.com/ClickHouse/ClickHouse/issues/10082). [#10214](https://github.com/ClickHouse/ClickHouse/pull/10214) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fix column names of constants inside `JOIN` that may clash with names of constants outside of `JOIN`. [#10207](https://github.com/ClickHouse/ClickHouse/pull/10207) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fix possible inifinite query execution when the query actually should stop on LIMIT, while reading from infinite source like `system.numbers` or `system.zeros`. [#10206](https://github.com/ClickHouse/ClickHouse/pull/10206) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix move-to-prewhere optimization in presense of `arrayJoin` functions (in certain cases). This fixes [#10092](https://github.com/ClickHouse/ClickHouse/issues/10092). [#10195](https://github.com/ClickHouse/ClickHouse/pull/10195) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Add the ability to relax the restriction on non-deterministic functions usage in mutations with `allow_nondeterministic_mutations` setting. [#10186](https://github.com/ClickHouse/ClickHouse/pull/10186) ([filimonov](https://github.com/filimonov)).
|
||||
* Convert blocks if structure does not match on `INSERT` into table with `Distributed` engine. [#10135](https://github.com/ClickHouse/ClickHouse/pull/10135) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix `SIGSEGV` on `INSERT` into `Distributed` table when its structure differs from the underlying tables. [#10105](https://github.com/ClickHouse/ClickHouse/pull/10105) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix possible rows loss for queries with `JOIN` and `UNION ALL`. Fixes [#9826](https://github.com/ClickHouse/ClickHouse/issues/9826), [#10113](https://github.com/ClickHouse/ClickHouse/issues/10113). [#10099](https://github.com/ClickHouse/ClickHouse/pull/10099) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Add arguments check and support identifier arguments for MySQL Database Engine. [#10077](https://github.com/ClickHouse/ClickHouse/pull/10077) ([Winter Zhang](https://github.com/zhang2014)).
|
||||
* Fix bug in clickhouse dictionary source from localhost clickhouse server. The bug may lead to memory corruption if types in dictionary and source are not compatible. [#10071](https://github.com/ClickHouse/ClickHouse/pull/10071) ([alesapin](https://github.com/alesapin)).
|
||||
* Fix error `Cannot clone block with columns because block has 0 columns ... While executing GroupingAggregatedTransform`. It happened when setting `distributed_aggregation_memory_efficient` was enabled, and distributed query read aggregating data with different level from different shards (mixed single and two level aggregation). [#10063](https://github.com/ClickHouse/ClickHouse/pull/10063) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix a segmentation fault that could occur in `GROUP BY` over string keys containing trailing zero bytes ([#8636](https://github.com/ClickHouse/ClickHouse/issues/8636), [#8925](https://github.com/ClickHouse/ClickHouse/issues/8925)). [#10025](https://github.com/ClickHouse/ClickHouse/pull/10025) ([Alexander Kuzmenkov](https://github.com/akuzm)).
|
||||
* Fix bug in which the necessary tables weren't retrieved at one of the processing stages of queries to some databases. Fixes [#9699](https://github.com/ClickHouse/ClickHouse/issues/9699). [#9949](https://github.com/ClickHouse/ClickHouse/pull/9949) ([achulkov2](https://github.com/achulkov2)).
|
||||
* Fix `'Not found column in block'` error when `JOIN` appears with `TOTALS`. Fixes [#9839](https://github.com/ClickHouse/ClickHouse/issues/9839). [#9939](https://github.com/ClickHouse/ClickHouse/pull/9939) ([Artem Zuikov](https://github.com/4ertus2)).
|
||||
* Fix a bug with `ON CLUSTER` DDL queries freezing on server startup. [#9927](https://github.com/ClickHouse/ClickHouse/pull/9927) ([Gagan Arneja](https://github.com/garneja)).
|
||||
* Fix `TRUNCATE` for Join table engine ([#9917](https://github.com/ClickHouse/ClickHouse/issues/9917)). [#9920](https://github.com/ClickHouse/ClickHouse/pull/9920) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Fix `'scalar doesn't exist'` error in ALTER queries ([#9878](https://github.com/ClickHouse/ClickHouse/issues/9878)). [#9904](https://github.com/ClickHouse/ClickHouse/pull/9904) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Fix race condition between drop and optimize in `ReplicatedMergeTree`. [#9901](https://github.com/ClickHouse/ClickHouse/pull/9901) ([alesapin](https://github.com/alesapin)).
|
||||
* Fixed `DeleteOnDestroy` logic in `ATTACH PART` which could lead to automatic removal of attached part and added few tests. [#9410](https://github.com/ClickHouse/ClickHouse/pull/9410) ([Vladimir Chebotarev](https://github.com/excitoon)).
|
||||
|
||||
#### Build/Testing/Packaging Improvement
|
||||
|
||||
* Fix unit test `collapsing_sorted_stream`. [#9367](https://github.com/ClickHouse/ClickHouse/pull/9367) ([Deleted user](https://github.com/ghost)).
|
||||
|
||||
### ClickHouse release v20.1.9.54, 2020-03-28
|
||||
|
||||
#### Bug Fix
|
||||
|
||||
* Fix `'Different expressions with the same alias'` error when query has `PREWHERE` and `WHERE` on distributed table and `SET distributed_product_mode = 'local'`. [#9871](https://github.com/ClickHouse/ClickHouse/pull/9871) ([Artem Zuikov](https://github.com/4ertus2)).
|
||||
* Fix mutations excessive memory consumption for tables with a composite primary key. This fixes [#9850](https://github.com/ClickHouse/ClickHouse/issues/9850). [#9860](https://github.com/ClickHouse/ClickHouse/pull/9860) ([alesapin](https://github.com/alesapin)).
|
||||
* For INSERT queries shard now clamps the settings got from the initiator to the shard's constaints instead of throwing an exception. This fix allows to send `INSERT` queries to a shard with another constraints. This change improves fix [#9447](https://github.com/ClickHouse/ClickHouse/issues/9447). [#9852](https://github.com/ClickHouse/ClickHouse/pull/9852) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Fix possible exception `Got 0 in totals chunk, expected 1` on client. It happened for queries with `JOIN` in case if right joined table had zero rows. Example: `select * from system.one t1 join system.one t2 on t1.dummy = t2.dummy limit 0 FORMAT TabSeparated;`. Fixes [#9777](https://github.com/ClickHouse/ClickHouse/issues/9777). [#9823](https://github.com/ClickHouse/ClickHouse/pull/9823) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix `SIGSEGV` with `optimize_skip_unused_shards` when type cannot be converted. [#9804](https://github.com/ClickHouse/ClickHouse/pull/9804) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fixed a few cases when timezone of the function argument wasn't used properly. [#9574](https://github.com/ClickHouse/ClickHouse/pull/9574) ([Vasily Nemkov](https://github.com/Enmk)).
|
||||
|
||||
#### Improvement
|
||||
|
||||
* Remove `ORDER BY` stage from mutations because we read from a single ordered part in a single thread. Also add check that the order of rows in mutation is ordered in sorting key order and this order is not violated. [#9886](https://github.com/ClickHouse/ClickHouse/pull/9886) ([alesapin](https://github.com/alesapin)).
|
||||
|
||||
#### Build/Testing/Packaging Improvement
|
||||
|
||||
* Clean up duplicated linker flags. Make sure the linker won't look up an unexpected symbol. [#9433](https://github.com/ClickHouse/ClickHouse/pull/9433) ([Amos Bird](https://github.com/amosbird)).
|
||||
|
||||
### ClickHouse release v20.1.8.41, 2020-03-20
|
||||
|
||||
#### Bug Fix
|
||||
|
@ -15,5 +15,7 @@ ClickHouse is an open-source column-oriented database management system that all
|
||||
|
||||
## Upcoming Events
|
||||
|
||||
* [ClickHouse Online Meetup West (in English)](https://www.eventbrite.com/e/clickhouse-online-meetup-registration-102886791162) on April 24, 2020.
|
||||
* [ClickHouse Online Meetup East (in English)](https://www.eventbrite.com/e/clickhouse-online-meetup-east-registration-102989325846) on April 28, 2020.
|
||||
* [ClickHouse Workshop in Novosibirsk](https://2020.codefest.ru/lecture/1628) on TBD date.
|
||||
* [Yandex C++ Open-Source Sprints in Moscow](https://events.yandex.ru/events/otkrytyj-kod-v-yandek-28-03-2020) on TBD date.
|
||||
|
@ -75,7 +75,11 @@ void OwnPatternFormatter::formatExtended(const DB::ExtendedLogMessage & msg_ext,
|
||||
if (color)
|
||||
writeCString(resetColor(), wb);
|
||||
writeCString("> ", wb);
|
||||
if (color)
|
||||
writeString(setColor(std::hash<std::string>()(msg.getSource())), wb);
|
||||
DB::writeString(msg.getSource(), wb);
|
||||
if (color)
|
||||
writeCString(resetColor(), wb);
|
||||
writeCString(": ", wb);
|
||||
DB::writeString(msg.getText(), wb);
|
||||
}
|
||||
|
@ -4,7 +4,11 @@ if (NOT COMPILER_CLANG)
|
||||
message (FATAL_ERROR "FreeBSD build is supported only for Clang")
|
||||
endif ()
|
||||
|
||||
execute_process (COMMAND ${CMAKE_CXX_COMPILER} --print-file-name=libclang_rt.builtins-${CMAKE_SYSTEM_PROCESSOR}.a OUTPUT_VARIABLE BUILTINS_LIBRARY OUTPUT_STRIP_TRAILING_WHITESPACE)
|
||||
if (${CMAKE_SYSTEM_PROCESSOR} STREQUAL "amd64")
|
||||
execute_process (COMMAND ${CMAKE_CXX_COMPILER} --print-file-name=libclang_rt.builtins-x86_64.a OUTPUT_VARIABLE BUILTINS_LIBRARY OUTPUT_STRIP_TRAILING_WHITESPACE)
|
||||
else ()
|
||||
execute_process (COMMAND ${CMAKE_CXX_COMPILER} --print-file-name=libclang_rt.builtins-${CMAKE_SYSTEM_PROCESSOR}.a OUTPUT_VARIABLE BUILTINS_LIBRARY OUTPUT_STRIP_TRAILING_WHITESPACE)
|
||||
endif ()
|
||||
|
||||
set (DEFAULT_LIBS "${DEFAULT_LIBS} ${BUILTINS_LIBRARY} ${COVERAGE_OPTION} -lc -lm -lrt -lpthread")
|
||||
|
||||
|
2
contrib/poco
vendored
2
contrib/poco
vendored
@ -1 +1 @@
|
||||
Subproject commit ddca76ba4956cb57150082394536cc43ff28f6fa
|
||||
Subproject commit 7d605a1ae5d878294f91f68feb62ae51e9a04426
|
@ -2,7 +2,7 @@
|
||||
set -ex
|
||||
set -o pipefail
|
||||
trap "exit" INT TERM
|
||||
trap "kill $(jobs -pr) ||:" EXIT
|
||||
trap 'kill $(jobs -pr) ||:' EXIT
|
||||
|
||||
stage=${stage:-}
|
||||
script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
|
||||
@ -18,22 +18,22 @@ function configure
|
||||
sed -i 's/<tcp_port>9000/<tcp_port>9002/g' right/config/config.xml
|
||||
|
||||
# Start a temporary server to rename the tables
|
||||
while killall clickhouse; do echo . ; sleep 1 ; done
|
||||
while killall clickhouse-server; do echo . ; sleep 1 ; done
|
||||
echo all killed
|
||||
|
||||
set -m # Spawn temporary in its own process groups
|
||||
left/clickhouse server --config-file=left/config/config.xml -- --path db0 &> setup-server-log.log &
|
||||
left/clickhouse-server --config-file=left/config/config.xml -- --path db0 &> setup-server-log.log &
|
||||
left_pid=$!
|
||||
kill -0 $left_pid
|
||||
disown $left_pid
|
||||
set +m
|
||||
while ! left/clickhouse client --port 9001 --query "select 1" ; do kill -0 $left_pid ; echo . ; sleep 1 ; done
|
||||
while ! clickhouse-client --port 9001 --query "select 1" ; do kill -0 $left_pid ; echo . ; sleep 1 ; done
|
||||
echo server for setup started
|
||||
|
||||
left/clickhouse client --port 9001 --query "create database test" ||:
|
||||
left/clickhouse client --port 9001 --query "rename table datasets.hits_v1 to test.hits" ||:
|
||||
clickhouse-client --port 9001 --query "create database test" ||:
|
||||
clickhouse-client --port 9001 --query "rename table datasets.hits_v1 to test.hits" ||:
|
||||
|
||||
while killall clickhouse; do echo . ; sleep 1 ; done
|
||||
while killall clickhouse-server; do echo . ; sleep 1 ; done
|
||||
echo all killed
|
||||
|
||||
# Remove logs etc, because they will be updated, and sharing them between
|
||||
@ -42,43 +42,50 @@ function configure
|
||||
rm db0/metadata/system/* -rf ||:
|
||||
|
||||
# Make copies of the original db for both servers. Use hardlinks instead
|
||||
# of copying. Be careful to remove preprocessed configs or it can lead to
|
||||
# weird effects.
|
||||
# of copying. Be careful to remove preprocessed configs and system tables,or
|
||||
# it can lead to weird effects.
|
||||
rm -r left/db ||:
|
||||
rm -r right/db ||:
|
||||
rm -r db0/preprocessed_configs ||:
|
||||
rm -r db/{data,metadata}/system ||:
|
||||
cp -al db0/ left/db/
|
||||
cp -al db0/ right/db/
|
||||
}
|
||||
|
||||
function restart
|
||||
{
|
||||
while killall clickhouse; do echo . ; sleep 1 ; done
|
||||
while killall clickhouse-server; do echo . ; sleep 1 ; done
|
||||
echo all killed
|
||||
|
||||
set -m # Spawn servers in their own process groups
|
||||
|
||||
left/clickhouse server --config-file=left/config/config.xml -- --path left/db &>> left-server-log.log &
|
||||
left/clickhouse-server --config-file=left/config/config.xml -- --path left/db &>> left-server-log.log &
|
||||
left_pid=$!
|
||||
kill -0 $left_pid
|
||||
disown $left_pid
|
||||
|
||||
right/clickhouse server --config-file=right/config/config.xml -- --path right/db &>> right-server-log.log &
|
||||
right/clickhouse-server --config-file=right/config/config.xml -- --path right/db &>> right-server-log.log &
|
||||
right_pid=$!
|
||||
kill -0 $right_pid
|
||||
disown $right_pid
|
||||
|
||||
set +m
|
||||
|
||||
while ! left/clickhouse client --port 9001 --query "select 1" ; do kill -0 $left_pid ; echo . ; sleep 1 ; done
|
||||
while ! clickhouse-client --port 9001 --query "select 1" ; do kill -0 $left_pid ; echo . ; sleep 1 ; done
|
||||
echo left ok
|
||||
while ! right/clickhouse client --port 9002 --query "select 1" ; do kill -0 $right_pid ; echo . ; sleep 1 ; done
|
||||
while ! clickhouse-client --port 9002 --query "select 1" ; do kill -0 $right_pid ; echo . ; sleep 1 ; done
|
||||
echo right ok
|
||||
|
||||
left/clickhouse client --port 9001 --query "select * from system.tables where database != 'system'"
|
||||
left/clickhouse client --port 9001 --query "select * from system.build_options"
|
||||
right/clickhouse client --port 9002 --query "select * from system.tables where database != 'system'"
|
||||
right/clickhouse client --port 9002 --query "select * from system.build_options"
|
||||
clickhouse-client --port 9001 --query "select * from system.tables where database != 'system'"
|
||||
clickhouse-client --port 9001 --query "select * from system.build_options"
|
||||
clickhouse-client --port 9002 --query "select * from system.tables where database != 'system'"
|
||||
clickhouse-client --port 9002 --query "select * from system.build_options"
|
||||
|
||||
# Check again that both servers we started are running -- this is important
|
||||
# for running locally, when there might be some other servers started and we
|
||||
# will connect to them instead.
|
||||
kill -0 $left_pid
|
||||
kill -0 $right_pid
|
||||
}
|
||||
|
||||
function run_tests
|
||||
@ -129,7 +136,7 @@ function run_tests
|
||||
# FIXME remove some broken long tests
|
||||
for test_name in {IPv4,IPv6,modulo,parse_engine_file,number_formatting_formats,select_format,arithmetic,cryptographic_hashes,logical_functions_{medium,small}}
|
||||
do
|
||||
printf "$test_name\tMarked as broken (see compare.sh)\n" >> skipped-tests.tsv
|
||||
printf "%s\tMarked as broken (see compare.sh)\n" "$test_name">> skipped-tests.tsv
|
||||
rm "$test_prefix/$test_name.xml" ||:
|
||||
done
|
||||
test_files=$(ls "$test_prefix"/*.xml)
|
||||
@ -140,9 +147,9 @@ function run_tests
|
||||
for test in $test_files
|
||||
do
|
||||
# Check that both servers are alive, to fail faster if they die.
|
||||
left/clickhouse client --port 9001 --query "select 1 format Null" \
|
||||
clickhouse-client --port 9001 --query "select 1 format Null" \
|
||||
|| { echo $test_name >> left-server-died.log ; restart ; continue ; }
|
||||
right/clickhouse client --port 9002 --query "select 1 format Null" \
|
||||
clickhouse-client --port 9002 --query "select 1 format Null" \
|
||||
|| { echo $test_name >> right-server-died.log ; restart ; continue ; }
|
||||
|
||||
test_name=$(basename "$test" ".xml")
|
||||
@ -160,7 +167,7 @@ function run_tests
|
||||
skipped=$(grep ^skipped "$test_name-raw.tsv" | cut -f2-)
|
||||
if [ "$skipped" != "" ]
|
||||
then
|
||||
printf "$test_name""\t""$skipped""\n" >> skipped-tests.tsv
|
||||
printf "%s\t%s\n" "$test_name" "$skipped">> skipped-tests.tsv
|
||||
fi
|
||||
done
|
||||
|
||||
@ -172,24 +179,24 @@ function run_tests
|
||||
function get_profiles
|
||||
{
|
||||
# Collect the profiles
|
||||
left/clickhouse client --port 9001 --query "set query_profiler_cpu_time_period_ns = 0"
|
||||
left/clickhouse client --port 9001 --query "set query_profiler_real_time_period_ns = 0"
|
||||
right/clickhouse client --port 9001 --query "set query_profiler_cpu_time_period_ns = 0"
|
||||
right/clickhouse client --port 9001 --query "set query_profiler_real_time_period_ns = 0"
|
||||
left/clickhouse client --port 9001 --query "system flush logs"
|
||||
right/clickhouse client --port 9002 --query "system flush logs"
|
||||
clickhouse-client --port 9001 --query "set query_profiler_cpu_time_period_ns = 0"
|
||||
clickhouse-client --port 9001 --query "set query_profiler_real_time_period_ns = 0"
|
||||
clickhouse-client --port 9001 --query "set query_profiler_cpu_time_period_ns = 0"
|
||||
clickhouse-client --port 9001 --query "set query_profiler_real_time_period_ns = 0"
|
||||
clickhouse-client --port 9001 --query "system flush logs"
|
||||
clickhouse-client --port 9002 --query "system flush logs"
|
||||
|
||||
left/clickhouse client --port 9001 --query "select * from system.query_log where type = 2 format TSVWithNamesAndTypes" > left-query-log.tsv ||: &
|
||||
left/clickhouse client --port 9001 --query "select * from system.query_thread_log format TSVWithNamesAndTypes" > left-query-thread-log.tsv ||: &
|
||||
left/clickhouse client --port 9001 --query "select * from system.trace_log format TSVWithNamesAndTypes" > left-trace-log.tsv ||: &
|
||||
left/clickhouse client --port 9001 --query "select arrayJoin(trace) addr, concat(splitByChar('/', addressToLine(addr))[-1], '#', demangle(addressToSymbol(addr)) ) name from system.trace_log group by addr format TSVWithNamesAndTypes" > left-addresses.tsv ||: &
|
||||
left/clickhouse client --port 9001 --query "select * from system.metric_log format TSVWithNamesAndTypes" > left-metric-log.tsv ||: &
|
||||
clickhouse-client --port 9001 --query "select * from system.query_log where type = 2 format TSVWithNamesAndTypes" > left-query-log.tsv ||: &
|
||||
clickhouse-client --port 9001 --query "select * from system.query_thread_log format TSVWithNamesAndTypes" > left-query-thread-log.tsv ||: &
|
||||
clickhouse-client --port 9001 --query "select * from system.trace_log format TSVWithNamesAndTypes" > left-trace-log.tsv ||: &
|
||||
clickhouse-client --port 9001 --query "select arrayJoin(trace) addr, concat(splitByChar('/', addressToLine(addr))[-1], '#', demangle(addressToSymbol(addr)) ) name from system.trace_log group by addr format TSVWithNamesAndTypes" > left-addresses.tsv ||: &
|
||||
clickhouse-client --port 9001 --query "select * from system.metric_log format TSVWithNamesAndTypes" > left-metric-log.tsv ||: &
|
||||
|
||||
right/clickhouse client --port 9002 --query "select * from system.query_log where type = 2 format TSVWithNamesAndTypes" > right-query-log.tsv ||: &
|
||||
right/clickhouse client --port 9002 --query "select * from system.query_thread_log format TSVWithNamesAndTypes" > right-query-thread-log.tsv ||: &
|
||||
right/clickhouse client --port 9002 --query "select * from system.trace_log format TSVWithNamesAndTypes" > right-trace-log.tsv ||: &
|
||||
right/clickhouse client --port 9002 --query "select arrayJoin(trace) addr, concat(splitByChar('/', addressToLine(addr))[-1], '#', demangle(addressToSymbol(addr)) ) name from system.trace_log group by addr format TSVWithNamesAndTypes" > right-addresses.tsv ||: &
|
||||
right/clickhouse client --port 9002 --query "select * from system.metric_log format TSVWithNamesAndTypes" > right-metric-log.tsv ||: &
|
||||
clickhouse-client --port 9002 --query "select * from system.query_log where type = 2 format TSVWithNamesAndTypes" > right-query-log.tsv ||: &
|
||||
clickhouse-client --port 9002 --query "select * from system.query_thread_log format TSVWithNamesAndTypes" > right-query-thread-log.tsv ||: &
|
||||
clickhouse-client --port 9002 --query "select * from system.trace_log format TSVWithNamesAndTypes" > right-trace-log.tsv ||: &
|
||||
clickhouse-client --port 9002 --query "select arrayJoin(trace) addr, concat(splitByChar('/', addressToLine(addr))[-1], '#', demangle(addressToSymbol(addr)) ) name from system.trace_log group by addr format TSVWithNamesAndTypes" > right-addresses.tsv ||: &
|
||||
clickhouse-client --port 9002 --query "select * from system.metric_log format TSVWithNamesAndTypes" > right-metric-log.tsv ||: &
|
||||
|
||||
wait
|
||||
}
|
||||
@ -197,9 +204,9 @@ function get_profiles
|
||||
# Build and analyze randomization distribution for all queries.
|
||||
function analyze_queries
|
||||
{
|
||||
find . -maxdepth 1 -name "*-queries.tsv" -print | \
|
||||
xargs -n1 -I% basename % -queries.tsv | \
|
||||
parallel --verbose right/clickhouse local --file "{}-queries.tsv" \
|
||||
find . -maxdepth 1 -name "*-queries.tsv" -print0 | \
|
||||
xargs -0 -n1 -I% basename % -queries.tsv | \
|
||||
parallel --verbose clickhouse-local --file "{}-queries.tsv" \
|
||||
--structure "\"query text, run int, version UInt32, time float\"" \
|
||||
--query "\"$(cat "$script_dir/eqmed.sql")\"" \
|
||||
">" {}-report.tsv
|
||||
@ -221,7 +228,7 @@ done
|
||||
|
||||
rm ./*.{rep,svg} test-times.tsv test-dump.tsv unstable.tsv unstable-query-ids.tsv unstable-query-metrics.tsv changed-perf.tsv unstable-tests.tsv unstable-queries.tsv bad-tests.tsv slow-on-client.tsv all-queries.tsv ||:
|
||||
|
||||
right/clickhouse local --query "
|
||||
clickhouse-local --query "
|
||||
create table queries engine File(TSVWithNamesAndTypes, 'queries.rep')
|
||||
as select
|
||||
-- FIXME Comparison mode doesn't make sense for queries that complete
|
||||
@ -230,12 +237,14 @@ create table queries engine File(TSVWithNamesAndTypes, 'queries.rep')
|
||||
-- but the right way to do this is not yet clear.
|
||||
left + right < 0.05 as short,
|
||||
|
||||
not short and abs(diff) < 0.10 and rd[3] > 0.10 as unstable,
|
||||
|
||||
-- Do not consider changed the queries with 5% RD below 5% -- e.g., we're
|
||||
-- likely to observe a difference > 5% in less than 5% cases.
|
||||
-- Not sure it is correct, but empirically it filters out a lot of noise.
|
||||
not short and abs(diff) > 0.15 and abs(diff) > rd[3] and rd[1] > 0.05 as changed,
|
||||
-- Difference > 15% and > rd(99%) -- changed. We can't filter out flaky
|
||||
-- queries by rd(5%), because it can be zero when the difference is smaller
|
||||
-- than a typical distribution width. The difference is still real though.
|
||||
not short and abs(diff) > 0.15 and abs(diff) > rd[4] as changed,
|
||||
|
||||
-- Not changed but rd(99%) > 10% -- unstable.
|
||||
not short and not changed and rd[4] > 0.10 as unstable,
|
||||
|
||||
left, right, diff, rd,
|
||||
replaceAll(_file, '-report.tsv', '') test,
|
||||
query
|
||||
@ -293,7 +302,7 @@ create table all_tests_tsv engine File(TSV, 'all-queries.tsv') as
|
||||
|
||||
for version in {right,left}
|
||||
do
|
||||
right/clickhouse local --query "
|
||||
clickhouse-local --query "
|
||||
create view queries as
|
||||
select * from file('queries.rep', TSVWithNamesAndTypes,
|
||||
'short int, unstable int, changed int, left float, right float,
|
||||
@ -411,6 +420,10 @@ unset IFS
|
||||
grep -H -m2 -i '\(Exception\|Error\):[^:]' ./*-err.log | sed 's/:/\t/' > run-errors.tsv ||:
|
||||
}
|
||||
|
||||
# Check that local and client are in PATH
|
||||
clickhouse-local --version > /dev/null
|
||||
clickhouse-client --version > /dev/null
|
||||
|
||||
case "$stage" in
|
||||
"")
|
||||
;&
|
||||
|
@ -1,4 +1,9 @@
|
||||
<yandex>
|
||||
<yandex>
|
||||
<http_port remove="remove"/>
|
||||
<mysql_port remove="remove"/>
|
||||
<interserver_http_port remove="remove"/>
|
||||
<listen_host>::</listen_host>
|
||||
|
||||
<logger>
|
||||
<console>true</console>
|
||||
</logger>
|
||||
|
@ -2,7 +2,7 @@
|
||||
set -ex
|
||||
set -o pipefail
|
||||
trap "exit" INT TERM
|
||||
trap "kill $(jobs -pr) ||:" EXIT
|
||||
trap 'kill $(jobs -pr) ||:' EXIT
|
||||
|
||||
mkdir db0 ||:
|
||||
|
||||
|
@ -98,14 +98,15 @@ fi
|
||||
# Even if we have some errors, try our best to save the logs.
|
||||
set +e
|
||||
|
||||
# Older version use 'kill 0', so put the script into a separate process group
|
||||
# FIXME remove set +m in April 2020
|
||||
set +m
|
||||
# Use clickhouse-client and clickhouse-local from the right server.
|
||||
PATH="$(readlink -f right/)":"$PATH"
|
||||
export PATH
|
||||
|
||||
# Start the main comparison script.
|
||||
{ \
|
||||
time ../download.sh "$REF_PR" "$REF_SHA" "$PR_TO_TEST" "$SHA_TO_TEST" && \
|
||||
time stage=configure "$script_path"/compare.sh ; \
|
||||
} 2>&1 | ts "$(printf '%%Y-%%m-%%d %%H:%%M:%%S\t')" | tee compare.log
|
||||
set -m
|
||||
|
||||
# Stop the servers to free memory. Normally they are restarted before getting
|
||||
# the profile info, so they shouldn't use much, but if the comparison script
|
||||
|
@ -140,9 +140,16 @@ report_stage_end('substitute2')
|
||||
for q in test_queries:
|
||||
# Prewarm: run once on both servers. Helps to bring the data into memory,
|
||||
# precompile the queries, etc.
|
||||
for conn_index, c in enumerate(connections):
|
||||
res = c.execute(q, query_id = 'prewarm {} {}'.format(0, q))
|
||||
print('prewarm\t' + tsv_escape(q) + '\t' + str(conn_index) + '\t' + str(c.last_query.elapsed))
|
||||
try:
|
||||
for conn_index, c in enumerate(connections):
|
||||
res = c.execute(q, query_id = 'prewarm {} {}'.format(0, q))
|
||||
print('prewarm\t' + tsv_escape(q) + '\t' + str(conn_index) + '\t' + str(c.last_query.elapsed))
|
||||
except:
|
||||
# If prewarm fails for some query -- skip it, and try to test the others.
|
||||
# This might happen if the new test introduces some function that the
|
||||
# old server doesn't support. Still, report it as an error.
|
||||
print(traceback.format_exc(), file=sys.stderr)
|
||||
continue
|
||||
|
||||
# Now, perform measured runs.
|
||||
# Track the time spent by the client to process this query, so that we can notice
|
||||
|
@ -256,17 +256,18 @@ if args.report == 'main':
|
||||
|
||||
print(tableStart('Test times'))
|
||||
print(tableHeader(columns))
|
||||
|
||||
|
||||
runs = 11 # FIXME pass this as an argument
|
||||
attrs = ['' for c in columns]
|
||||
for r in rows:
|
||||
if float(r[6]) > 22:
|
||||
if float(r[6]) > 3 * runs:
|
||||
# FIXME should be 15s max -- investigate parallel_insert
|
||||
slow_average_tests += 1
|
||||
attrs[6] = 'style="background: #ffb0a0"'
|
||||
else:
|
||||
attrs[6] = ''
|
||||
|
||||
if float(r[5]) > 30:
|
||||
if float(r[5]) > 4 * runs:
|
||||
slow_average_tests += 1
|
||||
attrs[5] = 'style="background: #ffb0a0"'
|
||||
else:
|
||||
|
@ -1,5 +1,5 @@
|
||||
---
|
||||
toc_folder_title: Разработка
|
||||
toc_folder_title: Development
|
||||
toc_hidden: true
|
||||
toc_priority: 58
|
||||
toc_title: hidden
|
||||
|
@ -9,7 +9,10 @@ The engine inherits from [MergeTree](mergetree.md#table_engines-mergetree), alte
|
||||
|
||||
You can use `AggregatingMergeTree` tables for incremental data aggregation, including for aggregated materialized views.
|
||||
|
||||
The engine processes all columns with [AggregateFunction](../../../sql_reference/data_types/aggregatefunction.md) type.
|
||||
The engine processes all columns with the following types:
|
||||
|
||||
- [AggregateFunction](../../../sql_reference/data_types/aggregatefunction.md)
|
||||
- [SimpleAggregateFunction](../../../sql_reference/data_types/simpleaggregatefunction.md)
|
||||
|
||||
It is appropriate to use `AggregatingMergeTree` if it reduces the number of rows by orders.
|
||||
|
||||
|
@ -0,0 +1,36 @@
|
||||
# SimpleAggregateFunction(name, types\_of\_arguments…) {#data-type-simpleaggregatefunction}
|
||||
|
||||
Unlike [`AggregateFunction`](../aggregatefunction.md), which stores not the value of the aggregate function but it's state:
|
||||
|
||||
- `SimpleAggregateFunction` data type stores current value of the aggregate function, and does not store its full state as [`AggregateFunction`](../aggregatefunction.md) does. This optimization can be applied to functions for which the following property holds: the result of applying a function `f` to a row set `S1 UNION ALL S2` can be obtained by applying `f` to parts of the row set separately, and then again applying `f` to the results: `f(S1 UNION ALL S2) = f(f(S1) UNION ALL f(S2))`. This property guarantees that partial aggregation results are enough to compute the combined one, so we don't have to store and process any extra data.
|
||||
|
||||
Currently, the following aggregate functions are supported:
|
||||
|
||||
- [`any`](../../query_language/agg_functions/reference.md#agg_function-any)
|
||||
- [`anyLast`](../../query_language/agg_functions/reference.md#anylastx)
|
||||
- [`min`](../../query_language/agg_functions/reference.md#agg_function-min)
|
||||
- [`max`](../../query_language/agg_functions/reference.md#agg_function-max)
|
||||
- [`sum`](../../query_language/agg_functions/reference.md#agg_function-sum)
|
||||
- [`groupBitAnd`](../../query_language/agg_functions/reference.md#groupbitand)
|
||||
- [`groupBitOr`](../../query_language/agg_functions/reference.md#groupbitor)
|
||||
- [`groupBitXor`](../../query_language/agg_functions/reference.md#groupbitxor)
|
||||
|
||||
- Values of the `SimpleAggregateFunction(func, Type)` look and stored the same way as `Type`, so you do not need to apply functions with `-Merge`/`-State` suffixes.
|
||||
- `SimpleAggregateFunction` has better performance than `AggregateFunction` with same aggregation function.
|
||||
|
||||
**Parameters**
|
||||
|
||||
- Name of the aggregate function.
|
||||
- Types of the aggregate function arguments.
|
||||
|
||||
**Example**
|
||||
|
||||
``` sql
|
||||
CREATE TABLE t
|
||||
(
|
||||
column1 SimpleAggregateFunction(sum, UInt64),
|
||||
column2 SimpleAggregateFunction(any, String)
|
||||
) ENGINE = ...
|
||||
```
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/data_types/nested_data_structures/simpleaggregatefunction/) <!--hide-->
|
@ -179,7 +179,7 @@ CREATE TABLE codec_example
|
||||
ENGINE = MergeTree()
|
||||
```
|
||||
|
||||
#### Common Purpose Codecs {#create-query-common-purpose-codecs}
|
||||
#### General Purpose Codecs {#create-query-general-purpose-codecs}
|
||||
|
||||
Codecs:
|
||||
|
||||
|
96
docs/ru/operations/settings/merge_tree_settings.md
Normal file
96
docs/ru/operations/settings/merge_tree_settings.md
Normal file
@ -0,0 +1,96 @@
|
||||
# Настройки MergeTree таблиц {#merge-tree-settings}
|
||||
|
||||
Значения настроек merge-tree (для всех MergeTree таблиц) можно посмотреть в таблице `system.merge_tree_settings`, их можно переопределить в `config.xml` в секции `merge_tree`, или задать в секции `SETTINGS` у каждой таблицы.
|
||||
|
||||
Пример переопределения в `config.xml`:
|
||||
```text
|
||||
<merge_tree>
|
||||
<max_suspicious_broken_parts>5</max_suspicious_broken_parts>
|
||||
</merge_tree>
|
||||
```
|
||||
|
||||
Пример для определения в `SETTINGS` у конкретной таблицы:
|
||||
```sql
|
||||
CREATE TABLE foo
|
||||
(
|
||||
`A` Int64
|
||||
)
|
||||
ENGINE = MergeTree
|
||||
ORDER BY tuple()
|
||||
SETTINGS max_suspicious_broken_parts = 500;
|
||||
```
|
||||
|
||||
Пример изменения настроек у конкретной таблицы командой `ALTER TABLE ... MODIFY SETTING`:
|
||||
```sql
|
||||
ALTER TABLE foo
|
||||
MODIFY SETTING max_suspicious_broken_parts = 100;
|
||||
```
|
||||
|
||||
|
||||
## parts_to_throw_insert {#parts-to-throw-insert}
|
||||
|
||||
Eсли число кусков в партиции превышает значение `parts_to_throw_insert`, INSERT прерывается с исключением `Too many parts (N). Merges are processing significantly slower than inserts`.
|
||||
|
||||
Возможные значения:
|
||||
|
||||
- Положительное целое число.
|
||||
|
||||
Значение по умолчанию: 300.
|
||||
|
||||
Для достижения максимальной производительности запросов `SELECT` необходимо минимизировать количество обрабатываемых кусков, см. [Дизайн MergeTree](../../development/architecture.md#merge-tree).
|
||||
|
||||
Можно установить большее значение 600 (1200), это уменьшит вероятность возникновения ошибки `Too many parts`, но в тоже время вы позже обнаружите возможную проблему со слияниями (например, из-за недостатка места на диске) и деградацию производительности `SELECT`.
|
||||
|
||||
|
||||
## parts_to_delay_insert {#parts-to-delay-insert}
|
||||
|
||||
Eсли число кусков в партиции превышает значение `parts_to_delay_insert`, `INSERT` искусственно замедляется.
|
||||
|
||||
Возможные значения:
|
||||
|
||||
- Положительное целое число.
|
||||
|
||||
Значение по умолчанию: 150.
|
||||
|
||||
ClickHouse искусственно выполняет `INSERT` дольше (добавляет 'sleep'), чтобы фоновый механизм слияния успевал слиять куски быстрее, чем они добавляются.
|
||||
|
||||
|
||||
## max_delay_to_insert {#max-delay-to-insert}
|
||||
|
||||
Величина в секундах, которая используется для расчета задержки `INSERT`, если число кусков в партиции превышает значение [parts_to_delay_insert](#parts-to-delay-insert).
|
||||
|
||||
Возможные значения:
|
||||
|
||||
- Положительное целое число.
|
||||
|
||||
Значение по умолчанию: 1.
|
||||
|
||||
Величина задержи (в миллисекундах) для `INSERT` вычисляется по формуле:
|
||||
|
||||
```code
|
||||
max_k = parts_to_throw_insert - parts_to_delay_insert
|
||||
k = 1 + parts_count_in_partition - parts_to_delay_insert
|
||||
delay_milliseconds = pow(max_delay_to_insert * 1000, k / max_k)
|
||||
```
|
||||
|
||||
Т.е. если в партиции уже 299 кусков и parts_to_throw_insert = 300, parts_to_delay_insert = 150, max_delay_to_insert = 1, `INSERT` замедлится на `pow( 1 * 1000, (1 + 299 - 150) / (300 - 150) ) = 1000` миллисекунд.
|
||||
|
||||
## old_parts_lifetime {#old-parts-lifetime}
|
||||
|
||||
Время (в секундах) хранения неактивных кусков, для защиты от потери данных при спонтанной перезагрузке сервера или О.С.
|
||||
|
||||
Возможные значения:
|
||||
|
||||
- Положительное целое число.
|
||||
|
||||
Значение по умолчанию: 480.
|
||||
|
||||
После слияния нескольких кусков в новый кусок, ClickHouse помечает исходные куски как неактивные и удаляет их после `old_parts_lifetime` секунд.
|
||||
Неактивные куски удаляются, если они не используются в текущих запросах, т.е. если счетчик ссылок куска -- `refcount` равен нулю.
|
||||
|
||||
Неактивные куски удаляются не сразу, потому что при записи нового куска не вызывается `fsync`, т.е. некоторое время новый кусок находится только в оперативной памяти сервера (кеше О.С.). Т.о. при спонтанной перезагрузке сервера новый (смерженный) кусок может быть потерян или испорчен. В этом случае ClickHouse в процессе старта при проверке целостности кусков обнаружит проблему, вернет неактивные куски в список активных и позже заново их смержит. Сломанный кусок в этом случае переименовывается (добавляется префикс broken_) и перемещается в папку detached. Если проверка целостности не обнаруживает проблем в смерженном куске, то исходные неактивные куски переименовываются (добавляется префикс ignored_) и перемещаются в папку detached.
|
||||
|
||||
Стандартное значение Linux dirty_expire_centisecs - 30 секунд (максимальное время, которое записанные данные хранятся только в оперативной памяти), но при больших нагрузках на дисковую систему, данные могут быть записаны намного позже. Экспериментально было найдено время - 480 секунд, за которое гарантированно новый кусок будет записан на диск.
|
||||
|
||||
|
||||
[Оригинальная статья](https://clickhouse.tech/docs/ru/operations/settings/merge_tree_settings/) <!--hide-->
|
@ -35,4 +35,4 @@ soupsieve==2.0
|
||||
termcolor==1.1.0
|
||||
tornado==5.1.1
|
||||
Unidecode==1.1.1
|
||||
urllib3==1.25.8
|
||||
urllib3==1.25.9
|
||||
|
@ -9,4 +9,4 @@ python-slugify==4.0.0
|
||||
PyYAML==5.3.1
|
||||
requests==2.23.0
|
||||
text-unidecode==1.3
|
||||
urllib3==1.25.8
|
||||
urllib3==1.25.9
|
||||
|
@ -44,7 +44,7 @@ public:
|
||||
/// Name of a Column kind, without parameters (example: FixedString, Array).
|
||||
virtual const char * getFamilyName() const = 0;
|
||||
|
||||
/** If column isn't constant, returns nullptr (or itself).
|
||||
/** If column isn't constant, returns itself.
|
||||
* If column is constant, transforms constant to full column (if column type allows such transform) and return it.
|
||||
*/
|
||||
virtual Ptr convertToFullColumnIfConst() const { return getPtr(); }
|
||||
|
@ -220,7 +220,8 @@ public:
|
||||
// This method only works for extending the last allocation. For lack of
|
||||
// original size, check a weaker condition: that 'begin' is at least in
|
||||
// the current Chunk.
|
||||
assert(range_start >= head->begin && range_start < head->end);
|
||||
assert(range_start >= head->begin);
|
||||
assert(range_start < head->end);
|
||||
|
||||
if (head->pos + additional_bytes <= head->end)
|
||||
{
|
||||
|
@ -491,6 +491,7 @@ namespace ErrorCodes
|
||||
extern const int CANNOT_ASSIGN_ALTER = 517;
|
||||
extern const int CANNOT_COMMIT_OFFSET = 518;
|
||||
extern const int NO_REMOTE_SHARD_AVAILABLE = 519;
|
||||
extern const int CANNOT_DETACH_DICTIONARY_AS_TABLE = 520;
|
||||
|
||||
extern const int KEEPER_EXCEPTION = 999;
|
||||
extern const int POCO_EXCEPTION = 1000;
|
||||
|
@ -78,9 +78,11 @@ struct Settings : public SettingsCollection<Settings>
|
||||
M(SettingBool, extremes, false, "Calculate minimums and maximums of the result columns. They can be output in JSON-formats.", IMPORTANT) \
|
||||
M(SettingBool, use_uncompressed_cache, true, "Whether to use the cache of uncompressed blocks.", 0) \
|
||||
M(SettingBool, replace_running_query, false, "Whether the running request should be canceled with the same id as the new one.", 0) \
|
||||
M(SettingUInt64, background_buffer_flush_schedule_pool_size, 16, "Number of threads performing background flush for tables with Buffer engine. Only has meaning at server startup.", 0) \
|
||||
M(SettingUInt64, background_pool_size, 16, "Number of threads performing background work for tables (for example, merging in merge tree). Only has meaning at server startup.", 0) \
|
||||
M(SettingUInt64, background_move_pool_size, 8, "Number of threads performing background moves for tables. Only has meaning at server startup.", 0) \
|
||||
M(SettingUInt64, background_schedule_pool_size, 16, "Number of threads performing background tasks for replicated tables. Only has meaning at server startup.", 0) \
|
||||
M(SettingUInt64, background_schedule_pool_size, 16, "Number of threads performing background tasks for replicated tables, kafka streaming, dns cache updates. Only has meaning at server startup.", 0) \
|
||||
M(SettingUInt64, background_distributed_schedule_pool_size, 16, "Number of threads performing background tasks for distributed sends. Only has meaning at server startup.", 0) \
|
||||
\
|
||||
M(SettingMilliseconds, distributed_directory_monitor_sleep_time_ms, 100, "Sleep time for StorageDistributed DirectoryMonitors, in case of any errors delay grows exponentially.", 0) \
|
||||
M(SettingMilliseconds, distributed_directory_monitor_max_sleep_time_ms, 30000, "Maximum sleep time for StorageDistributed DirectoryMonitors, it limits exponential growth too.", 0) \
|
||||
|
@ -1,6 +1,7 @@
|
||||
#include <Databases/DatabaseDictionary.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Interpreters/ExternalDictionariesLoader.h>
|
||||
#include <Dictionaries/DictionaryStructure.h>
|
||||
#include <Storages/StorageDictionary.h>
|
||||
#include <common/logger_useful.h>
|
||||
#include <IO/WriteBufferFromString.h>
|
||||
@ -15,6 +16,18 @@ namespace DB
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int SYNTAX_ERROR;
|
||||
extern const int CANNOT_GET_CREATE_DICTIONARY_QUERY;
|
||||
}
|
||||
|
||||
namespace
|
||||
{
|
||||
StoragePtr createStorageDictionary(const String & database_name, const ExternalLoader::LoadResult & load_result)
|
||||
{
|
||||
if (!load_result.config)
|
||||
return nullptr;
|
||||
DictionaryStructure dictionary_structure = ExternalDictionariesLoader::getDictionaryStructure(*load_result.config);
|
||||
return StorageDictionary::create(StorageID(database_name, load_result.name), load_result.name, dictionary_structure);
|
||||
}
|
||||
}
|
||||
|
||||
DatabaseDictionary::DatabaseDictionary(const String & name_)
|
||||
@ -26,29 +39,12 @@ DatabaseDictionary::DatabaseDictionary(const String & name_)
|
||||
Tables DatabaseDictionary::listTables(const Context & context, const FilterByNameFunction & filter_by_name)
|
||||
{
|
||||
Tables tables;
|
||||
ExternalLoader::LoadResults load_results;
|
||||
if (filter_by_name)
|
||||
auto load_results = context.getExternalDictionariesLoader().getLoadResults(filter_by_name);
|
||||
for (auto & load_result : load_results)
|
||||
{
|
||||
/// If `filter_by_name` is set, we iterate through all dictionaries with such names. That's why we need to load all of them.
|
||||
load_results = context.getExternalDictionariesLoader().tryLoad<ExternalLoader::LoadResults>(filter_by_name);
|
||||
}
|
||||
else
|
||||
{
|
||||
/// If `filter_by_name` isn't set, we iterate through only already loaded dictionaries. We don't try to load all dictionaries in this case.
|
||||
load_results = context.getExternalDictionariesLoader().getCurrentLoadResults();
|
||||
}
|
||||
|
||||
for (const auto & load_result: load_results)
|
||||
{
|
||||
/// Load tables only from XML dictionaries, don't touch other
|
||||
if (load_result.object && load_result.repository_name.empty())
|
||||
{
|
||||
auto dict_ptr = std::static_pointer_cast<const IDictionaryBase>(load_result.object);
|
||||
auto dict_name = dict_ptr->getName();
|
||||
const DictionaryStructure & dictionary_structure = dict_ptr->getStructure();
|
||||
auto columns = StorageDictionary::getNamesAndTypes(dictionary_structure);
|
||||
tables[dict_name] = StorageDictionary::create(StorageID(getDatabaseName(), dict_name), ColumnsDescription{columns}, context, true, dict_name);
|
||||
}
|
||||
auto storage = createStorageDictionary(getDatabaseName(), load_result);
|
||||
if (storage)
|
||||
tables.emplace(storage->getStorageID().table_name, storage);
|
||||
}
|
||||
return tables;
|
||||
}
|
||||
@ -64,15 +60,8 @@ StoragePtr DatabaseDictionary::tryGetTable(
|
||||
const Context & context,
|
||||
const String & table_name) const
|
||||
{
|
||||
auto dict_ptr = context.getExternalDictionariesLoader().tryGetDictionary(table_name, true /*load*/);
|
||||
if (dict_ptr)
|
||||
{
|
||||
const DictionaryStructure & dictionary_structure = dict_ptr->getStructure();
|
||||
auto columns = StorageDictionary::getNamesAndTypes(dictionary_structure);
|
||||
return StorageDictionary::create(StorageID(getDatabaseName(), table_name), ColumnsDescription{columns}, context, true, table_name);
|
||||
}
|
||||
|
||||
return {};
|
||||
auto load_result = context.getExternalDictionariesLoader().getLoadResult(table_name);
|
||||
return createStorageDictionary(getDatabaseName(), load_result);
|
||||
}
|
||||
|
||||
DatabaseTablesIteratorPtr DatabaseDictionary::getTablesIterator(const Context & context, const FilterByNameFunction & filter_by_table_name)
|
||||
@ -82,7 +71,7 @@ DatabaseTablesIteratorPtr DatabaseDictionary::getTablesIterator(const Context &
|
||||
|
||||
bool DatabaseDictionary::empty(const Context & context) const
|
||||
{
|
||||
return !context.getExternalDictionariesLoader().hasCurrentlyLoadedObjects();
|
||||
return !context.getExternalDictionariesLoader().hasObjects();
|
||||
}
|
||||
|
||||
ASTPtr DatabaseDictionary::getCreateTableQueryImpl(const Context & context,
|
||||
@ -92,15 +81,17 @@ ASTPtr DatabaseDictionary::getCreateTableQueryImpl(const Context & context,
|
||||
{
|
||||
WriteBufferFromString buffer(query);
|
||||
|
||||
const auto & dictionaries = context.getExternalDictionariesLoader();
|
||||
auto dictionary = throw_on_error ? dictionaries.getDictionary(table_name)
|
||||
: dictionaries.tryGetDictionary(table_name, true /*load*/);
|
||||
if (!dictionary)
|
||||
auto load_result = context.getExternalDictionariesLoader().getLoadResult(table_name);
|
||||
if (!load_result.config)
|
||||
{
|
||||
if (throw_on_error)
|
||||
throw Exception{"Dictionary " + backQuote(table_name) + " doesn't exist", ErrorCodes::CANNOT_GET_CREATE_DICTIONARY_QUERY};
|
||||
return {};
|
||||
}
|
||||
|
||||
auto names_and_types = StorageDictionary::getNamesAndTypes(dictionary->getStructure());
|
||||
auto names_and_types = StorageDictionary::getNamesAndTypes(ExternalDictionariesLoader::getDictionaryStructure(*load_result.config));
|
||||
buffer << "CREATE TABLE " << backQuoteIfNeed(database_name) << '.' << backQuoteIfNeed(table_name) << " (";
|
||||
buffer << StorageDictionary::generateNamesAndTypesDescription(names_and_types.begin(), names_and_types.end());
|
||||
buffer << StorageDictionary::generateNamesAndTypesDescription(names_and_types);
|
||||
buffer << ") Engine = Dictionary(" << backQuoteIfNeed(table_name) << ")";
|
||||
}
|
||||
|
||||
|
@ -43,6 +43,8 @@ public:
|
||||
|
||||
ASTPtr getCreateDatabaseQuery(const Context & context) const override;
|
||||
|
||||
bool shouldBeEmptyOnDetach() const override { return false; }
|
||||
|
||||
void shutdown() override;
|
||||
|
||||
protected:
|
||||
|
@ -16,6 +16,7 @@
|
||||
#include <Parsers/parseQuery.h>
|
||||
#include <Parsers/formatAST.h>
|
||||
#include <Parsers/ASTSetQuery.h>
|
||||
#include <Dictionaries/getDictionaryConfigurationFromAST.h>
|
||||
#include <TableFunctions/TableFunctionFactory.h>
|
||||
|
||||
#include <Parsers/queryToString.h>
|
||||
@ -74,18 +75,24 @@ namespace
|
||||
|
||||
|
||||
void tryAttachDictionary(
|
||||
Context & context,
|
||||
const ASTCreateQuery & query,
|
||||
DatabaseOrdinary & database)
|
||||
const ASTPtr & query,
|
||||
DatabaseOrdinary & database,
|
||||
const String & metadata_path)
|
||||
{
|
||||
assert(query.is_dictionary);
|
||||
auto & create_query = query->as<ASTCreateQuery &>();
|
||||
assert(create_query.is_dictionary);
|
||||
try
|
||||
{
|
||||
database.attachDictionary(query.table, context);
|
||||
Poco::File meta_file(metadata_path);
|
||||
auto config = getDictionaryConfigurationFromAST(create_query);
|
||||
time_t modification_time = meta_file.getLastModified().epochTime();
|
||||
database.attachDictionary(create_query.table, DictionaryAttachInfo{query, config, modification_time});
|
||||
}
|
||||
catch (Exception & e)
|
||||
{
|
||||
e.addMessage("Cannot attach table '" + backQuote(query.table) + "' from query " + serializeAST(query));
|
||||
e.addMessage("Cannot attach dictionary " + backQuote(database.getDatabaseName()) + "." + backQuote(create_query.table) +
|
||||
" from metadata file " + metadata_path +
|
||||
" from query " + serializeAST(*query));
|
||||
throw;
|
||||
}
|
||||
}
|
||||
@ -173,12 +180,12 @@ void DatabaseOrdinary::loadStoredObjects(
|
||||
|
||||
/// Attach dictionaries.
|
||||
attachToExternalDictionariesLoader(context);
|
||||
for (const auto & name_with_query : file_names)
|
||||
for (const auto & [name, query] : file_names)
|
||||
{
|
||||
auto create_query = name_with_query.second->as<const ASTCreateQuery &>();
|
||||
auto create_query = query->as<const ASTCreateQuery &>();
|
||||
if (create_query.is_dictionary)
|
||||
{
|
||||
tryAttachDictionary(context, create_query, *this);
|
||||
tryAttachDictionary(query, *this, getMetadataPath() + name);
|
||||
|
||||
/// Messages, so that it's not boring to wait for the server to load for a long time.
|
||||
logAboutProgress(log, ++dictionaries_processed, total_dictionaries, watch);
|
||||
|
@ -5,6 +5,8 @@
|
||||
#include <Interpreters/ExternalLoaderTempConfigRepository.h>
|
||||
#include <Interpreters/ExternalLoaderDatabaseConfigRepository.h>
|
||||
#include <Dictionaries/getDictionaryConfigurationFromAST.h>
|
||||
#include <Dictionaries/DictionaryStructure.h>
|
||||
#include <Parsers/ASTCreateQuery.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Storages/StorageDictionary.h>
|
||||
#include <IO/WriteBufferFromFile.h>
|
||||
@ -24,46 +26,80 @@ namespace ErrorCodes
|
||||
{
|
||||
extern const int CANNOT_GET_CREATE_DICTIONARY_QUERY;
|
||||
extern const int TABLE_ALREADY_EXISTS;
|
||||
extern const int UNKNOWN_TABLE;
|
||||
extern const int UNKNOWN_DICTIONARY;
|
||||
extern const int DICTIONARY_ALREADY_EXISTS;
|
||||
extern const int FILE_DOESNT_EXIST;
|
||||
extern const int CANNOT_GET_CREATE_TABLE_QUERY;
|
||||
}
|
||||
|
||||
|
||||
void DatabaseWithDictionaries::attachDictionary(const String & dictionary_name, const Context & context)
|
||||
void DatabaseWithDictionaries::attachDictionary(const String & dictionary_name, const DictionaryAttachInfo & attach_info)
|
||||
{
|
||||
String full_name = getDatabaseName() + "." + dictionary_name;
|
||||
{
|
||||
std::lock_guard lock(mutex);
|
||||
if (!dictionaries.emplace(dictionary_name).second)
|
||||
auto [it, inserted] = dictionaries.emplace(dictionary_name, attach_info);
|
||||
if (!inserted)
|
||||
throw Exception("Dictionary " + full_name + " already exists.", ErrorCodes::DICTIONARY_ALREADY_EXISTS);
|
||||
|
||||
/// Attach the dictionary as table too.
|
||||
try
|
||||
{
|
||||
attachTableUnlocked(
|
||||
dictionary_name,
|
||||
StorageDictionary::create(
|
||||
StorageID(getDatabaseName(), dictionary_name),
|
||||
full_name,
|
||||
ExternalDictionariesLoader::getDictionaryStructure(*attach_info.config)));
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
dictionaries.erase(it);
|
||||
throw;
|
||||
}
|
||||
}
|
||||
|
||||
CurrentStatusInfo::set(CurrentStatusInfo::DictionaryStatus, full_name, static_cast<Int8>(ExternalLoaderStatus::NOT_LOADED));
|
||||
|
||||
/// ExternalLoader::reloadConfig() will find out that the dictionary's config has been added
|
||||
/// and in case `dictionaries_lazy_load == false` it will load the dictionary.
|
||||
const auto & external_loader = context.getExternalDictionariesLoader();
|
||||
external_loader.reloadConfig(getDatabaseName(), full_name);
|
||||
external_loader->reloadConfig(getDatabaseName(), full_name);
|
||||
}
|
||||
|
||||
void DatabaseWithDictionaries::detachDictionary(const String & dictionary_name, const Context & context)
|
||||
void DatabaseWithDictionaries::detachDictionary(const String & dictionary_name)
|
||||
{
|
||||
DictionaryAttachInfo attach_info;
|
||||
detachDictionaryImpl(dictionary_name, attach_info);
|
||||
}
|
||||
|
||||
void DatabaseWithDictionaries::detachDictionaryImpl(const String & dictionary_name, DictionaryAttachInfo & attach_info)
|
||||
{
|
||||
String full_name = getDatabaseName() + "." + dictionary_name;
|
||||
|
||||
{
|
||||
std::lock_guard lock(mutex);
|
||||
auto it = dictionaries.find(dictionary_name);
|
||||
if (it == dictionaries.end())
|
||||
throw Exception("Dictionary " + full_name + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE);
|
||||
throw Exception("Dictionary " + full_name + " doesn't exist.", ErrorCodes::UNKNOWN_DICTIONARY);
|
||||
attach_info = std::move(it->second);
|
||||
dictionaries.erase(it);
|
||||
|
||||
/// Detach the dictionary as table too.
|
||||
try
|
||||
{
|
||||
detachTableUnlocked(dictionary_name);
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
dictionaries.emplace(dictionary_name, std::move(attach_info));
|
||||
throw;
|
||||
}
|
||||
}
|
||||
|
||||
CurrentStatusInfo::unset(CurrentStatusInfo::DictionaryStatus, full_name);
|
||||
|
||||
/// ExternalLoader::reloadConfig() will find out that the dictionary's config has been removed
|
||||
/// and therefore it will unload the dictionary.
|
||||
const auto & external_loader = context.getExternalDictionariesLoader();
|
||||
external_loader.reloadConfig(getDatabaseName(), full_name);
|
||||
|
||||
external_loader->reloadConfig(getDatabaseName(), full_name);
|
||||
}
|
||||
|
||||
void DatabaseWithDictionaries::createDictionary(const Context & context, const String & dictionary_name, const ASTPtr & query)
|
||||
@ -85,8 +121,7 @@ void DatabaseWithDictionaries::createDictionary(const Context & context, const S
|
||||
|
||||
/// A dictionary with the same full name could be defined in *.xml config files.
|
||||
String full_name = getDatabaseName() + "." + dictionary_name;
|
||||
const auto & external_loader = context.getExternalDictionariesLoader();
|
||||
if (external_loader.getCurrentStatus(full_name) != ExternalLoader::Status::NOT_EXIST)
|
||||
if (external_loader->getCurrentStatus(full_name) != ExternalLoader::Status::NOT_EXIST)
|
||||
throw Exception(
|
||||
"Dictionary " + backQuote(getDatabaseName()) + "." + backQuote(dictionary_name) + " already exists.",
|
||||
ErrorCodes::DICTIONARY_ALREADY_EXISTS);
|
||||
@ -117,23 +152,22 @@ void DatabaseWithDictionaries::createDictionary(const Context & context, const S
|
||||
|
||||
/// Add a temporary repository containing the dictionary.
|
||||
/// We need this temp repository to try loading the dictionary before actually attaching it to the database.
|
||||
auto temp_repository
|
||||
= const_cast<ExternalDictionariesLoader &>(external_loader) /// the change of ExternalDictionariesLoader is temporary
|
||||
.addConfigRepository(std::make_unique<ExternalLoaderTempConfigRepository>(
|
||||
getDatabaseName(), dictionary_metadata_tmp_path, getDictionaryConfigurationFromAST(query->as<const ASTCreateQuery &>())));
|
||||
auto temp_repository = external_loader->addConfigRepository(std::make_unique<ExternalLoaderTempConfigRepository>(
|
||||
getDatabaseName(), dictionary_metadata_tmp_path, getDictionaryConfigurationFromAST(query->as<const ASTCreateQuery &>())));
|
||||
|
||||
bool lazy_load = context.getConfigRef().getBool("dictionaries_lazy_load", true);
|
||||
if (!lazy_load)
|
||||
{
|
||||
/// load() is called here to force loading the dictionary, wait until the loading is finished,
|
||||
/// and throw an exception if the loading is failed.
|
||||
external_loader.load(full_name);
|
||||
external_loader->load(full_name);
|
||||
}
|
||||
|
||||
attachDictionary(dictionary_name, context);
|
||||
auto config = getDictionaryConfigurationFromAST(query->as<const ASTCreateQuery &>());
|
||||
attachDictionary(dictionary_name, DictionaryAttachInfo{query, config, time(nullptr)});
|
||||
SCOPE_EXIT({
|
||||
if (!succeeded)
|
||||
detachDictionary(dictionary_name, context);
|
||||
detachDictionary(dictionary_name);
|
||||
});
|
||||
|
||||
/// If it was ATTACH query and file with dictionary metadata already exist
|
||||
@ -142,94 +176,31 @@ void DatabaseWithDictionaries::createDictionary(const Context & context, const S
|
||||
|
||||
/// ExternalDictionariesLoader doesn't know we renamed the metadata path.
|
||||
/// So we have to manually call reloadConfig() here.
|
||||
external_loader.reloadConfig(getDatabaseName(), full_name);
|
||||
external_loader->reloadConfig(getDatabaseName(), full_name);
|
||||
|
||||
/// Everything's ok.
|
||||
succeeded = true;
|
||||
}
|
||||
|
||||
void DatabaseWithDictionaries::removeDictionary(const Context & context, const String & dictionary_name)
|
||||
void DatabaseWithDictionaries::removeDictionary(const Context &, const String & dictionary_name)
|
||||
{
|
||||
detachDictionary(dictionary_name, context);
|
||||
|
||||
String dictionary_metadata_path = getObjectMetadataPath(dictionary_name);
|
||||
DictionaryAttachInfo attach_info;
|
||||
detachDictionaryImpl(dictionary_name, attach_info);
|
||||
|
||||
try
|
||||
{
|
||||
String dictionary_metadata_path = getObjectMetadataPath(dictionary_name);
|
||||
Poco::File(dictionary_metadata_path).remove();
|
||||
CurrentStatusInfo::unset(CurrentStatusInfo::DictionaryStatus, getDatabaseName() + "." + dictionary_name);
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
/// If remove was not possible for some reason
|
||||
attachDictionary(dictionary_name, context);
|
||||
attachDictionary(dictionary_name, attach_info);
|
||||
throw;
|
||||
}
|
||||
}
|
||||
|
||||
StoragePtr DatabaseWithDictionaries::tryGetTableImpl(const Context & context, const String & table_name, bool load) const
|
||||
{
|
||||
if (auto table_ptr = DatabaseWithOwnTablesBase::tryGetTable(context, table_name))
|
||||
return table_ptr;
|
||||
|
||||
if (isDictionaryExist(context, table_name))
|
||||
/// We don't need lock database here, because database doesn't store dictionary itself
|
||||
/// just metadata
|
||||
return getDictionaryStorage(context, table_name, load);
|
||||
|
||||
return {};
|
||||
}
|
||||
StoragePtr DatabaseWithDictionaries::tryGetTable(const Context & context, const String & table_name) const
|
||||
{
|
||||
return tryGetTableImpl(context, table_name, true /*load*/);
|
||||
}
|
||||
|
||||
ASTPtr DatabaseWithDictionaries::getCreateTableQueryImpl(const Context & context, const String & table_name, bool throw_on_error) const
|
||||
{
|
||||
ASTPtr ast;
|
||||
bool has_table = tryGetTableImpl(context, table_name, false /*load*/) != nullptr;
|
||||
auto table_metadata_path = getObjectMetadataPath(table_name);
|
||||
try
|
||||
{
|
||||
ast = getCreateQueryFromMetadata(context, table_metadata_path, throw_on_error);
|
||||
}
|
||||
catch (const Exception & e)
|
||||
{
|
||||
if (!has_table && e.code() == ErrorCodes::FILE_DOESNT_EXIST && throw_on_error)
|
||||
throw Exception{"Table " + backQuote(table_name) + " doesn't exist",
|
||||
ErrorCodes::CANNOT_GET_CREATE_TABLE_QUERY};
|
||||
else if (throw_on_error)
|
||||
throw;
|
||||
}
|
||||
return ast;
|
||||
}
|
||||
|
||||
DatabaseTablesIteratorPtr DatabaseWithDictionaries::getTablesWithDictionaryTablesIterator(
|
||||
const Context & context, const FilterByNameFunction & filter_by_dictionary_name)
|
||||
{
|
||||
/// NOTE: it's not atomic
|
||||
auto tables_it = getTablesIterator(context, filter_by_dictionary_name);
|
||||
auto dictionaries_it = getDictionariesIterator(context, filter_by_dictionary_name);
|
||||
|
||||
Tables result;
|
||||
while (tables_it && tables_it->isValid())
|
||||
{
|
||||
result.emplace(tables_it->name(), tables_it->table());
|
||||
tables_it->next();
|
||||
}
|
||||
|
||||
while (dictionaries_it && dictionaries_it->isValid())
|
||||
{
|
||||
auto table_name = dictionaries_it->name();
|
||||
auto table_ptr = getDictionaryStorage(context, table_name, false /*load*/);
|
||||
if (table_ptr)
|
||||
result.emplace(table_name, table_ptr);
|
||||
dictionaries_it->next();
|
||||
}
|
||||
|
||||
return std::make_unique<DatabaseTablesSnapshotIterator>(result);
|
||||
}
|
||||
|
||||
DatabaseDictionariesIteratorPtr DatabaseWithDictionaries::getDictionariesIterator(const Context & /*context*/, const FilterByNameFunction & filter_by_dictionary_name)
|
||||
{
|
||||
std::lock_guard lock(mutex);
|
||||
@ -237,9 +208,9 @@ DatabaseDictionariesIteratorPtr DatabaseWithDictionaries::getDictionariesIterato
|
||||
return std::make_unique<DatabaseDictionariesSnapshotIterator>(dictionaries);
|
||||
|
||||
Dictionaries filtered_dictionaries;
|
||||
for (const auto & dictionary_name : dictionaries)
|
||||
for (const auto & dictionary_name : dictionaries | boost::adaptors::map_keys)
|
||||
if (filter_by_dictionary_name(dictionary_name))
|
||||
filtered_dictionaries.emplace(dictionary_name);
|
||||
filtered_dictionaries.emplace_back(dictionary_name);
|
||||
return std::make_unique<DatabaseDictionariesSnapshotIterator>(std::move(filtered_dictionaries));
|
||||
}
|
||||
|
||||
@ -249,44 +220,84 @@ bool DatabaseWithDictionaries::isDictionaryExist(const Context & /*context*/, co
|
||||
return dictionaries.find(dictionary_name) != dictionaries.end();
|
||||
}
|
||||
|
||||
StoragePtr DatabaseWithDictionaries::getDictionaryStorage(const Context & context, const String & table_name, bool load) const
|
||||
{
|
||||
auto dict_name = database_name + "." + table_name;
|
||||
const auto & external_loader = context.getExternalDictionariesLoader();
|
||||
auto dict_ptr = external_loader.tryGetDictionary(dict_name, load);
|
||||
if (dict_ptr)
|
||||
{
|
||||
const DictionaryStructure & dictionary_structure = dict_ptr->getStructure();
|
||||
auto columns = StorageDictionary::getNamesAndTypes(dictionary_structure);
|
||||
return StorageDictionary::create(StorageID(database_name, table_name), ColumnsDescription{columns}, context, true, dict_name);
|
||||
}
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
ASTPtr DatabaseWithDictionaries::getCreateDictionaryQueryImpl(
|
||||
const Context & context,
|
||||
const String & dictionary_name,
|
||||
bool throw_on_error) const
|
||||
{
|
||||
ASTPtr ast;
|
||||
|
||||
auto dictionary_metadata_path = getObjectMetadataPath(dictionary_name);
|
||||
ast = getCreateQueryFromMetadata(context, dictionary_metadata_path, throw_on_error);
|
||||
if (!ast && throw_on_error)
|
||||
{
|
||||
/// Handle system.* tables for which there are no table.sql files.
|
||||
bool has_dictionary = isDictionaryExist(context, dictionary_name);
|
||||
|
||||
auto msg = has_dictionary ? "There is no CREATE DICTIONARY query for table " : "There is no metadata file for dictionary ";
|
||||
|
||||
throw Exception(msg + backQuote(dictionary_name), ErrorCodes::CANNOT_GET_CREATE_DICTIONARY_QUERY);
|
||||
/// Try to get create query ifg for an attached dictionary.
|
||||
std::lock_guard lock{mutex};
|
||||
auto it = dictionaries.find(dictionary_name);
|
||||
if (it != dictionaries.end())
|
||||
{
|
||||
ASTPtr ast = it->second.create_query->clone();
|
||||
auto & create_query = ast->as<ASTCreateQuery &>();
|
||||
create_query.attach = false;
|
||||
create_query.database = getDatabaseName();
|
||||
return ast;
|
||||
}
|
||||
}
|
||||
|
||||
return ast;
|
||||
/// Try to get create query for non-attached dictionary.
|
||||
ASTPtr ast;
|
||||
try
|
||||
{
|
||||
auto dictionary_metadata_path = getObjectMetadataPath(dictionary_name);
|
||||
ast = getCreateQueryFromMetadata(context, dictionary_metadata_path, throw_on_error);
|
||||
}
|
||||
catch (const Exception & e)
|
||||
{
|
||||
if (throw_on_error && (e.code() != ErrorCodes::FILE_DOESNT_EXIST))
|
||||
throw;
|
||||
}
|
||||
|
||||
if (ast)
|
||||
{
|
||||
const auto * create_query = ast->as<const ASTCreateQuery>();
|
||||
if (create_query && create_query->is_dictionary)
|
||||
return ast;
|
||||
}
|
||||
if (throw_on_error)
|
||||
throw Exception{"Dictionary " + backQuote(dictionary_name) + " doesn't exist",
|
||||
ErrorCodes::CANNOT_GET_CREATE_DICTIONARY_QUERY};
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
|
||||
Poco::AutoPtr<Poco::Util::AbstractConfiguration> DatabaseWithDictionaries::getDictionaryConfiguration(const String & dictionary_name) const
|
||||
{
|
||||
std::lock_guard lock(mutex);
|
||||
auto it = dictionaries.find(dictionary_name);
|
||||
if (it != dictionaries.end())
|
||||
return it->second.config;
|
||||
throw Exception("Dictionary " + backQuote(dictionary_name) + " doesn't exist", ErrorCodes::UNKNOWN_DICTIONARY);
|
||||
}
|
||||
|
||||
time_t DatabaseWithDictionaries::getObjectMetadataModificationTime(const String & object_name) const
|
||||
{
|
||||
{
|
||||
std::lock_guard lock(mutex);
|
||||
auto it = dictionaries.find(object_name);
|
||||
if (it != dictionaries.end())
|
||||
return it->second.modification_time;
|
||||
}
|
||||
return DatabaseOnDisk::getObjectMetadataModificationTime(object_name);
|
||||
}
|
||||
|
||||
|
||||
bool DatabaseWithDictionaries::empty(const Context &) const
|
||||
{
|
||||
std::lock_guard lock{mutex};
|
||||
return tables.empty() && dictionaries.empty();
|
||||
}
|
||||
|
||||
void DatabaseWithDictionaries::shutdown()
|
||||
{
|
||||
{
|
||||
std::lock_guard lock(mutex);
|
||||
dictionaries.clear();
|
||||
}
|
||||
detachFromExternalDictionariesLoader();
|
||||
DatabaseOnDisk::shutdown();
|
||||
}
|
||||
@ -295,8 +306,9 @@ DatabaseWithDictionaries::~DatabaseWithDictionaries() = default;
|
||||
|
||||
void DatabaseWithDictionaries::attachToExternalDictionariesLoader(Context & context)
|
||||
{
|
||||
database_as_config_repo_for_external_loader = context.getExternalDictionariesLoader().addConfigRepository(
|
||||
std::make_unique<ExternalLoaderDatabaseConfigRepository>(*this, context));
|
||||
external_loader = &context.getExternalDictionariesLoader();
|
||||
database_as_config_repo_for_external_loader
|
||||
= external_loader->addConfigRepository(std::make_unique<ExternalLoaderDatabaseConfigRepository>(*this, context));
|
||||
}
|
||||
|
||||
void DatabaseWithDictionaries::detachFromExternalDictionariesLoader()
|
||||
|
@ -8,9 +8,9 @@ namespace DB
|
||||
class DatabaseWithDictionaries : public DatabaseOnDisk
|
||||
{
|
||||
public:
|
||||
void attachDictionary(const String & name, const Context & context) override;
|
||||
void attachDictionary(const String & dictionary_name, const DictionaryAttachInfo & attach_info) override;
|
||||
|
||||
void detachDictionary(const String & name, const Context & context) override;
|
||||
void detachDictionary(const String & dictionary_name) override;
|
||||
|
||||
void createDictionary(const Context & context,
|
||||
const String & dictionary_name,
|
||||
@ -18,15 +18,15 @@ public:
|
||||
|
||||
void removeDictionary(const Context & context, const String & dictionary_name) override;
|
||||
|
||||
StoragePtr tryGetTable(const Context & context, const String & table_name) const override;
|
||||
|
||||
ASTPtr getCreateTableQueryImpl(const Context & context, const String & table_name, bool throw_on_error) const override;
|
||||
|
||||
DatabaseTablesIteratorPtr getTablesWithDictionaryTablesIterator(const Context & context, const FilterByNameFunction & filter_by_dictionary_name) override;
|
||||
bool isDictionaryExist(const Context & context, const String & dictionary_name) const override;
|
||||
|
||||
DatabaseDictionariesIteratorPtr getDictionariesIterator(const Context & context, const FilterByNameFunction & filter_by_dictionary_name) override;
|
||||
|
||||
bool isDictionaryExist(const Context & context, const String & dictionary_name) const override;
|
||||
Poco::AutoPtr<Poco::Util::AbstractConfiguration> getDictionaryConfiguration(const String & /*name*/) const override;
|
||||
|
||||
time_t getObjectMetadataModificationTime(const String & object_name) const override;
|
||||
|
||||
bool empty(const Context & context) const override;
|
||||
|
||||
void shutdown() override;
|
||||
|
||||
@ -39,16 +39,17 @@ protected:
|
||||
void attachToExternalDictionariesLoader(Context & context);
|
||||
void detachFromExternalDictionariesLoader();
|
||||
|
||||
StoragePtr getDictionaryStorage(const Context & context, const String & table_name, bool load) const;
|
||||
void detachDictionaryImpl(const String & dictionary_name, DictionaryAttachInfo & attach_info);
|
||||
|
||||
ASTPtr getCreateDictionaryQueryImpl(const Context & context,
|
||||
const String & dictionary_name,
|
||||
bool throw_on_error) const override;
|
||||
|
||||
private:
|
||||
ext::scope_guard database_as_config_repo_for_external_loader;
|
||||
std::unordered_map<String, DictionaryAttachInfo> dictionaries;
|
||||
|
||||
StoragePtr tryGetTableImpl(const Context & context, const String & table_name, bool load) const;
|
||||
private:
|
||||
ExternalDictionariesLoader * external_loader = nullptr;
|
||||
ext::scope_guard database_as_config_repo_for_external_loader;
|
||||
};
|
||||
|
||||
}
|
||||
|
@ -27,7 +27,7 @@ bool DatabaseWithOwnTablesBase::isTableExist(
|
||||
const String & table_name) const
|
||||
{
|
||||
std::lock_guard lock(mutex);
|
||||
return tables.find(table_name) != tables.end() || dictionaries.find(table_name) != dictionaries.end();
|
||||
return tables.find(table_name) != tables.end();
|
||||
}
|
||||
|
||||
StoragePtr DatabaseWithOwnTablesBase::tryGetTable(
|
||||
@ -58,7 +58,7 @@ DatabaseTablesIteratorPtr DatabaseWithOwnTablesBase::getTablesIterator(const Con
|
||||
bool DatabaseWithOwnTablesBase::empty(const Context & /*context*/) const
|
||||
{
|
||||
std::lock_guard lock(mutex);
|
||||
return tables.empty() && dictionaries.empty();
|
||||
return tables.empty();
|
||||
}
|
||||
|
||||
StoragePtr DatabaseWithOwnTablesBase::detachTable(const String & table_name)
|
||||
@ -125,7 +125,6 @@ void DatabaseWithOwnTablesBase::shutdown()
|
||||
|
||||
std::lock_guard lock(mutex);
|
||||
tables.clear();
|
||||
dictionaries.clear();
|
||||
}
|
||||
|
||||
DatabaseWithOwnTablesBase::~DatabaseWithOwnTablesBase()
|
||||
|
@ -42,7 +42,6 @@ public:
|
||||
protected:
|
||||
mutable std::mutex mutex;
|
||||
Tables tables;
|
||||
Dictionaries dictionaries;
|
||||
Poco::Logger * log;
|
||||
|
||||
DatabaseWithOwnTablesBase(const String & name_, const String & logger);
|
||||
|
18
src/Databases/DictionaryAttachInfo.h
Normal file
18
src/Databases/DictionaryAttachInfo.h
Normal file
@ -0,0 +1,18 @@
|
||||
#pragma once
|
||||
|
||||
#include <Parsers/IAST_fwd.h>
|
||||
#include <Poco/AutoPtr.h>
|
||||
#include <Poco/Util/AbstractConfiguration.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
struct DictionaryAttachInfo
|
||||
{
|
||||
ASTPtr create_query;
|
||||
Poco::AutoPtr<Poco::Util::AbstractConfiguration> config;
|
||||
time_t modification_time;
|
||||
};
|
||||
|
||||
}
|
@ -5,8 +5,11 @@
|
||||
#include <Storages/IStorage_fwd.h>
|
||||
#include <Storages/StorageInMemoryMetadata.h>
|
||||
#include <Dictionaries/IDictionary.h>
|
||||
#include <Databases/DictionaryAttachInfo.h>
|
||||
#include <Common/Exception.h>
|
||||
|
||||
#include <boost/range/adaptor/map.hpp>
|
||||
#include <boost/range/algorithm/copy.hpp>
|
||||
#include <ctime>
|
||||
#include <functional>
|
||||
#include <memory>
|
||||
@ -18,11 +21,10 @@ namespace DB
|
||||
class Context;
|
||||
struct Settings;
|
||||
struct ConstraintsDescription;
|
||||
class ColumnsDescription;
|
||||
struct IndicesDescription;
|
||||
struct TableStructureWriteLockHolder;
|
||||
class ASTCreateQuery;
|
||||
using Dictionaries = std::set<String>;
|
||||
using Dictionaries = std::vector<String>;
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
@ -74,9 +76,14 @@ private:
|
||||
public:
|
||||
DatabaseDictionariesSnapshotIterator() = default;
|
||||
DatabaseDictionariesSnapshotIterator(Dictionaries & dictionaries_) : dictionaries(dictionaries_), it(dictionaries.begin()) {}
|
||||
|
||||
DatabaseDictionariesSnapshotIterator(Dictionaries && dictionaries_) : dictionaries(dictionaries_), it(dictionaries.begin()) {}
|
||||
|
||||
DatabaseDictionariesSnapshotIterator(const std::unordered_map<String, DictionaryAttachInfo> & dictionaries_)
|
||||
{
|
||||
boost::range::copy(dictionaries_ | boost::adaptors::map_keys, std::back_inserter(dictionaries));
|
||||
it = dictionaries.begin();
|
||||
}
|
||||
|
||||
void next() { ++it; }
|
||||
|
||||
bool isValid() const { return !dictionaries.empty() && it != dictionaries.end(); }
|
||||
@ -140,12 +147,6 @@ public:
|
||||
return std::make_unique<DatabaseDictionariesSnapshotIterator>();
|
||||
}
|
||||
|
||||
/// Get an iterator to pass through all the tables and dictionary tables.
|
||||
virtual DatabaseTablesIteratorPtr getTablesWithDictionaryTablesIterator(const Context & context, const FilterByNameFunction & filter_by_name = {})
|
||||
{
|
||||
return getTablesIterator(context, filter_by_name);
|
||||
}
|
||||
|
||||
/// Is the database empty.
|
||||
virtual bool empty(const Context & context) const = 0;
|
||||
|
||||
@ -192,7 +193,7 @@ public:
|
||||
|
||||
/// Add dictionary to the database, but do not add it to the metadata. The database may not support this method.
|
||||
/// If dictionaries_lazy_load is false it also starts loading the dictionary asynchronously.
|
||||
virtual void attachDictionary(const String & /*name*/, const Context & /*context*/)
|
||||
virtual void attachDictionary(const String & /* dictionary_name */, const DictionaryAttachInfo & /* attach_info */)
|
||||
{
|
||||
throw Exception("There is no ATTACH DICTIONARY query for Database" + getEngineName(), ErrorCodes::NOT_IMPLEMENTED);
|
||||
}
|
||||
@ -204,7 +205,7 @@ public:
|
||||
}
|
||||
|
||||
/// Forget about the dictionary without deleting it. The database may not support this method.
|
||||
virtual void detachDictionary(const String & /*name*/, const Context & /*context*/)
|
||||
virtual void detachDictionary(const String & /*name*/)
|
||||
{
|
||||
throw Exception("There is no DETACH DICTIONARY query for Database" + getEngineName(), ErrorCodes::NOT_IMPLEMENTED);
|
||||
}
|
||||
@ -260,6 +261,11 @@ public:
|
||||
return getCreateDictionaryQueryImpl(context, name, true);
|
||||
}
|
||||
|
||||
virtual Poco::AutoPtr<Poco::Util::AbstractConfiguration> getDictionaryConfiguration(const String & /*name*/) const
|
||||
{
|
||||
throw Exception(getEngineName() + ": getDictionaryConfiguration() is not supported", ErrorCodes::NOT_IMPLEMENTED);
|
||||
}
|
||||
|
||||
/// Get the CREATE DATABASE query for current database.
|
||||
virtual ASTPtr getCreateDatabaseQuery(const Context & /*context*/) const = 0;
|
||||
|
||||
@ -276,6 +282,9 @@ public:
|
||||
/// Returns metadata path of a concrete table if the database supports it, empty string otherwise
|
||||
virtual String getObjectMetadataPath(const String & /*table_name*/) const { return {}; }
|
||||
|
||||
/// All tables and dictionaries should be detached before detaching the database.
|
||||
virtual bool shouldBeEmptyOnDetach() const { return true; }
|
||||
|
||||
/// Ask all tables to complete the background threads they are using and delete all table objects.
|
||||
virtual void shutdown() = 0;
|
||||
|
||||
|
@ -46,6 +46,7 @@ namespace ErrorCodes
|
||||
extern const int BAD_ARGUMENTS;
|
||||
extern const int UNSUPPORTED_METHOD;
|
||||
extern const int TOO_SMALL_BUFFER_SIZE;
|
||||
extern const int TIMEOUT_EXCEEDED;
|
||||
}
|
||||
|
||||
|
||||
@ -63,10 +64,12 @@ CacheDictionary::CacheDictionary(
|
||||
const DictionaryStructure & dict_struct_,
|
||||
DictionarySourcePtr source_ptr_,
|
||||
DictionaryLifetime dict_lifetime_,
|
||||
size_t strict_max_lifetime_seconds_,
|
||||
size_t size_,
|
||||
bool allow_read_expired_keys_,
|
||||
size_t max_update_queue_size_,
|
||||
size_t update_queue_push_timeout_milliseconds_,
|
||||
size_t query_wait_timeout_milliseconds_,
|
||||
size_t max_threads_for_updates_)
|
||||
: database(database_)
|
||||
, name(name_)
|
||||
@ -74,9 +77,11 @@ CacheDictionary::CacheDictionary(
|
||||
, dict_struct(dict_struct_)
|
||||
, source_ptr{std::move(source_ptr_)}
|
||||
, dict_lifetime(dict_lifetime_)
|
||||
, strict_max_lifetime_seconds(strict_max_lifetime_seconds_)
|
||||
, allow_read_expired_keys(allow_read_expired_keys_)
|
||||
, max_update_queue_size(max_update_queue_size_)
|
||||
, update_queue_push_timeout_milliseconds(update_queue_push_timeout_milliseconds_)
|
||||
, query_wait_timeout_milliseconds(query_wait_timeout_milliseconds_)
|
||||
, max_threads_for_updates(max_threads_for_updates_)
|
||||
, log(&Logger::get("ExternalDictionaries"))
|
||||
, size{roundUpToPowerOfTwoOrZero(std::max(size_, size_t(max_collision_length)))}
|
||||
@ -332,6 +337,13 @@ void CacheDictionary::has(const PaddedPODArray<Key> & ids, PaddedPODArray<UInt8>
|
||||
{
|
||||
if (find_result.outdated)
|
||||
{
|
||||
/// Protection of reading very expired keys.
|
||||
if (now > cells[find_result.cell_idx].strict_max)
|
||||
{
|
||||
cache_not_found_ids[id].push_back(row);
|
||||
continue;
|
||||
}
|
||||
|
||||
cache_expired_ids[id].push_back(row);
|
||||
|
||||
if (allow_read_expired_keys)
|
||||
@ -693,6 +705,9 @@ void registerDictionaryCache(DictionaryFactory & factory)
|
||||
const String name = config.getString(config_prefix + ".name");
|
||||
const DictionaryLifetime dict_lifetime{config, config_prefix + ".lifetime"};
|
||||
|
||||
const size_t strict_max_lifetime_seconds =
|
||||
config.getUInt64(layout_prefix + ".cache.strict_max_lifetime_seconds", static_cast<size_t>(dict_lifetime.max_sec));
|
||||
|
||||
const size_t max_update_queue_size =
|
||||
config.getUInt64(layout_prefix + ".cache.max_update_queue_size", 100000);
|
||||
if (max_update_queue_size == 0)
|
||||
@ -708,6 +723,9 @@ void registerDictionaryCache(DictionaryFactory & factory)
|
||||
throw Exception{name + ": dictionary of layout 'cache' have too little update_queue_push_timeout",
|
||||
ErrorCodes::BAD_ARGUMENTS};
|
||||
|
||||
const size_t query_wait_timeout_milliseconds =
|
||||
config.getUInt64(layout_prefix + ".cache.query_wait_timeout_milliseconds", 60000);
|
||||
|
||||
const size_t max_threads_for_updates =
|
||||
config.getUInt64(layout_prefix + ".max_threads_for_updates", 4);
|
||||
if (max_threads_for_updates == 0)
|
||||
@ -715,8 +733,17 @@ void registerDictionaryCache(DictionaryFactory & factory)
|
||||
ErrorCodes::BAD_ARGUMENTS};
|
||||
|
||||
return std::make_unique<CacheDictionary>(
|
||||
database, name, dict_struct, std::move(source_ptr), dict_lifetime, size,
|
||||
allow_read_expired_keys, max_update_queue_size, update_queue_push_timeout_milliseconds,
|
||||
database,
|
||||
name,
|
||||
dict_struct,
|
||||
std::move(source_ptr),
|
||||
dict_lifetime,
|
||||
strict_max_lifetime_seconds,
|
||||
size,
|
||||
allow_read_expired_keys,
|
||||
max_update_queue_size,
|
||||
update_queue_push_timeout_milliseconds,
|
||||
query_wait_timeout_milliseconds,
|
||||
max_threads_for_updates);
|
||||
};
|
||||
factory.registerLayout("cache", create_layout, false);
|
||||
@ -782,20 +809,32 @@ void CacheDictionary::updateThreadFunction()
|
||||
|
||||
void CacheDictionary::waitForCurrentUpdateFinish(UpdateUnitPtr & update_unit_ptr) const
|
||||
{
|
||||
std::unique_lock<std::mutex> lock(update_mutex);
|
||||
std::unique_lock<std::mutex> update_lock(update_mutex);
|
||||
|
||||
/*
|
||||
* We wait here without any timeout to avoid SEGFAULT's.
|
||||
* Consider timeout for wait had expired and main query's thread ended with exception
|
||||
* or some other error. But the UpdateUnit with callbacks is left in the queue.
|
||||
* It has these callback that capture god knows what from the current thread
|
||||
* (most of the variables lies on the stack of finished thread) that
|
||||
* intended to do a synchronous update. AsyncUpdate thread can touch deallocated memory and explode.
|
||||
* */
|
||||
is_update_finished.wait(
|
||||
lock,
|
||||
size_t timeout_for_wait = 100000;
|
||||
bool result = is_update_finished.wait_for(
|
||||
update_lock,
|
||||
std::chrono::milliseconds(timeout_for_wait),
|
||||
[&] {return update_unit_ptr->is_done || update_unit_ptr->current_exception; });
|
||||
|
||||
if (!result)
|
||||
{
|
||||
std::lock_guard<std::mutex> callback_lock(update_unit_ptr->callback_mutex);
|
||||
/*
|
||||
* We acquire a lock here and store false to special variable to avoid SEGFAULT's.
|
||||
* Consider timeout for wait had expired and main query's thread ended with exception
|
||||
* or some other error. But the UpdateUnit with callbacks is left in the queue.
|
||||
* It has these callback that capture god knows what from the current thread
|
||||
* (most of the variables lies on the stack of finished thread) that
|
||||
* intended to do a synchronous update. AsyncUpdate thread can touch deallocated memory and explode.
|
||||
* */
|
||||
update_unit_ptr->can_use_callback = false;
|
||||
throw DB::Exception(
|
||||
"Dictionary " + getName() + " source seems unavailable, because " +
|
||||
toString(timeout_for_wait) + " timeout exceeded.", ErrorCodes::TIMEOUT_EXCEEDED);
|
||||
}
|
||||
|
||||
|
||||
if (update_unit_ptr->current_exception)
|
||||
std::rethrow_exception(update_unit_ptr->current_exception);
|
||||
}
|
||||
@ -968,9 +1007,14 @@ void CacheDictionary::update(BunchUpdateUnit & bunch_update_unit) const
|
||||
{
|
||||
std::uniform_int_distribution<UInt64> distribution{dict_lifetime.min_sec, dict_lifetime.max_sec};
|
||||
cell.setExpiresAt(now + std::chrono::seconds{distribution(rnd_engine)});
|
||||
cell.strict_max = now + std::chrono::seconds{strict_max_lifetime_seconds};
|
||||
}
|
||||
else
|
||||
{
|
||||
cell.setExpiresAt(std::chrono::time_point<std::chrono::system_clock>::max());
|
||||
cell.strict_max = now + std::chrono::seconds{strict_max_lifetime_seconds};
|
||||
}
|
||||
|
||||
|
||||
/// Set null_value for each attribute
|
||||
cell.setDefault();
|
||||
|
@ -55,10 +55,12 @@ public:
|
||||
const DictionaryStructure & dict_struct_,
|
||||
DictionarySourcePtr source_ptr_,
|
||||
DictionaryLifetime dict_lifetime_,
|
||||
size_t strict_max_lifetime_seconds,
|
||||
size_t size_,
|
||||
bool allow_read_expired_keys_,
|
||||
size_t max_update_queue_size_,
|
||||
size_t update_queue_push_timeout_milliseconds_,
|
||||
size_t query_wait_timeout_milliseconds,
|
||||
size_t max_threads_for_updates);
|
||||
|
||||
~CacheDictionary() override;
|
||||
@ -87,9 +89,18 @@ public:
|
||||
std::shared_ptr<const IExternalLoadable> clone() const override
|
||||
{
|
||||
return std::make_shared<CacheDictionary>(
|
||||
database, name, dict_struct, source_ptr->clone(), dict_lifetime, size,
|
||||
allow_read_expired_keys, max_update_queue_size,
|
||||
update_queue_push_timeout_milliseconds, max_threads_for_updates);
|
||||
database,
|
||||
name,
|
||||
dict_struct,
|
||||
source_ptr->clone(),
|
||||
dict_lifetime,
|
||||
strict_max_lifetime_seconds,
|
||||
size,
|
||||
allow_read_expired_keys,
|
||||
max_update_queue_size,
|
||||
update_queue_push_timeout_milliseconds,
|
||||
query_wait_timeout_milliseconds,
|
||||
max_threads_for_updates);
|
||||
}
|
||||
|
||||
const IDictionarySource * getSource() const override { return source_ptr.get(); }
|
||||
@ -206,6 +217,8 @@ private:
|
||||
/// Stores both expiration time and `is_default` flag in the most significant bit
|
||||
time_point_urep_t data;
|
||||
|
||||
time_point_t strict_max;
|
||||
|
||||
/// Sets expiration time, resets `is_default` flag to false
|
||||
time_point_t expiresAt() const { return ext::safe_bit_cast<time_point_t>(data & EXPIRES_AT_MASK); }
|
||||
void setExpiresAt(const time_point_t & t) { data = ext::safe_bit_cast<time_point_urep_t>(t); }
|
||||
@ -294,9 +307,11 @@ private:
|
||||
const DictionaryStructure dict_struct;
|
||||
mutable DictionarySourcePtr source_ptr;
|
||||
const DictionaryLifetime dict_lifetime;
|
||||
const size_t strict_max_lifetime_seconds;
|
||||
const bool allow_read_expired_keys;
|
||||
const size_t max_update_queue_size;
|
||||
const size_t update_queue_push_timeout_milliseconds;
|
||||
const size_t query_wait_timeout_milliseconds;
|
||||
const size_t max_threads_for_updates;
|
||||
|
||||
Logger * const log;
|
||||
@ -366,6 +381,12 @@ private:
|
||||
alive_keys(CurrentMetrics::CacheDictionaryUpdateQueueKeys, requested_ids.size()){}
|
||||
|
||||
std::vector<Key> requested_ids;
|
||||
|
||||
/// It might seem that it is a leak of performance.
|
||||
/// But aquiring a mutex without contention is rather cheap.
|
||||
std::mutex callback_mutex;
|
||||
bool can_use_callback{true};
|
||||
|
||||
PresentIdHandler present_id_handler;
|
||||
AbsentIdHandler absent_id_handler;
|
||||
|
||||
@ -412,6 +433,7 @@ private:
|
||||
helper.push_back(unit_ptr->requested_ids.size() + helper.back());
|
||||
present_id_handlers.emplace_back(unit_ptr->present_id_handler);
|
||||
absent_id_handlers.emplace_back(unit_ptr->absent_id_handler);
|
||||
update_units.emplace_back(unit_ptr);
|
||||
}
|
||||
|
||||
concatenated_requested_ids.reserve(total_requested_keys_count);
|
||||
@ -428,31 +450,51 @@ private:
|
||||
|
||||
void informCallersAboutPresentId(Key id, size_t cell_idx)
|
||||
{
|
||||
for (size_t i = 0; i < concatenated_requested_ids.size(); ++i)
|
||||
for (size_t position = 0; position < concatenated_requested_ids.size(); ++position)
|
||||
{
|
||||
auto & curr = concatenated_requested_ids[i];
|
||||
if (curr == id)
|
||||
getPresentIdHandlerForPosition(i)(id, cell_idx);
|
||||
if (concatenated_requested_ids[position] == id)
|
||||
{
|
||||
auto unit_number = getUpdateUnitNumberForRequestedIdPosition(position);
|
||||
auto lock = getLockToCurrentUnit(unit_number);
|
||||
if (canUseCallback(unit_number))
|
||||
getPresentIdHandlerForPosition(unit_number)(id, cell_idx);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void informCallersAboutAbsentId(Key id, size_t cell_idx)
|
||||
{
|
||||
for (size_t i = 0; i < concatenated_requested_ids.size(); ++i)
|
||||
if (concatenated_requested_ids[i] == id)
|
||||
getAbsentIdHandlerForPosition(i)(id, cell_idx);
|
||||
for (size_t position = 0; position < concatenated_requested_ids.size(); ++position)
|
||||
if (concatenated_requested_ids[position] == id)
|
||||
{
|
||||
auto unit_number = getUpdateUnitNumberForRequestedIdPosition(position);
|
||||
auto lock = getLockToCurrentUnit(unit_number);
|
||||
if (canUseCallback(unit_number))
|
||||
getAbsentIdHandlerForPosition(unit_number)(id, cell_idx);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
private:
|
||||
PresentIdHandler & getPresentIdHandlerForPosition(size_t position)
|
||||
/// Needed for control the usage of callback to avoid SEGFAULTs.
|
||||
bool canUseCallback(size_t unit_number)
|
||||
{
|
||||
return present_id_handlers[getUpdateUnitNumberForRequestedIdPosition(position)];
|
||||
return update_units[unit_number].get()->can_use_callback;
|
||||
}
|
||||
|
||||
AbsentIdHandler & getAbsentIdHandlerForPosition(size_t position)
|
||||
std::unique_lock<std::mutex> getLockToCurrentUnit(size_t unit_number)
|
||||
{
|
||||
return absent_id_handlers[getUpdateUnitNumberForRequestedIdPosition((position))];
|
||||
return std::unique_lock<std::mutex>(update_units[unit_number].get()->callback_mutex);
|
||||
}
|
||||
|
||||
PresentIdHandler & getPresentIdHandlerForPosition(size_t unit_number)
|
||||
{
|
||||
return update_units[unit_number].get()->present_id_handler;
|
||||
}
|
||||
|
||||
AbsentIdHandler & getAbsentIdHandlerForPosition(size_t unit_number)
|
||||
{
|
||||
return update_units[unit_number].get()->absent_id_handler;
|
||||
}
|
||||
|
||||
size_t getUpdateUnitNumberForRequestedIdPosition(size_t position)
|
||||
@ -464,6 +506,8 @@ private:
|
||||
std::vector<PresentIdHandler> present_id_handlers;
|
||||
std::vector<AbsentIdHandler> absent_id_handlers;
|
||||
|
||||
std::vector<std::reference_wrapper<UpdateUnitPtr>> update_units;
|
||||
|
||||
std::vector<size_t> helper;
|
||||
};
|
||||
|
||||
|
@ -75,6 +75,13 @@ void CacheDictionary::getItemsNumberImpl(
|
||||
|
||||
if (find_result.outdated)
|
||||
{
|
||||
/// Protection of reading very expired keys.
|
||||
if (now > cells[find_result.cell_idx].strict_max)
|
||||
{
|
||||
cache_not_found_ids[id].push_back(row);
|
||||
continue;
|
||||
}
|
||||
|
||||
cache_expired_ids[id].push_back(row);
|
||||
if (allow_read_expired_keys)
|
||||
update_routine();
|
||||
@ -249,6 +256,13 @@ void CacheDictionary::getItemsString(
|
||||
{
|
||||
if (find_result.outdated)
|
||||
{
|
||||
/// Protection of reading very expired keys.
|
||||
if (now > cells[find_result.cell_idx].strict_max)
|
||||
{
|
||||
cache_not_found_ids[id].push_back(row);
|
||||
continue;
|
||||
}
|
||||
|
||||
cache_expired_ids[id].push_back(row);
|
||||
|
||||
if (allow_read_expired_keys)
|
||||
|
@ -11,5 +11,4 @@ using DictionaryConfigurationPtr = Poco::AutoPtr<Poco::Util::AbstractConfigurati
|
||||
/// This function is necessary because all loadable objects configuration are Poco::AbstractConfiguration
|
||||
/// Can throw exception if query is ill-formed
|
||||
DictionaryConfigurationPtr getDictionaryConfigurationFromAST(const ASTCreateQuery & query);
|
||||
|
||||
}
|
||||
|
@ -35,7 +35,7 @@ public:
|
||||
return 1;
|
||||
}
|
||||
|
||||
bool isInjective(const Block &) override
|
||||
bool isInjective(const Block &) const override
|
||||
{
|
||||
return is_injective;
|
||||
}
|
||||
|
@ -115,7 +115,7 @@ public:
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override { return 1; }
|
||||
bool isInjective(const Block &) override { return is_injective; }
|
||||
bool isInjective(const Block &) const override { return is_injective; }
|
||||
|
||||
bool useDefaultImplementationForConstants() const override { return true; }
|
||||
|
||||
|
@ -72,7 +72,7 @@ public:
|
||||
String getName() const override { return name; }
|
||||
|
||||
size_t getNumberOfArguments() const override { return 1; }
|
||||
bool isInjective(const Block &) override { return true; }
|
||||
bool isInjective(const Block &) const override { return true; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
@ -326,7 +326,7 @@ public:
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override { return 1; }
|
||||
bool isInjective(const Block &) override { return mask_tail_octets == 0; }
|
||||
bool isInjective(const Block &) const override { return mask_tail_octets == 0; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
@ -447,7 +447,7 @@ public:
|
||||
String getName() const override { return name; }
|
||||
|
||||
size_t getNumberOfArguments() const override { return 1; }
|
||||
bool isInjective(const Block &) override { return true; }
|
||||
bool isInjective(const Block &) const override { return true; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
@ -546,7 +546,7 @@ public:
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override { return 1; }
|
||||
bool isInjective(const Block &) override { return true; }
|
||||
bool isInjective(const Block &) const override { return true; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
@ -739,7 +739,7 @@ public:
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override { return 1; }
|
||||
bool isInjective(const Block &) override { return true; }
|
||||
bool isInjective(const Block &) const override { return true; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
@ -837,7 +837,7 @@ public:
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override { return 1; }
|
||||
bool isInjective(const Block &) override { return true; }
|
||||
bool isInjective(const Block &) const override { return true; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
@ -941,7 +941,7 @@ public:
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override { return 1; }
|
||||
bool isInjective(const Block &) override { return true; }
|
||||
bool isInjective(const Block &) const override { return true; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
@ -1224,7 +1224,7 @@ public:
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override { return 1; }
|
||||
bool isInjective(const Block &) override { return true; }
|
||||
bool isInjective(const Block &) const override { return true; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
@ -1313,7 +1313,7 @@ public:
|
||||
}
|
||||
|
||||
bool isVariadic() const override { return true; }
|
||||
bool isInjective(const Block &) override { return true; }
|
||||
bool isInjective(const Block &) const override { return true; }
|
||||
size_t getNumberOfArguments() const override { return 0; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
@ -1408,7 +1408,7 @@ public:
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override { return 1; }
|
||||
bool isInjective(const Block &) override { return true; }
|
||||
bool isInjective(const Block &) const override { return true; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
|
@ -913,7 +913,7 @@ public:
|
||||
|
||||
bool isVariadic() const override { return true; }
|
||||
size_t getNumberOfArguments() const override { return 0; }
|
||||
bool isInjective(const Block &) override { return std::is_same_v<Name, NameToString>; }
|
||||
bool isInjective(const Block &) const override { return std::is_same_v<Name, NameToString>; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
|
||||
{
|
||||
@ -1268,7 +1268,7 @@ public:
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override { return 2; }
|
||||
bool isInjective(const Block &) override { return true; }
|
||||
bool isInjective(const Block &) const override { return true; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
|
||||
{
|
||||
|
@ -592,7 +592,7 @@ public:
|
||||
|
||||
/// For the purpose of query optimization, we assume this function to be injective
|
||||
/// even in face of fact that there are many different cities named Moscow.
|
||||
bool isInjective(const Block &) override { return true; }
|
||||
bool isInjective(const Block &) const override { return true; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
|
@ -243,7 +243,7 @@ private:
|
||||
bool useDefaultImplementationForConstants() const final { return true; }
|
||||
ColumnNumbers getArgumentsThatAreAlwaysConstant() const final { return {0, 1}; }
|
||||
|
||||
bool isInjective(const Block & sample_block) override
|
||||
bool isInjective(const Block & sample_block) const override
|
||||
{
|
||||
return isDictGetFunctionInjective(dictionaries_loader, sample_block);
|
||||
}
|
||||
@ -769,7 +769,7 @@ private:
|
||||
bool useDefaultImplementationForConstants() const final { return true; }
|
||||
ColumnNumbers getArgumentsThatAreAlwaysConstant() const final { return {0, 1}; }
|
||||
|
||||
bool isInjective(const Block & sample_block) override
|
||||
bool isInjective(const Block & sample_block) const override
|
||||
{
|
||||
return isDictGetFunctionInjective(dictionaries_loader, sample_block);
|
||||
}
|
||||
@ -1338,7 +1338,7 @@ private:
|
||||
bool useDefaultImplementationForConstants() const final { return true; }
|
||||
ColumnNumbers getArgumentsThatAreAlwaysConstant() const final { return {0, 1}; }
|
||||
|
||||
bool isInjective(const Block & sample_block) override
|
||||
bool isInjective(const Block & sample_block) const override
|
||||
{
|
||||
return isDictGetFunctionInjective(dictionaries_loader, sample_block);
|
||||
}
|
||||
@ -1486,7 +1486,7 @@ private:
|
||||
bool useDefaultImplementationForConstants() const final { return true; }
|
||||
ColumnNumbers getArgumentsThatAreAlwaysConstant() const final { return {0, 1}; }
|
||||
|
||||
bool isInjective(const Block & sample_block) override
|
||||
bool isInjective(const Block & sample_block) const override
|
||||
{
|
||||
return isDictGetFunctionInjective(dictionaries_loader, sample_block);
|
||||
}
|
||||
@ -1627,7 +1627,7 @@ public:
|
||||
|
||||
private:
|
||||
size_t getNumberOfArguments() const override { return 2; }
|
||||
bool isInjective(const Block & /*sample_block*/) override { return true; }
|
||||
bool isInjective(const Block & /*sample_block*/) const override { return true; }
|
||||
|
||||
bool useDefaultImplementationForConstants() const final { return true; }
|
||||
ColumnNumbers getArgumentsThatAreAlwaysConstant() const final { return {0}; }
|
||||
|
@ -42,7 +42,7 @@ public:
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override { return 1; }
|
||||
bool isInjective(const Block &) override { return true; }
|
||||
bool isInjective(const Block &) const override { return true; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
|
@ -134,7 +134,7 @@ public:
|
||||
*
|
||||
* sample_block should contain data types of arguments and values of constants, if relevant.
|
||||
*/
|
||||
virtual bool isInjective(const Block & /*sample_block*/) { return false; }
|
||||
virtual bool isInjective(const Block & /*sample_block*/) const { return false; }
|
||||
|
||||
/** Function is called "deterministic", if it returns same result for same values of arguments.
|
||||
* Most of functions are deterministic. Notable counterexample is rand().
|
||||
@ -189,6 +189,7 @@ public:
|
||||
/// See the comment for the same method in IFunctionBase
|
||||
virtual bool isDeterministic() const = 0;
|
||||
virtual bool isDeterministicInScopeOfQuery() const = 0;
|
||||
virtual bool isInjective(const Block &) const = 0;
|
||||
|
||||
/// Override and return true if function needs to depend on the state of the data.
|
||||
virtual bool isStateful() const = 0;
|
||||
|
@ -68,7 +68,7 @@ public:
|
||||
return impl->getResultIfAlwaysReturnsConstantAndHasArguments(block, arguments);
|
||||
}
|
||||
|
||||
bool isInjective(const Block & sample_block) final { return impl->isInjective(sample_block); }
|
||||
bool isInjective(const Block & sample_block) const final { return impl->isInjective(sample_block); }
|
||||
bool isDeterministic() const final { return impl->isDeterministic(); }
|
||||
bool isDeterministicInScopeOfQuery() const final { return impl->isDeterministicInScopeOfQuery(); }
|
||||
bool hasInformationAboutMonotonicity() const final { return impl->hasInformationAboutMonotonicity(); }
|
||||
@ -96,6 +96,8 @@ public:
|
||||
|
||||
bool isDeterministicInScopeOfQuery() const final { return impl->isDeterministicInScopeOfQuery(); }
|
||||
|
||||
bool isInjective(const Block & block) const final { return impl->isInjective(block); }
|
||||
|
||||
bool isStateful() const final { return impl->isStateful(); }
|
||||
|
||||
bool isVariadic() const final { return impl->isVariadic(); }
|
||||
@ -195,7 +197,7 @@ public:
|
||||
|
||||
bool isStateful() const override { return function->isStateful(); }
|
||||
|
||||
bool isInjective(const Block & sample_block) override { return function->isInjective(sample_block); }
|
||||
bool isInjective(const Block & sample_block) const override { return function->isInjective(sample_block); }
|
||||
|
||||
bool isDeterministic() const override { return function->isDeterministic(); }
|
||||
|
||||
@ -226,6 +228,7 @@ public:
|
||||
|
||||
bool isDeterministic() const override { return function->isDeterministic(); }
|
||||
bool isDeterministicInScopeOfQuery() const override { return function->isDeterministicInScopeOfQuery(); }
|
||||
bool isInjective(const Block &block) const override { return function->isInjective(block); }
|
||||
|
||||
String getName() const override { return function->getName(); }
|
||||
bool isStateful() const override { return function->isStateful(); }
|
||||
|
@ -107,7 +107,7 @@ public:
|
||||
virtual bool isSuitableForConstantFolding() const { return true; }
|
||||
virtual ColumnPtr getResultIfAlwaysReturnsConstantAndHasArguments(const Block & /*block*/, const ColumnNumbers & /*arguments*/) const { return nullptr; }
|
||||
|
||||
virtual bool isInjective(const Block & /*sample_block*/) { return false; }
|
||||
virtual bool isInjective(const Block & /*sample_block*/) const { return false; }
|
||||
virtual bool isDeterministic() const { return true; }
|
||||
virtual bool isDeterministicInScopeOfQuery() const { return true; }
|
||||
virtual bool hasInformationAboutMonotonicity() const { return false; }
|
||||
@ -152,6 +152,7 @@ public:
|
||||
/// Properties from IFunctionOverloadResolver. See comments in IFunction.h
|
||||
virtual bool isDeterministic() const { return true; }
|
||||
virtual bool isDeterministicInScopeOfQuery() const { return true; }
|
||||
virtual bool isInjective(const Block &) const { return false; }
|
||||
virtual bool isStateful() const { return false; }
|
||||
virtual bool isVariadic() const { return false; }
|
||||
|
||||
@ -256,7 +257,7 @@ public:
|
||||
/// Properties from IFunctionBase (see IFunction.h)
|
||||
virtual bool isSuitableForConstantFolding() const { return true; }
|
||||
virtual ColumnPtr getResultIfAlwaysReturnsConstantAndHasArguments(const Block & /*block*/, const ColumnNumbers & /*arguments*/) const { return nullptr; }
|
||||
virtual bool isInjective(const Block & /*sample_block*/) { return false; }
|
||||
virtual bool isInjective(const Block & /*sample_block*/) const { return false; }
|
||||
virtual bool isDeterministic() const { return true; }
|
||||
virtual bool isDeterministicInScopeOfQuery() const { return true; }
|
||||
virtual bool isStateful() const { return false; }
|
||||
|
140
src/Functions/LeastGreatestGeneric.h
Normal file
140
src/Functions/LeastGreatestGeneric.h
Normal file
@ -0,0 +1,140 @@
|
||||
#pragma once
|
||||
|
||||
#include <DataTypes/getLeastSupertype.h>
|
||||
#include <DataTypes/NumberTraits.h>
|
||||
#include <Interpreters/castColumn.h>
|
||||
#include <Columns/ColumnsNumber.h>
|
||||
#include <Functions/IFunctionImpl.h>
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <ext/map.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
|
||||
}
|
||||
|
||||
|
||||
enum class LeastGreatest
|
||||
{
|
||||
Least,
|
||||
Greatest
|
||||
};
|
||||
|
||||
|
||||
template <LeastGreatest kind>
|
||||
class FunctionLeastGreatestGeneric : public IFunction
|
||||
{
|
||||
public:
|
||||
static constexpr auto name = kind == LeastGreatest::Least ? "least" : "greatest";
|
||||
static FunctionPtr create(const Context &) { return std::make_shared<FunctionLeastGreatestGeneric<kind>>(); }
|
||||
|
||||
private:
|
||||
String getName() const override { return name; }
|
||||
size_t getNumberOfArguments() const override { return 0; }
|
||||
bool isVariadic() const override { return true; }
|
||||
bool useDefaultImplementationForConstants() const override { return true; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & types) const override
|
||||
{
|
||||
if (types.empty())
|
||||
throw Exception("Function " + getName() + " cannot be called without arguments", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
|
||||
|
||||
return getLeastSupertype(types);
|
||||
}
|
||||
|
||||
void executeImpl(Block & block, const ColumnNumbers & arguments, size_t result, size_t input_rows_count) override
|
||||
{
|
||||
size_t num_arguments = arguments.size();
|
||||
if (1 == num_arguments)
|
||||
{
|
||||
block.getByPosition(result).column = block.getByPosition(arguments[0]).column;
|
||||
return;
|
||||
}
|
||||
|
||||
auto result_type = block.getByPosition(result).type;
|
||||
|
||||
Columns converted_columns(num_arguments);
|
||||
for (size_t arg = 0; arg < num_arguments; ++arg)
|
||||
converted_columns[arg] = castColumn(block.getByPosition(arguments[arg]), result_type)->convertToFullColumnIfConst();
|
||||
|
||||
auto result_column = result_type->createColumn();
|
||||
result_column->reserve(input_rows_count);
|
||||
|
||||
for (size_t row_num = 0; row_num < input_rows_count; ++row_num)
|
||||
{
|
||||
size_t best_arg = 0;
|
||||
for (size_t arg = 1; arg < num_arguments; ++arg)
|
||||
{
|
||||
auto cmp_result = converted_columns[arg]->compareAt(row_num, row_num, *converted_columns[best_arg], 1);
|
||||
|
||||
if constexpr (kind == LeastGreatest::Least)
|
||||
{
|
||||
if (cmp_result < 0)
|
||||
best_arg = arg;
|
||||
}
|
||||
else
|
||||
{
|
||||
if (cmp_result > 0)
|
||||
best_arg = arg;
|
||||
}
|
||||
}
|
||||
|
||||
result_column->insertFrom(*converted_columns[best_arg], row_num);
|
||||
}
|
||||
|
||||
block.getByPosition(result).column = std::move(result_column);
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
template <LeastGreatest kind, typename SpecializedFunction>
|
||||
class LeastGreatestOverloadResolver : public IFunctionOverloadResolverImpl
|
||||
{
|
||||
public:
|
||||
static constexpr auto name = kind == LeastGreatest::Least ? "least" : "greatest";
|
||||
|
||||
static FunctionOverloadResolverImplPtr create(const Context & context)
|
||||
{
|
||||
return std::make_unique<LeastGreatestOverloadResolver<kind, SpecializedFunction>>(context);
|
||||
}
|
||||
|
||||
explicit LeastGreatestOverloadResolver(const Context & context_) : context(context_) {}
|
||||
|
||||
String getName() const override { return name; }
|
||||
size_t getNumberOfArguments() const override { return 0; }
|
||||
bool isVariadic() const override { return true; }
|
||||
|
||||
FunctionBaseImplPtr build(const ColumnsWithTypeAndName & arguments, const DataTypePtr & return_type) const override
|
||||
{
|
||||
DataTypes argument_types;
|
||||
|
||||
/// More efficient specialization for two numeric arguments.
|
||||
if (arguments.size() == 2 && isNumber(arguments[0].type) && isNumber(arguments[1].type))
|
||||
return std::make_unique<DefaultFunction>(SpecializedFunction::create(context), argument_types, return_type);
|
||||
|
||||
return std::make_unique<DefaultFunction>(
|
||||
FunctionLeastGreatestGeneric<kind>::create(context), argument_types, return_type);
|
||||
}
|
||||
|
||||
DataTypePtr getReturnType(const DataTypes & types) const override
|
||||
{
|
||||
if (types.empty())
|
||||
throw Exception("Function " + getName() + " cannot be called without arguments", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH);
|
||||
|
||||
if (types.size() == 2 && isNumber(types[0]) && isNumber(types[1]))
|
||||
return SpecializedFunction::create(context)->getReturnTypeImpl(types);
|
||||
|
||||
return getLeastSupertype(types);
|
||||
}
|
||||
|
||||
private:
|
||||
const Context & context;
|
||||
};
|
||||
|
||||
}
|
||||
|
||||
|
@ -41,7 +41,7 @@ public:
|
||||
|
||||
size_t getNumberOfArguments() const override { return 0; }
|
||||
|
||||
bool isInjective(const Block &) override { return is_injective; }
|
||||
bool isInjective(const Block &) const override { return is_injective; }
|
||||
|
||||
bool useDefaultImplementationForConstants() const override { return true; }
|
||||
|
||||
|
@ -1,6 +1,8 @@
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/FunctionBinaryArithmetic.h>
|
||||
#include <Core/AccurateComparison.h>
|
||||
#include <Functions/LeastGreatestGeneric.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
@ -57,7 +59,7 @@ using FunctionGreatest = FunctionBinaryArithmetic<GreatestImpl, NameGreatest>;
|
||||
|
||||
void registerFunctionGreatest(FunctionFactory & factory)
|
||||
{
|
||||
factory.registerFunction<FunctionGreatest>(FunctionFactory::CaseInsensitive);
|
||||
factory.registerFunction<LeastGreatestOverloadResolver<LeastGreatest::Greatest, FunctionGreatest>>(FunctionFactory::CaseInsensitive);
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -1,6 +1,8 @@
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/FunctionBinaryArithmetic.h>
|
||||
#include <Core/AccurateComparison.h>
|
||||
#include <Functions/LeastGreatestGeneric.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
@ -57,7 +59,7 @@ using FunctionLeast = FunctionBinaryArithmetic<LeastImpl, NameLeast>;
|
||||
|
||||
void registerFunctionLeast(FunctionFactory & factory)
|
||||
{
|
||||
factory.registerFunction<FunctionLeast>(FunctionFactory::CaseInsensitive);
|
||||
factory.registerFunction<LeastGreatestOverloadResolver<LeastGreatest::Least, FunctionLeast>>(FunctionFactory::CaseInsensitive);
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -71,7 +71,7 @@ public:
|
||||
return 1;
|
||||
}
|
||||
|
||||
bool isInjective(const Block &) override
|
||||
bool isInjective(const Block &) const override
|
||||
{
|
||||
return true;
|
||||
}
|
||||
|
@ -43,7 +43,7 @@ public:
|
||||
return 0;
|
||||
}
|
||||
|
||||
bool isInjective(const Block &) override
|
||||
bool isInjective(const Block &) const override
|
||||
{
|
||||
return true;
|
||||
}
|
||||
|
@ -317,9 +317,11 @@ struct ContextShared
|
||||
MergeList merge_list; /// The list of executable merge (for (Replicated)?MergeTree)
|
||||
ConfigurationPtr users_config; /// Config with the users, profiles and quotas sections.
|
||||
InterserverIOHandler interserver_io_handler; /// Handler for interserver communication.
|
||||
std::optional<BackgroundSchedulePool> buffer_flush_schedule_pool; /// A thread pool that can do background flush for Buffer tables.
|
||||
std::optional<BackgroundProcessingPool> background_pool; /// The thread pool for the background work performed by the tables.
|
||||
std::optional<BackgroundProcessingPool> background_move_pool; /// The thread pool for the background moves performed by the tables.
|
||||
std::optional<BackgroundSchedulePool> schedule_pool; /// A thread pool that can run different jobs in background (used in replicated tables)
|
||||
std::optional<BackgroundSchedulePool> distributed_schedule_pool; /// A thread pool that can run different jobs in background (used for distributed sends)
|
||||
MultiVersion<Macros> macros; /// Substitutions extracted from config.
|
||||
std::unique_ptr<DDLWorker> ddl_worker; /// Process ddl commands from zk.
|
||||
/// Rules for selecting the compression settings, depending on the size of the part.
|
||||
@ -413,9 +415,11 @@ struct ContextShared
|
||||
embedded_dictionaries.reset();
|
||||
external_dictionaries_loader.reset();
|
||||
external_models_loader.reset();
|
||||
buffer_flush_schedule_pool.reset();
|
||||
background_pool.reset();
|
||||
background_move_pool.reset();
|
||||
schedule_pool.reset();
|
||||
distributed_schedule_pool.reset();
|
||||
ddl_worker.reset();
|
||||
|
||||
/// Stop trace collector if any
|
||||
@ -1330,6 +1334,14 @@ BackgroundProcessingPool & Context::getBackgroundMovePool()
|
||||
return *shared->background_move_pool;
|
||||
}
|
||||
|
||||
BackgroundSchedulePool & Context::getBufferFlushSchedulePool()
|
||||
{
|
||||
auto lock = getLock();
|
||||
if (!shared->buffer_flush_schedule_pool)
|
||||
shared->buffer_flush_schedule_pool.emplace(settings.background_buffer_flush_schedule_pool_size);
|
||||
return *shared->buffer_flush_schedule_pool;
|
||||
}
|
||||
|
||||
BackgroundSchedulePool & Context::getSchedulePool()
|
||||
{
|
||||
auto lock = getLock();
|
||||
@ -1338,6 +1350,14 @@ BackgroundSchedulePool & Context::getSchedulePool()
|
||||
return *shared->schedule_pool;
|
||||
}
|
||||
|
||||
BackgroundSchedulePool & Context::getDistributedSchedulePool()
|
||||
{
|
||||
auto lock = getLock();
|
||||
if (!shared->distributed_schedule_pool)
|
||||
shared->distributed_schedule_pool.emplace(settings.background_distributed_schedule_pool_size);
|
||||
return *shared->distributed_schedule_pool;
|
||||
}
|
||||
|
||||
void Context::setDDLWorker(std::unique_ptr<DDLWorker> ddl_worker)
|
||||
{
|
||||
auto lock = getLock();
|
||||
|
@ -471,9 +471,11 @@ public:
|
||||
*/
|
||||
void dropCaches() const;
|
||||
|
||||
BackgroundSchedulePool & getBufferFlushSchedulePool();
|
||||
BackgroundProcessingPool & getBackgroundPool();
|
||||
BackgroundProcessingPool & getBackgroundMovePool();
|
||||
BackgroundSchedulePool & getSchedulePool();
|
||||
BackgroundSchedulePool & getDistributedSchedulePool();
|
||||
|
||||
void setDDLWorker(std::unique_ptr<DDLWorker> ddl_worker);
|
||||
DDLWorker & getDDLWorker() const;
|
||||
|
167
src/Interpreters/DictionaryReader.cpp
Normal file
167
src/Interpreters/DictionaryReader.cpp
Normal file
@ -0,0 +1,167 @@
|
||||
#include <Interpreters/DictionaryReader.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Common/typeid_cast.h>
|
||||
#include <Columns/ColumnString.h>
|
||||
#include <Columns/ColumnConst.h>
|
||||
#include <DataTypes/DataTypeNullable.h>
|
||||
#include <DataTypes/DataTypeString.h>
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
#include <Functions/FunctionFactory.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int NUMBER_OF_COLUMNS_DOESNT_MATCH;
|
||||
extern const int TYPE_MISMATCH;
|
||||
}
|
||||
|
||||
|
||||
DictionaryReader::FunctionWrapper::FunctionWrapper(FunctionOverloadResolverPtr resolver, const ColumnsWithTypeAndName & arguments,
|
||||
Block & block, const ColumnNumbers & arg_positions_, const String & column_name,
|
||||
TypeIndex expected_type)
|
||||
: arg_positions(arg_positions_)
|
||||
, result_pos(block.columns())
|
||||
{
|
||||
FunctionBasePtr prepared_function = resolver->build(arguments);
|
||||
|
||||
ColumnWithTypeAndName result;
|
||||
result.name = "get_" + column_name;
|
||||
result.type = prepared_function->getReturnType();
|
||||
if (result.type->getTypeId() != expected_type)
|
||||
throw Exception("Type mismatch in dictionary reader for: " + column_name, ErrorCodes::TYPE_MISMATCH);
|
||||
block.insert(result);
|
||||
|
||||
function = prepared_function->prepare(block, arg_positions, result_pos);
|
||||
}
|
||||
|
||||
static constexpr const size_t key_size = 1;
|
||||
|
||||
DictionaryReader::DictionaryReader(const String & dictionary_name, const Names & src_column_names, const NamesAndTypesList & result_columns,
|
||||
const Context & context)
|
||||
: result_header(makeResultBlock(result_columns))
|
||||
, key_position(key_size + result_header.columns())
|
||||
{
|
||||
if (src_column_names.size() != result_columns.size())
|
||||
throw Exception("Columns number mismatch in dictionary reader", ErrorCodes::NUMBER_OF_COLUMNS_DOESNT_MATCH);
|
||||
|
||||
ColumnWithTypeAndName dict_name;
|
||||
ColumnWithTypeAndName key;
|
||||
ColumnWithTypeAndName column_name;
|
||||
|
||||
{
|
||||
dict_name.name = "dict";
|
||||
dict_name.type = std::make_shared<DataTypeString>();
|
||||
dict_name.column = dict_name.type->createColumnConst(1, dictionary_name);
|
||||
|
||||
/// TODO: composite key (key_size > 1)
|
||||
key.name = "key";
|
||||
key.type = std::make_shared<DataTypeUInt64>();
|
||||
|
||||
column_name.name = "column";
|
||||
column_name.type = std::make_shared<DataTypeString>();
|
||||
}
|
||||
|
||||
/// dictHas('dict_name', id)
|
||||
ColumnsWithTypeAndName arguments_has;
|
||||
arguments_has.push_back(dict_name);
|
||||
arguments_has.push_back(key);
|
||||
|
||||
/// dictGet('dict_name', 'attr_name', id)
|
||||
ColumnsWithTypeAndName arguments_get;
|
||||
arguments_get.push_back(dict_name);
|
||||
arguments_get.push_back(column_name);
|
||||
arguments_get.push_back(key);
|
||||
|
||||
sample_block.insert(dict_name);
|
||||
|
||||
for (auto & columns_name : src_column_names)
|
||||
{
|
||||
ColumnWithTypeAndName name;
|
||||
name.name = "col_" + columns_name;
|
||||
name.type = std::make_shared<DataTypeString>();
|
||||
name.column = name.type->createColumnConst(1, columns_name);
|
||||
|
||||
sample_block.insert(name);
|
||||
}
|
||||
|
||||
sample_block.insert(key);
|
||||
|
||||
ColumnNumbers positions_has{0, key_position};
|
||||
function_has = std::make_unique<FunctionWrapper>(FunctionFactory::instance().get("dictHas", context),
|
||||
arguments_has, sample_block, positions_has, "has", DataTypeUInt8().getTypeId());
|
||||
functions_get.reserve(result_header.columns());
|
||||
|
||||
for (size_t i = 0; i < result_header.columns(); ++i)
|
||||
{
|
||||
size_t column_name_pos = key_size + i;
|
||||
auto & column = result_header.getByPosition(i);
|
||||
arguments_get[1].column = DataTypeString().createColumnConst(1, src_column_names[i]);
|
||||
ColumnNumbers positions_get{0, column_name_pos, key_position};
|
||||
functions_get.emplace_back(
|
||||
FunctionWrapper(FunctionFactory::instance().get("dictGet", context),
|
||||
arguments_get, sample_block, positions_get, column.name, column.type->getTypeId()));
|
||||
}
|
||||
}
|
||||
|
||||
void DictionaryReader::readKeys(const IColumn & keys, Block & out_block, ColumnVector<UInt8>::Container & found,
|
||||
std::vector<size_t> & positions) const
|
||||
{
|
||||
Block working_block = sample_block;
|
||||
size_t has_position = key_position + 1;
|
||||
size_t size = keys.size();
|
||||
|
||||
/// set keys for dictHas()
|
||||
ColumnWithTypeAndName & key_column = working_block.getByPosition(key_position);
|
||||
key_column.column = keys.cloneResized(size); /// just a copy we cannot avoid
|
||||
|
||||
/// calculate and extract dictHas()
|
||||
function_has->execute(working_block, size);
|
||||
ColumnWithTypeAndName & has_column = working_block.getByPosition(has_position);
|
||||
auto mutable_has = (*std::move(has_column.column)).mutate();
|
||||
found.swap(typeid_cast<ColumnVector<UInt8> &>(*mutable_has).getData());
|
||||
has_column.column = nullptr;
|
||||
|
||||
/// set mapping form source keys to resulting rows in output block
|
||||
positions.clear();
|
||||
positions.resize(size, 0);
|
||||
size_t pos = 0;
|
||||
for (size_t i = 0; i < size; ++i)
|
||||
if (found[i])
|
||||
positions[i] = pos++;
|
||||
|
||||
/// set keys for dictGet(): remove not found keys
|
||||
key_column.column = key_column.column->filter(found, -1);
|
||||
size_t rows = key_column.column->size();
|
||||
|
||||
/// calculate dictGet()
|
||||
for (auto & func : functions_get)
|
||||
func.execute(working_block, rows);
|
||||
|
||||
/// make result: copy header block with correct names and move data columns
|
||||
out_block = result_header.cloneEmpty();
|
||||
size_t first_get_position = has_position + 1;
|
||||
for (size_t i = 0; i < out_block.columns(); ++i)
|
||||
{
|
||||
auto & src_column = working_block.getByPosition(first_get_position + i);
|
||||
auto & dst_column = out_block.getByPosition(i);
|
||||
dst_column.column = src_column.column;
|
||||
src_column.column = nullptr;
|
||||
}
|
||||
}
|
||||
|
||||
Block DictionaryReader::makeResultBlock(const NamesAndTypesList & names)
|
||||
{
|
||||
Block block;
|
||||
for (auto & nm : names)
|
||||
{
|
||||
ColumnWithTypeAndName column{nullptr, nm.type, nm.name};
|
||||
if (column.type->isNullable())
|
||||
column.type = typeid_cast<const DataTypeNullable &>(*column.type).getNestedType();
|
||||
block.insert(std::move(column));
|
||||
}
|
||||
return block;
|
||||
}
|
||||
|
||||
}
|
46
src/Interpreters/DictionaryReader.h
Normal file
46
src/Interpreters/DictionaryReader.h
Normal file
@ -0,0 +1,46 @@
|
||||
#pragma once
|
||||
|
||||
#include <Core/Block.h>
|
||||
#include <Columns/ColumnVector.h>
|
||||
#include <Functions/IFunctionAdaptors.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
class Context;
|
||||
|
||||
/// Read block of required columns from Dictionary by UInt64 key column. Rename columns if needed.
|
||||
/// Current implementation uses dictHas() + N * dictGet() functions.
|
||||
class DictionaryReader
|
||||
{
|
||||
public:
|
||||
struct FunctionWrapper
|
||||
{
|
||||
ExecutableFunctionPtr function;
|
||||
ColumnNumbers arg_positions;
|
||||
size_t result_pos = 0;
|
||||
|
||||
FunctionWrapper(FunctionOverloadResolverPtr resolver, const ColumnsWithTypeAndName & arguments, Block & block,
|
||||
const ColumnNumbers & arg_positions_, const String & column_name, TypeIndex expected_type);
|
||||
|
||||
void execute(Block & block, size_t rows) const
|
||||
{
|
||||
function->execute(block, arg_positions, result_pos, rows, false);
|
||||
}
|
||||
};
|
||||
|
||||
DictionaryReader(const String & dictionary_name, const Names & src_column_names, const NamesAndTypesList & result_columns,
|
||||
const Context & context);
|
||||
void readKeys(const IColumn & keys, Block & out_block, ColumnVector<UInt8>::Container & found, std::vector<size_t> & positions) const;
|
||||
|
||||
private:
|
||||
Block result_header;
|
||||
Block sample_block; /// dictionary name, column names, key, dictHas() result, dictGet() results
|
||||
size_t key_position;
|
||||
std::unique_ptr<FunctionWrapper> function_has;
|
||||
std::vector<FunctionWrapper> functions_get;
|
||||
|
||||
static Block makeResultBlock(const NamesAndTypesList & names);
|
||||
};
|
||||
|
||||
}
|
@ -31,17 +31,20 @@
|
||||
#include <Interpreters/JoinSwitcher.h>
|
||||
#include <Interpreters/HashJoin.h>
|
||||
#include <Interpreters/MergeJoin.h>
|
||||
#include <Interpreters/DictionaryReader.h>
|
||||
|
||||
#include <AggregateFunctions/AggregateFunctionFactory.h>
|
||||
#include <AggregateFunctions/parseAggregateFunctionParameters.h>
|
||||
|
||||
#include <Storages/StorageDistributed.h>
|
||||
#include <Storages/StorageDictionary.h>
|
||||
#include <Storages/StorageJoin.h>
|
||||
|
||||
#include <DataStreams/copyData.h>
|
||||
#include <DataStreams/IBlockInputStream.h>
|
||||
|
||||
#include <Dictionaries/IDictionary.h>
|
||||
#include <Dictionaries/DictionaryStructure.h>
|
||||
|
||||
#include <Common/typeid_cast.h>
|
||||
#include <Common/StringUtils/StringUtils.h>
|
||||
@ -502,25 +505,11 @@ bool SelectQueryExpressionAnalyzer::appendJoin(ExpressionActionsChain & chain, b
|
||||
return true;
|
||||
}
|
||||
|
||||
static JoinPtr tryGetStorageJoin(const ASTTablesInSelectQueryElement & join_element, std::shared_ptr<TableJoin> analyzed_join,
|
||||
const Context & context)
|
||||
static JoinPtr tryGetStorageJoin(std::shared_ptr<TableJoin> analyzed_join)
|
||||
{
|
||||
const auto & table_to_join = join_element.table_expression->as<ASTTableExpression &>();
|
||||
|
||||
/// TODO This syntax does not support specifying a database name.
|
||||
if (table_to_join.database_and_table_name)
|
||||
{
|
||||
auto table_id = context.resolveStorageID(table_to_join.database_and_table_name);
|
||||
StoragePtr table = DatabaseCatalog::instance().tryGetTable(table_id);
|
||||
|
||||
if (table)
|
||||
{
|
||||
auto * storage_join = dynamic_cast<StorageJoin *>(table.get());
|
||||
if (storage_join)
|
||||
return storage_join->getJoin(analyzed_join);
|
||||
}
|
||||
}
|
||||
|
||||
if (auto * table = analyzed_join->joined_storage.get())
|
||||
if (auto * storage_join = dynamic_cast<StorageJoin *>(table))
|
||||
return storage_join->getJoin(analyzed_join);
|
||||
return {};
|
||||
}
|
||||
|
||||
@ -531,10 +520,44 @@ static ExpressionActionsPtr createJoinedBlockActions(const Context & context, co
|
||||
return ExpressionAnalyzer(expression_list, syntax_result, context).getActions(true, false);
|
||||
}
|
||||
|
||||
static std::shared_ptr<IJoin> makeJoin(std::shared_ptr<TableJoin> analyzed_join, const Block & sample_block)
|
||||
static bool allowDictJoin(StoragePtr joined_storage, const Context & context, String & dict_name, String & key_name)
|
||||
{
|
||||
auto * dict = dynamic_cast<const StorageDictionary *>(joined_storage.get());
|
||||
if (!dict)
|
||||
return false;
|
||||
|
||||
dict_name = dict->dictionaryName();
|
||||
auto dictionary = context.getExternalDictionariesLoader().getDictionary(dict_name);
|
||||
if (!dictionary)
|
||||
return false;
|
||||
|
||||
const DictionaryStructure & structure = dictionary->getStructure();
|
||||
if (structure.id)
|
||||
{
|
||||
key_name = structure.id->name;
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
static std::shared_ptr<IJoin> makeJoin(std::shared_ptr<TableJoin> analyzed_join, const Block & sample_block, const Context & context)
|
||||
{
|
||||
bool allow_merge_join = analyzed_join->allowMergeJoin();
|
||||
|
||||
/// HashJoin with Dictionary optimisation
|
||||
String dict_name;
|
||||
String key_name;
|
||||
if (analyzed_join->joined_storage && allowDictJoin(analyzed_join->joined_storage, context, dict_name, key_name))
|
||||
{
|
||||
Names original_names;
|
||||
NamesAndTypesList result_columns;
|
||||
if (analyzed_join->allowDictJoin(key_name, sample_block, original_names, result_columns))
|
||||
{
|
||||
analyzed_join->dictionary_reader = std::make_shared<DictionaryReader>(dict_name, original_names, result_columns, context);
|
||||
return std::make_shared<HashJoin>(analyzed_join, sample_block);
|
||||
}
|
||||
}
|
||||
|
||||
if (analyzed_join->forceHashJoin() || (analyzed_join->preferMergeJoin() && !allow_merge_join))
|
||||
return std::make_shared<HashJoin>(analyzed_join, sample_block);
|
||||
else if (analyzed_join->forceMergeJoin() || (analyzed_join->preferMergeJoin() && allow_merge_join))
|
||||
@ -550,48 +573,49 @@ JoinPtr SelectQueryExpressionAnalyzer::makeTableJoin(const ASTTablesInSelectQuer
|
||||
|
||||
SubqueryForSet & subquery_for_join = subqueries_for_sets[join_subquery_id];
|
||||
|
||||
/// Special case - if table name is specified on the right of JOIN, then the table has the type Join (the previously prepared mapping).
|
||||
/// Use StorageJoin if any.
|
||||
if (!subquery_for_join.join)
|
||||
subquery_for_join.join = tryGetStorageJoin(join_element, syntax->analyzed_join, context);
|
||||
subquery_for_join.join = tryGetStorageJoin(syntax->analyzed_join);
|
||||
|
||||
if (!subquery_for_join.join)
|
||||
{
|
||||
/// Actions which need to be calculated on joined block.
|
||||
ExpressionActionsPtr joined_block_actions = createJoinedBlockActions(context, analyzedJoin());
|
||||
|
||||
Names original_right_columns;
|
||||
if (!subquery_for_join.source)
|
||||
{
|
||||
NamesWithAliases required_columns_with_aliases =
|
||||
analyzedJoin().getRequiredColumns(joined_block_actions->getSampleBlock(), joined_block_actions->getRequiredColumns());
|
||||
makeSubqueryForJoin(join_element, std::move(required_columns_with_aliases), subquery_for_join);
|
||||
NamesWithAliases required_columns_with_aliases = analyzedJoin().getRequiredColumns(
|
||||
joined_block_actions->getSampleBlock(), joined_block_actions->getRequiredColumns());
|
||||
for (auto & pr : required_columns_with_aliases)
|
||||
original_right_columns.push_back(pr.first);
|
||||
|
||||
/** For GLOBAL JOINs (in the case, for example, of the push method for executing GLOBAL subqueries), the following occurs
|
||||
* - in the addExternalStorage function, the JOIN (SELECT ...) subquery is replaced with JOIN _data1,
|
||||
* in the subquery_for_set object this subquery is exposed as source and the temporary table _data1 as the `table`.
|
||||
* - this function shows the expression JOIN _data1.
|
||||
*/
|
||||
auto interpreter = interpretSubquery(join_element.table_expression, context, original_right_columns, query_options);
|
||||
|
||||
subquery_for_join.makeSource(interpreter, std::move(required_columns_with_aliases));
|
||||
}
|
||||
|
||||
/// TODO You do not need to set this up when JOIN is only needed on remote servers.
|
||||
subquery_for_join.setJoinActions(joined_block_actions); /// changes subquery_for_join.sample_block inside
|
||||
subquery_for_join.join = makeJoin(syntax->analyzed_join, subquery_for_join.sample_block);
|
||||
subquery_for_join.join = makeJoin(syntax->analyzed_join, subquery_for_join.sample_block, context);
|
||||
|
||||
/// Do not make subquery for join over dictionary.
|
||||
if (syntax->analyzed_join->dictionary_reader)
|
||||
{
|
||||
JoinPtr join = subquery_for_join.join;
|
||||
subqueries_for_sets.erase(join_subquery_id);
|
||||
return join;
|
||||
}
|
||||
}
|
||||
|
||||
return subquery_for_join.join;
|
||||
}
|
||||
|
||||
void SelectQueryExpressionAnalyzer::makeSubqueryForJoin(const ASTTablesInSelectQueryElement & join_element,
|
||||
NamesWithAliases && required_columns_with_aliases,
|
||||
SubqueryForSet & subquery_for_set) const
|
||||
{
|
||||
/** For GLOBAL JOINs (in the case, for example, of the push method for executing GLOBAL subqueries), the following occurs
|
||||
* - in the addExternalStorage function, the JOIN (SELECT ...) subquery is replaced with JOIN _data1,
|
||||
* in the subquery_for_set object this subquery is exposed as source and the temporary table _data1 as the `table`.
|
||||
* - this function shows the expression JOIN _data1.
|
||||
*/
|
||||
Names original_columns;
|
||||
for (auto & pr : required_columns_with_aliases)
|
||||
original_columns.push_back(pr.first);
|
||||
|
||||
auto interpreter = interpretSubquery(join_element.table_expression, context, original_columns, query_options);
|
||||
|
||||
subquery_for_set.makeSource(interpreter, std::move(required_columns_with_aliases));
|
||||
}
|
||||
|
||||
bool SelectQueryExpressionAnalyzer::appendPrewhere(
|
||||
ExpressionActionsChain & chain, bool only_types, const Names & additional_required_columns)
|
||||
{
|
||||
|
@ -276,8 +276,6 @@ private:
|
||||
SetPtr isPlainStorageSetInSubquery(const ASTPtr & subquery_or_table_name);
|
||||
|
||||
JoinPtr makeTableJoin(const ASTTablesInSelectQueryElement & join_element);
|
||||
void makeSubqueryForJoin(const ASTTablesInSelectQueryElement & join_element, NamesWithAliases && required_columns_with_aliases,
|
||||
SubqueryForSet & subquery_for_set) const;
|
||||
|
||||
const ASTSelectQuery * getAggregatingQuery() const;
|
||||
|
||||
|
@ -510,7 +510,7 @@ bool LLVMFunction::isSuitableForConstantFolding() const
|
||||
return true;
|
||||
}
|
||||
|
||||
bool LLVMFunction::isInjective(const Block & sample_block)
|
||||
bool LLVMFunction::isInjective(const Block & sample_block) const
|
||||
{
|
||||
for (const auto & f : originals)
|
||||
if (!f->isInjective(sample_block))
|
||||
|
@ -53,7 +53,7 @@ public:
|
||||
|
||||
bool isSuitableForConstantFolding() const override;
|
||||
|
||||
bool isInjective(const Block & sample_block) override;
|
||||
bool isInjective(const Block & sample_block) const override;
|
||||
|
||||
bool hasInformationAboutMonotonicity() const override;
|
||||
|
||||
|
@ -1,5 +1,6 @@
|
||||
#include <Interpreters/ExternalDictionariesLoader.h>
|
||||
#include <Dictionaries/DictionaryFactory.h>
|
||||
#include <Dictionaries/DictionaryStructure.h>
|
||||
|
||||
#if !defined(ARCADIA_BUILD)
|
||||
# include "config_core.h"
|
||||
@ -33,6 +34,19 @@ ExternalLoader::LoadablePtr ExternalDictionariesLoader::create(
|
||||
return DictionaryFactory::instance().create(name, config, key_in_config, context, dictionary_from_database);
|
||||
}
|
||||
|
||||
|
||||
DictionaryStructure
|
||||
ExternalDictionariesLoader::getDictionaryStructure(const Poco::Util::AbstractConfiguration & config, const std::string & key_in_config)
|
||||
{
|
||||
return {config, key_in_config + ".structure"};
|
||||
}
|
||||
|
||||
DictionaryStructure ExternalDictionariesLoader::getDictionaryStructure(const ObjectConfig & config)
|
||||
{
|
||||
return getDictionaryStructure(*config.config, config.key_in_config);
|
||||
}
|
||||
|
||||
|
||||
void ExternalDictionariesLoader::resetAll()
|
||||
{
|
||||
#if USE_MYSQL
|
||||
|
@ -23,14 +23,14 @@ public:
|
||||
return std::static_pointer_cast<const IDictionaryBase>(load(name));
|
||||
}
|
||||
|
||||
DictPtr tryGetDictionary(const std::string & name, bool load) const
|
||||
DictPtr tryGetDictionary(const std::string & name) const
|
||||
{
|
||||
if (load)
|
||||
return std::static_pointer_cast<const IDictionaryBase>(tryLoad(name));
|
||||
else
|
||||
return std::static_pointer_cast<const IDictionaryBase>(getCurrentLoadResult(name).object);
|
||||
return std::static_pointer_cast<const IDictionaryBase>(tryLoad(name));
|
||||
}
|
||||
|
||||
static DictionaryStructure getDictionaryStructure(const Poco::Util::AbstractConfiguration & config, const std::string & key_in_config = "dictionary");
|
||||
static DictionaryStructure getDictionaryStructure(const ObjectConfig & config);
|
||||
|
||||
static void resetAll();
|
||||
|
||||
protected:
|
||||
|
@ -94,15 +94,6 @@ namespace
|
||||
};
|
||||
}
|
||||
|
||||
struct ExternalLoader::ObjectConfig
|
||||
{
|
||||
Poco::AutoPtr<Poco::Util::AbstractConfiguration> config;
|
||||
String key_in_config;
|
||||
String repository_name;
|
||||
bool from_temp_repository = false;
|
||||
String path;
|
||||
};
|
||||
|
||||
|
||||
/** Reads configurations from configuration repository and parses it.
|
||||
*/
|
||||
@ -141,7 +132,7 @@ public:
|
||||
settings = settings_;
|
||||
}
|
||||
|
||||
using ObjectConfigsPtr = std::shared_ptr<const std::unordered_map<String /* object's name */, ObjectConfig>>;
|
||||
using ObjectConfigsPtr = std::shared_ptr<const std::unordered_map<String /* object's name */, std::shared_ptr<const ObjectConfig>>>;
|
||||
|
||||
/// Reads all repositories.
|
||||
ObjectConfigsPtr read()
|
||||
@ -176,8 +167,9 @@ private:
|
||||
struct FileInfo
|
||||
{
|
||||
Poco::Timestamp last_update_time = 0;
|
||||
std::vector<std::pair<String, ObjectConfig>> objects; // Parsed contents of the file.
|
||||
bool in_use = true; // Whether the `FileInfo` should be destroyed because the correspondent file is deleted.
|
||||
Poco::AutoPtr<Poco::Util::AbstractConfiguration> file_contents; // Parsed contents of the file.
|
||||
std::unordered_map<String /* object name */, String /* key in file_contents */> objects;
|
||||
};
|
||||
|
||||
struct RepositoryInfo
|
||||
@ -280,14 +272,15 @@ private:
|
||||
}
|
||||
|
||||
LOG_TRACE(log, "Loading config file '" << path << "'.");
|
||||
auto file_contents = repository.load(path);
|
||||
file_info.file_contents = repository.load(path);
|
||||
auto & file_contents = *file_info.file_contents;
|
||||
|
||||
/// get all objects' definitions
|
||||
Poco::Util::AbstractConfiguration::Keys keys;
|
||||
file_contents->keys(keys);
|
||||
file_contents.keys(keys);
|
||||
|
||||
/// for each object defined in repositories
|
||||
std::vector<std::pair<String, ObjectConfig>> object_configs_from_file;
|
||||
std::unordered_map<String, String> objects;
|
||||
for (const auto & key : keys)
|
||||
{
|
||||
if (!startsWith(key, settings.external_config))
|
||||
@ -297,7 +290,7 @@ private:
|
||||
continue;
|
||||
}
|
||||
|
||||
String object_name = file_contents->getString(key + "." + settings.external_name);
|
||||
String object_name = file_contents.getString(key + "." + settings.external_name);
|
||||
if (object_name.empty())
|
||||
{
|
||||
LOG_WARNING(log, path << ": node '" << key << "' defines " << type_name << " with an empty name. It's not allowed");
|
||||
@ -306,14 +299,14 @@ private:
|
||||
|
||||
String database;
|
||||
if (!settings.external_database.empty())
|
||||
database = file_contents->getString(key + "." + settings.external_database, "");
|
||||
database = file_contents.getString(key + "." + settings.external_database, "");
|
||||
if (!database.empty())
|
||||
object_name = database + "." + object_name;
|
||||
|
||||
object_configs_from_file.emplace_back(object_name, ObjectConfig{file_contents, key, {}, {}, {}});
|
||||
objects.emplace(object_name, key);
|
||||
}
|
||||
|
||||
file_info.objects = std::move(object_configs_from_file);
|
||||
file_info.objects = std::move(objects);
|
||||
file_info.last_update_time = update_time_from_repository;
|
||||
file_info.in_use = true;
|
||||
return true;
|
||||
@ -333,33 +326,36 @@ private:
|
||||
need_collect_object_configs = false;
|
||||
|
||||
// Generate new result.
|
||||
auto new_configs = std::make_shared<std::unordered_map<String /* object's name */, ObjectConfig>>();
|
||||
auto new_configs = std::make_shared<std::unordered_map<String /* object's name */, std::shared_ptr<const ObjectConfig>>>();
|
||||
|
||||
for (const auto & [repository, repository_info] : repositories)
|
||||
{
|
||||
for (const auto & [path, file_info] : repository_info.files)
|
||||
{
|
||||
for (const auto & [object_name, object_config] : file_info.objects)
|
||||
for (const auto & [object_name, key_in_config] : file_info.objects)
|
||||
{
|
||||
auto already_added_it = new_configs->find(object_name);
|
||||
if (already_added_it == new_configs->end())
|
||||
{
|
||||
auto & new_config = new_configs->emplace(object_name, object_config).first->second;
|
||||
new_config.from_temp_repository = repository->isTemporary();
|
||||
new_config.repository_name = repository->getName();
|
||||
new_config.path = path;
|
||||
auto new_config = std::make_shared<ObjectConfig>();
|
||||
new_config->config = file_info.file_contents;
|
||||
new_config->key_in_config = key_in_config;
|
||||
new_config->repository_name = repository->getName();
|
||||
new_config->from_temp_repository = repository->isTemporary();
|
||||
new_config->path = path;
|
||||
new_configs->emplace(object_name, std::move(new_config));
|
||||
}
|
||||
else
|
||||
{
|
||||
const auto & already_added = already_added_it->second;
|
||||
if (!already_added.from_temp_repository && !repository->isTemporary())
|
||||
if (!already_added->from_temp_repository && !repository->isTemporary())
|
||||
{
|
||||
LOG_WARNING(
|
||||
log,
|
||||
type_name << " '" << object_name << "' is found "
|
||||
<< (((path == already_added.path) && (repository->getName() == already_added.repository_name))
|
||||
<< (((path == already_added->path) && (repository->getName() == already_added->repository_name))
|
||||
? ("twice in the same file '" + path + "'")
|
||||
: ("both in file '" + already_added.path + "' and '" + path + "'")));
|
||||
: ("both in file '" + already_added->path + "' and '" + path + "'")));
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -440,13 +436,10 @@ public:
|
||||
else
|
||||
{
|
||||
const auto & new_config = new_config_it->second;
|
||||
bool config_is_same = isSameConfiguration(*info.object_config.config, info.object_config.key_in_config, *new_config.config, new_config.key_in_config);
|
||||
info.object_config = new_config;
|
||||
bool config_is_same = isSameConfiguration(*info.config->config, info.config->key_in_config, *new_config->config, new_config->key_in_config);
|
||||
info.config = new_config;
|
||||
if (!config_is_same)
|
||||
{
|
||||
/// Configuration has been changed.
|
||||
info.object_config = new_config;
|
||||
|
||||
if (info.triedToLoad())
|
||||
{
|
||||
/// The object has been tried to load before, so it is currently in use or was in use
|
||||
@ -531,7 +524,7 @@ public:
|
||||
|
||||
/// Returns the load result of the object.
|
||||
template <typename ReturnType>
|
||||
ReturnType getCurrentLoadResult(const String & name) const
|
||||
ReturnType getLoadResult(const String & name) const
|
||||
{
|
||||
std::lock_guard lock{mutex};
|
||||
const Info * info = getInfo(name);
|
||||
@ -543,13 +536,13 @@ public:
|
||||
/// Returns all the load results as a map.
|
||||
/// The function doesn't load anything, it just returns the current load results as is.
|
||||
template <typename ReturnType>
|
||||
ReturnType getCurrentLoadResults(const FilterByNameFunction & filter) const
|
||||
ReturnType getLoadResults(const FilterByNameFunction & filter) const
|
||||
{
|
||||
std::lock_guard lock{mutex};
|
||||
return collectLoadResults<ReturnType>(filter);
|
||||
}
|
||||
|
||||
size_t getNumberOfCurrentlyLoadedObjects() const
|
||||
size_t getNumberOfLoadedObjects() const
|
||||
{
|
||||
std::lock_guard lock{mutex};
|
||||
size_t count = 0;
|
||||
@ -562,7 +555,7 @@ public:
|
||||
return count;
|
||||
}
|
||||
|
||||
bool hasCurrentlyLoadedObjects() const
|
||||
bool hasLoadedObjects() const
|
||||
{
|
||||
std::lock_guard lock{mutex};
|
||||
for (auto & name_info : infos)
|
||||
@ -581,6 +574,12 @@ public:
|
||||
return names;
|
||||
}
|
||||
|
||||
size_t getNumberOfObjects() const
|
||||
{
|
||||
std::lock_guard lock{mutex};
|
||||
return infos.size();
|
||||
}
|
||||
|
||||
/// Tries to load a specified object during the timeout.
|
||||
template <typename ReturnType>
|
||||
ReturnType tryLoad(const String & name, Duration timeout)
|
||||
@ -698,7 +697,7 @@ public:
|
||||
private:
|
||||
struct Info
|
||||
{
|
||||
Info(const String & name_, const ObjectConfig & object_config_) : name(name_), object_config(object_config_) {}
|
||||
Info(const String & name_, const std::shared_ptr<const ObjectConfig> & config_) : name(name_), config(config_) {}
|
||||
|
||||
bool loaded() const { return object != nullptr; }
|
||||
bool failed() const { return !object && exception; }
|
||||
@ -737,8 +736,7 @@ private:
|
||||
result.loading_start_time = loading_start_time;
|
||||
result.last_successful_update_time = last_successful_update_time;
|
||||
result.loading_duration = loadingDuration();
|
||||
result.origin = object_config.path;
|
||||
result.repository_name = object_config.repository_name;
|
||||
result.config = config;
|
||||
return result;
|
||||
}
|
||||
else
|
||||
@ -750,7 +748,7 @@ private:
|
||||
|
||||
String name;
|
||||
LoadablePtr object;
|
||||
ObjectConfig object_config;
|
||||
std::shared_ptr<const ObjectConfig> config;
|
||||
TimePoint loading_start_time;
|
||||
TimePoint loading_end_time;
|
||||
TimePoint last_successful_update_time;
|
||||
@ -784,7 +782,7 @@ private:
|
||||
results.reserve(infos.size());
|
||||
for (const auto & [name, info] : infos)
|
||||
{
|
||||
if (filter(name))
|
||||
if (!filter || filter(name))
|
||||
{
|
||||
auto result = info.template getLoadResult<typename ReturnType::value_type>();
|
||||
if constexpr (std::is_same_v<typename ReturnType::value_type, LoadablePtr>)
|
||||
@ -838,7 +836,7 @@ private:
|
||||
bool all_ready = true;
|
||||
for (auto & [name, info] : infos)
|
||||
{
|
||||
if (!filter(name))
|
||||
if (filter && !filter(name))
|
||||
continue;
|
||||
|
||||
if (info.state_id >= min_id)
|
||||
@ -955,7 +953,7 @@ private:
|
||||
previous_version_as_base_for_loading = nullptr; /// Need complete reloading, cannot use the previous version.
|
||||
|
||||
/// Loading.
|
||||
auto [new_object, new_exception] = loadSingleObject(name, info->object_config, previous_version_as_base_for_loading);
|
||||
auto [new_object, new_exception] = loadSingleObject(name, *info->config, previous_version_as_base_for_loading);
|
||||
if (!new_object && !new_exception)
|
||||
throw Exception("No object created and no exception raised for " + type_name, ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
@ -1296,9 +1294,9 @@ void ExternalLoader::enablePeriodicUpdates(bool enable_)
|
||||
periodic_updater->enable(enable_);
|
||||
}
|
||||
|
||||
bool ExternalLoader::hasCurrentlyLoadedObjects() const
|
||||
bool ExternalLoader::hasLoadedObjects() const
|
||||
{
|
||||
return loading_dispatcher->hasCurrentlyLoadedObjects();
|
||||
return loading_dispatcher->hasLoadedObjects();
|
||||
}
|
||||
|
||||
ExternalLoader::Status ExternalLoader::getCurrentStatus(const String & name) const
|
||||
@ -1307,30 +1305,35 @@ ExternalLoader::Status ExternalLoader::getCurrentStatus(const String & name) con
|
||||
}
|
||||
|
||||
template <typename ReturnType, typename>
|
||||
ReturnType ExternalLoader::getCurrentLoadResult(const String & name) const
|
||||
ReturnType ExternalLoader::getLoadResult(const String & name) const
|
||||
{
|
||||
return loading_dispatcher->getCurrentLoadResult<ReturnType>(name);
|
||||
return loading_dispatcher->getLoadResult<ReturnType>(name);
|
||||
}
|
||||
|
||||
template <typename ReturnType, typename>
|
||||
ReturnType ExternalLoader::getCurrentLoadResults(const FilterByNameFunction & filter) const
|
||||
ReturnType ExternalLoader::getLoadResults(const FilterByNameFunction & filter) const
|
||||
{
|
||||
return loading_dispatcher->getCurrentLoadResults<ReturnType>(filter);
|
||||
return loading_dispatcher->getLoadResults<ReturnType>(filter);
|
||||
}
|
||||
|
||||
ExternalLoader::Loadables ExternalLoader::getCurrentlyLoadedObjects() const
|
||||
ExternalLoader::Loadables ExternalLoader::getLoadedObjects() const
|
||||
{
|
||||
return getCurrentLoadResults<Loadables>();
|
||||
return getLoadResults<Loadables>();
|
||||
}
|
||||
|
||||
ExternalLoader::Loadables ExternalLoader::getCurrentlyLoadedObjects(const FilterByNameFunction & filter) const
|
||||
ExternalLoader::Loadables ExternalLoader::getLoadedObjects(const FilterByNameFunction & filter) const
|
||||
{
|
||||
return getCurrentLoadResults<Loadables>(filter);
|
||||
return getLoadResults<Loadables>(filter);
|
||||
}
|
||||
|
||||
size_t ExternalLoader::getNumberOfCurrentlyLoadedObjects() const
|
||||
size_t ExternalLoader::getNumberOfLoadedObjects() const
|
||||
{
|
||||
return loading_dispatcher->getNumberOfCurrentlyLoadedObjects();
|
||||
return loading_dispatcher->getNumberOfLoadedObjects();
|
||||
}
|
||||
|
||||
size_t ExternalLoader::getNumberOfObjects() const
|
||||
{
|
||||
return loading_dispatcher->getNumberOfObjects();
|
||||
}
|
||||
|
||||
template <typename ReturnType, typename>
|
||||
@ -1456,10 +1459,10 @@ ExternalLoader::LoadablePtr ExternalLoader::createObject(
|
||||
return create(name, *config.config, config.key_in_config, config.repository_name);
|
||||
}
|
||||
|
||||
template ExternalLoader::LoadablePtr ExternalLoader::getCurrentLoadResult<ExternalLoader::LoadablePtr>(const String &) const;
|
||||
template ExternalLoader::LoadResult ExternalLoader::getCurrentLoadResult<ExternalLoader::LoadResult>(const String &) const;
|
||||
template ExternalLoader::Loadables ExternalLoader::getCurrentLoadResults<ExternalLoader::Loadables>(const FilterByNameFunction &) const;
|
||||
template ExternalLoader::LoadResults ExternalLoader::getCurrentLoadResults<ExternalLoader::LoadResults>(const FilterByNameFunction &) const;
|
||||
template ExternalLoader::LoadablePtr ExternalLoader::getLoadResult<ExternalLoader::LoadablePtr>(const String &) const;
|
||||
template ExternalLoader::LoadResult ExternalLoader::getLoadResult<ExternalLoader::LoadResult>(const String &) const;
|
||||
template ExternalLoader::Loadables ExternalLoader::getLoadResults<ExternalLoader::Loadables>(const FilterByNameFunction &) const;
|
||||
template ExternalLoader::LoadResults ExternalLoader::getLoadResults<ExternalLoader::LoadResults>(const FilterByNameFunction &) const;
|
||||
|
||||
template ExternalLoader::LoadablePtr ExternalLoader::tryLoad<ExternalLoader::LoadablePtr>(const String &, Duration) const;
|
||||
template ExternalLoader::LoadResult ExternalLoader::tryLoad<ExternalLoader::LoadResult>(const String &, Duration) const;
|
||||
|
@ -53,17 +53,25 @@ public:
|
||||
using Duration = std::chrono::milliseconds;
|
||||
using TimePoint = std::chrono::system_clock::time_point;
|
||||
|
||||
struct ObjectConfig
|
||||
{
|
||||
Poco::AutoPtr<Poco::Util::AbstractConfiguration> config;
|
||||
String key_in_config;
|
||||
String repository_name;
|
||||
bool from_temp_repository = false;
|
||||
String path;
|
||||
};
|
||||
|
||||
struct LoadResult
|
||||
{
|
||||
Status status = Status::NOT_EXIST;
|
||||
String name;
|
||||
LoadablePtr object;
|
||||
String origin;
|
||||
TimePoint loading_start_time;
|
||||
TimePoint last_successful_update_time;
|
||||
Duration loading_duration;
|
||||
std::exception_ptr exception;
|
||||
std::string repository_name;
|
||||
std::shared_ptr<const ObjectConfig> config;
|
||||
};
|
||||
|
||||
using LoadResults = std::vector<LoadResult>;
|
||||
@ -99,26 +107,32 @@ public:
|
||||
/// Returns the result of loading the object.
|
||||
/// The function doesn't load anything, it just returns the current load result as is.
|
||||
template <typename ReturnType = LoadResult, typename = std::enable_if_t<is_scalar_load_result_type<ReturnType>, void>>
|
||||
ReturnType getCurrentLoadResult(const String & name) const;
|
||||
ReturnType getLoadResult(const String & name) const;
|
||||
|
||||
using FilterByNameFunction = std::function<bool(const String &)>;
|
||||
|
||||
/// Returns all the load results as a map.
|
||||
/// The function doesn't load anything, it just returns the current load results as is.
|
||||
template <typename ReturnType = LoadResults, typename = std::enable_if_t<is_vector_load_result_type<ReturnType>, void>>
|
||||
ReturnType getCurrentLoadResults() const { return getCurrentLoadResults<ReturnType>(alwaysTrue); }
|
||||
ReturnType getLoadResults() const { return getLoadResults<ReturnType>(FilterByNameFunction{}); }
|
||||
|
||||
template <typename ReturnType = LoadResults, typename = std::enable_if_t<is_vector_load_result_type<ReturnType>, void>>
|
||||
ReturnType getCurrentLoadResults(const FilterByNameFunction & filter) const;
|
||||
ReturnType getLoadResults(const FilterByNameFunction & filter) const;
|
||||
|
||||
/// Returns all loaded objects as a map.
|
||||
/// The function doesn't load anything, it just returns the current load results as is.
|
||||
Loadables getCurrentlyLoadedObjects() const;
|
||||
Loadables getCurrentlyLoadedObjects(const FilterByNameFunction & filter) const;
|
||||
Loadables getLoadedObjects() const;
|
||||
Loadables getLoadedObjects(const FilterByNameFunction & filter) const;
|
||||
|
||||
/// Returns true if any object was loaded.
|
||||
bool hasCurrentlyLoadedObjects() const;
|
||||
size_t getNumberOfCurrentlyLoadedObjects() const;
|
||||
bool hasLoadedObjects() const;
|
||||
size_t getNumberOfLoadedObjects() const;
|
||||
|
||||
/// Returns true if there is no object.
|
||||
bool hasObjects() const { return getNumberOfObjects() == 0; }
|
||||
|
||||
/// Returns number of objects.
|
||||
size_t getNumberOfObjects() const;
|
||||
|
||||
static constexpr Duration NO_WAIT = Duration::zero();
|
||||
static constexpr Duration WAIT = Duration::max();
|
||||
@ -139,7 +153,7 @@ public:
|
||||
/// The function does nothing for already loaded objects, it just returns them.
|
||||
/// The function doesn't throw an exception if it's failed to load something.
|
||||
template <typename ReturnType = Loadables, typename = std::enable_if_t<is_vector_load_result_type<ReturnType>, void>>
|
||||
ReturnType tryLoadAll(Duration timeout = WAIT) const { return tryLoad<ReturnType>(alwaysTrue, timeout); }
|
||||
ReturnType tryLoadAll(Duration timeout = WAIT) const { return tryLoad<ReturnType>(FilterByNameFunction{}, timeout); }
|
||||
|
||||
/// Loads a specified object.
|
||||
/// The function does nothing if it's already loaded.
|
||||
@ -157,7 +171,7 @@ public:
|
||||
/// The function does nothing for already loaded objects, it just returns them.
|
||||
/// The function throws an exception if it's failed to load something.
|
||||
template <typename ReturnType = Loadables, typename = std::enable_if_t<is_vector_load_result_type<ReturnType>, void>>
|
||||
ReturnType loadAll() const { return load<ReturnType>(alwaysTrue); }
|
||||
ReturnType loadAll() const { return load<ReturnType>(FilterByNameFunction{}); }
|
||||
|
||||
/// Loads or reloads a specified object.
|
||||
/// The function reloads the object if it's already loaded.
|
||||
@ -174,7 +188,7 @@ public:
|
||||
/// Load or reloads all objects. Not recommended to use.
|
||||
/// The function throws an exception if it's failed to load or reload something.
|
||||
template <typename ReturnType = Loadables, typename = std::enable_if_t<is_vector_load_result_type<ReturnType>, void>>
|
||||
ReturnType loadOrReloadAll() const { return loadOrReload<ReturnType>(alwaysTrue); }
|
||||
ReturnType loadOrReloadAll() const { return loadOrReload<ReturnType>(FilterByNameFunction{}); }
|
||||
|
||||
/// Reloads objects by filter which were tried to load before (successfully or not).
|
||||
/// The function throws an exception if it's failed to load or reload something.
|
||||
@ -197,10 +211,8 @@ private:
|
||||
void checkLoaded(const LoadResult & result, bool check_no_errors) const;
|
||||
void checkLoaded(const LoadResults & results, bool check_no_errors) const;
|
||||
|
||||
static bool alwaysTrue(const String &) { return true; }
|
||||
Strings getAllTriedToLoadNames() const;
|
||||
|
||||
struct ObjectConfig;
|
||||
LoadablePtr createObject(const String & name, const ObjectConfig & config, const LoadablePtr & previous_version) const;
|
||||
|
||||
class LoadablesConfigReader;
|
||||
|
@ -1,30 +1,30 @@
|
||||
#include <Interpreters/ExternalLoaderDatabaseConfigRepository.h>
|
||||
#include <Interpreters/ExternalDictionariesLoader.h>
|
||||
#include <Dictionaries/getDictionaryConfigurationFromAST.h>
|
||||
#include <Common/StringUtils/StringUtils.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int UNKNOWN_DICTIONARY;
|
||||
}
|
||||
|
||||
|
||||
namespace
|
||||
{
|
||||
String trimDatabaseName(const std::string & loadable_definition_name, const IDatabase & database)
|
||||
{
|
||||
const auto & dbname = database.getDatabaseName();
|
||||
if (!startsWith(loadable_definition_name, dbname))
|
||||
throw Exception(
|
||||
"Loadable '" + loadable_definition_name + "' is not from database '" + database.getDatabaseName(), ErrorCodes::UNKNOWN_DICTIONARY);
|
||||
/// dbname.loadable_name
|
||||
///--> remove <---
|
||||
return loadable_definition_name.substr(dbname.length() + 1);
|
||||
}
|
||||
String trimDatabaseName(const std::string & loadable_definition_name, const IDatabase & database)
|
||||
{
|
||||
const auto & dbname = database.getDatabaseName();
|
||||
if (!startsWith(loadable_definition_name, dbname))
|
||||
throw Exception(
|
||||
"Loadable '" + loadable_definition_name + "' is not from database '" + database.getDatabaseName(), ErrorCodes::UNKNOWN_DICTIONARY);
|
||||
/// dbname.loadable_name
|
||||
///--> remove <---
|
||||
return loadable_definition_name.substr(dbname.length() + 1);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
ExternalLoaderDatabaseConfigRepository::ExternalLoaderDatabaseConfigRepository(IDatabase & database_, const Context & context_)
|
||||
: name(database_.getDatabaseName())
|
||||
, database(database_)
|
||||
@ -34,8 +34,7 @@ ExternalLoaderDatabaseConfigRepository::ExternalLoaderDatabaseConfigRepository(I
|
||||
|
||||
LoadablesConfigurationPtr ExternalLoaderDatabaseConfigRepository::load(const std::string & loadable_definition_name)
|
||||
{
|
||||
String dictname = trimDatabaseName(loadable_definition_name, database);
|
||||
return getDictionaryConfigurationFromAST(database.getCreateDictionaryQuery(context, dictname)->as<const ASTCreateQuery &>());
|
||||
return database.getDictionaryConfiguration(trimDatabaseName(loadable_definition_name, database));
|
||||
}
|
||||
|
||||
bool ExternalLoaderDatabaseConfigRepository::exists(const std::string & loadable_definition_name)
|
||||
|
@ -4,16 +4,21 @@
|
||||
|
||||
#include <Columns/ColumnConst.h>
|
||||
#include <Columns/ColumnString.h>
|
||||
#include <Columns/ColumnVector.h>
|
||||
#include <Columns/ColumnFixedString.h>
|
||||
#include <Columns/ColumnNullable.h>
|
||||
|
||||
#include <DataTypes/DataTypeNullable.h>
|
||||
#include <DataTypes/DataTypeLowCardinality.h>
|
||||
|
||||
#include <Interpreters/HashJoin.h>
|
||||
#include <Interpreters/join_common.h>
|
||||
#include <Interpreters/TableJoin.h>
|
||||
#include <Interpreters/joinDispatch.h>
|
||||
#include <Interpreters/NullableUtils.h>
|
||||
#include <Interpreters/DictionaryReader.h>
|
||||
|
||||
#include <Storages/StorageDictionary.h>
|
||||
|
||||
#include <DataStreams/IBlockInputStream.h>
|
||||
#include <DataStreams/materializeBlock.h>
|
||||
@ -21,8 +26,6 @@
|
||||
#include <Core/ColumnNumbers.h>
|
||||
#include <Common/typeid_cast.h>
|
||||
#include <Common/assert_cast.h>
|
||||
#include <DataTypes/DataTypeLowCardinality.h>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
@ -282,6 +285,42 @@ static KeyGetter createKeyGetter(const ColumnRawPtrs & key_columns, const Sizes
|
||||
return KeyGetter(key_columns, key_sizes, nullptr);
|
||||
}
|
||||
|
||||
class KeyGetterForDict
|
||||
{
|
||||
public:
|
||||
using Mapped = JoinStuff::MappedOne;
|
||||
using FindResult = ColumnsHashing::columns_hashing_impl::FindResultImpl<Mapped>;
|
||||
|
||||
KeyGetterForDict(const ColumnRawPtrs & key_columns_, const Sizes &, void *)
|
||||
: key_columns(key_columns_)
|
||||
{}
|
||||
|
||||
FindResult findKey(const TableJoin & table_join, size_t row, const Arena &)
|
||||
{
|
||||
const DictionaryReader & reader = *table_join.dictionary_reader;
|
||||
if (!read_result)
|
||||
{
|
||||
reader.readKeys(*key_columns[0], read_result, found, positions);
|
||||
result.block = &read_result;
|
||||
|
||||
if (table_join.forceNullableRight())
|
||||
for (auto & column : read_result)
|
||||
if (table_join.rightBecomeNullable(column.type))
|
||||
JoinCommon::convertColumnToNullable(column);
|
||||
}
|
||||
|
||||
result.row_num = positions[row];
|
||||
return FindResult(&result, found[row]);
|
||||
}
|
||||
|
||||
private:
|
||||
const ColumnRawPtrs & key_columns;
|
||||
Block read_result;
|
||||
Mapped result;
|
||||
ColumnVector<UInt8>::Container found;
|
||||
std::vector<size_t> positions;
|
||||
};
|
||||
|
||||
template <HashJoin::Type type, typename Value, typename Mapped>
|
||||
struct KeyGetterForTypeImpl;
|
||||
|
||||
@ -351,7 +390,7 @@ size_t HashJoin::getTotalRowCount() const
|
||||
for (const auto & block : data->blocks)
|
||||
res += block.rows();
|
||||
}
|
||||
else
|
||||
else if (data->type != Type::DICT)
|
||||
{
|
||||
joinDispatch(kind, strictness, data->maps, [&](auto, auto, auto & map) { res += map.getTotalRowCount(data->type); });
|
||||
}
|
||||
@ -368,7 +407,7 @@ size_t HashJoin::getTotalByteCount() const
|
||||
for (const auto & block : data->blocks)
|
||||
res += block.bytes();
|
||||
}
|
||||
else
|
||||
else if (data->type != Type::DICT)
|
||||
{
|
||||
joinDispatch(kind, strictness, data->maps, [&](auto, auto, auto & map) { res += map.getTotalByteCountImpl(data->type); });
|
||||
res += data->pool.size();
|
||||
@ -400,7 +439,13 @@ void HashJoin::setSampleBlock(const Block & block)
|
||||
if (nullable_right_side)
|
||||
JoinCommon::convertColumnsToNullable(sample_block_with_columns_to_add);
|
||||
|
||||
if (strictness == ASTTableJoin::Strictness::Asof)
|
||||
if (table_join->dictionary_reader)
|
||||
{
|
||||
data->type = Type::DICT;
|
||||
std::get<MapsOne>(data->maps).create(Type::DICT);
|
||||
chooseMethod(key_columns, key_sizes); /// init key_sizes
|
||||
}
|
||||
else if (strictness == ASTTableJoin::Strictness::Asof)
|
||||
{
|
||||
if (kind != ASTTableJoin::Kind::Left and kind != ASTTableJoin::Kind::Inner)
|
||||
throw Exception("ASOF only supports LEFT and INNER as base joins", ErrorCodes::NOT_IMPLEMENTED);
|
||||
@ -526,7 +571,8 @@ namespace
|
||||
switch (type)
|
||||
{
|
||||
case HashJoin::Type::EMPTY: break;
|
||||
case HashJoin::Type::CROSS: break; /// Do nothing. We have already saved block, and it is enough.
|
||||
case HashJoin::Type::CROSS: break; /// Do nothing. We have already saved block, and it is enough.
|
||||
case HashJoin::Type::DICT: break; /// Noone should call it with Type::DICT.
|
||||
|
||||
#define M(TYPE) \
|
||||
case HashJoin::Type::TYPE: \
|
||||
@ -598,6 +644,8 @@ bool HashJoin::addJoinedBlock(const Block & source_block, bool check_limits)
|
||||
{
|
||||
if (empty())
|
||||
throw Exception("Logical error: HashJoin was not initialized", ErrorCodes::LOGICAL_ERROR);
|
||||
if (overDictionary())
|
||||
throw Exception("Logical error: insert into hash-map in HashJoin over dictionary", ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
/// There's no optimization for right side const columns. Remove constness if any.
|
||||
Block block = materializeBlock(source_block);
|
||||
@ -930,8 +978,7 @@ IColumn::Filter switchJoinRightColumns(const Maps & maps_, AddedColumns & added_
|
||||
case HashJoin::Type::TYPE: \
|
||||
return joinRightColumnsSwitchNullability<KIND, STRICTNESS,\
|
||||
typename KeyGetterForType<HashJoin::Type::TYPE, const std::remove_reference_t<decltype(*maps_.TYPE)>>::Type>(\
|
||||
*maps_.TYPE, added_columns, null_map);\
|
||||
break;
|
||||
*maps_.TYPE, added_columns, null_map);
|
||||
APPLY_FOR_JOIN_VARIANTS(M)
|
||||
#undef M
|
||||
|
||||
@ -940,6 +987,20 @@ IColumn::Filter switchJoinRightColumns(const Maps & maps_, AddedColumns & added_
|
||||
}
|
||||
}
|
||||
|
||||
template <ASTTableJoin::Kind KIND, ASTTableJoin::Strictness STRICTNESS>
|
||||
IColumn::Filter dictionaryJoinRightColumns(const TableJoin & table_join, AddedColumns & added_columns, const ConstNullMapPtr & null_map)
|
||||
{
|
||||
if constexpr (KIND == ASTTableJoin::Kind::Left &&
|
||||
(STRICTNESS == ASTTableJoin::Strictness::Any ||
|
||||
STRICTNESS == ASTTableJoin::Strictness::Semi ||
|
||||
STRICTNESS == ASTTableJoin::Strictness::Anti))
|
||||
{
|
||||
return joinRightColumnsSwitchNullability<KIND, STRICTNESS, KeyGetterForDict>(table_join, added_columns, null_map);
|
||||
}
|
||||
|
||||
throw Exception("Logical error: wrong JOIN combination", ErrorCodes::LOGICAL_ERROR);
|
||||
}
|
||||
|
||||
} /// nameless
|
||||
|
||||
|
||||
@ -1000,7 +1061,9 @@ void HashJoin::joinBlockImpl(
|
||||
bool has_required_right_keys = (required_right_keys.columns() != 0);
|
||||
added_columns.need_filter = need_filter || has_required_right_keys;
|
||||
|
||||
IColumn::Filter row_filter = switchJoinRightColumns<KIND, STRICTNESS>(maps_, added_columns, data->type, null_map);
|
||||
IColumn::Filter row_filter = overDictionary() ?
|
||||
dictionaryJoinRightColumns<KIND, STRICTNESS>(*table_join, added_columns, null_map) :
|
||||
switchJoinRightColumns<KIND, STRICTNESS>(maps_, added_columns, data->type, null_map);
|
||||
|
||||
for (size_t i = 0; i < added_columns.size(); ++i)
|
||||
block.insert(added_columns.moveColumn(i));
|
||||
@ -1211,7 +1274,36 @@ void HashJoin::joinBlock(Block & block, ExtraBlockPtr & not_processed)
|
||||
const Names & key_names_left = table_join->keyNamesLeft();
|
||||
JoinCommon::checkTypesOfKeys(block, key_names_left, right_table_keys, key_names_right);
|
||||
|
||||
if (joinDispatch(kind, strictness, data->maps, [&](auto kind_, auto strictness_, auto & map)
|
||||
if (overDictionary())
|
||||
{
|
||||
using Kind = ASTTableJoin::Kind;
|
||||
using Strictness = ASTTableJoin::Strictness;
|
||||
|
||||
auto & map = std::get<MapsOne>(data->maps);
|
||||
if (kind == Kind::Left)
|
||||
{
|
||||
switch (strictness)
|
||||
{
|
||||
case Strictness::Any:
|
||||
case Strictness::All:
|
||||
joinBlockImpl<Kind::Left, Strictness::Any>(block, key_names_left, sample_block_with_columns_to_add, map);
|
||||
break;
|
||||
case Strictness::Semi:
|
||||
joinBlockImpl<Kind::Left, Strictness::Semi>(block, key_names_left, sample_block_with_columns_to_add, map);
|
||||
break;
|
||||
case Strictness::Anti:
|
||||
joinBlockImpl<Kind::Left, Strictness::Anti>(block, key_names_left, sample_block_with_columns_to_add, map);
|
||||
break;
|
||||
default:
|
||||
throw Exception("Logical error: wrong JOIN combination", ErrorCodes::LOGICAL_ERROR);
|
||||
}
|
||||
}
|
||||
else if (kind == Kind::Inner && strictness == Strictness::All)
|
||||
joinBlockImpl<Kind::Left, Strictness::Semi>(block, key_names_left, sample_block_with_columns_to_add, map);
|
||||
else
|
||||
throw Exception("Logical error: wrong JOIN combination", ErrorCodes::LOGICAL_ERROR);
|
||||
}
|
||||
else if (joinDispatch(kind, strictness, data->maps, [&](auto kind_, auto strictness_, auto & map)
|
||||
{
|
||||
joinBlockImpl<kind_, strictness_>(block, key_names_left, sample_block_with_columns_to_add, map);
|
||||
}))
|
||||
|
@ -27,6 +27,7 @@ namespace DB
|
||||
{
|
||||
|
||||
class TableJoin;
|
||||
class DictionaryReader;
|
||||
|
||||
namespace JoinStuff
|
||||
{
|
||||
@ -148,7 +149,8 @@ class HashJoin : public IJoin
|
||||
public:
|
||||
HashJoin(std::shared_ptr<TableJoin> table_join_, const Block & right_sample_block, bool any_take_last_row_ = false);
|
||||
|
||||
bool empty() { return data->type == Type::EMPTY; }
|
||||
bool empty() const { return data->type == Type::EMPTY; }
|
||||
bool overDictionary() const { return data->type == Type::DICT; }
|
||||
|
||||
/** Add block of data from right hand of JOIN to the map.
|
||||
* Returns false, if some limit was exceeded and you should not insert more data.
|
||||
@ -186,7 +188,7 @@ public:
|
||||
/// Sum size in bytes of all buffers, used for JOIN maps and for all memory pools.
|
||||
size_t getTotalByteCount() const final;
|
||||
|
||||
bool alwaysReturnsEmptySet() const final { return isInnerOrRight(getKind()) && data->empty; }
|
||||
bool alwaysReturnsEmptySet() const final { return isInnerOrRight(getKind()) && data->empty && !overDictionary(); }
|
||||
|
||||
ASTTableJoin::Kind getKind() const { return kind; }
|
||||
ASTTableJoin::Strictness getStrictness() const { return strictness; }
|
||||
@ -220,12 +222,12 @@ public:
|
||||
{
|
||||
EMPTY,
|
||||
CROSS,
|
||||
DICT,
|
||||
#define M(NAME) NAME,
|
||||
APPLY_FOR_JOIN_VARIANTS(M)
|
||||
#undef M
|
||||
};
|
||||
|
||||
|
||||
/** Different data structures, that are used to perform JOIN.
|
||||
*/
|
||||
template <typename Mapped>
|
||||
@ -247,6 +249,7 @@ public:
|
||||
{
|
||||
case Type::EMPTY: break;
|
||||
case Type::CROSS: break;
|
||||
case Type::DICT: break;
|
||||
|
||||
#define M(NAME) \
|
||||
case Type::NAME: NAME = std::make_unique<typename decltype(NAME)::element_type>(); break;
|
||||
@ -261,6 +264,7 @@ public:
|
||||
{
|
||||
case Type::EMPTY: return 0;
|
||||
case Type::CROSS: return 0;
|
||||
case Type::DICT: return 0;
|
||||
|
||||
#define M(NAME) \
|
||||
case Type::NAME: return NAME ? NAME->size() : 0;
|
||||
@ -277,6 +281,7 @@ public:
|
||||
{
|
||||
case Type::EMPTY: return 0;
|
||||
case Type::CROSS: return 0;
|
||||
case Type::DICT: return 0;
|
||||
|
||||
#define M(NAME) \
|
||||
case Type::NAME: return NAME ? NAME->getBufferSizeInBytes() : 0;
|
||||
|
@ -45,6 +45,8 @@
|
||||
#include <Databases/DatabaseFactory.h>
|
||||
#include <Databases/IDatabase.h>
|
||||
|
||||
#include <Dictionaries/getDictionaryConfigurationFromAST.h>
|
||||
|
||||
#include <Compression/CompressionFactory.h>
|
||||
|
||||
#include <Interpreters/InterpreterDropQuery.h>
|
||||
@ -703,7 +705,11 @@ BlockIO InterpreterCreateQuery::createDictionary(ASTCreateQuery & create)
|
||||
}
|
||||
|
||||
if (create.attach)
|
||||
database->attachDictionary(dictionary_name, context);
|
||||
{
|
||||
auto config = getDictionaryConfigurationFromAST(create);
|
||||
auto modification_time = database->getObjectMetadataModificationTime(dictionary_name);
|
||||
database->attachDictionary(dictionary_name, DictionaryAttachInfo{query_ptr, config, modification_time});
|
||||
}
|
||||
else
|
||||
database->createDictionary(context, dictionary_name, query_ptr);
|
||||
|
||||
|
@ -188,7 +188,7 @@ BlockIO InterpreterDropQuery::executeToDictionary(
|
||||
{
|
||||
/// Drop dictionary from memory, don't touch data and metadata
|
||||
context.checkAccess(AccessType::DROP_DICTIONARY, database_name, dictionary_name);
|
||||
database->detachDictionary(dictionary_name, context);
|
||||
database->detachDictionary(dictionary_name);
|
||||
}
|
||||
else if (kind == ASTDropQuery::Kind::Truncate)
|
||||
{
|
||||
@ -254,21 +254,26 @@ BlockIO InterpreterDropQuery::executeToDatabase(const String & database_name, AS
|
||||
bool drop = kind == ASTDropQuery::Kind::Drop;
|
||||
context.checkAccess(AccessType::DROP_DATABASE, database_name);
|
||||
|
||||
/// DETACH or DROP all tables and dictionaries inside database
|
||||
for (auto iterator = database->getTablesIterator(context); iterator->isValid(); iterator->next())
|
||||
if (database->shouldBeEmptyOnDetach())
|
||||
{
|
||||
String current_table_name = iterator->name();
|
||||
executeToTable(database_name, current_table_name, kind, false, false, false);
|
||||
}
|
||||
/// DETACH or DROP all tables and dictionaries inside database.
|
||||
/// First we should DETACH or DROP dictionaries because StorageDictionary
|
||||
/// must be detached only by detaching corresponding dictionary.
|
||||
for (auto iterator = database->getDictionariesIterator(context); iterator->isValid(); iterator->next())
|
||||
{
|
||||
String current_dictionary = iterator->name();
|
||||
executeToDictionary(database_name, current_dictionary, kind, false, false, false);
|
||||
}
|
||||
|
||||
for (auto iterator = database->getDictionariesIterator(context); iterator->isValid(); iterator->next())
|
||||
{
|
||||
String current_dictionary = iterator->name();
|
||||
executeToDictionary(database_name, current_dictionary, kind, false, false, false);
|
||||
for (auto iterator = database->getTablesIterator(context); iterator->isValid(); iterator->next())
|
||||
{
|
||||
String current_table_name = iterator->name();
|
||||
executeToTable(database_name, current_table_name, kind, false, false, false);
|
||||
}
|
||||
}
|
||||
|
||||
/// DETACH or DROP database itself
|
||||
DatabaseCatalog::instance().detachDatabase(database_name, drop);
|
||||
DatabaseCatalog::instance().detachDatabase(database_name, drop, database->shouldBeEmptyOnDetach());
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -305,12 +305,13 @@ InterpreterSelectQuery::InterpreterSelectQuery(
|
||||
|
||||
max_streams = settings.max_threads;
|
||||
ASTSelectQuery & query = getSelectQuery();
|
||||
std::shared_ptr<TableJoin> table_join = joined_tables.makeTableJoin(query);
|
||||
|
||||
auto analyze = [&] (bool try_move_to_prewhere = true)
|
||||
{
|
||||
syntax_analyzer_result = SyntaxAnalyzer(*context).analyzeSelect(
|
||||
query_ptr, SyntaxAnalyzerResult(source_header.getNamesAndTypesList(), storage),
|
||||
options, joined_tables.tablesWithColumns(), required_result_column_names);
|
||||
options, joined_tables.tablesWithColumns(), required_result_column_names, table_join);
|
||||
|
||||
/// Save scalar sub queries's results in the query context
|
||||
if (context->hasQueryContext())
|
||||
|
@ -1,18 +1,26 @@
|
||||
#include <Interpreters/JoinedTables.h>
|
||||
#include <Interpreters/TableJoin.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Interpreters/getTableExpressions.h>
|
||||
#include <Interpreters/InJoinSubqueriesPreprocessor.h>
|
||||
#include <Interpreters/IdentifierSemantic.h>
|
||||
#include <Interpreters/InDepthNodeVisitor.h>
|
||||
|
||||
#include <Storages/IStorage.h>
|
||||
#include <Storages/ColumnsDescription.h>
|
||||
#include <Storages/StorageValues.h>
|
||||
#include <Storages/StorageJoin.h>
|
||||
#include <Storages/StorageDictionary.h>
|
||||
|
||||
#include <Parsers/ASTFunction.h>
|
||||
#include <Parsers/ASTSelectQuery.h>
|
||||
#include <Parsers/ASTSelectWithUnionQuery.h>
|
||||
#include <Parsers/ASTSubquery.h>
|
||||
#include <Parsers/ASTTablesInSelectQuery.h>
|
||||
#include <Parsers/ASTIdentifier.h>
|
||||
#include <Parsers/ASTQualifiedAsterisk.h>
|
||||
#include <Parsers/ParserTablesInSelectQuery.h>
|
||||
#include <Parsers/parseQuery.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
@ -26,6 +34,34 @@ namespace ErrorCodes
|
||||
namespace
|
||||
{
|
||||
|
||||
void replaceJoinedTable(const ASTSelectQuery & select_query)
|
||||
{
|
||||
const ASTTablesInSelectQueryElement * join = select_query.join();
|
||||
if (!join || !join->table_expression)
|
||||
return;
|
||||
|
||||
/// TODO: Push down for CROSS JOIN is not OK [disabled]
|
||||
const auto & table_join = join->table_join->as<ASTTableJoin &>();
|
||||
if (table_join.kind == ASTTableJoin::Kind::Cross)
|
||||
return;
|
||||
|
||||
auto & table_expr = join->table_expression->as<ASTTableExpression &>();
|
||||
if (table_expr.database_and_table_name)
|
||||
{
|
||||
const auto & table_id = table_expr.database_and_table_name->as<ASTIdentifier &>();
|
||||
String expr = "(select * from " + table_id.name + ") as " + table_id.shortName();
|
||||
|
||||
// FIXME: since the expression "a as b" exposes both "a" and "b" names, which is not equivalent to "(select * from a) as b",
|
||||
// we can't replace aliased tables.
|
||||
// FIXME: long table names include database name, which we can't save within alias.
|
||||
if (table_id.alias.empty() && table_id.isShort())
|
||||
{
|
||||
ParserTableExpression parser;
|
||||
table_expr = parseQuery(parser, expr, 0, DBMS_DEFAULT_MAX_PARSER_DEPTH)->as<ASTTableExpression &>();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
void checkTablesWithColumns(const std::vector<T> & tables_with_columns, const Context & context)
|
||||
{
|
||||
@ -209,4 +245,35 @@ void JoinedTables::rewriteDistributedInAndJoins(ASTPtr & query)
|
||||
}
|
||||
}
|
||||
|
||||
std::shared_ptr<TableJoin> JoinedTables::makeTableJoin(const ASTSelectQuery & select_query)
|
||||
{
|
||||
if (tables_with_columns.size() < 2)
|
||||
return {};
|
||||
|
||||
auto settings = context.getSettingsRef();
|
||||
auto table_join = std::make_shared<TableJoin>(settings, context.getTemporaryVolume());
|
||||
|
||||
const ASTTablesInSelectQueryElement * ast_join = select_query.join();
|
||||
const auto & table_to_join = ast_join->table_expression->as<ASTTableExpression &>();
|
||||
|
||||
/// TODO This syntax does not support specifying a database name.
|
||||
if (table_to_join.database_and_table_name)
|
||||
{
|
||||
auto joined_table_id = context.resolveStorageID(table_to_join.database_and_table_name);
|
||||
StoragePtr table = DatabaseCatalog::instance().tryGetTable(joined_table_id);
|
||||
if (table)
|
||||
{
|
||||
if (dynamic_cast<StorageJoin *>(table.get()) ||
|
||||
dynamic_cast<StorageDictionary *>(table.get()))
|
||||
table_join->joined_storage = table;
|
||||
}
|
||||
}
|
||||
|
||||
if (!table_join->joined_storage &&
|
||||
settings.enable_optimize_predicate_expression)
|
||||
replaceJoinedTable(select_query);
|
||||
|
||||
return table_join;
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -10,6 +10,7 @@ namespace DB
|
||||
|
||||
class ASTSelectQuery;
|
||||
class Context;
|
||||
class TableJoin;
|
||||
struct SelectQueryOptions;
|
||||
|
||||
/// Joined tables' columns resolver.
|
||||
@ -30,6 +31,7 @@ public:
|
||||
|
||||
/// Make fake tables_with_columns[0] in case we have predefined input in InterpreterSelectQuery
|
||||
void makeFakeTable(StoragePtr storage, const Block & source_header);
|
||||
std::shared_ptr<TableJoin> makeTableJoin(const ASTSelectQuery & select_query);
|
||||
|
||||
const std::vector<TableWithColumnNamesAndTypes> & tablesWithColumns() const { return tables_with_columns; }
|
||||
|
||||
|
@ -29,10 +29,10 @@
|
||||
#include <Parsers/ASTOrderByElement.h>
|
||||
#include <Parsers/ASTSelectQuery.h>
|
||||
#include <Parsers/ASTTablesInSelectQuery.h>
|
||||
#include <Parsers/ParserTablesInSelectQuery.h>
|
||||
#include <Parsers/parseQuery.h>
|
||||
#include <Parsers/queryToString.h>
|
||||
|
||||
#include <Functions/FunctionFactory.h>
|
||||
|
||||
#include <DataTypes/NestedUtils.h>
|
||||
#include <DataTypes/DataTypeNullable.h>
|
||||
|
||||
@ -216,28 +216,6 @@ void executeScalarSubqueries(ASTPtr & query, const Context & context, size_t sub
|
||||
ExecuteScalarSubqueriesVisitor(visitor_data, log.stream()).visit(query);
|
||||
}
|
||||
|
||||
/** Calls to these functions in the GROUP BY statement would be
|
||||
* replaced by their immediate argument.
|
||||
*/
|
||||
const std::unordered_set<String> injective_function_names
|
||||
{
|
||||
"negate",
|
||||
"bitNot",
|
||||
"reverse",
|
||||
"reverseUTF8",
|
||||
"toString",
|
||||
"toFixedString",
|
||||
"IPv4NumToString",
|
||||
"IPv4StringToNum",
|
||||
"hex",
|
||||
"unhex",
|
||||
"bitmaskToList",
|
||||
"bitmaskToArray",
|
||||
"tuple",
|
||||
"regionToName",
|
||||
"concatAssumeInjective",
|
||||
};
|
||||
|
||||
const std::unordered_set<String> possibly_injective_function_names
|
||||
{
|
||||
"dictGetString",
|
||||
@ -278,6 +256,8 @@ void appendUnusedGroupByColumn(ASTSelectQuery * select_query, const NameSet & so
|
||||
/// Eliminates injective function calls and constant expressions from group by statement.
|
||||
void optimizeGroupBy(ASTSelectQuery * select_query, const NameSet & source_columns, const Context & context)
|
||||
{
|
||||
const FunctionFactory & function_factory = FunctionFactory::instance();
|
||||
|
||||
if (!select_query->groupBy())
|
||||
{
|
||||
// If there is a HAVING clause without GROUP BY, make sure we have some aggregation happen.
|
||||
@ -327,7 +307,7 @@ void optimizeGroupBy(ASTSelectQuery * select_query, const NameSet & source_colum
|
||||
continue;
|
||||
}
|
||||
}
|
||||
else if (!injective_function_names.count(function->name))
|
||||
else if (!function_factory.get(function->name, context)->isInjective(Block{}))
|
||||
{
|
||||
++i;
|
||||
continue;
|
||||
@ -565,34 +545,6 @@ void collectJoinedColumns(TableJoin & analyzed_join, const ASTSelectQuery & sele
|
||||
}
|
||||
}
|
||||
|
||||
void replaceJoinedTable(const ASTSelectQuery & select_query)
|
||||
{
|
||||
const ASTTablesInSelectQueryElement * join = select_query.join();
|
||||
if (!join || !join->table_expression)
|
||||
return;
|
||||
|
||||
/// TODO: Push down for CROSS JOIN is not OK [disabled]
|
||||
const auto & table_join = join->table_join->as<ASTTableJoin &>();
|
||||
if (table_join.kind == ASTTableJoin::Kind::Cross)
|
||||
return;
|
||||
|
||||
auto & table_expr = join->table_expression->as<ASTTableExpression &>();
|
||||
if (table_expr.database_and_table_name)
|
||||
{
|
||||
const auto & table_id = table_expr.database_and_table_name->as<ASTIdentifier &>();
|
||||
String expr = "(select * from " + table_id.name + ") as " + table_id.shortName();
|
||||
|
||||
// FIXME: since the expression "a as b" exposes both "a" and "b" names, which is not equivalent to "(select * from a) as b",
|
||||
// we can't replace aliased tables.
|
||||
// FIXME: long table names include database name, which we can't save within alias.
|
||||
if (table_id.alias.empty() && table_id.isShort())
|
||||
{
|
||||
ParserTableExpression parser;
|
||||
table_expr = parseQuery(parser, expr, 0, DBMS_DEFAULT_MAX_PARSER_DEPTH)->as<ASTTableExpression &>();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
std::vector<const ASTFunction *> getAggregates(ASTPtr & query, const ASTSelectQuery & select_query)
|
||||
{
|
||||
/// There can not be aggregate functions inside the WHERE and PREWHERE.
|
||||
@ -799,7 +751,8 @@ SyntaxAnalyzerResultPtr SyntaxAnalyzer::analyzeSelect(
|
||||
SyntaxAnalyzerResult && result,
|
||||
const SelectQueryOptions & select_options,
|
||||
const std::vector<TableWithColumnNamesAndTypes> & tables_with_columns,
|
||||
const Names & required_result_columns) const
|
||||
const Names & required_result_columns,
|
||||
std::shared_ptr<TableJoin> table_join) const
|
||||
{
|
||||
auto * select_query = query->as<ASTSelectQuery>();
|
||||
if (!select_query)
|
||||
@ -811,14 +764,13 @@ SyntaxAnalyzerResultPtr SyntaxAnalyzer::analyzeSelect(
|
||||
const auto & settings = context.getSettingsRef();
|
||||
|
||||
const NameSet & source_columns_set = result.source_columns_set;
|
||||
result.analyzed_join = std::make_shared<TableJoin>(settings, context.getTemporaryVolume());
|
||||
result.analyzed_join = table_join;
|
||||
if (!result.analyzed_join) /// ExpressionAnalyzer expects some not empty object here
|
||||
result.analyzed_join = std::make_shared<TableJoin>();
|
||||
|
||||
if (remove_duplicates)
|
||||
renameDuplicatedColumns(select_query);
|
||||
|
||||
if (settings.enable_optimize_predicate_expression)
|
||||
replaceJoinedTable(*select_query);
|
||||
|
||||
/// TODO: Remove unneeded conversion
|
||||
std::vector<TableWithColumnNames> tables_with_column_names;
|
||||
tables_with_column_names.reserve(tables_with_columns.size());
|
||||
|
@ -94,7 +94,8 @@ public:
|
||||
SyntaxAnalyzerResult && result,
|
||||
const SelectQueryOptions & select_options = {},
|
||||
const std::vector<TableWithColumnNamesAndTypes> & tables_with_columns = {},
|
||||
const Names & required_result_columns = {}) const;
|
||||
const Names & required_result_columns = {},
|
||||
std::shared_ptr<TableJoin> table_join = {}) const;
|
||||
|
||||
private:
|
||||
const Context & context;
|
||||
|
@ -159,22 +159,26 @@ NamesWithAliases TableJoin::getRequiredColumns(const Block & sample, const Names
|
||||
return getNamesWithAliases(required_columns);
|
||||
}
|
||||
|
||||
bool TableJoin::leftBecomeNullable(const DataTypePtr & column_type) const
|
||||
{
|
||||
return forceNullableLeft() && column_type->canBeInsideNullable();
|
||||
}
|
||||
|
||||
bool TableJoin::rightBecomeNullable(const DataTypePtr & column_type) const
|
||||
{
|
||||
return forceNullableRight() && column_type->canBeInsideNullable();
|
||||
}
|
||||
|
||||
void TableJoin::addJoinedColumn(const NameAndTypePair & joined_column)
|
||||
{
|
||||
if (join_use_nulls && isLeftOrFull(table_join.kind))
|
||||
{
|
||||
auto type = joined_column.type->canBeInsideNullable() ? makeNullable(joined_column.type) : joined_column.type;
|
||||
columns_added_by_join.emplace_back(NameAndTypePair(joined_column.name, std::move(type)));
|
||||
}
|
||||
if (rightBecomeNullable(joined_column.type))
|
||||
columns_added_by_join.emplace_back(NameAndTypePair(joined_column.name, makeNullable(joined_column.type)));
|
||||
else
|
||||
columns_added_by_join.push_back(joined_column);
|
||||
}
|
||||
|
||||
void TableJoin::addJoinedColumnsAndCorrectNullability(Block & sample_block) const
|
||||
{
|
||||
bool right_or_full_join = isRightOrFull(table_join.kind);
|
||||
bool left_or_full_join = isLeftOrFull(table_join.kind);
|
||||
|
||||
for (auto & col : sample_block)
|
||||
{
|
||||
/// Materialize column.
|
||||
@ -183,9 +187,7 @@ void TableJoin::addJoinedColumnsAndCorrectNullability(Block & sample_block) cons
|
||||
if (col.column)
|
||||
col.column = nullptr;
|
||||
|
||||
bool make_nullable = join_use_nulls && right_or_full_join;
|
||||
|
||||
if (make_nullable && col.type->canBeInsideNullable())
|
||||
if (leftBecomeNullable(col.type))
|
||||
col.type = makeNullable(col.type);
|
||||
}
|
||||
|
||||
@ -193,9 +195,7 @@ void TableJoin::addJoinedColumnsAndCorrectNullability(Block & sample_block) cons
|
||||
{
|
||||
auto res_type = col.type;
|
||||
|
||||
bool make_nullable = join_use_nulls && left_or_full_join;
|
||||
|
||||
if (make_nullable && res_type->canBeInsideNullable())
|
||||
if (rightBecomeNullable(res_type))
|
||||
res_type = makeNullable(res_type);
|
||||
|
||||
sample_block.insert(ColumnWithTypeAndName(nullptr, res_type, col.name));
|
||||
@ -242,4 +242,31 @@ bool TableJoin::allowMergeJoin() const
|
||||
return allow_merge_join;
|
||||
}
|
||||
|
||||
bool TableJoin::allowDictJoin(const String & dict_key, const Block & sample_block, Names & names, NamesAndTypesList & result_columns) const
|
||||
{
|
||||
/// Support ALL INNER, [ANY | ALL | SEMI | ANTI] LEFT
|
||||
if (!isLeft(kind()) && !(isInner(kind()) && strictness() == ASTTableJoin::Strictness::All))
|
||||
return false;
|
||||
|
||||
const Names & right_keys = keyNamesRight();
|
||||
if (right_keys.size() != 1)
|
||||
return false;
|
||||
|
||||
for (auto & col : sample_block)
|
||||
{
|
||||
String original = original_names.find(col.name)->second;
|
||||
if (col.name == right_keys[0])
|
||||
{
|
||||
if (original != dict_key)
|
||||
return false; /// JOIN key != Dictionary key
|
||||
continue; /// do not extract key column
|
||||
}
|
||||
|
||||
names.push_back(original);
|
||||
result_columns.push_back({col.name, col.type});
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -8,6 +8,7 @@
|
||||
#include <Interpreters/asof.h>
|
||||
#include <DataStreams/IBlockStream_fwd.h>
|
||||
#include <DataStreams/SizeLimits.h>
|
||||
#include <Storages/IStorage_fwd.h>
|
||||
|
||||
#include <utility>
|
||||
#include <memory>
|
||||
@ -19,6 +20,7 @@ class Context;
|
||||
class ASTSelectQuery;
|
||||
struct DatabaseAndTableWithAlias;
|
||||
class Block;
|
||||
class DictionaryReader;
|
||||
|
||||
struct Settings;
|
||||
|
||||
@ -42,10 +44,10 @@ class TableJoin
|
||||
friend class SyntaxAnalyzer;
|
||||
|
||||
const SizeLimits size_limits;
|
||||
const size_t default_max_bytes;
|
||||
const bool join_use_nulls;
|
||||
const size_t default_max_bytes = 0;
|
||||
const bool join_use_nulls = false;
|
||||
const size_t max_joined_block_rows = 0;
|
||||
JoinAlgorithm join_algorithm;
|
||||
JoinAlgorithm join_algorithm = JoinAlgorithm::AUTO;
|
||||
const bool partial_merge_join_optimizations = false;
|
||||
const size_t partial_merge_join_rows_in_right_blocks = 0;
|
||||
|
||||
@ -69,6 +71,7 @@ class TableJoin
|
||||
VolumePtr tmp_volume;
|
||||
|
||||
public:
|
||||
TableJoin() = default;
|
||||
TableJoin(const Settings &, VolumePtr tmp_volume);
|
||||
|
||||
/// for StorageJoin
|
||||
@ -84,12 +87,16 @@ public:
|
||||
table_join.strictness = strictness;
|
||||
}
|
||||
|
||||
StoragePtr joined_storage;
|
||||
std::shared_ptr<DictionaryReader> dictionary_reader;
|
||||
|
||||
ASTTableJoin::Kind kind() const { return table_join.kind; }
|
||||
ASTTableJoin::Strictness strictness() const { return table_join.strictness; }
|
||||
bool sameStrictnessAndKind(ASTTableJoin::Strictness, ASTTableJoin::Kind) const;
|
||||
const SizeLimits & sizeLimits() const { return size_limits; }
|
||||
VolumePtr getTemporaryVolume() { return tmp_volume; }
|
||||
bool allowMergeJoin() const;
|
||||
bool allowDictJoin(const String & dict_key, const Block & sample_block, Names &, NamesAndTypesList &) const;
|
||||
bool preferMergeJoin() const { return join_algorithm == JoinAlgorithm::PREFER_PARTIAL_MERGE; }
|
||||
bool forceMergeJoin() const { return join_algorithm == JoinAlgorithm::PARTIAL_MERGE; }
|
||||
bool forceHashJoin() const { return join_algorithm == JoinAlgorithm::HASH; }
|
||||
@ -115,6 +122,8 @@ public:
|
||||
size_t rightKeyInclusion(const String & name) const;
|
||||
NameSet requiredRightKeys() const;
|
||||
|
||||
bool leftBecomeNullable(const DataTypePtr & column_type) const;
|
||||
bool rightBecomeNullable(const DataTypePtr & column_type) const;
|
||||
void addJoinedColumn(const NameAndTypePair & joined_column);
|
||||
void addJoinedColumnsAndCorrectNullability(Block & sample_block) const;
|
||||
|
||||
|
@ -24,14 +24,124 @@ namespace ErrorCodes
|
||||
}
|
||||
|
||||
MsgPackRowInputFormat::MsgPackRowInputFormat(const Block & header_, ReadBuffer & in_, Params params_)
|
||||
: IRowInputFormat(header_, in_, std::move(params_)), buf(in), ctx(&reference_func, nullptr, msgpack::unpack_limit()), data_types(header_.getDataTypes()) {}
|
||||
: IRowInputFormat(header_, in_, std::move(params_)), buf(in), parser(visitor), data_types(header_.getDataTypes()) {}
|
||||
|
||||
int MsgPackRowInputFormat::unpack(msgpack::zone & zone, size_t & offset)
|
||||
void MsgPackVisitor::set_info(IColumn & column, DataTypePtr type) // NOLINT
|
||||
{
|
||||
offset = 0;
|
||||
ctx.init();
|
||||
ctx.user().set_zone(zone);
|
||||
return ctx.execute(buf.position(), buf.buffer().end() - buf.position(), offset);
|
||||
while (!info_stack.empty())
|
||||
{
|
||||
info_stack.pop();
|
||||
}
|
||||
info_stack.push(Info{column, type});
|
||||
}
|
||||
|
||||
void MsgPackVisitor::insert_integer(UInt64 value) // NOLINT
|
||||
{
|
||||
Info & info = info_stack.top();
|
||||
switch (info.type->getTypeId())
|
||||
{
|
||||
case TypeIndex::UInt8:
|
||||
{
|
||||
assert_cast<ColumnUInt8 &>(info.column).insertValue(value);
|
||||
break;
|
||||
}
|
||||
case TypeIndex::Date: [[fallthrough]];
|
||||
case TypeIndex::UInt16:
|
||||
{
|
||||
assert_cast<ColumnUInt16 &>(info.column).insertValue(value);
|
||||
break;
|
||||
}
|
||||
case TypeIndex::DateTime: [[fallthrough]];
|
||||
case TypeIndex::UInt32:
|
||||
{
|
||||
assert_cast<ColumnUInt32 &>(info.column).insertValue(value);
|
||||
break;
|
||||
}
|
||||
case TypeIndex::UInt64:
|
||||
{
|
||||
assert_cast<ColumnUInt64 &>(info.column).insertValue(value);
|
||||
break;
|
||||
}
|
||||
case TypeIndex::Int8:
|
||||
{
|
||||
assert_cast<ColumnInt8 &>(info.column).insertValue(value);
|
||||
break;
|
||||
}
|
||||
case TypeIndex::Int16:
|
||||
{
|
||||
assert_cast<ColumnInt16 &>(info.column).insertValue(value);
|
||||
break;
|
||||
}
|
||||
case TypeIndex::Int32:
|
||||
{
|
||||
assert_cast<ColumnInt32 &>(info.column).insertValue(value);
|
||||
break;
|
||||
}
|
||||
case TypeIndex::Int64:
|
||||
{
|
||||
assert_cast<ColumnInt64 &>(info.column).insertValue(value);
|
||||
break;
|
||||
}
|
||||
case TypeIndex::DateTime64:
|
||||
{
|
||||
assert_cast<DataTypeDateTime64::ColumnType &>(info.column).insertValue(value);
|
||||
break;
|
||||
}
|
||||
default:
|
||||
throw Exception("Type " + info.type->getName() + " is not supported for MsgPack input format", ErrorCodes::ILLEGAL_COLUMN);
|
||||
}
|
||||
}
|
||||
|
||||
bool MsgPackVisitor::visit_positive_integer(UInt64 value) // NOLINT
|
||||
{
|
||||
insert_integer(value);
|
||||
return true;
|
||||
}
|
||||
|
||||
bool MsgPackVisitor::visit_negative_integer(Int64 value) // NOLINT
|
||||
{
|
||||
insert_integer(value);
|
||||
return true;
|
||||
}
|
||||
|
||||
bool MsgPackVisitor::visit_str(const char* value, size_t size) // NOLINT
|
||||
{
|
||||
info_stack.top().column.insertData(value, size);
|
||||
return true;
|
||||
}
|
||||
|
||||
bool MsgPackVisitor::visit_float32(Float32 value) // NOLINT
|
||||
{
|
||||
assert_cast<ColumnFloat32 &>(info_stack.top().column).insertValue(value);
|
||||
return true;
|
||||
}
|
||||
|
||||
bool MsgPackVisitor::visit_float64(Float64 value) // NOLINT
|
||||
{
|
||||
assert_cast<ColumnFloat64 &>(info_stack.top().column).insertValue(value);
|
||||
return true;
|
||||
}
|
||||
|
||||
bool MsgPackVisitor::start_array(size_t size) // NOLINT
|
||||
{
|
||||
auto nested_type = assert_cast<const DataTypeArray &>(*info_stack.top().type).getNestedType();
|
||||
ColumnArray & column_array = assert_cast<ColumnArray &>(info_stack.top().column);
|
||||
ColumnArray::Offsets & offsets = column_array.getOffsets();
|
||||
IColumn & nested_column = column_array.getData();
|
||||
offsets.push_back(offsets.back() + size);
|
||||
info_stack.push(Info{nested_column, nested_type});
|
||||
return true;
|
||||
}
|
||||
|
||||
bool MsgPackVisitor::end_array() // NOLINT
|
||||
{
|
||||
info_stack.pop();
|
||||
return true;
|
||||
}
|
||||
|
||||
void MsgPackVisitor::parse_error(size_t, size_t) // NOLINT
|
||||
{
|
||||
throw Exception("Error occurred while parsing msgpack data.", ErrorCodes::INCORRECT_DATA);
|
||||
}
|
||||
|
||||
bool MsgPackRowInputFormat::readObject()
|
||||
@ -40,9 +150,8 @@ bool MsgPackRowInputFormat::readObject()
|
||||
return false;
|
||||
|
||||
PeekableReadBufferCheckpoint checkpoint{buf};
|
||||
std::unique_ptr<msgpack::zone> zone(new msgpack::zone);
|
||||
size_t offset;
|
||||
while (!unpack(*zone, offset))
|
||||
size_t offset = 0;
|
||||
while (!parser.execute(buf.position(), buf.available(), offset))
|
||||
{
|
||||
buf.position() = buf.buffer().end();
|
||||
if (buf.eof())
|
||||
@ -52,123 +161,19 @@ bool MsgPackRowInputFormat::readObject()
|
||||
buf.rollbackToCheckpoint();
|
||||
}
|
||||
buf.position() += offset;
|
||||
object_handle = msgpack::object_handle(ctx.data(), std::move(zone));
|
||||
return true;
|
||||
}
|
||||
|
||||
void MsgPackRowInputFormat::insertObject(IColumn & column, DataTypePtr data_type, const msgpack::object & object)
|
||||
{
|
||||
switch (data_type->getTypeId())
|
||||
{
|
||||
case TypeIndex::UInt8:
|
||||
{
|
||||
assert_cast<ColumnUInt8 &>(column).insertValue(object.as<uint8_t>());
|
||||
return;
|
||||
}
|
||||
case TypeIndex::Date: [[fallthrough]];
|
||||
case TypeIndex::UInt16:
|
||||
{
|
||||
assert_cast<ColumnUInt16 &>(column).insertValue(object.as<UInt16>());
|
||||
return;
|
||||
}
|
||||
case TypeIndex::DateTime: [[fallthrough]];
|
||||
case TypeIndex::UInt32:
|
||||
{
|
||||
assert_cast<ColumnUInt32 &>(column).insertValue(object.as<UInt32>());
|
||||
return;
|
||||
}
|
||||
case TypeIndex::UInt64:
|
||||
{
|
||||
assert_cast<ColumnUInt64 &>(column).insertValue(object.as<UInt64>());
|
||||
return;
|
||||
}
|
||||
case TypeIndex::Int8:
|
||||
{
|
||||
assert_cast<ColumnInt8 &>(column).insertValue(object.as<Int8>());
|
||||
return;
|
||||
}
|
||||
case TypeIndex::Int16:
|
||||
{
|
||||
assert_cast<ColumnInt16 &>(column).insertValue(object.as<Int16>());
|
||||
return;
|
||||
}
|
||||
case TypeIndex::Int32:
|
||||
{
|
||||
assert_cast<ColumnInt32 &>(column).insertValue(object.as<Int32>());
|
||||
return;
|
||||
}
|
||||
case TypeIndex::Int64:
|
||||
{
|
||||
assert_cast<ColumnInt64 &>(column).insertValue(object.as<Int64>());
|
||||
return;
|
||||
}
|
||||
case TypeIndex::Float32:
|
||||
{
|
||||
assert_cast<ColumnFloat32 &>(column).insertValue(object.as<Float32>());
|
||||
return;
|
||||
}
|
||||
case TypeIndex::Float64:
|
||||
{
|
||||
assert_cast<ColumnFloat64 &>(column).insertValue(object.as<Float64>());
|
||||
return;
|
||||
}
|
||||
case TypeIndex::DateTime64:
|
||||
{
|
||||
assert_cast<DataTypeDateTime64::ColumnType &>(column).insertValue(object.as<UInt64>());
|
||||
return;
|
||||
}
|
||||
case TypeIndex::FixedString: [[fallthrough]];
|
||||
case TypeIndex::String:
|
||||
{
|
||||
msgpack::object_str obj_str = object.via.str;
|
||||
column.insertData(obj_str.ptr, obj_str.size);
|
||||
return;
|
||||
}
|
||||
case TypeIndex::Array:
|
||||
{
|
||||
msgpack::object_array object_array = object.via.array;
|
||||
auto nested_type = assert_cast<const DataTypeArray &>(*data_type).getNestedType();
|
||||
ColumnArray & column_array = assert_cast<ColumnArray &>(column);
|
||||
ColumnArray::Offsets & offsets = column_array.getOffsets();
|
||||
IColumn & nested_column = column_array.getData();
|
||||
for (size_t i = 0; i != object_array.size; ++i)
|
||||
{
|
||||
insertObject(nested_column, nested_type, object_array.ptr[i]);
|
||||
}
|
||||
offsets.push_back(offsets.back() + object_array.size);
|
||||
return;
|
||||
}
|
||||
case TypeIndex::Nullable:
|
||||
{
|
||||
auto nested_type = removeNullable(data_type);
|
||||
ColumnNullable & column_nullable = assert_cast<ColumnNullable &>(column);
|
||||
if (object.type == msgpack::type::NIL)
|
||||
column_nullable.insertDefault();
|
||||
else
|
||||
insertObject(column_nullable.getNestedColumn(), nested_type, object);
|
||||
return;
|
||||
}
|
||||
case TypeIndex::Nothing:
|
||||
{
|
||||
// Nothing to insert, MsgPack object is nil.
|
||||
return;
|
||||
}
|
||||
default:
|
||||
break;
|
||||
}
|
||||
throw Exception("Type " + data_type->getName() + " is not supported for MsgPack input format", ErrorCodes::ILLEGAL_COLUMN);
|
||||
}
|
||||
|
||||
bool MsgPackRowInputFormat::readRow(MutableColumns & columns, RowReadExtension &)
|
||||
{
|
||||
size_t column_index = 0;
|
||||
bool has_more_data = true;
|
||||
for (; column_index != columns.size(); ++column_index)
|
||||
{
|
||||
visitor.set_info(*columns[column_index], data_types[column_index]);
|
||||
has_more_data = readObject();
|
||||
if (!has_more_data)
|
||||
break;
|
||||
insertObject(*columns[column_index], data_types[column_index], object_handle.get());
|
||||
}
|
||||
if (!has_more_data)
|
||||
{
|
||||
|
@ -4,12 +4,44 @@
|
||||
#include <Formats/FormatFactory.h>
|
||||
#include <IO/PeekableReadBuffer.h>
|
||||
#include <msgpack.hpp>
|
||||
#include <stack>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
class ReadBuffer;
|
||||
|
||||
class MsgPackVisitor : public msgpack::null_visitor
|
||||
{
|
||||
public:
|
||||
struct Info
|
||||
{
|
||||
IColumn & column;
|
||||
DataTypePtr type;
|
||||
};
|
||||
|
||||
/// These functions are called when parser meets corresponding object in parsed data
|
||||
bool visit_positive_integer(UInt64 value);
|
||||
bool visit_negative_integer(Int64 value);
|
||||
bool visit_float32(Float32 value);
|
||||
bool visit_float64(Float64 value);
|
||||
bool visit_str(const char* value, size_t size);
|
||||
bool start_array(size_t size);
|
||||
bool end_array();
|
||||
|
||||
/// This function will be called if error occurs in parsing
|
||||
[[noreturn]] void parse_error(size_t parsed_offset, size_t error_offset);
|
||||
|
||||
/// Update info_stack
|
||||
void set_info(IColumn & column, DataTypePtr type);
|
||||
|
||||
void insert_integer(UInt64 value);
|
||||
|
||||
private:
|
||||
/// Stack is needed to process nested arrays
|
||||
std::stack<Info> info_stack;
|
||||
};
|
||||
|
||||
class MsgPackRowInputFormat : public IRowInputFormat
|
||||
{
|
||||
public:
|
||||
@ -19,15 +51,10 @@ public:
|
||||
String getName() const override { return "MagPackRowInputFormat"; }
|
||||
private:
|
||||
bool readObject();
|
||||
void insertObject(IColumn & column, DataTypePtr type, const msgpack::object & object);
|
||||
int unpack(msgpack::zone & zone, size_t & offset);
|
||||
|
||||
// msgpack makes a copy of object by default, this function tells unpacker not to copy.
|
||||
static bool reference_func(msgpack::type::object_type, size_t, void *) { return true; }
|
||||
|
||||
PeekableReadBuffer buf;
|
||||
msgpack::object_handle object_handle;
|
||||
msgpack::v1::detail::context ctx;
|
||||
MsgPackVisitor visitor;
|
||||
msgpack::detail::parse_helper<MsgPackVisitor> parser;
|
||||
DataTypes data_types;
|
||||
};
|
||||
|
||||
|
@ -1,7 +1,6 @@
|
||||
#include <DataStreams/RemoteBlockOutputStream.h>
|
||||
#include <DataStreams/NativeBlockInputStream.h>
|
||||
#include <Common/escapeForFileName.h>
|
||||
#include <Common/setThreadName.h>
|
||||
#include <Common/CurrentMetrics.h>
|
||||
#include <Common/StringUtils/StringUtils.h>
|
||||
#include <Common/ClickHouseRevision.h>
|
||||
@ -78,7 +77,7 @@ namespace
|
||||
|
||||
|
||||
StorageDistributedDirectoryMonitor::StorageDistributedDirectoryMonitor(
|
||||
StorageDistributed & storage_, std::string path_, ConnectionPoolPtr pool_, ActionBlocker & monitor_blocker_)
|
||||
StorageDistributed & storage_, std::string path_, ConnectionPoolPtr pool_, ActionBlocker & monitor_blocker_, BackgroundSchedulePool & bg_pool_)
|
||||
/// It's important to initialize members before `thread` to avoid race.
|
||||
: storage(storage_)
|
||||
, pool(std::move(pool_))
|
||||
@ -92,7 +91,10 @@ StorageDistributedDirectoryMonitor::StorageDistributedDirectoryMonitor(
|
||||
, max_sleep_time{storage.global_context.getSettingsRef().distributed_directory_monitor_max_sleep_time_ms.totalMilliseconds()}
|
||||
, log{&Logger::get(getLoggerName())}
|
||||
, monitor_blocker(monitor_blocker_)
|
||||
, bg_pool(bg_pool_)
|
||||
{
|
||||
task_handle = bg_pool.createTask(getLoggerName() + "/Bg", [this]{ run(); });
|
||||
task_handle->activateAndSchedule();
|
||||
}
|
||||
|
||||
|
||||
@ -100,12 +102,9 @@ StorageDistributedDirectoryMonitor::~StorageDistributedDirectoryMonitor()
|
||||
{
|
||||
if (!quit)
|
||||
{
|
||||
{
|
||||
quit = true;
|
||||
std::lock_guard lock{mutex};
|
||||
}
|
||||
quit = true;
|
||||
cond.notify_one();
|
||||
thread.join();
|
||||
task_handle->deactivate();
|
||||
}
|
||||
}
|
||||
|
||||
@ -122,12 +121,9 @@ void StorageDistributedDirectoryMonitor::shutdownAndDropAllData()
|
||||
{
|
||||
if (!quit)
|
||||
{
|
||||
{
|
||||
quit = true;
|
||||
std::lock_guard lock{mutex};
|
||||
}
|
||||
quit = true;
|
||||
cond.notify_one();
|
||||
thread.join();
|
||||
task_handle->deactivate();
|
||||
}
|
||||
|
||||
Poco::File(path).remove(true);
|
||||
@ -136,16 +132,11 @@ void StorageDistributedDirectoryMonitor::shutdownAndDropAllData()
|
||||
|
||||
void StorageDistributedDirectoryMonitor::run()
|
||||
{
|
||||
setThreadName("DistrDirMonitor");
|
||||
|
||||
std::unique_lock lock{mutex};
|
||||
|
||||
const auto quit_requested = [this] { return quit.load(std::memory_order_relaxed); };
|
||||
|
||||
while (!quit_requested())
|
||||
while (!quit)
|
||||
{
|
||||
auto do_sleep = true;
|
||||
|
||||
bool do_sleep = true;
|
||||
if (!monitor_blocker.isCancelled())
|
||||
{
|
||||
try
|
||||
@ -167,15 +158,25 @@ void StorageDistributedDirectoryMonitor::run()
|
||||
LOG_DEBUG(log, "Skipping send data over distributed table.");
|
||||
}
|
||||
|
||||
if (do_sleep)
|
||||
cond.wait_for(lock, sleep_time, quit_requested);
|
||||
|
||||
const auto now = std::chrono::system_clock::now();
|
||||
if (now - last_decrease_time > decrease_error_count_period)
|
||||
{
|
||||
error_count /= 2;
|
||||
last_decrease_time = now;
|
||||
}
|
||||
|
||||
if (do_sleep)
|
||||
break;
|
||||
}
|
||||
|
||||
if (!quit)
|
||||
{
|
||||
/// If there is no error, then it will be scheduled by the DistributedBlockOutputStream,
|
||||
/// so this is just in case, hence it is distributed_directory_monitor_max_sleep_time_ms
|
||||
if (error_count)
|
||||
task_handle->scheduleAfter(sleep_time.count());
|
||||
else
|
||||
task_handle->scheduleAfter(max_sleep_time.count());
|
||||
}
|
||||
}
|
||||
|
||||
@ -586,6 +587,13 @@ BlockInputStreamPtr StorageDistributedDirectoryMonitor::createStreamFromFile(con
|
||||
return std::make_shared<DirectoryMonitorBlockInputStream>(file_name);
|
||||
}
|
||||
|
||||
bool StorageDistributedDirectoryMonitor::scheduleAfter(size_t ms)
|
||||
{
|
||||
if (quit)
|
||||
return false;
|
||||
return task_handle->scheduleAfter(ms);
|
||||
}
|
||||
|
||||
void StorageDistributedDirectoryMonitor::processFilesWithBatching(const std::map<UInt64, std::string> & files)
|
||||
{
|
||||
std::unordered_set<UInt64> file_indices_to_skip;
|
||||
@ -714,8 +722,13 @@ std::string StorageDistributedDirectoryMonitor::getLoggerName() const
|
||||
void StorageDistributedDirectoryMonitor::updatePath(const std::string & new_path)
|
||||
{
|
||||
std::lock_guard lock{mutex};
|
||||
|
||||
task_handle->deactivate();
|
||||
|
||||
path = new_path;
|
||||
current_batch_file_path = path + "current_batch.txt";
|
||||
|
||||
task_handle->activateAndSchedule();
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -1,10 +1,9 @@
|
||||
#pragma once
|
||||
|
||||
#include <Storages/StorageDistributed.h>
|
||||
#include <Common/ThreadPool.h>
|
||||
#include <Core/BackgroundSchedulePool.h>
|
||||
|
||||
#include <atomic>
|
||||
#include <thread>
|
||||
#include <mutex>
|
||||
#include <condition_variable>
|
||||
#include <IO/ReadBufferFromFile.h>
|
||||
@ -20,7 +19,7 @@ class StorageDistributedDirectoryMonitor
|
||||
{
|
||||
public:
|
||||
StorageDistributedDirectoryMonitor(
|
||||
StorageDistributed & storage_, std::string path_, ConnectionPoolPtr pool_, ActionBlocker & monitor_blocker_);
|
||||
StorageDistributed & storage_, std::string path_, ConnectionPoolPtr pool_, ActionBlocker & monitor_blocker_, BackgroundSchedulePool & bg_pool_);
|
||||
|
||||
~StorageDistributedDirectoryMonitor();
|
||||
|
||||
@ -33,6 +32,9 @@ public:
|
||||
void shutdownAndDropAllData();
|
||||
|
||||
static BlockInputStreamPtr createStreamFromFile(const String & file_name);
|
||||
|
||||
/// For scheduling via DistributedBlockOutputStream
|
||||
bool scheduleAfter(size_t ms);
|
||||
private:
|
||||
void run();
|
||||
bool processFiles();
|
||||
@ -67,7 +69,9 @@ private:
|
||||
std::condition_variable cond;
|
||||
Logger * log;
|
||||
ActionBlocker & monitor_blocker;
|
||||
ThreadFromGlobalPool thread{&StorageDistributedDirectoryMonitor::run, this};
|
||||
|
||||
BackgroundSchedulePool & bg_pool;
|
||||
BackgroundSchedulePoolTaskHolder task_handle;
|
||||
|
||||
/// Read insert query and insert settings for backward compatible.
|
||||
static void readHeader(ReadBuffer & in, Settings & insert_settings, std::string & insert_query, ClientInfo & client_info, Logger * log);
|
||||
|
@ -589,8 +589,8 @@ void DistributedBlockOutputStream::writeToShard(const Block & block, const std::
|
||||
const std::string path(disk + data_path + dir_name + '/');
|
||||
|
||||
/// ensure shard subdirectory creation and notify storage
|
||||
if (Poco::File(path).createDirectory())
|
||||
storage.requireDirectoryMonitor(disk, dir_name);
|
||||
Poco::File(path).createDirectory();
|
||||
auto & directory_monitor = storage.requireDirectoryMonitor(disk, dir_name);
|
||||
|
||||
const auto & file_name = toString(storage.file_names_increment.get()) + ".bin";
|
||||
const auto & block_file_path = path + file_name;
|
||||
@ -632,6 +632,9 @@ void DistributedBlockOutputStream::writeToShard(const Block & block, const std::
|
||||
stream.writePrefix();
|
||||
stream.write(block);
|
||||
stream.writeSuffix();
|
||||
|
||||
auto sleep_ms = context.getSettingsRef().distributed_directory_monitor_sleep_time_ms;
|
||||
directory_monitor.scheduleAfter(sleep_ms.totalMilliseconds());
|
||||
}
|
||||
|
||||
if (link(first_file_tmp_path.data(), block_file_path.data()))
|
||||
|
@ -255,23 +255,23 @@ MergeTreeData::MutableDataPartPtr Fetcher::downloadPart(
|
||||
const ReservationPtr reservation,
|
||||
PooledReadWriteBufferFromHTTP & in)
|
||||
{
|
||||
|
||||
size_t files;
|
||||
readBinary(files, in);
|
||||
|
||||
auto disk = reservation->getDisk();
|
||||
|
||||
static const String TMP_PREFIX = "tmp_fetch_";
|
||||
String tmp_prefix = tmp_prefix_.empty() ? TMP_PREFIX : tmp_prefix_;
|
||||
|
||||
String relative_part_path = String(to_detached ? "detached/" : "") + tmp_prefix + part_name;
|
||||
String absolute_part_path = Poco::Path(data.getFullPathOnDisk(reservation->getDisk()) + relative_part_path + "/").absolute().toString();
|
||||
Poco::File part_file(absolute_part_path);
|
||||
String part_relative_path = String(to_detached ? "detached/" : "") + tmp_prefix + part_name;
|
||||
String part_download_path = data.getRelativeDataPath() + part_relative_path + "/";
|
||||
|
||||
if (part_file.exists())
|
||||
throw Exception("Directory " + absolute_part_path + " already exists.", ErrorCodes::DIRECTORY_ALREADY_EXISTS);
|
||||
if (disk->exists(part_download_path))
|
||||
throw Exception("Directory " + fullPath(disk, part_download_path) + " already exists.", ErrorCodes::DIRECTORY_ALREADY_EXISTS);
|
||||
|
||||
CurrentMetrics::Increment metric_increment{CurrentMetrics::ReplicatedFetch};
|
||||
|
||||
part_file.createDirectory();
|
||||
disk->createDirectories(part_download_path);
|
||||
|
||||
MergeTreeData::DataPart::Checksums checksums;
|
||||
for (size_t i = 0; i < files; ++i)
|
||||
@ -284,21 +284,21 @@ MergeTreeData::MutableDataPartPtr Fetcher::downloadPart(
|
||||
|
||||
/// File must be inside "absolute_part_path" directory.
|
||||
/// Otherwise malicious ClickHouse replica may force us to write to arbitrary path.
|
||||
String absolute_file_path = Poco::Path(absolute_part_path + file_name).absolute().toString();
|
||||
if (!startsWith(absolute_file_path, absolute_part_path))
|
||||
throw Exception("File path (" + absolute_file_path + ") doesn't appear to be inside part path (" + absolute_part_path + ")."
|
||||
String absolute_file_path = Poco::Path(part_download_path + file_name).absolute().toString();
|
||||
if (!startsWith(absolute_file_path, Poco::Path(part_download_path).absolute().toString()))
|
||||
throw Exception("File path (" + absolute_file_path + ") doesn't appear to be inside part path (" + part_download_path + ")."
|
||||
" This may happen if we are trying to download part from malicious replica or logical error.",
|
||||
ErrorCodes::INSECURE_PATH);
|
||||
|
||||
WriteBufferFromFile file_out(absolute_file_path);
|
||||
HashingWriteBuffer hashing_out(file_out);
|
||||
auto file_out = disk->writeFile(part_download_path + file_name);
|
||||
HashingWriteBuffer hashing_out(*file_out);
|
||||
copyData(in, hashing_out, file_size, blocker.getCounter());
|
||||
|
||||
if (blocker.isCancelled())
|
||||
{
|
||||
/// NOTE The is_cancelled flag also makes sense to check every time you read over the network, performing a poll with a not very large timeout.
|
||||
/// And now we check it only between read chunks (in the `copyData` function).
|
||||
part_file.remove(true);
|
||||
disk->removeRecursive(part_download_path);
|
||||
throw Exception("Fetching of part was cancelled", ErrorCodes::ABORTED);
|
||||
}
|
||||
|
||||
@ -306,7 +306,7 @@ MergeTreeData::MutableDataPartPtr Fetcher::downloadPart(
|
||||
readPODBinary(expected_hash, in);
|
||||
|
||||
if (expected_hash != hashing_out.getHash())
|
||||
throw Exception("Checksum mismatch for file " + absolute_part_path + file_name + " transferred from " + replica_path,
|
||||
throw Exception("Checksum mismatch for file " + fullPath(disk, part_download_path + file_name) + " transferred from " + replica_path,
|
||||
ErrorCodes::CHECKSUM_DOESNT_MATCH);
|
||||
|
||||
if (file_name != "checksums.txt" &&
|
||||
@ -316,7 +316,7 @@ MergeTreeData::MutableDataPartPtr Fetcher::downloadPart(
|
||||
|
||||
assertEOF(in);
|
||||
|
||||
MergeTreeData::MutableDataPartPtr new_data_part = data.createPart(part_name, reservation->getDisk(), relative_part_path);
|
||||
MergeTreeData::MutableDataPartPtr new_data_part = data.createPart(part_name, reservation->getDisk(), part_relative_path);
|
||||
new_data_part->is_temp = true;
|
||||
new_data_part->modification_time = time(nullptr);
|
||||
new_data_part->loadColumnsChecksumsIndexes(true, false);
|
||||
|
@ -731,35 +731,43 @@ void IMergeTreeDataPart::remove() const
|
||||
return;
|
||||
}
|
||||
|
||||
try
|
||||
if (checksums.empty())
|
||||
{
|
||||
/// Remove each expected file in directory, then remove directory itself.
|
||||
|
||||
#if !__clang__
|
||||
# pragma GCC diagnostic push
|
||||
# pragma GCC diagnostic ignored "-Wunused-variable"
|
||||
#endif
|
||||
for (const auto & [file, _] : checksums.files)
|
||||
disk->remove(to + "/" + file);
|
||||
#if !__clang__
|
||||
# pragma GCC diagnostic pop
|
||||
#endif
|
||||
|
||||
for (const auto & file : {"checksums.txt", "columns.txt"})
|
||||
disk->remove(to + "/" + file);
|
||||
disk->removeIfExists(to + "/" + DELETE_ON_DESTROY_MARKER_PATH);
|
||||
|
||||
disk->remove(to);
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
/// Recursive directory removal does many excessive "stat" syscalls under the hood.
|
||||
|
||||
LOG_ERROR(storage.log, "Cannot quickly remove directory " << fullPath(disk, to) << " by removing files; fallback to recursive removal. Reason: "
|
||||
<< getCurrentExceptionMessage(false));
|
||||
|
||||
/// If the part is not completely written, we cannot use fast path by listing files.
|
||||
disk->removeRecursive(to + "/");
|
||||
}
|
||||
else
|
||||
{
|
||||
try
|
||||
{
|
||||
/// Remove each expected file in directory, then remove directory itself.
|
||||
|
||||
#if !__clang__
|
||||
# pragma GCC diagnostic push
|
||||
# pragma GCC diagnostic ignored "-Wunused-variable"
|
||||
#endif
|
||||
for (const auto & [file, _] : checksums.files)
|
||||
disk->remove(to + "/" + file);
|
||||
#if !__clang__
|
||||
# pragma GCC diagnostic pop
|
||||
#endif
|
||||
|
||||
for (const auto & file : {"checksums.txt", "columns.txt"})
|
||||
disk->remove(to + "/" + file);
|
||||
disk->removeIfExists(to + "/" + DELETE_ON_DESTROY_MARKER_PATH);
|
||||
|
||||
disk->remove(to);
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
/// Recursive directory removal does many excessive "stat" syscalls under the hood.
|
||||
|
||||
LOG_ERROR(storage.log, "Cannot quickly remove directory " << fullPath(disk, to) << " by removing files; fallback to recursive removal. Reason: "
|
||||
<< getCurrentExceptionMessage(false));
|
||||
|
||||
disk->removeRecursive(to + "/");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
String IMergeTreeDataPart::getRelativePathForDetachedPart(const String & prefix) const
|
||||
|
@ -107,7 +107,7 @@ public:
|
||||
void initSkipIndices();
|
||||
void initPrimaryIndex();
|
||||
|
||||
virtual void finishDataSerialization(IMergeTreeDataPart::Checksums & checksums, bool sync = false) = 0;
|
||||
virtual void finishDataSerialization(IMergeTreeDataPart::Checksums & checksums) = 0;
|
||||
void finishPrimaryIndexSerialization(MergeTreeData::DataPart::Checksums & checksums);
|
||||
void finishSkipIndicesSerialization(MergeTreeData::DataPart::Checksums & checksums);
|
||||
|
||||
|
@ -25,7 +25,7 @@ public:
|
||||
protected:
|
||||
using SerializationState = IDataType::SerializeBinaryBulkStatePtr;
|
||||
|
||||
IDataType::OutputStreamGetter createStreamGetter(const String & name, WrittenOffsetColumns & offset_columns, bool skip_offsets);
|
||||
IDataType::OutputStreamGetter createStreamGetter(const String & name, WrittenOffsetColumns & offset_columns);
|
||||
|
||||
/// Remove all columns marked expired in data_part. Also, clears checksums
|
||||
/// and columns array. Return set of removed files names.
|
||||
|
@ -620,6 +620,8 @@ public:
|
||||
return storage_settings.get();
|
||||
}
|
||||
|
||||
String getRelativeDataPath() const { return relative_data_path; }
|
||||
|
||||
/// Get table path on disk
|
||||
String getFullPathOnDisk(const DiskPtr & disk) const;
|
||||
|
||||
|
@ -876,9 +876,7 @@ MergeTreeData::MutableDataPartPtr MergeTreeDataMergerMutator::mergePartsToTempor
|
||||
MergedColumnOnlyOutputStream column_to(
|
||||
new_data_part,
|
||||
column_gathered_stream.getHeader(),
|
||||
false,
|
||||
compression_codec,
|
||||
false,
|
||||
/// we don't need to recalc indices here
|
||||
/// because all of them were already recalculated and written
|
||||
/// as key part of vertical merge
|
||||
@ -1588,9 +1586,7 @@ void MergeTreeDataMergerMutator::mutateSomePartColumns(
|
||||
MergedColumnOnlyOutputStream out(
|
||||
new_data_part,
|
||||
mutation_header,
|
||||
/* sync = */ false,
|
||||
compression_codec,
|
||||
/* skip_offsets = */ false,
|
||||
std::vector<MergeTreeIndexPtr>(indices_to_recalc.begin(), indices_to_recalc.end()),
|
||||
nullptr,
|
||||
source_part->index_granularity,
|
||||
|
@ -20,7 +20,6 @@ class MergeTreeDataPartCompact : public IMergeTreeDataPart
|
||||
public:
|
||||
static constexpr auto DATA_FILE_NAME = "data";
|
||||
static constexpr auto DATA_FILE_EXTENSION = ".bin";
|
||||
static constexpr auto TEMP_FILE_SUFFIX = "_temp";
|
||||
static constexpr auto DATA_FILE_NAME_WITH_EXTENSION = "data.bin";
|
||||
|
||||
MergeTreeDataPartCompact(
|
||||
|
@ -22,8 +22,6 @@ MergeTreeDataPartWriterCompact::MergeTreeDataPartWriterCompact(
|
||||
{
|
||||
using DataPart = MergeTreeDataPartCompact;
|
||||
String data_file_name = DataPart::DATA_FILE_NAME;
|
||||
if (settings.is_writing_temp_files)
|
||||
data_file_name += DataPart::TEMP_FILE_SUFFIX;
|
||||
|
||||
stream = std::make_unique<Stream>(
|
||||
data_file_name,
|
||||
@ -145,7 +143,7 @@ void MergeTreeDataPartWriterCompact::writeColumnSingleGranule(const ColumnWithTy
|
||||
column.type->serializeBinaryBulkStateSuffix(serialize_settings, state);
|
||||
}
|
||||
|
||||
void MergeTreeDataPartWriterCompact::finishDataSerialization(IMergeTreeDataPart::Checksums & checksums, bool sync)
|
||||
void MergeTreeDataPartWriterCompact::finishDataSerialization(IMergeTreeDataPart::Checksums & checksums)
|
||||
{
|
||||
if (columns_buffer.size() != 0)
|
||||
writeBlock(header.cloneWithColumns(columns_buffer.releaseColumns()));
|
||||
@ -161,8 +159,6 @@ void MergeTreeDataPartWriterCompact::finishDataSerialization(IMergeTreeDataPart:
|
||||
}
|
||||
|
||||
stream->finalize();
|
||||
if (sync)
|
||||
stream->sync();
|
||||
stream->addToChecksums(checksums);
|
||||
stream.reset();
|
||||
}
|
||||
|
@ -21,7 +21,7 @@ public:
|
||||
void write(const Block & block, const IColumn::Permutation * permutation,
|
||||
const Block & primary_key_block, const Block & skip_indexes_block) override;
|
||||
|
||||
void finishDataSerialization(IMergeTreeDataPart::Checksums & checksums, bool sync) override;
|
||||
void finishDataSerialization(IMergeTreeDataPart::Checksums & checksums) override;
|
||||
|
||||
private:
|
||||
/// Write single granule of one column (rows between 2 marks)
|
||||
|
@ -39,9 +39,6 @@ void MergeTreeDataPartWriterWide::addStreams(
|
||||
{
|
||||
IDataType::StreamCallback callback = [&] (const IDataType::SubstreamPath & substream_path)
|
||||
{
|
||||
if (settings.skip_offsets && !substream_path.empty() && substream_path.back().type == IDataType::Substream::ArraySizes)
|
||||
return;
|
||||
|
||||
String stream_name = IDataType::getFileNameForStream(name, substream_path);
|
||||
/// Shared offsets for Nested type.
|
||||
if (column_streams.count(stream_name))
|
||||
@ -69,8 +66,6 @@ IDataType::OutputStreamGetter MergeTreeDataPartWriterWide::createStreamGetter(
|
||||
return [&, this] (const IDataType::SubstreamPath & substream_path) -> WriteBuffer *
|
||||
{
|
||||
bool is_offsets = !substream_path.empty() && substream_path.back().type == IDataType::Substream::ArraySizes;
|
||||
if (is_offsets && settings.skip_offsets)
|
||||
return nullptr;
|
||||
|
||||
String stream_name = IDataType::getFileNameForStream(name, substream_path);
|
||||
|
||||
@ -135,8 +130,6 @@ void MergeTreeDataPartWriterWide::writeSingleMark(
|
||||
type.enumerateStreams([&] (const IDataType::SubstreamPath & substream_path)
|
||||
{
|
||||
bool is_offsets = !substream_path.empty() && substream_path.back().type == IDataType::Substream::ArraySizes;
|
||||
if (is_offsets && settings.skip_offsets)
|
||||
return;
|
||||
|
||||
String stream_name = IDataType::getFileNameForStream(name, substream_path);
|
||||
|
||||
@ -177,8 +170,6 @@ size_t MergeTreeDataPartWriterWide::writeSingleGranule(
|
||||
type.enumerateStreams([&] (const IDataType::SubstreamPath & substream_path)
|
||||
{
|
||||
bool is_offsets = !substream_path.empty() && substream_path.back().type == IDataType::Substream::ArraySizes;
|
||||
if (is_offsets && settings.skip_offsets)
|
||||
return;
|
||||
|
||||
String stream_name = IDataType::getFileNameForStream(name, substream_path);
|
||||
|
||||
@ -270,7 +261,7 @@ void MergeTreeDataPartWriterWide::writeColumn(
|
||||
next_index_offset = current_row - total_rows;
|
||||
}
|
||||
|
||||
void MergeTreeDataPartWriterWide::finishDataSerialization(IMergeTreeDataPart::Checksums & checksums, bool sync)
|
||||
void MergeTreeDataPartWriterWide::finishDataSerialization(IMergeTreeDataPart::Checksums & checksums)
|
||||
{
|
||||
const auto & global_settings = storage.global_context.getSettingsRef();
|
||||
IDataType::SerializeBinaryBulkSettings serialize_settings;
|
||||
@ -300,8 +291,6 @@ void MergeTreeDataPartWriterWide::finishDataSerialization(IMergeTreeDataPart::Ch
|
||||
for (auto & stream : column_streams)
|
||||
{
|
||||
stream.second->finalize();
|
||||
if (sync)
|
||||
stream.second->sync();
|
||||
stream.second->addToChecksums(checksums);
|
||||
}
|
||||
|
||||
|
@ -24,7 +24,7 @@ public:
|
||||
void write(const Block & block, const IColumn::Permutation * permutation,
|
||||
const Block & primary_key_block, const Block & skip_indexes_block) override;
|
||||
|
||||
void finishDataSerialization(IMergeTreeDataPart::Checksums & checksums, bool sync) override;
|
||||
void finishDataSerialization(IMergeTreeDataPart::Checksums & checksums) override;
|
||||
|
||||
IDataType::OutputStreamGetter createStreamGetter(const String & name, WrittenOffsetColumns & offset_columns);
|
||||
|
||||
|
@ -30,10 +30,7 @@ struct MergeTreeWriterSettings
|
||||
size_t aio_threshold;
|
||||
bool can_use_adaptive_granularity;
|
||||
bool blocks_are_granules_size;
|
||||
/// true if we write temporary files during alter.
|
||||
bool is_writing_temp_files = false;
|
||||
|
||||
size_t estimated_size = 0;
|
||||
/// used when ALTERing columns if we know that array offsets are not altered.
|
||||
bool skip_offsets = false;
|
||||
};
|
||||
}
|
||||
|
@ -8,15 +8,14 @@ namespace ErrorCodes
|
||||
}
|
||||
|
||||
MergedColumnOnlyOutputStream::MergedColumnOnlyOutputStream(
|
||||
const MergeTreeDataPartPtr & data_part, const Block & header_, bool sync_,
|
||||
CompressionCodecPtr default_codec, bool skip_offsets_,
|
||||
const MergeTreeDataPartPtr & data_part,
|
||||
const Block & header_,
|
||||
CompressionCodecPtr default_codec,
|
||||
const std::vector<MergeTreeIndexPtr> & indices_to_recalc,
|
||||
WrittenOffsetColumns * offset_columns_,
|
||||
const MergeTreeIndexGranularity & index_granularity,
|
||||
const MergeTreeIndexGranularityInfo * index_granularity_info,
|
||||
bool is_writing_temp_files)
|
||||
: IMergedBlockOutputStream(data_part),
|
||||
header(header_), sync(sync_)
|
||||
const MergeTreeIndexGranularityInfo * index_granularity_info)
|
||||
: IMergedBlockOutputStream(data_part), header(header_)
|
||||
{
|
||||
const auto & global_settings = data_part->storage.global_context.getSettings();
|
||||
MergeTreeWriterSettings writer_settings(
|
||||
@ -24,11 +23,13 @@ MergedColumnOnlyOutputStream::MergedColumnOnlyOutputStream(
|
||||
index_granularity_info ? index_granularity_info->is_adaptive : data_part->storage.canUseAdaptiveGranularity(),
|
||||
global_settings.min_bytes_to_use_direct_io);
|
||||
|
||||
writer_settings.is_writing_temp_files = is_writing_temp_files;
|
||||
writer_settings.skip_offsets = skip_offsets_;
|
||||
writer = data_part->getWriter(
|
||||
header.getNamesAndTypesList(),
|
||||
indices_to_recalc,
|
||||
default_codec,
|
||||
std::move(writer_settings),
|
||||
index_granularity);
|
||||
|
||||
writer = data_part->getWriter(header.getNamesAndTypesList(), indices_to_recalc,
|
||||
default_codec,std::move(writer_settings), index_granularity);
|
||||
writer->setWrittenOffsetColumns(offset_columns_);
|
||||
writer->initSkipIndices();
|
||||
}
|
||||
@ -62,7 +63,7 @@ MergedColumnOnlyOutputStream::writeSuffixAndGetChecksums(MergeTreeData::MutableD
|
||||
{
|
||||
/// Finish columns serialization.
|
||||
MergeTreeData::DataPart::Checksums checksums;
|
||||
writer->finishDataSerialization(checksums, sync);
|
||||
writer->finishDataSerialization(checksums);
|
||||
writer->finishSkipIndicesSerialization(checksums);
|
||||
|
||||
auto columns = new_part->getColumns();
|
||||
|
@ -11,17 +11,16 @@ class MergeTreeDataPartWriterWide;
|
||||
class MergedColumnOnlyOutputStream final : public IMergedBlockOutputStream
|
||||
{
|
||||
public:
|
||||
/// skip_offsets: used when ALTERing columns if we know that array offsets are not altered.
|
||||
/// Pass empty 'already_written_offset_columns' first time then and pass the same object to subsequent instances of MergedColumnOnlyOutputStream
|
||||
/// if you want to serialize elements of Nested data structure in different instances of MergedColumnOnlyOutputStream.
|
||||
MergedColumnOnlyOutputStream(
|
||||
const MergeTreeDataPartPtr & data_part, const Block & header_, bool sync_,
|
||||
CompressionCodecPtr default_codec_, bool skip_offsets_,
|
||||
const MergeTreeDataPartPtr & data_part,
|
||||
const Block & header_,
|
||||
CompressionCodecPtr default_codec_,
|
||||
const std::vector<MergeTreeIndexPtr> & indices_to_recalc_,
|
||||
WrittenOffsetColumns * offset_columns_ = nullptr,
|
||||
const MergeTreeIndexGranularity & index_granularity = {},
|
||||
const MergeTreeIndexGranularityInfo * index_granularity_info_ = nullptr,
|
||||
bool is_writing_temp_files = false);
|
||||
const MergeTreeIndexGranularityInfo * index_granularity_info_ = nullptr);
|
||||
|
||||
Block getHeader() const override { return header; }
|
||||
void write(const Block & block) override;
|
||||
@ -31,7 +30,6 @@ public:
|
||||
|
||||
private:
|
||||
Block header;
|
||||
bool sync;
|
||||
};
|
||||
|
||||
|
||||
|
@ -13,7 +13,6 @@
|
||||
#include <Parsers/ASTIdentifier.h>
|
||||
#include <Parsers/ASTLiteral.h>
|
||||
#include <Parsers/ASTExpressionList.h>
|
||||
#include <Common/setThreadName.h>
|
||||
#include <Common/CurrentMetrics.h>
|
||||
#include <Common/MemoryTracker.h>
|
||||
#include <Common/FieldVisitors.h>
|
||||
@ -76,6 +75,7 @@ StorageBuffer::StorageBuffer(
|
||||
, destination_id(destination_id_)
|
||||
, allow_materialized(allow_materialized_)
|
||||
, log(&Logger::get("StorageBuffer (" + table_id_.getFullTableName() + ")"))
|
||||
, bg_pool(global_context.getBufferFlushSchedulePool())
|
||||
{
|
||||
setColumns(columns_);
|
||||
setConstraints(constraints_);
|
||||
@ -83,12 +83,7 @@ StorageBuffer::StorageBuffer(
|
||||
|
||||
StorageBuffer::~StorageBuffer()
|
||||
{
|
||||
// Should not happen if shutdown was called
|
||||
if (flush_thread.joinable())
|
||||
{
|
||||
shutdown_event.set();
|
||||
flush_thread.join();
|
||||
}
|
||||
flush_handle->deactivate();
|
||||
}
|
||||
|
||||
|
||||
@ -397,6 +392,9 @@ public:
|
||||
least_busy_lock = std::unique_lock(least_busy_buffer->mutex);
|
||||
}
|
||||
insertIntoBuffer(block, *least_busy_buffer);
|
||||
least_busy_lock.unlock();
|
||||
|
||||
storage.reschedule();
|
||||
}
|
||||
private:
|
||||
StorageBuffer & storage;
|
||||
@ -458,16 +456,15 @@ void StorageBuffer::startup()
|
||||
<< " Set appropriate system_profile to fix this.");
|
||||
}
|
||||
|
||||
flush_thread = ThreadFromGlobalPool(&StorageBuffer::flushThread, this);
|
||||
|
||||
flush_handle = bg_pool.createTask(log->name() + "/Bg", [this]{ flushBack(); });
|
||||
flush_handle->activateAndSchedule();
|
||||
}
|
||||
|
||||
|
||||
void StorageBuffer::shutdown()
|
||||
{
|
||||
shutdown_event.set();
|
||||
|
||||
if (flush_thread.joinable())
|
||||
flush_thread.join();
|
||||
flush_handle->deactivate();
|
||||
|
||||
try
|
||||
{
|
||||
@ -595,7 +592,7 @@ void StorageBuffer::flushBuffer(Buffer & buffer, bool check_thresholds, bool loc
|
||||
|
||||
ProfileEvents::increment(ProfileEvents::StorageBufferFlush);
|
||||
|
||||
LOG_TRACE(log, "Flushing buffer with " << rows << " rows, " << bytes << " bytes, age " << time_passed << " seconds.");
|
||||
LOG_TRACE(log, "Flushing buffer with " << rows << " rows, " << bytes << " bytes, age " << time_passed << " seconds " << (check_thresholds ? "(bg)" : "(direct)") << ".");
|
||||
|
||||
if (!destination_id)
|
||||
return;
|
||||
@ -697,21 +694,42 @@ void StorageBuffer::writeBlockToDestination(const Block & block, StoragePtr tabl
|
||||
}
|
||||
|
||||
|
||||
void StorageBuffer::flushThread()
|
||||
void StorageBuffer::flushBack()
|
||||
{
|
||||
setThreadName("BufferFlush");
|
||||
|
||||
do
|
||||
try
|
||||
{
|
||||
try
|
||||
{
|
||||
flushAllBuffers(true);
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
tryLogCurrentException(__PRETTY_FUNCTION__);
|
||||
}
|
||||
} while (!shutdown_event.tryWait(1000));
|
||||
flushAllBuffers(true);
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
tryLogCurrentException(__PRETTY_FUNCTION__);
|
||||
}
|
||||
|
||||
reschedule();
|
||||
}
|
||||
|
||||
void StorageBuffer::reschedule()
|
||||
{
|
||||
time_t min_first_write_time = std::numeric_limits<time_t>::max();
|
||||
time_t rows = 0;
|
||||
|
||||
for (auto & buffer : buffers)
|
||||
{
|
||||
std::lock_guard lock(buffer.mutex);
|
||||
min_first_write_time = buffer.first_write_time;
|
||||
rows += buffer.data.rows();
|
||||
}
|
||||
|
||||
/// will be rescheduled via INSERT
|
||||
if (!rows)
|
||||
return;
|
||||
|
||||
time_t current_time = time(nullptr);
|
||||
time_t time_passed = current_time - min_first_write_time;
|
||||
|
||||
size_t min = std::max<ssize_t>(min_thresholds.time - time_passed, 1);
|
||||
size_t max = std::max<ssize_t>(max_thresholds.time - time_passed, 1);
|
||||
flush_handle->scheduleAfter(std::min(min, max) * 1000);
|
||||
}
|
||||
|
||||
void StorageBuffer::checkAlterIsPossible(const AlterCommands & commands, const Settings & /* settings */)
|
||||
|
@ -4,7 +4,7 @@
|
||||
#include <thread>
|
||||
#include <ext/shared_ptr_helper.h>
|
||||
#include <Core/NamesAndTypes.h>
|
||||
#include <Common/ThreadPool.h>
|
||||
#include <Core/BackgroundSchedulePool.h>
|
||||
#include <Storages/IStorage.h>
|
||||
#include <DataStreams/IBlockOutputStream.h>
|
||||
#include <Poco/Event.h>
|
||||
@ -118,10 +118,6 @@ private:
|
||||
|
||||
Poco::Logger * log;
|
||||
|
||||
Poco::Event shutdown_event;
|
||||
/// Resets data by timeout.
|
||||
ThreadFromGlobalPool flush_thread;
|
||||
|
||||
void flushAllBuffers(bool check_thresholds = true);
|
||||
/// Reset the buffer. If check_thresholds is set - resets only if thresholds are exceeded.
|
||||
void flushBuffer(Buffer & buffer, bool check_thresholds, bool locked = false);
|
||||
@ -131,7 +127,11 @@ private:
|
||||
/// `table` argument is passed, as it is sometimes evaluated beforehand. It must match the `destination`.
|
||||
void writeBlockToDestination(const Block & block, StoragePtr table);
|
||||
|
||||
void flushThread();
|
||||
void flushBack();
|
||||
void reschedule();
|
||||
|
||||
BackgroundSchedulePool & bg_pool;
|
||||
BackgroundSchedulePoolTaskHolder flush_handle;
|
||||
|
||||
protected:
|
||||
/** num_shards - the level of internal parallelism (the number of independent buffers)
|
||||
|
@ -1,73 +1,48 @@
|
||||
#include <sstream>
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
#include <DataTypes/DataTypeDate.h>
|
||||
#include <Dictionaries/IDictionarySource.h>
|
||||
#include <Dictionaries/DictionaryStructure.h>
|
||||
#include <Storages/StorageDictionary.h>
|
||||
#include <Storages/StorageFactory.h>
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
#include <Dictionaries/DictionaryStructure.h>
|
||||
#include <Interpreters/Context.h>
|
||||
#include <Interpreters/evaluateConstantExpression.h>
|
||||
#include <Interpreters/ExternalDictionariesLoader.h>
|
||||
#include <Parsers/ASTLiteral.h>
|
||||
#include <common/logger_useful.h>
|
||||
#include <Common/typeid_cast.h>
|
||||
#include <Common/quoteString.h>
|
||||
#include <Processors/Sources/SourceFromInputStream.h>
|
||||
#include <Processors/Pipe.h>
|
||||
#include <sstream>
|
||||
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
|
||||
extern const int THERE_IS_NO_COLUMN;
|
||||
extern const int UNKNOWN_TABLE;
|
||||
extern const int CANNOT_DETACH_DICTIONARY_AS_TABLE;
|
||||
}
|
||||
|
||||
|
||||
StorageDictionary::StorageDictionary(
|
||||
const StorageID & table_id_,
|
||||
const ColumnsDescription & columns_,
|
||||
const Context & context,
|
||||
bool attach,
|
||||
const String & dictionary_name_)
|
||||
: IStorage(table_id_)
|
||||
, dictionary_name(dictionary_name_)
|
||||
, logger(&Poco::Logger::get("StorageDictionary"))
|
||||
namespace
|
||||
{
|
||||
setColumns(columns_);
|
||||
|
||||
if (!attach)
|
||||
void checkNamesAndTypesCompatibleWithDictionary(const String & dictionary_name, const ColumnsDescription & columns, const DictionaryStructure & dictionary_structure)
|
||||
{
|
||||
const auto & dictionary = context.getExternalDictionariesLoader().getDictionary(dictionary_name);
|
||||
const DictionaryStructure & dictionary_structure = dictionary->getStructure();
|
||||
checkNamesAndTypesCompatibleWithDictionary(dictionary_structure);
|
||||
auto dictionary_names_and_types = StorageDictionary::getNamesAndTypes(dictionary_structure);
|
||||
std::set<NameAndTypePair> names_and_types_set(dictionary_names_and_types.begin(), dictionary_names_and_types.end());
|
||||
|
||||
for (const auto & column : columns.getOrdinary())
|
||||
{
|
||||
if (names_and_types_set.find(column) == names_and_types_set.end())
|
||||
{
|
||||
std::string message = "Not found column ";
|
||||
message += column.name + " " + column.type->getName();
|
||||
message += " in dictionary " + backQuote(dictionary_name) + ". ";
|
||||
message += "There are only columns ";
|
||||
message += StorageDictionary::generateNamesAndTypesDescription(dictionary_names_and_types);
|
||||
throw Exception(message, ErrorCodes::THERE_IS_NO_COLUMN);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void StorageDictionary::checkTableCanBeDropped() const
|
||||
{
|
||||
throw Exception("Cannot detach dictionary " + backQuoteIfNeed(dictionary_name) + " as table, use DETACH DICTIONARY query.", ErrorCodes::UNKNOWN_TABLE);
|
||||
}
|
||||
|
||||
Pipes StorageDictionary::read(
|
||||
const Names & column_names,
|
||||
const SelectQueryInfo & /*query_info*/,
|
||||
const Context & context,
|
||||
QueryProcessingStage::Enum /*processed_stage*/,
|
||||
const size_t max_block_size,
|
||||
const unsigned /*threads*/)
|
||||
{
|
||||
auto dictionary = context.getExternalDictionariesLoader().getDictionary(dictionary_name);
|
||||
auto stream = dictionary->getBlockInputStream(column_names, max_block_size);
|
||||
auto source = std::make_shared<SourceFromInputStream>(stream);
|
||||
/// TODO: update dictionary interface for processors.
|
||||
Pipes pipes;
|
||||
pipes.emplace_back(std::move(source));
|
||||
return pipes;
|
||||
}
|
||||
|
||||
NamesAndTypesList StorageDictionary::getNamesAndTypes(const DictionaryStructure & dictionary_structure)
|
||||
{
|
||||
@ -103,25 +78,55 @@ NamesAndTypesList StorageDictionary::getNamesAndTypes(const DictionaryStructure
|
||||
return dictionary_names_and_types;
|
||||
}
|
||||
|
||||
void StorageDictionary::checkNamesAndTypesCompatibleWithDictionary(const DictionaryStructure & dictionary_structure) const
|
||||
{
|
||||
auto dictionary_names_and_types = getNamesAndTypes(dictionary_structure);
|
||||
std::set<NameAndTypePair> names_and_types_set(dictionary_names_and_types.begin(), dictionary_names_and_types.end());
|
||||
|
||||
for (const auto & column : getColumns().getOrdinary())
|
||||
String StorageDictionary::generateNamesAndTypesDescription(const NamesAndTypesList & list)
|
||||
{
|
||||
std::stringstream ss;
|
||||
bool first = true;
|
||||
for (const auto & name_and_type : list)
|
||||
{
|
||||
if (names_and_types_set.find(column) == names_and_types_set.end())
|
||||
{
|
||||
std::string message = "Not found column ";
|
||||
message += column.name + " " + column.type->getName();
|
||||
message += " in dictionary " + dictionary_name + ". ";
|
||||
message += "There are only columns ";
|
||||
message += generateNamesAndTypesDescription(dictionary_names_and_types.begin(), dictionary_names_and_types.end());
|
||||
throw Exception(message, ErrorCodes::THERE_IS_NO_COLUMN);
|
||||
}
|
||||
if (!std::exchange(first, false))
|
||||
ss << ", ";
|
||||
ss << name_and_type.name << ' ' << name_and_type.type->getName();
|
||||
}
|
||||
return ss.str();
|
||||
}
|
||||
|
||||
|
||||
StorageDictionary::StorageDictionary(
|
||||
const StorageID & table_id_,
|
||||
const String & dictionary_name_,
|
||||
const DictionaryStructure & dictionary_structure_)
|
||||
: IStorage(table_id_)
|
||||
, dictionary_name(dictionary_name_)
|
||||
{
|
||||
setColumns(ColumnsDescription{getNamesAndTypes(dictionary_structure_)});
|
||||
}
|
||||
|
||||
|
||||
void StorageDictionary::checkTableCanBeDropped() const
|
||||
{
|
||||
throw Exception("Cannot detach dictionary " + backQuote(dictionary_name) + " as table, use DETACH DICTIONARY query.", ErrorCodes::CANNOT_DETACH_DICTIONARY_AS_TABLE);
|
||||
}
|
||||
|
||||
Pipes StorageDictionary::read(
|
||||
const Names & column_names,
|
||||
const SelectQueryInfo & /*query_info*/,
|
||||
const Context & context,
|
||||
QueryProcessingStage::Enum /*processed_stage*/,
|
||||
const size_t max_block_size,
|
||||
const unsigned /*threads*/)
|
||||
{
|
||||
auto dictionary = context.getExternalDictionariesLoader().getDictionary(dictionary_name);
|
||||
auto stream = dictionary->getBlockInputStream(column_names, max_block_size);
|
||||
auto source = std::make_shared<SourceFromInputStream>(stream);
|
||||
/// TODO: update dictionary interface for processors.
|
||||
Pipes pipes;
|
||||
pipes.emplace_back(std::move(source));
|
||||
return pipes;
|
||||
}
|
||||
|
||||
|
||||
void registerStorageDictionary(StorageFactory & factory)
|
||||
{
|
||||
factory.registerStorage("Dictionary", [](const StorageFactory::Arguments & args)
|
||||
@ -133,8 +138,11 @@ void registerStorageDictionary(StorageFactory & factory)
|
||||
args.engine_args[0] = evaluateConstantExpressionOrIdentifierAsLiteral(args.engine_args[0], args.local_context);
|
||||
String dictionary_name = args.engine_args[0]->as<ASTLiteral &>().value.safeGet<String>();
|
||||
|
||||
return StorageDictionary::create(
|
||||
args.table_id, args.columns, args.context, args.attach, dictionary_name);
|
||||
const auto & dictionary = args.context.getExternalDictionariesLoader().getDictionary(dictionary_name);
|
||||
const DictionaryStructure & dictionary_structure = dictionary->getStructure();
|
||||
checkNamesAndTypesCompatibleWithDictionary(dictionary_name, args.columns, dictionary_structure);
|
||||
|
||||
return StorageDictionary::create(args.table_id, dictionary_name, dictionary_structure);
|
||||
});
|
||||
}
|
||||
|
||||
|
@ -1,23 +1,12 @@
|
||||
#pragma once
|
||||
|
||||
#include <Storages/IStorage.h>
|
||||
#include <Core/Defines.h>
|
||||
#include <Common/MultiVersion.h>
|
||||
#include <ext/shared_ptr_helper.h>
|
||||
#include <IO/WriteBufferFromString.h>
|
||||
#include <IO/Operators.h>
|
||||
|
||||
|
||||
namespace Poco
|
||||
{
|
||||
class Logger;
|
||||
}
|
||||
|
||||
namespace DB
|
||||
{
|
||||
struct DictionaryStructure;
|
||||
struct IDictionaryBase;
|
||||
class ExternalDictionaries;
|
||||
|
||||
class StorageDictionary final : public ext::shared_ptr_helper<StorageDictionary>, public IStorage
|
||||
{
|
||||
@ -35,42 +24,18 @@ public:
|
||||
unsigned threads) override;
|
||||
|
||||
static NamesAndTypesList getNamesAndTypes(const DictionaryStructure & dictionary_structure);
|
||||
static String generateNamesAndTypesDescription(const NamesAndTypesList & list);
|
||||
|
||||
template <typename ForwardIterator>
|
||||
static std::string generateNamesAndTypesDescription(ForwardIterator begin, ForwardIterator end)
|
||||
{
|
||||
std::string description;
|
||||
{
|
||||
WriteBufferFromString buffer(description);
|
||||
bool first = true;
|
||||
for (; begin != end; ++begin)
|
||||
{
|
||||
if (!first)
|
||||
buffer << ", ";
|
||||
first = false;
|
||||
|
||||
buffer << begin->name << ' ' << begin->type->getName();
|
||||
}
|
||||
}
|
||||
|
||||
return description;
|
||||
}
|
||||
const String & dictionaryName() const { return dictionary_name; }
|
||||
|
||||
private:
|
||||
using Ptr = MultiVersion<IDictionaryBase>::Version;
|
||||
|
||||
String dictionary_name;
|
||||
Poco::Logger * logger;
|
||||
|
||||
void checkNamesAndTypesCompatibleWithDictionary(const DictionaryStructure & dictionary_structure) const;
|
||||
|
||||
protected:
|
||||
StorageDictionary(
|
||||
const StorageID & table_id_,
|
||||
const ColumnsDescription & columns_,
|
||||
const Context & context,
|
||||
bool attach,
|
||||
const String & dictionary_name_);
|
||||
const String & dictionary_name_,
|
||||
const DictionaryStructure & dictionary_structure);
|
||||
};
|
||||
|
||||
}
|
||||
|
@ -577,15 +577,20 @@ void StorageDistributed::createDirectoryMonitors(const std::string & disk)
|
||||
}
|
||||
|
||||
|
||||
void StorageDistributed::requireDirectoryMonitor(const std::string & disk, const std::string & name)
|
||||
StorageDistributedDirectoryMonitor& StorageDistributed::requireDirectoryMonitor(const std::string & disk, const std::string & name)
|
||||
{
|
||||
const std::string path(disk + relative_data_path + name);
|
||||
const std::string key(disk + name);
|
||||
|
||||
std::lock_guard lock(cluster_nodes_mutex);
|
||||
auto & node_data = cluster_nodes_data[key];
|
||||
node_data.conneciton_pool = StorageDistributedDirectoryMonitor::createPool(name, *this);
|
||||
node_data.directory_monitor = std::make_unique<StorageDistributedDirectoryMonitor>(*this, path, node_data.conneciton_pool, monitors_blocker);
|
||||
if (!node_data.directory_monitor)
|
||||
{
|
||||
node_data.conneciton_pool = StorageDistributedDirectoryMonitor::createPool(name, *this);
|
||||
node_data.directory_monitor = std::make_unique<StorageDistributedDirectoryMonitor>(
|
||||
*this, path, node_data.conneciton_pool, monitors_blocker, global_context.getDistributedSchedulePool());
|
||||
}
|
||||
return *node_data.directory_monitor;
|
||||
}
|
||||
|
||||
size_t StorageDistributed::getShardCount() const
|
||||
|
@ -109,7 +109,7 @@ public:
|
||||
/// create directory monitors for each existing subdirectory
|
||||
void createDirectoryMonitors(const std::string & disk);
|
||||
/// ensure directory monitor thread and connectoin pool creation by disk and subdirectory name
|
||||
void requireDirectoryMonitor(const std::string & disk, const std::string & name);
|
||||
StorageDistributedDirectoryMonitor & requireDirectoryMonitor(const std::string & disk, const std::string & name);
|
||||
|
||||
void flushClusterNodesAllData();
|
||||
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user