Merge branch 'master' of https://github.com/ClickHouse/ClickHouse into polygon-dict-basic-interface-improvements

This commit is contained in:
Andrei Chulkov 2020-01-28 15:06:36 +03:00
commit d2e4f4e778
582 changed files with 12961 additions and 3646 deletions

View File

@ -7,16 +7,20 @@ Changelog category (leave one):
- Performance Improvement
- Backward Incompatible Change
- Build/Testing/Packaging Improvement
- Documentation
- Documentation (changelog entry is not required)
- Other
- Non-significant (changelog entry is not needed)
- Non-significant (changelog entry is not required)
Changelog entry (up to few sentences, required except for Non-significant/Documentation categories):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
...
Detailed description (optional):
Detailed description / Documentation draft:
...
By adding documentation, you'll allow users to try your new feature immediately, not when someone else will have time to document it later. Documentation is necessary for all features that affect user experience in any way. You can add brief documentation draft above, or add documentation right into your patch as Markdown files in [docs](https://github.com/ClickHouse/ClickHouse/tree/master/docs) folder.
If you are doing this for the first time, it's recommended to read the lightweight [Contributing to ClickHouse Documentation](https://github.com/ClickHouse/ClickHouse/tree/master/docs/README.md) guide first.

7
.gitmodules vendored
View File

@ -134,6 +134,13 @@
[submodule "contrib/libc-headers"]
path = contrib/libc-headers
url = https://github.com/ClickHouse-Extras/libc-headers.git
[submodule "contrib/replxx"]
path = contrib/replxx
url = https://github.com/AmokHuginnsson/replxx.git
[submodule "contrib/ryu"]
path = contrib/ryu
url = https://github.com/ClickHouse-Extras/ryu.git
[submodule "contrib/avro"]
path = contrib/avro
url = https://github.com/ClickHouse-Extras/avro.git
ignore = untracked

45
AUTHORS
View File

@ -1,43 +1,2 @@
The following authors have created the source code of "ClickHouse"
published and distributed by YANDEX LLC as the owner:
Alexander Makarov <asealback@yandex-team.ru>
Alexander Prudaev <aprudaev@yandex-team.ru>
Alexey Arno <af-arno@yandex-team.ru>
Alexey Milovidov <milovidov@yandex-team.ru>
Alexey Tronov <vkusny@yandex-team.ru>
Alexey Vasiliev <loudhorr@yandex-team.ru>
Alexey Zatelepin <ztlpn@yandex-team.ru>
Amy Krishnevsky <krishnevsky@yandex-team.ru>
Andrey M <hertz@yandex-team.ru>
Andrey Mironov <hertz@yandex-team.ru>
Andrey Urusov <drobus@yandex-team.ru>
Anton Tikhonov <rokerjoker@yandex-team.ru>
Dmitry Bilunov <kmeaw@yandex-team.ru>
Dmitry Galuza <galuza@yandex-team.ru>
Eugene Konkov <konkov@yandex-team.ru>
Evgeniy Gatov <egatov@yandex-team.ru>
Ilya Khomutov <robert@yandex-team.ru>
Ilya Korolev <breeze@yandex-team.ru>
Ivan Blinkov <blinkov@yandex-team.ru>
Maxim Nikulin <mnikulin@yandex-team.ru>
Michael Kolupaev <mkolupaev@yandex-team.ru>
Michael Razuvaev <razuvaev@yandex-team.ru>
Nikolai Kochetov <nik-kochetov@yandex-team.ru>
Nikolay Vasiliev <lonlylocly@yandex-team.ru>
Nikolay Volosatov <bamx23@yandex-team.ru>
Pavel Artemkin <stanly@yandex-team.ru>
Pavel Kartaviy <kartavyy@yandex-team.ru>
Roman Nozdrin <drrtuy@yandex-team.ru>
Roman Peshkurov <peshkurov@yandex-team.ru>
Sergey Fedorov <fets@yandex-team.ru>
Sergey Lazarev <hamilkar@yandex-team.ru>
Sergey Magidovich <mgsergio@yandex-team.ru>
Sergey Serebryanik <serebrserg@yandex-team.ru>
Sergey Veletskiy <velom@yandex-team.ru>
Vasily Okunev <okunev@yandex-team.ru>
Vitaliy Lyudvichenko <vludv@yandex-team.ru>
Vladimir Chebotarev <chebotarev@yandex-team.ru>
Vsevolod Orlov <vorloff@yandex-team.ru>
Vyacheslav Alipov <alipov@yandex-team.ru>
Yuriy Galitskiy <orantius@yandex-team.ru>
To see the list of authors who created the source code of ClickHouse, published and distributed by YANDEX LLC as the owner,
run "SELECT * FROM system.contributors;" query on any ClickHouse server.

View File

@ -1,3 +1,300 @@
## ClickHouse release v20.1
### Backward Incompatible Change
* Make the setting `merge_tree_uniform_read_distribution` obsolete. The server still recognizes this setting but it has no effect. [#8308](https://github.com/ClickHouse/ClickHouse/pull/8308) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Changed return type of the function `greatCircleDistance` to `Float32` because now the result of calculation is `Float32`. [#7993](https://github.com/ClickHouse/ClickHouse/pull/7993) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Now it's expected that query parameters are represented in "escaped" format. For example, to pass string `a<tab>b` you have to write `a\tb` or `a\<tab>b` and respectively, `a%5Ctb` or `a%5C%09b` in URL. This is needed to add the possibility to pass NULL as `\N`. This fixes [#7488](https://github.com/ClickHouse/ClickHouse/issues/7488). [#8517](https://github.com/ClickHouse/ClickHouse/pull/8517) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Enable `use_minimalistic_part_header_in_zookeeper` setting for `ReplicatedMergeTree` by default. This will significantly reduce amount of data stored in ZooKeeper. This setting is supported since version 19.1 and we already use it in production in multiple services without any issues for more than half a year. Disable this setting if you have a chance to downgrade to versions older than 19.1. [#6850](https://github.com/ClickHouse/ClickHouse/pull/6850) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Data skipping indices are production ready and enabled by default. The settings `allow_experimental_data_skipping_indices`, `allow_experimental_cross_to_join_conversion` and `allow_experimental_multiple_joins_emulation` are now obsolete and do nothing. [#7974](https://github.com/ClickHouse/ClickHouse/pull/7974) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Add new `ANY JOIN` logic for `StorageJoin` consistent with `JOIN` operation. To upgrade without changes in behaviour you need add `SETTINGS any_join_distinct_right_table_keys = 1` to Engine Join tables metadata or recreate these tables after upgrade. [#8400](https://github.com/ClickHouse/ClickHouse/pull/8400) ([Artem Zuikov](https://github.com/4ertus2))
### New Feature
* Added information about part paths to `system.merges`. [#8043](https://github.com/ClickHouse/ClickHouse/pull/8043) ([Vladimir Chebotarev](https://github.com/excitoon))
* Add ability to execute `SYSTEM RELOAD DICTIONARY` query in `ON CLUSTER` mode. [#8288](https://github.com/ClickHouse/ClickHouse/pull/8288) ([Guillaume Tassery](https://github.com/YiuRULE))
* Add ability to execute `CREATE DICTIONARY` queries in `ON CLUSTER` mode. [#8163](https://github.com/ClickHouse/ClickHouse/pull/8163) ([alesapin](https://github.com/alesapin))
* Now user's profile in `users.xml` can inherit multiple profiles. [#8343](https://github.com/ClickHouse/ClickHouse/pull/8343) ([Mikhail f. Shiryaev](https://github.com/Felixoid))
* Added `system.stack_trace` table that allows to look at stack traces of all server threads. This is useful for developers to introspect server state. This fixes [#7576](https://github.com/ClickHouse/ClickHouse/issues/7576). [#8344](https://github.com/ClickHouse/ClickHouse/pull/8344) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Add `DateTime64` datatype with configurable sub-second precision. [#7170](https://github.com/ClickHouse/ClickHouse/pull/7170) ([Vasily Nemkov](https://github.com/Enmk))
* Add table function `clusterAllReplicas` which allows to query all the nodes in the cluster. [#8493](https://github.com/ClickHouse/ClickHouse/pull/8493) ([kiran sunkari](https://github.com/kiransunkari))
* Add aggregate function `categoricalInformationValue` which calculates the information value of a discrete feature. [#8117](https://github.com/ClickHouse/ClickHouse/pull/8117) ([hcz](https://github.com/hczhcz))
* Speed up parsing of data files in `CSV`, `TSV` and `JSONEachRow` format by doing it in parallel. [#7780](https://github.com/ClickHouse/ClickHouse/pull/7780) ([Alexander Kuzmenkov](https://github.com/akuzm))
* Add function `bankerRound` which performs banker's rounding. [#8112](https://github.com/ClickHouse/ClickHouse/pull/8112) ([hcz](https://github.com/hczhcz))
* Support more languages in embedded dictionary for region names: 'ru', 'en', 'ua', 'uk', 'by', 'kz', 'tr', 'de', 'uz', 'lv', 'lt', 'et', 'pt', 'he', 'vi'. [#8189](https://github.com/ClickHouse/ClickHouse/pull/8189) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Improvements in consistency of `ANY JOIN` logic. Now `t1 ANY LEFT JOIN t2` equals `t2 ANY RIGHT JOIN t1`. [#7665](https://github.com/ClickHouse/ClickHouse/pull/7665) ([Artem Zuikov](https://github.com/4ertus2))
* Add setting `any_join_distinct_right_table_keys` which enables old behaviour for `ANY INNER JOIN`. [#7665](https://github.com/ClickHouse/ClickHouse/pull/7665) ([Artem Zuikov](https://github.com/4ertus2))
* Add new `SEMI` and `ANTI JOIN`. Old `ANY INNER JOIN` behaviour now available as `SEMI LEFT JOIN`. [#7665](https://github.com/ClickHouse/ClickHouse/pull/7665) ([Artem Zuikov](https://github.com/4ertus2))
* Added `Distributed` format for `File` engine and `file` table function which allows to read from `.bin` files generated by asynchronous inserts into `Distributed` table. [#8535](https://github.com/ClickHouse/ClickHouse/pull/8535) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Add optional reset column argument for `runningAccumulate` which allows to reset aggregation results for each new key value. [#8326](https://github.com/ClickHouse/ClickHouse/pull/8326) ([Sergey Kononenko](https://github.com/kononencheg))
* Add ability to alter materialized views with `ALTER <materialized view name> MODIFY QUERY <select_query>`. [#7533](https://github.com/ClickHouse/ClickHouse/pull/7533) ([nvartolomei](https://github.com/nvartolomei))
* Add ability to use ClickHouse as Prometheus endpoint. [#7900](https://github.com/ClickHouse/ClickHouse/pull/7900) ([vdimir](https://github.com/Vdimir))
* Add section `<remote_url_allow_hosts>` in `config.xml` which restricts allowed hosts for remote table engines and table functions `URL`, `S3`, `HDFS`. [#7154](https://github.com/ClickHouse/ClickHouse/pull/7154) ([Mikhail Korotov](https://github.com/millb))
* Added function `greatCircleAngle` which calculates the distance on a sphere in degrees. [#8105](https://github.com/ClickHouse/ClickHouse/pull/8105) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Changed Earth radius to be consistent with H3 library. [#8105](https://github.com/ClickHouse/ClickHouse/pull/8105) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Added `JSONCompactEachRow` and `JSONCompactEachRowWithNamesAndTypes` formats for input and output. [#7841](https://github.com/ClickHouse/ClickHouse/pull/7841) ([Mikhail Korotov](https://github.com/millb))
* Added feature for file-related table engines and table functions (`File`, `S3`, `URL`, `HDFS`) which allows to read and write `gzip` files based on additional engine parameter or file extension. [#7840](https://github.com/ClickHouse/ClickHouse/pull/7840) ([Andrey Bodrov](https://github.com/apbodrov))
* Added the `randomASCII(length)` function, generating a string with a random set of [ASCII](https://en.wikipedia.org/wiki/ASCII#Printable_characters) printable characters. [#8401](https://github.com/ClickHouse/ClickHouse/pull/8401) ([BayoNet](https://github.com/BayoNet))
* Added function `JSONExtractArrayRaw` which returns an array on unparsed json array elements from `JSON` string. [#8081](https://github.com/ClickHouse/ClickHouse/pull/8081) ([Oleg Matrokhin](https://github.com/errx))
* Add `arrayZip` function which allows to combine multiple arrays of equal lengths into one array of tuples. [#8149](https://github.com/ClickHouse/ClickHouse/pull/8149) ([Winter Zhang](https://github.com/zhang2014))
* Add ability to move data between disks according to configured `TTL`-expressions for `*MergeTree` table engines family. [#8140](https://github.com/ClickHouse/ClickHouse/pull/8140) ([Vladimir Chebotarev](https://github.com/excitoon))
* Added new aggregate function `avgWeighted` which allows to calculate weighted average. [#7898](https://github.com/ClickHouse/ClickHouse/pull/7898) ([Andrey Bodrov](https://github.com/apbodrov))
* Now parallel parsing is enabled by default for `TSV`, `TSKV`, `CSV` and `JSONEachRow` formats. [#7894](https://github.com/ClickHouse/ClickHouse/pull/7894) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov))
* Add several geo functions from `H3` library: `h3GetResolution`, `h3EdgeAngle`, `h3EdgeLength`, `h3IsValid` and `h3kRing`. [#8034](https://github.com/ClickHouse/ClickHouse/pull/8034) ([Konstantin Malanchev](https://github.com/hombit))
* Added support for brotli (`br`) compression in file-related storages and table functions. This fixes [#8156](https://github.com/ClickHouse/ClickHouse/issues/8156). [#8526](https://github.com/ClickHouse/ClickHouse/pull/8526) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Add `groupBit*` functions for the `SimpleAggregationFunction` type. [#8485](https://github.com/ClickHouse/ClickHouse/pull/8485) ([Guillaume Tassery](https://github.com/YiuRULE))
### Bug Fix
* Fix rename of tables with `Distributed` engine. Fixes issue [#7868](https://github.com/ClickHouse/ClickHouse/issues/7868). [#8306](https://github.com/ClickHouse/ClickHouse/pull/8306) ([tavplubix](https://github.com/tavplubix))
* Now dictionaries support `EXPRESSION` for attributes in arbitrary string in non-ClickHouse SQL dialect. [#8098](https://github.com/ClickHouse/ClickHouse/pull/8098) ([alesapin](https://github.com/alesapin))
* Fix broken `INSERT SELECT FROM mysql(...)` query. This fixes [#8070](https://github.com/ClickHouse/ClickHouse/issues/8070) and [#7960](https://github.com/ClickHouse/ClickHouse/issues/7960). [#8234](https://github.com/ClickHouse/ClickHouse/pull/8234) ([tavplubix](https://github.com/tavplubix))
* Fix error "Mismatch column sizes" when inserting default `Tuple` from `JSONEachRow`. This fixes [#5653](https://github.com/ClickHouse/ClickHouse/issues/5653). [#8606](https://github.com/ClickHouse/ClickHouse/pull/8606) ([tavplubix](https://github.com/tavplubix))
* Now an exception will be thrown in case of using `WITH TIES` alongside `LIMIT BY`. Also add ability to use `TOP` with `LIMIT BY`. This fixes [#7472](https://github.com/ClickHouse/ClickHouse/issues/7472). [#7637](https://github.com/ClickHouse/ClickHouse/pull/7637) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov))
* Fix unintendent dependency from fresh glibc version in `clickhouse-odbc-bridge` binary. [#8046](https://github.com/ClickHouse/ClickHouse/pull/8046) ([Amos Bird](https://github.com/amosbird))
* Fix bug in check function of `*MergeTree` engines family. Now it doesn't fail in case when we have equal amount of rows in last granule and last mark (non-final). [#8047](https://github.com/ClickHouse/ClickHouse/pull/8047) ([alesapin](https://github.com/alesapin))
* Fix insert into `Enum*` columns after `ALTER` query, when underlying numeric type is equal to table specified type. This fixes [#7836](https://github.com/ClickHouse/ClickHouse/issues/7836). [#7908](https://github.com/ClickHouse/ClickHouse/pull/7908) ([Anton Popov](https://github.com/CurtizJ))
* Allowed non-constant negative "size" argument for function `substring`. It was not allowed by mistake. This fixes [#4832](https://github.com/ClickHouse/ClickHouse/issues/4832). [#7703](https://github.com/ClickHouse/ClickHouse/pull/7703) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix parsing bug when wrong number of arguments passed to `(O|J)DBC` table engine. [#7709](https://github.com/ClickHouse/ClickHouse/pull/7709) ([alesapin](https://github.com/alesapin))
* Using command name of the running clickhouse process when sending logs to syslog. In previous versions, empty string was used instead of command name. [#8460](https://github.com/ClickHouse/ClickHouse/pull/8460) ([Michael Nacharov](https://github.com/mnach))
* Fix check of allowed hosts for `localhost`. This PR fixes the solution provided in [#8241](https://github.com/ClickHouse/ClickHouse/pull/8241). [#8342](https://github.com/ClickHouse/ClickHouse/pull/8342) ([Vitaly Baranov](https://github.com/vitlibar))
* Fix rare crash in `argMin` and `argMax` functions for long string arguments, when result is used in `runningAccumulate` function. This fixes [#8325](https://github.com/ClickHouse/ClickHouse/issues/8325) [#8341](https://github.com/ClickHouse/ClickHouse/pull/8341) ([dinosaur](https://github.com/769344359))
* Fix memory overcommit for tables with `Buffer` engine. [#8345](https://github.com/ClickHouse/ClickHouse/pull/8345) ([Azat Khuzhin](https://github.com/azat))
* Fixed potential bug in functions that can take `NULL` as one of the arguments and return non-NULL. [#8196](https://github.com/ClickHouse/ClickHouse/pull/8196) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Better metrics calculations in thread pool for background processes for `MergeTree` table engines. [#8194](https://github.com/ClickHouse/ClickHouse/pull/8194) ([Vladimir Chebotarev](https://github.com/excitoon))
* Fix function `IN` inside `WHERE` statement when row-level table filter is present. Fixes [#6687](https://github.com/ClickHouse/ClickHouse/issues/6687) [#8357](https://github.com/ClickHouse/ClickHouse/pull/8357) ([Ivan](https://github.com/abyss7))
* Now an exception is thrown if the integral value is not parsed completely for settings values. [#7678](https://github.com/ClickHouse/ClickHouse/pull/7678) ([Mikhail Korotov](https://github.com/millb))
* Fix exception when aggregate function is used in query to distributed table with more than two local shards. [#8164](https://github.com/ClickHouse/ClickHouse/pull/8164) ([小路](https://github.com/nicelulu))
* Now bloom filter can handle zero length arrays and doesn't perform redundant calculations. [#8242](https://github.com/ClickHouse/ClickHouse/pull/8242) ([achimbab](https://github.com/achimbab))
* Fixed checking if a client host is allowed by matching the client host to `host_regexp` specified in `users.xml`. [#8241](https://github.com/ClickHouse/ClickHouse/pull/8241) ([Vitaly Baranov](https://github.com/vitlibar))
* Relax ambiguous column check that leads to false positives in multiple `JOIN ON` section. [#8385](https://github.com/ClickHouse/ClickHouse/pull/8385) ([Artem Zuikov](https://github.com/4ertus2))
* Fixed possible server crash (`std::terminate`) when the server cannot send or write data in `JSON` or `XML` format with values of `String` data type (that require `UTF-8` validation) or when compressing result data with Brotli algorithm or in some other rare cases. This fixes [#7603](https://github.com/ClickHouse/ClickHouse/issues/7603) [#8384](https://github.com/ClickHouse/ClickHouse/pull/8384) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix race condition in `StorageDistributedDirectoryMonitor` found by CI. This fixes [#8364](https://github.com/ClickHouse/ClickHouse/issues/8364). [#8383](https://github.com/ClickHouse/ClickHouse/pull/8383) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Now background merges in `*MergeTree` table engines family preserve storage policy volume order more accurately. [#8549](https://github.com/ClickHouse/ClickHouse/pull/8549) ([Vladimir Chebotarev](https://github.com/excitoon))
* Now table engine `Kafka` works properly with `Native` format. This fixes [#6731](https://github.com/ClickHouse/ClickHouse/issues/6731) [#7337](https://github.com/ClickHouse/ClickHouse/issues/7337) [#8003](https://github.com/ClickHouse/ClickHouse/issues/8003). [#8016](https://github.com/ClickHouse/ClickHouse/pull/8016) ([filimonov](https://github.com/filimonov))
* Fixed formats with headers (like `CSVWithNames`) which were throwing exception about EOF for table engine `Kafka`. [#8016](https://github.com/ClickHouse/ClickHouse/pull/8016) ([filimonov](https://github.com/filimonov))
* Fixed a bug with making set from subquery in right part of `IN` section. This fixes [#5767](https://github.com/ClickHouse/ClickHouse/issues/5767) and [#2542](https://github.com/ClickHouse/ClickHouse/issues/2542). [#7755](https://github.com/ClickHouse/ClickHouse/pull/7755) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov))
* Fix possible crash while reading from storage `File`. [#7756](https://github.com/ClickHouse/ClickHouse/pull/7756) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Fixed reading of the files in `Parquet` format containing columns of type `list`. [#8334](https://github.com/ClickHouse/ClickHouse/pull/8334) ([maxulan](https://github.com/maxulan))
* Fix error `Not found column` for distributed queries with `PREWHERE` condition dependent on sampling key if `max_parallel_replicas > 1`. [#7913](https://github.com/ClickHouse/ClickHouse/pull/7913) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Fix error `Not found column` if query used `PREWHERE` dependent on table's alias and the result set was empty because of primary key condition. [#7911](https://github.com/ClickHouse/ClickHouse/pull/7911) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Fixed return type for functions `rand` and `randConstant` in case of `Nullable` argument. Now functions always return `UInt32` and never `Nullable(UInt32)`. [#8204](https://github.com/ClickHouse/ClickHouse/pull/8204) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Disabled predicate push-down for `WITH FILL` expression. This fixes [#7784](https://github.com/ClickHouse/ClickHouse/issues/7784). [#7789](https://github.com/ClickHouse/ClickHouse/pull/7789) ([Winter Zhang](https://github.com/zhang2014))
* Fixed incorrect `count()` result for `SummingMergeTree` when `FINAL` section is used. [#3280](https://github.com/ClickHouse/ClickHouse/issues/3280) [#7786](https://github.com/ClickHouse/ClickHouse/pull/7786) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov))
* Fix possible incorrect result for constant functions from remote servers. It happened for queries with functions like `version()`, `uptime()`, etc. which returns different constant values for different servers. This fixes [#7666](https://github.com/ClickHouse/ClickHouse/issues/7666). [#7689](https://github.com/ClickHouse/ClickHouse/pull/7689) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Fix complicated bug in push-down predicate optimization which leads to wrong results. This fixes a lot of issues on push-down predicate optimization. [#8503](https://github.com/ClickHouse/ClickHouse/pull/8503) ([Winter Zhang](https://github.com/zhang2014))
* Fix crash in `CREATE TABLE .. AS dictionary` query. [#8508](https://github.com/ClickHouse/ClickHouse/pull/8508) ([Azat Khuzhin](https://github.com/azat))
* Several improvements ClickHouse grammar in `.g4` file. [#8294](https://github.com/ClickHouse/ClickHouse/pull/8294) ([taiyang-li](https://github.com/taiyang-li))
* Fix bug that leads to crashes in `JOIN`s with tables with engine `Join`. This fixes [#7556](https://github.com/ClickHouse/ClickHouse/issues/7556) [#8254](https://github.com/ClickHouse/ClickHouse/issues/8254) [#7915](https://github.com/ClickHouse/ClickHouse/issues/7915) [#8100](https://github.com/ClickHouse/ClickHouse/issues/8100). [#8298](https://github.com/ClickHouse/ClickHouse/pull/8298) ([Artem Zuikov](https://github.com/4ertus2))
* Fix redundant dictionaries reload on `CREATE DATABASE`. [#7916](https://github.com/ClickHouse/ClickHouse/pull/7916) ([Azat Khuzhin](https://github.com/azat))
* Limit maximum number of streams for read from `StorageFile` and `StorageHDFS`. Fixes https://github.com/ClickHouse/ClickHouse/issues/7650. [#7981](https://github.com/ClickHouse/ClickHouse/pull/7981) ([alesapin](https://github.com/alesapin))
* Fix bug in `ALTER ... MODIFY ... CODEC` query, when user specify both default expression and codec. Fixes [8593](https://github.com/ClickHouse/ClickHouse/issues/8593). [#8614](https://github.com/ClickHouse/ClickHouse/pull/8614) ([alesapin](https://github.com/alesapin))
* Fix error in background merge of columns with `SimpleAggregateFunction(LowCardinality)` type. [#8613](https://github.com/ClickHouse/ClickHouse/pull/8613) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Fixed type check in function `toDateTime64`. [#8375](https://github.com/ClickHouse/ClickHouse/pull/8375) ([Vasily Nemkov](https://github.com/Enmk))
* Now server do not crash on `LEFT` or `FULL JOIN` with and Join engine and unsupported `join_use_nulls` settings. [#8479](https://github.com/ClickHouse/ClickHouse/pull/8479) ([Artem Zuikov](https://github.com/4ertus2))
* Now `DROP DICTIONARY IF EXISTS db.dict` query doesn't throw exception if `db` doesn't exist. [#8185](https://github.com/ClickHouse/ClickHouse/pull/8185) ([Vitaly Baranov](https://github.com/vitlibar))
* Fix possible crashes in table functions (`file`, `mysql`, `remote`) caused by usage of reference to removed `IStorage` object. Fix incorrect parsing of columns specified at insertion into table function. [#7762](https://github.com/ClickHouse/ClickHouse/pull/7762) ([tavplubix](https://github.com/tavplubix))
* Ensure network be up before starting `clickhouse-server`. This fixes [#7507](https://github.com/ClickHouse/ClickHouse/issues/7507). [#8570](https://github.com/ClickHouse/ClickHouse/pull/8570) ([Zhichang Yu](https://github.com/yuzhichang))
* Fix timeouts handling for secure connections, so queries doesn't hang indefenitely. This fixes [#8126](https://github.com/ClickHouse/ClickHouse/issues/8126). [#8128](https://github.com/ClickHouse/ClickHouse/pull/8128) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix `clickhouse-copier`'s redundant contention between concurrent workers. [#7816](https://github.com/ClickHouse/ClickHouse/pull/7816) ([Ding Xiang Fei](https://github.com/dingxiangfei2009))
* Now mutations doesn't skip attached parts, even if their mutation version were larger than current mutation version. [#7812](https://github.com/ClickHouse/ClickHouse/pull/7812) ([Zhichang Yu](https://github.com/yuzhichang)) [#8250](https://github.com/ClickHouse/ClickHouse/pull/8250) ([alesapin](https://github.com/alesapin))
* Ignore redundant copies of `*MergeTree` data parts after move to another disk and server restart. [#7810](https://github.com/ClickHouse/ClickHouse/pull/7810) ([Vladimir Chebotarev](https://github.com/excitoon))
* Fix crash in `FULL JOIN` with `LowCardinality` in `JOIN` key. [#8252](https://github.com/ClickHouse/ClickHouse/pull/8252) ([Artem Zuikov](https://github.com/4ertus2))
* Forbidden to use column name more than once in insert query like `INSERT INTO tbl (x, y, x)`. This fixes [#5465](https://github.com/ClickHouse/ClickHouse/issues/5465), [#7681](https://github.com/ClickHouse/ClickHouse/issues/7681). [#7685](https://github.com/ClickHouse/ClickHouse/pull/7685) ([alesapin](https://github.com/alesapin))
* Added fallback for detection the number of physical CPU cores for unknown CPUs (using the number of logical CPU cores). This fixes [#5239](https://github.com/ClickHouse/ClickHouse/issues/5239). [#7726](https://github.com/ClickHouse/ClickHouse/pull/7726) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix `There's no column` error for materialized and alias columns. [#8210](https://github.com/ClickHouse/ClickHouse/pull/8210) ([Artem Zuikov](https://github.com/4ertus2))
* Fixed sever crash when `EXISTS` query was used without `TABLE` or `DICTIONARY` qualifier. Just like `EXISTS t`. This fixes [#8172](https://github.com/ClickHouse/ClickHouse/issues/8172). This bug was introduced in version 19.17. [#8213](https://github.com/ClickHouse/ClickHouse/pull/8213) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix rare bug with error `"Sizes of columns doesn't match"` that might appear when using `SimpleAggregateFunction` column. [#7790](https://github.com/ClickHouse/ClickHouse/pull/7790) ([Boris Granveaud](https://github.com/bgranvea))
* Fix bug where user with empty `allow_databases` got access to all databases (and same for `allow_dictionaries`). [#7793](https://github.com/ClickHouse/ClickHouse/pull/7793) ([DeifyTheGod](https://github.com/DeifyTheGod))
* Fix client crash when server already disconnected from client. [#8071](https://github.com/ClickHouse/ClickHouse/pull/8071) ([Azat Khuzhin](https://github.com/azat))
* Fix `ORDER BY` behaviour in case of sorting by primary key prefix and non primary key suffix. [#7759](https://github.com/ClickHouse/ClickHouse/pull/7759) ([Anton Popov](https://github.com/CurtizJ))
* Check if qualified column present in the table. This fixes [#6836](https://github.com/ClickHouse/ClickHouse/issues/6836). [#7758](https://github.com/ClickHouse/ClickHouse/pull/7758) ([Artem Zuikov](https://github.com/4ertus2))
* Fixed behavior with `ALTER MOVE` ran immediately after merge finish moves superpart of specified. Fixes [#8103](https://github.com/ClickHouse/ClickHouse/issues/8103). [#8104](https://github.com/ClickHouse/ClickHouse/pull/8104) ([Vladimir Chebotarev](https://github.com/excitoon))
* Fix possible server crash while using `UNION` with different number of columns. Fixes [#7279](https://github.com/ClickHouse/ClickHouse/issues/7279). [#7929](https://github.com/ClickHouse/ClickHouse/pull/7929) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Fix size of result substring for function `substr` with negative size. [#8589](https://github.com/ClickHouse/ClickHouse/pull/8589) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Now server does not execute part mutation in `MergeTree` if there are not enough free threads in background pool. [#8588](https://github.com/ClickHouse/ClickHouse/pull/8588) ([tavplubix](https://github.com/tavplubix))
* Fix a minor typo on formatting `UNION ALL` AST. [#7999](https://github.com/ClickHouse/ClickHouse/pull/7999) ([litao91](https://github.com/litao91))
* Fixed incorrect bloom filter results for negative numbers. This fixes [#8317](https://github.com/ClickHouse/ClickHouse/issues/8317). [#8566](https://github.com/ClickHouse/ClickHouse/pull/8566) ([Winter Zhang](https://github.com/zhang2014))
* Fixed potential buffer overflow in decompress. Malicious user can pass fabricated compressed data that will cause read after buffer. This issue was found by Eldar Zaitov from Yandex information security team. [#8404](https://github.com/ClickHouse/ClickHouse/pull/8404) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix incorrect result because of integers overflow in `arrayIntersect`. [#7777](https://github.com/ClickHouse/ClickHouse/pull/7777) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Now `OPTIMIZE TABLE` query will not wait for offline replicas to perform the operation. [#8314](https://github.com/ClickHouse/ClickHouse/pull/8314) ([javi santana](https://github.com/javisantana))
* Fixed `ALTER TTL` parser for `Replicated*MergeTree` tables. [#8318](https://github.com/ClickHouse/ClickHouse/pull/8318) ([Vladimir Chebotarev](https://github.com/excitoon))
* Fix communication between server and client, so server read temporary tables info after query failure. [#8084](https://github.com/ClickHouse/ClickHouse/pull/8084) ([Azat Khuzhin](https://github.com/azat))
* Fix `bitmapAnd` function error when intersecting an aggregated bitmap and a scalar bitmap. [#8082](https://github.com/ClickHouse/ClickHouse/pull/8082) ([Yue Huang](https://github.com/moon03432))
* Refine the definition of `ZXid` according to the ZooKeeper Programmer's Guide which fixes bug in `clickhouse-cluster-copier`. [#8088](https://github.com/ClickHouse/ClickHouse/pull/8088) ([Ding Xiang Fei](https://github.com/dingxiangfei2009))
* `odbc` table function now respects `external_table_functions_use_nulls` setting. [#7506](https://github.com/ClickHouse/ClickHouse/pull/7506) ([Vasily Nemkov](https://github.com/Enmk))
* Fixed bug that lead to a rare data race. [#8143](https://github.com/ClickHouse/ClickHouse/pull/8143) ([Alexander Kazakov](https://github.com/Akazz))
* Now `SYSTEM RELOAD DICTIONARY` reloads a dictionary completely, ignoring `update_field`. This fixes [#7440](https://github.com/ClickHouse/ClickHouse/issues/7440). [#8037](https://github.com/ClickHouse/ClickHouse/pull/8037) ([Vitaly Baranov](https://github.com/vitlibar))
* Add ability to check if dictionary exists in create query. [#8032](https://github.com/ClickHouse/ClickHouse/pull/8032) ([alesapin](https://github.com/alesapin))
* Fix `Float*` parsing in `Values` format. This fixes [#7817](https://github.com/ClickHouse/ClickHouse/issues/7817). [#7870](https://github.com/ClickHouse/ClickHouse/pull/7870) ([tavplubix](https://github.com/tavplubix))
* Fix crash when we cannot reserve space in some background operations of `*MergeTree` table engines family. [#7873](https://github.com/ClickHouse/ClickHouse/pull/7873) ([Vladimir Chebotarev](https://github.com/excitoon))
* Fix crash of merge operation when table contains `SimpleAggregateFunction(LowCardinality)` column. This fixes [#8515](https://github.com/ClickHouse/ClickHouse/issues/8515). [#8522](https://github.com/ClickHouse/ClickHouse/pull/8522) ([Azat Khuzhin](https://github.com/azat))
* Restore support of all ICU locales and add the ability to apply collations for constant expressions. Also add language name to `system.collations` table. [#8051](https://github.com/ClickHouse/ClickHouse/pull/8051) ([alesapin](https://github.com/alesapin))
* Fix bug when external dictionaries with zero minimal lifetime (`LIFETIME(MIN 0 MAX N)`, `LIFETIME(N)`) don't update in background. [#7983](https://github.com/ClickHouse/ClickHouse/pull/7983) ([alesapin](https://github.com/alesapin))
* Fix crash when external dictionary with ClickHouse source has subquery in query. [#8351](https://github.com/ClickHouse/ClickHouse/pull/8351) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Fix incorrect parsing of file extension in table with engine `URL`. This fixes [#8157](https://github.com/ClickHouse/ClickHouse/issues/8157). [#8419](https://github.com/ClickHouse/ClickHouse/pull/8419) ([Andrey Bodrov](https://github.com/apbodrov))
* Fix `CHECK TABLE` query for `*MergeTree` tables without key. Fixes [#7543](https://github.com/ClickHouse/ClickHouse/issues/7543). [#7979](https://github.com/ClickHouse/ClickHouse/pull/7979) ([alesapin](https://github.com/alesapin))
* Fixed conversion of `Float64` to MySQL type. [#8079](https://github.com/ClickHouse/ClickHouse/pull/8079) ([Yuriy Baranov](https://github.com/yurriy))
* Now if table was not completely dropped because of server crash, server will try to restore and load it. [#8176](https://github.com/ClickHouse/ClickHouse/pull/8176) ([tavplubix](https://github.com/tavplubix))
* Fixed crash in table function `file` while inserting into file that doesn't exist. Now in this case file would be created and then insert would be processed. [#8177](https://github.com/ClickHouse/ClickHouse/pull/8177) ([Olga Khvostikova](https://github.com/stavrolia))
* Fix rare deadlock which can happen when `trace_log` is in enabled. [#7838](https://github.com/ClickHouse/ClickHouse/pull/7838) ([filimonov](https://github.com/filimonov))
* Add ability to work with different types besides `Date` in `RangeHashed` external dictionary created from DDL query. Fixes [7899](https://github.com/ClickHouse/ClickHouse/issues/7899). [#8275](https://github.com/ClickHouse/ClickHouse/pull/8275) ([alesapin](https://github.com/alesapin))
* Fixes crash when `now64()` is called with result of another function. [#8270](https://github.com/ClickHouse/ClickHouse/pull/8270) ([Vasily Nemkov](https://github.com/Enmk))
* Fixed bug with detecting client IP for connections through mysql wire protocol. [#7743](https://github.com/ClickHouse/ClickHouse/pull/7743) ([Dmitry Muzyka](https://github.com/dmitriy-myz))
* Fix empty array handling in `arraySplit` function. This fixes [#7708](https://github.com/ClickHouse/ClickHouse/issues/7708). [#7747](https://github.com/ClickHouse/ClickHouse/pull/7747) ([hcz](https://github.com/hczhcz))
* Fixed the issue when `pid-file` of another running `clickhouse-server` may be deleted. [#8487](https://github.com/ClickHouse/ClickHouse/pull/8487) ([Weiqing Xu](https://github.com/weiqxu))
* Fix dictionary reload if it has `invalidate_query`, which stopped updates and some exception on previous update tries. [#8029](https://github.com/ClickHouse/ClickHouse/pull/8029) ([alesapin](https://github.com/alesapin))
* Fixed error in function `arrayReduce` that may lead to "double free" and error in aggregate function combinator `Resample` that may lead to memory leak. Added aggregate function `aggThrow`. This function can be used for testing purposes. [#8446](https://github.com/ClickHouse/ClickHouse/pull/8446) ([alexey-milovidov](https://github.com/alexey-milovidov))
### Improvement
* Improved logging when working with `S3` table engine. [#8251](https://github.com/ClickHouse/ClickHouse/pull/8251) ([Grigory Pervakov](https://github.com/GrigoryPervakov))
* Printed help message when no arguments are passed when calling `clickhouse-local`. This fixes [#5335](https://github.com/ClickHouse/ClickHouse/issues/5335). [#8230](https://github.com/ClickHouse/ClickHouse/pull/8230) ([Andrey Nagorny](https://github.com/Melancholic))
* Add setting `mutations_sync` which allows to wait `ALTER UPDATE/DELETE` queries synchronously. [#8237](https://github.com/ClickHouse/ClickHouse/pull/8237) ([alesapin](https://github.com/alesapin))
* Allow to set up relative `user_files_path` in `config.xml` (in the way similar to `format_schema_path`). [#7632](https://github.com/ClickHouse/ClickHouse/pull/7632) ([hcz](https://github.com/hczhcz))
* Add exception for illegal types for conversion functions with `-OrZero` postfix. [#7880](https://github.com/ClickHouse/ClickHouse/pull/7880) ([Andrey Konyaev](https://github.com/akonyaev90))
* Simplify format of the header of data sending to a shard in a distributed query. [#8044](https://github.com/ClickHouse/ClickHouse/pull/8044) ([Vitaly Baranov](https://github.com/vitlibar))
* `Live View` table engine refactoring. [#8519](https://github.com/ClickHouse/ClickHouse/pull/8519) ([vzakaznikov](https://github.com/vzakaznikov))
* Add additional checks for external dictionaries created from DDL-queries. [#8127](https://github.com/ClickHouse/ClickHouse/pull/8127) ([alesapin](https://github.com/alesapin))
* Fix error `Column ... already exists` while using `FINAL` and `SAMPLE` together, e.g. `select count() from table final sample 1/2`. Fixes [#5186](https://github.com/ClickHouse/ClickHouse/issues/5186). [#7907](https://github.com/ClickHouse/ClickHouse/pull/7907) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Now table the first argument of `joinGet` function can be table indentifier. [#7707](https://github.com/ClickHouse/ClickHouse/pull/7707) ([Amos Bird](https://github.com/amosbird))
* Allow using `MaterializedView` with subqueries above `Kafka` tables. [#8197](https://github.com/ClickHouse/ClickHouse/pull/8197) ([filimonov](https://github.com/filimonov))
* Now background moves between disks run it the seprate thread pool. [#7670](https://github.com/ClickHouse/ClickHouse/pull/7670) ([Vladimir Chebotarev](https://github.com/excitoon))
* `SYSTEM RELOAD DICTIONARY` now executes synchronously. [#8240](https://github.com/ClickHouse/ClickHouse/pull/8240) ([Vitaly Baranov](https://github.com/vitlibar))
* Stack traces now display physical addresses (offsets in object file) instead of virtual memory addresses (where the object file was loaded). That allows the use of `addr2line` when binary is position independent and ASLR is active. This fixes [#8360](https://github.com/ClickHouse/ClickHouse/issues/8360). [#8387](https://github.com/ClickHouse/ClickHouse/pull/8387) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Support new syntax for row-level security filters: `<table name='table_name'>…</table>`. Fixes [#5779](https://github.com/ClickHouse/ClickHouse/issues/5779). [#8381](https://github.com/ClickHouse/ClickHouse/pull/8381) ([Ivan](https://github.com/abyss7))
* Now `cityHash` function can work with `Decimal` and `UUID` types. Fixes [#5184](https://github.com/ClickHouse/ClickHouse/issues/5184). [#7693](https://github.com/ClickHouse/ClickHouse/pull/7693) ([Mikhail Korotov](https://github.com/millb))
* Removed fixed index granularity (it was 1024) from system logs because it's obsolete after implementation of adaptive granularity. [#7698](https://github.com/ClickHouse/ClickHouse/pull/7698) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Enabled MySQL compatibility server when ClickHouse is compiled without SSL. [#7852](https://github.com/ClickHouse/ClickHouse/pull/7852) ([Yuriy Baranov](https://github.com/yurriy))
* Now server checksums distributed batches, which gives more verbose errors in case of corrupted data in batch. [#7914](https://github.com/ClickHouse/ClickHouse/pull/7914) ([Azat Khuzhin](https://github.com/azat))
* Support `DROP DATABASE`, `DETACH TABLE`, `DROP TABLE` and `ATTACH TABLE` for `MySQL` database engine. [#8202](https://github.com/ClickHouse/ClickHouse/pull/8202) ([Winter Zhang](https://github.com/zhang2014))
* Add authentication in S3 table function and table engine. [#7623](https://github.com/ClickHouse/ClickHouse/pull/7623) ([Vladimir Chebotarev](https://github.com/excitoon))
* Added check for extra parts of `MergeTree` at different disks, in order to not allow to miss data parts at undefined disks. [#8118](https://github.com/ClickHouse/ClickHouse/pull/8118) ([Vladimir Chebotarev](https://github.com/excitoon))
* Enable SSL support for Mac client and server. [#8297](https://github.com/ClickHouse/ClickHouse/pull/8297) ([Ivan](https://github.com/abyss7))
* Now ClickHouse can work as MySQL federated server (see https://dev.mysql.com/doc/refman/5.7/en/federated-create-server.html). [#7717](https://github.com/ClickHouse/ClickHouse/pull/7717) ([Maxim Fedotov](https://github.com/MaxFedotov))
* `clickhouse-client` now only enable `bracketed-paste` when multiquery is on and multiline is off. This fixes (#7757)[https://github.com/ClickHouse/ClickHouse/issues/7757]. [#7761](https://github.com/ClickHouse/ClickHouse/pull/7761) ([Amos Bird](https://github.com/amosbird))
* Support `Array(Decimal)` in `if` function. [#7721](https://github.com/ClickHouse/ClickHouse/pull/7721) ([Artem Zuikov](https://github.com/4ertus2))
* Support Decimals in `arrayDifference`, `arrayCumSum` and `arrayCumSumNegative` functions. [#7724](https://github.com/ClickHouse/ClickHouse/pull/7724) ([Artem Zuikov](https://github.com/4ertus2))
* Added `lifetime` column to `system.dictionaries` table. [#6820](https://github.com/ClickHouse/ClickHouse/issues/6820) [#7727](https://github.com/ClickHouse/ClickHouse/pull/7727) ([kekekekule](https://github.com/kekekekule))
* Improved check for existing parts on different disks for `*MergeTree` table engines. Addresses [#7660](https://github.com/ClickHouse/ClickHouse/issues/7660). [#8440](https://github.com/ClickHouse/ClickHouse/pull/8440) ([Vladimir Chebotarev](https://github.com/excitoon))
* Integration with `AWS SDK` for `S3` interactions which allows to use all S3 features out of the box. [#8011](https://github.com/ClickHouse/ClickHouse/pull/8011) ([Pavel Kovalenko](https://github.com/Jokser))
* Added support for subqueries in `Live View` tables. [#7792](https://github.com/ClickHouse/ClickHouse/pull/7792) ([vzakaznikov](https://github.com/vzakaznikov))
* Check for using `Date` or `DateTime` column from `TTL` expressions was removed. [#7920](https://github.com/ClickHouse/ClickHouse/pull/7920) ([Vladimir Chebotarev](https://github.com/excitoon))
* Information about disk was added to `system.detached_parts` table. [#7833](https://github.com/ClickHouse/ClickHouse/pull/7833) ([Vladimir Chebotarev](https://github.com/excitoon))
* Now settings `max_(table|partition)_size_to_drop` can be changed without a restart. [#7779](https://github.com/ClickHouse/ClickHouse/pull/7779) ([Grigory Pervakov](https://github.com/GrigoryPervakov))
* Slightly better usability of error messages. Ask user not to remove the lines below `Stack trace:`. [#7897](https://github.com/ClickHouse/ClickHouse/pull/7897) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Better reading messages from `Kafka` engine in various formats after [#7935](https://github.com/ClickHouse/ClickHouse/issues/7935). [#8035](https://github.com/ClickHouse/ClickHouse/pull/8035) ([Ivan](https://github.com/abyss7))
* Better compatibility with MySQL clients which don't support `sha2_password` auth plugin. [#8036](https://github.com/ClickHouse/ClickHouse/pull/8036) ([Yuriy Baranov](https://github.com/yurriy))
* Support more column types in MySQL compatibility server. [#7975](https://github.com/ClickHouse/ClickHouse/pull/7975) ([Yuriy Baranov](https://github.com/yurriy))
* Implement `ORDER BY` optimization for `Merge`, `Buffer` and `Materilized View` storages with underlying `MergeTree` tables. [#8130](https://github.com/ClickHouse/ClickHouse/pull/8130) ([Anton Popov](https://github.com/CurtizJ))
* Now we always use POSIX implementation of `getrandom` to have better compatibility with old kernels (< 3.17). [#7940](https://github.com/ClickHouse/ClickHouse/pull/7940) ([Amos Bird](https://github.com/amosbird))
* Better check for valid destination in a move TTL rule. [#8410](https://github.com/ClickHouse/ClickHouse/pull/8410) ([Vladimir Chebotarev](https://github.com/excitoon))
* Better checks for broken insert batches for `Distributed` table engine. [#7933](https://github.com/ClickHouse/ClickHouse/pull/7933) ([Azat Khuzhin](https://github.com/azat))
* Add column with array of parts name which mutations must process in future to `system.mutations` table. [#8179](https://github.com/ClickHouse/ClickHouse/pull/8179) ([alesapin](https://github.com/alesapin))
* Parallel merge sort optimization for processors. [#8552](https://github.com/ClickHouse/ClickHouse/pull/8552) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* The settings `mark_cache_min_lifetime` is now obsolete and does nothing. In previous versions, mark cache can grow in memory larger than `mark_cache_size` to accomodate data within `mark_cache_min_lifetime` seconds. That was leading to confusion and higher memory usage than expected, that is especially bad on memory constrained systems. If you will see performance degradation after installing this release, you should increase the `mark_cache_size`. [#8484](https://github.com/ClickHouse/ClickHouse/pull/8484) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Preparation to use `tid` everywhere. This is needed for [#7477](https://github.com/ClickHouse/ClickHouse/issues/7477). [#8276](https://github.com/ClickHouse/ClickHouse/pull/8276) ([alexey-milovidov](https://github.com/alexey-milovidov))
### Performance Improvement
* Performance optimizations in processors pipeline. [#7988](https://github.com/ClickHouse/ClickHouse/pull/7988) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Non-blocking updates of expired keys in cache dictionaries (with permission to read old ones). [#8303](https://github.com/ClickHouse/ClickHouse/pull/8303) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov))
* Compile ClickHouse without `-fno-omit-frame-pointer` globally to spare one more register. [#8097](https://github.com/ClickHouse/ClickHouse/pull/8097) ([Amos Bird](https://github.com/amosbird))
* Speedup `greatCircleDistance` function and add performance tests for it. [#7307](https://github.com/ClickHouse/ClickHouse/pull/7307) ([Olga Khvostikova](https://github.com/stavrolia))
* Improved performance of function `roundDown`. [#8465](https://github.com/ClickHouse/ClickHouse/pull/8465) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Improved performance of `max`, `min`, `argMin`, `argMax` for `DateTime64` data type. [#8199](https://github.com/ClickHouse/ClickHouse/pull/8199) ([Vasily Nemkov](https://github.com/Enmk))
* Improved performance of sorting without a limit or with big limit and external sorting. [#8545](https://github.com/ClickHouse/ClickHouse/pull/8545) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Improved performance of formatting floating point numbers up to 6 times. [#8542](https://github.com/ClickHouse/ClickHouse/pull/8542) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Improved performance of `modulo` function. [#7750](https://github.com/ClickHouse/ClickHouse/pull/7750) ([Amos Bird](https://github.com/amosbird))
* Optimized `ORDER BY` and merging with single column key. [#8335](https://github.com/ClickHouse/ClickHouse/pull/8335) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Better implementation for `arrayReduce`, `-Array` and `-State` combinators. [#7710](https://github.com/ClickHouse/ClickHouse/pull/7710) ([Amos Bird](https://github.com/amosbird))
* Now `PREWHERE` should be optimized to be at least as efficient as `WHERE`. [#7769](https://github.com/ClickHouse/ClickHouse/pull/7769) ([Amos Bird](https://github.com/amosbird))
* Improve the way `round` and `roundBankers` handling negative numbers. [#8229](https://github.com/ClickHouse/ClickHouse/pull/8229) ([hcz](https://github.com/hczhcz))
* Improved decoding performance of `DoubleDelta` and `Gorilla` codecs by roughly 30-40%. This fixes [#7082](https://github.com/ClickHouse/ClickHouse/issues/7082). [#8019](https://github.com/ClickHouse/ClickHouse/pull/8019) ([Vasily Nemkov](https://github.com/Enmk))
* Improved performance of `base64` related functions. [#8444](https://github.com/ClickHouse/ClickHouse/pull/8444) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Added a function `geoDistance`. It is similar to `greatCircleDistance` but uses approximation to WGS-84 ellipsoid model. The performance of both functions are near the same. [#8086](https://github.com/ClickHouse/ClickHouse/pull/8086) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Faster `min` and `max` aggregation functions for `Decimal` data type. [#8144](https://github.com/ClickHouse/ClickHouse/pull/8144) ([Artem Zuikov](https://github.com/4ertus2))
* Vectorize processing `arrayReduce`. [#7608](https://github.com/ClickHouse/ClickHouse/pull/7608) ([Amos Bird](https://github.com/amosbird))
* `if` chains are now optimized as `multiIf`. [#8355](https://github.com/ClickHouse/ClickHouse/pull/8355) ([kamalov-ruslan](https://github.com/kamalov-ruslan))
* Fix performance regression of `Kafka` table engine introduced in 19.15. This fixes [#7261](https://github.com/ClickHouse/ClickHouse/issues/7261). [#7935](https://github.com/ClickHouse/ClickHouse/pull/7935) ([filimonov](https://github.com/filimonov))
* Removed "pie" code generation that `gcc` from Debian packages occasionally brings by default. [#8483](https://github.com/ClickHouse/ClickHouse/pull/8483) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Parallel parsing data formats [#6553](https://github.com/ClickHouse/ClickHouse/pull/6553) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov))
* Enable optimized parser of `Values` with expressions by default (`input_format_values_deduce_templates_of_expressions=1`). [#8231](https://github.com/ClickHouse/ClickHouse/pull/8231) ([tavplubix](https://github.com/tavplubix))
### Build/Testing/Packaging Improvement
* Build fixes for `ARM` and in minimal mode. [#8304](https://github.com/ClickHouse/ClickHouse/pull/8304) ([proller](https://github.com/proller))
* Add coverage file flush for `clickhouse-server` when std::atexit is not called. Also slightly improved logging in stateless tests with coverage. [#8267](https://github.com/ClickHouse/ClickHouse/pull/8267) ([alesapin](https://github.com/alesapin))
* Update LLVM library in contrib. Avoid using LLVM from OS packages. [#8258](https://github.com/ClickHouse/ClickHouse/pull/8258) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Make bundled `curl` build fully quiet. [#8232](https://github.com/ClickHouse/ClickHouse/pull/8232) [#8203](https://github.com/ClickHouse/ClickHouse/pull/8203) ([Pavel Kovalenko](https://github.com/Jokser))
* Fix some `MemorySanitizer` warnings. [#8235](https://github.com/ClickHouse/ClickHouse/pull/8235) ([Alexander Kuzmenkov](https://github.com/akuzm))
* Use `add_warning` and `no_warning` macros in `CMakeLists.txt`. [#8604](https://github.com/ClickHouse/ClickHouse/pull/8604) ([Ivan](https://github.com/abyss7))
* Add support of Minio S3 Compatible object (https://min.io/) for better integration tests. [#7863](https://github.com/ClickHouse/ClickHouse/pull/7863) [#7875](https://github.com/ClickHouse/ClickHouse/pull/7875) ([Pavel Kovalenko](https://github.com/Jokser))
* Imported `libc` headers to contrib. It allows to make builds more consistent across various systems (only for `x86_64-linux-gnu`). [#5773](https://github.com/ClickHouse/ClickHouse/pull/5773) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Remove `-fPIC` from some libraries. [#8464](https://github.com/ClickHouse/ClickHouse/pull/8464) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Clean `CMakeLists.txt` for curl. See https://github.com/ClickHouse/ClickHouse/pull/8011#issuecomment-569478910 [#8459](https://github.com/ClickHouse/ClickHouse/pull/8459) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Silent warnings in `CapNProto` library. [#8220](https://github.com/ClickHouse/ClickHouse/pull/8220) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Add performance tests for short string optimized hash tables. [#7679](https://github.com/ClickHouse/ClickHouse/pull/7679) ([Amos Bird](https://github.com/amosbird))
* Now ClickHouse will build on `AArch64` even if `MADV_FREE` is not available. This fixes [#8027](https://github.com/ClickHouse/ClickHouse/issues/8027). [#8243](https://github.com/ClickHouse/ClickHouse/pull/8243) ([Amos Bird](https://github.com/amosbird))
* Update `zlib-ng` to fix memory sanitizer problems. [#7182](https://github.com/ClickHouse/ClickHouse/pull/7182) [#8206](https://github.com/ClickHouse/ClickHouse/pull/8206) ([Alexander Kuzmenkov](https://github.com/akuzm))
* Enable internal MySQL library on non-Linux system, because usage of OS packages is very fragile and usually doesn't work at all. This fixes [#5765](https://github.com/ClickHouse/ClickHouse/issues/5765). [#8426](https://github.com/ClickHouse/ClickHouse/pull/8426) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed build on some systems after enabling `libc++`. This supersedes [#8374](https://github.com/ClickHouse/ClickHouse/issues/8374). [#8380](https://github.com/ClickHouse/ClickHouse/pull/8380) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Make `Field` methods more type-safe to find more errors. [#7386](https://github.com/ClickHouse/ClickHouse/pull/7386) [#8209](https://github.com/ClickHouse/ClickHouse/pull/8209) ([Alexander Kuzmenkov](https://github.com/akuzm))
* Added missing files to the `libc-headers` submodule. [#8507](https://github.com/ClickHouse/ClickHouse/pull/8507) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix wrong `JSON` quoting in performance test output. [#8497](https://github.com/ClickHouse/ClickHouse/pull/8497) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Now stack trace is displayed for `std::exception` and `Poco::Exception`. In previous versions it was available only for `DB::Exception`. This improves diagnostics. [#8501](https://github.com/ClickHouse/ClickHouse/pull/8501) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Porting `clock_gettime` and `clock_nanosleep` for fresh glibc versions. [#8054](https://github.com/ClickHouse/ClickHouse/pull/8054) ([Amos Bird](https://github.com/amosbird))
* Enable `part_log` in example config for developers. [#8609](https://github.com/ClickHouse/ClickHouse/pull/8609) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix async nature of reload in `01036_no_superfluous_dict_reload_on_create_database*`. [#8111](https://github.com/ClickHouse/ClickHouse/pull/8111) ([Azat Khuzhin](https://github.com/azat))
* Fixed codec performance tests. [#8615](https://github.com/ClickHouse/ClickHouse/pull/8615) ([Vasily Nemkov](https://github.com/Enmk))
* Add install scripts for `.tgz` build and documentation for them. [#8612](https://github.com/ClickHouse/ClickHouse/pull/8612) [#8591](https://github.com/ClickHouse/ClickHouse/pull/8591) ([alesapin](https://github.com/alesapin))
* Removed old `ZSTD` test (it was created in year 2016 to reproduce the bug that pre 1.0 version of ZSTD has had). This fixes [#8618](https://github.com/ClickHouse/ClickHouse/issues/8618). [#8619](https://github.com/ClickHouse/ClickHouse/pull/8619) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed build on Mac OS Catalina. [#8600](https://github.com/ClickHouse/ClickHouse/pull/8600) ([meo](https://github.com/meob))
* Increased number of rows in codec performance tests to make results noticeable. [#8574](https://github.com/ClickHouse/ClickHouse/pull/8574) ([Vasily Nemkov](https://github.com/Enmk))
* In debug builds, treat `LOGICAL_ERROR` exceptions as assertion failures, so that they are easier to notice. [#8475](https://github.com/ClickHouse/ClickHouse/pull/8475) ([Alexander Kuzmenkov](https://github.com/akuzm))
* Make formats-related performance test more deterministic. [#8477](https://github.com/ClickHouse/ClickHouse/pull/8477) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Update `lz4` to fix a MemorySanitizer failure. [#8181](https://github.com/ClickHouse/ClickHouse/pull/8181) ([Alexander Kuzmenkov](https://github.com/akuzm))
* Suppress a known MemorySanitizer false positive in exception handling. [#8182](https://github.com/ClickHouse/ClickHouse/pull/8182) ([Alexander Kuzmenkov](https://github.com/akuzm))
* Update `gcc` and `g++` to version 9 in `build/docker/build.sh` [#7766](https://github.com/ClickHouse/ClickHouse/pull/7766) ([TLightSky](https://github.com/tlightsky))
* Add performance test case to test that `PREWHERE` is worse than `WHERE`. [#7768](https://github.com/ClickHouse/ClickHouse/pull/7768) ([Amos Bird](https://github.com/amosbird))
* Progress towards fixing one flacky test. [#8621](https://github.com/ClickHouse/ClickHouse/pull/8621) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Avoid MemorySanitizer report for data from `libunwind`. [#8539](https://github.com/ClickHouse/ClickHouse/pull/8539) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Updated `libc++` to the latest version. [#8324](https://github.com/ClickHouse/ClickHouse/pull/8324) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Build ICU library from sources. This fixes [#6460](https://github.com/ClickHouse/ClickHouse/issues/6460). [#8219](https://github.com/ClickHouse/ClickHouse/pull/8219) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Switched from `libressl` to `openssl`. ClickHouse should support TLS 1.3 and SNI after this change. This fixes [#8171](https://github.com/ClickHouse/ClickHouse/issues/8171). [#8218](https://github.com/ClickHouse/ClickHouse/pull/8218) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed UBSan report when using `chacha20_poly1305` from SSL (happens on connect to https://yandex.ru/). [#8214](https://github.com/ClickHouse/ClickHouse/pull/8214) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix mode of default password file for `.deb` linux distros. [#8075](https://github.com/ClickHouse/ClickHouse/pull/8075) ([proller](https://github.com/proller))
* Improved expression for getting `clickhouse-server` PID in `clickhouse-test`. [#8063](https://github.com/ClickHouse/ClickHouse/pull/8063) ([Alexander Kazakov](https://github.com/Akazz))
* Updated contrib/googletest to v1.10.0. [#8587](https://github.com/ClickHouse/ClickHouse/pull/8587) ([Alexander Burmak](https://github.com/Alex-Burmak))
* Fixed ThreadSaninitizer report in `base64` library. Also updated this library to the latest version, but it doesn't matter. This fixes [#8397](https://github.com/ClickHouse/ClickHouse/issues/8397). [#8403](https://github.com/ClickHouse/ClickHouse/pull/8403) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix `00600_replace_running_query` for processors. [#8272](https://github.com/ClickHouse/ClickHouse/pull/8272) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Remove support for `tcmalloc` to make `CMakeLists.txt` simpler. [#8310](https://github.com/ClickHouse/ClickHouse/pull/8310) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Release gcc builds now use `libc++` instead of `libstdc++`. Recently `libc++` was used only with clang. This will improve consistency of build configurations and portability. [#8311](https://github.com/ClickHouse/ClickHouse/pull/8311) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Enable ICU library for build with MemorySanitizer. [#8222](https://github.com/ClickHouse/ClickHouse/pull/8222) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Suppress warnings from `CapNProto` library. [#8224](https://github.com/ClickHouse/ClickHouse/pull/8224) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Removed special cases of code for `tcmalloc`, because it's no longer supported. [#8225](https://github.com/ClickHouse/ClickHouse/pull/8225) ([alexey-milovidov](https://github.com/alexey-milovidov))
* In CI coverage task, kill the server gracefully to allow it to save the coverage report. This fixes incomplete coverage reports we've been seeing lately. [#8142](https://github.com/ClickHouse/ClickHouse/pull/8142) ([alesapin](https://github.com/alesapin))
* Performance tests for all codecs against `Float64` and `UInt64` values. [#8349](https://github.com/ClickHouse/ClickHouse/pull/8349) ([Vasily Nemkov](https://github.com/Enmk))
* `termcap` is very much deprecated and lead to various problems (f.g. missing "up" cap and echoing `^J` instead of multi line) . Favor `terminfo` or bundled `ncurses`. [#7737](https://github.com/ClickHouse/ClickHouse/pull/7737) ([Amos Bird](https://github.com/amosbird))
* Fix `test_storage_s3` integration test. [#7734](https://github.com/ClickHouse/ClickHouse/pull/7734) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Support `StorageFile(<format>, null) ` to insert block into given format file without actually write to disk. This is required for performance tests. [#8455](https://github.com/ClickHouse/ClickHouse/pull/8455) ([Amos Bird](https://github.com/amosbird))
* Added argument `--print-time` to functional tests which prints execution time per test. [#8001](https://github.com/ClickHouse/ClickHouse/pull/8001) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Added asserts to `KeyCondition` while evaluating RPN. This will fix warning from gcc-9. [#8279](https://github.com/ClickHouse/ClickHouse/pull/8279) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Dump cmake options in CI builds. [#8273](https://github.com/ClickHouse/ClickHouse/pull/8273) ([Alexander Kuzmenkov](https://github.com/akuzm))
* Don't generate debug info for some fat libraries. [#8271](https://github.com/ClickHouse/ClickHouse/pull/8271) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Make `log_to_console.xml` always log to stderr, regardless of is it interactive or not. [#8395](https://github.com/ClickHouse/ClickHouse/pull/8395) ([Alexander Kuzmenkov](https://github.com/akuzm))
* Removed some unused features from `clickhouse-performance-test` tool. [#8555](https://github.com/ClickHouse/ClickHouse/pull/8555) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Now we will also search for `lld-X` with corresponding `clang-X` version. [#8092](https://github.com/ClickHouse/ClickHouse/pull/8092) ([alesapin](https://github.com/alesapin))
* Parquet build improvement. [#8421](https://github.com/ClickHouse/ClickHouse/pull/8421) ([maxulan](https://github.com/maxulan))
* More GCC warnings [#8221](https://github.com/ClickHouse/ClickHouse/pull/8221) ([kreuzerkrieg](https://github.com/kreuzerkrieg))
* Package for Arch Linux now allows to run ClickHouse server, and not only client. [#8534](https://github.com/ClickHouse/ClickHouse/pull/8534) ([Vladimir Chebotarev](https://github.com/excitoon))
* Fix test with processors. Tiny performance fixes. [#7672](https://github.com/ClickHouse/ClickHouse/pull/7672) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Update contrib/protobuf. [#8256](https://github.com/ClickHouse/ClickHouse/pull/8256) ([Matwey V. Kornilov](https://github.com/matwey))
* In preparation of switching to c++20 as a new year celebration. "May the C++ force be with ClickHouse." [#8447](https://github.com/ClickHouse/ClickHouse/pull/8447) ([Amos Bird](https://github.com/amosbird))
### Experimental Feature
* Added experimental setting `min_bytes_to_use_mmap_io`. It allows to read big files without copying data from kernel to userspace. The setting is disabled by default. Recommended threshold is about 64 MB, because mmap/munmap is slow. [#8520](https://github.com/ClickHouse/ClickHouse/pull/8520) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Reworked quotas as a part of access control system. Added new table `system.quotas`, new functions `currentQuota`, `currentQuotaKey`, new SQL syntax `CREATE QUOTA`, `ALTER QUOTA`, `DROP QUOTA`, `SHOW QUOTA`. [#7257](https://github.com/ClickHouse/ClickHouse/pull/7257) ([Vitaly Baranov](https://github.com/vitlibar))
* Allow skipping unknown settings with warnings instead of throwing exceptions. [#7653](https://github.com/ClickHouse/ClickHouse/pull/7653) ([Vitaly Baranov](https://github.com/vitlibar))
* Reworked row policies as a part of access control system. Added new table `system.row_policies`, new function `currentRowPolicies()`, new SQL syntax `CREATE POLICY`, `ALTER POLICY`, `DROP POLICY`, `SHOW CREATE POLICY`, `SHOW POLICIES`. [#7808](https://github.com/ClickHouse/ClickHouse/pull/7808) ([Vitaly Baranov](https://github.com/vitlibar))
### Security Fix
* Fixed the possibility of reading directories structure in tables with `File` table engine. This fixes [#8536](https://github.com/ClickHouse/ClickHouse/issues/8536). [#8537](https://github.com/ClickHouse/ClickHouse/pull/8537) ([alexey-milovidov](https://github.com/alexey-milovidov))
## ClickHouse release v19.17
### ClickHouse release v19.17.6.36, 2019-12-27

View File

@ -328,7 +328,6 @@ include (cmake/find/xxhash.cmake)
include (cmake/find/sparsehash.cmake)
include (cmake/find/rt.cmake)
include (cmake/find/execinfo.cmake)
include (cmake/find/readline_edit.cmake)
include (cmake/find/re2.cmake)
include (cmake/find/libgsasl.cmake)
include (cmake/find/rdkafka.cmake)
@ -353,6 +352,7 @@ include (cmake/find/simdjson.cmake)
include (cmake/find/rapidjson.cmake)
include (cmake/find/fastops.cmake)
include (cmake/find/orc.cmake)
include (cmake/find/avro.cmake)
find_contrib_lib(cityhash)
find_contrib_lib(farmhash)

View File

@ -1,24 +1,30 @@
# Contributing to ClickHouse
## Technical info
Developer guide for writing code for ClickHouse is published on official website alongside the usage and operations documentation:
https://clickhouse.yandex/docs/en/development/architecture/
ClickHouse is an open project, and you can contribute to it in many ways. You can help with ideas, code, or documentation. We appreciate any efforts that help us to make the project better.
## Legal info
Thank you.
In order for us (YANDEX LLC) to accept patches and other contributions from you, you will have to adopt our Yandex Contributor License Agreement (the "**CLA**"). The current version of the CLA you may find here:
## Technical Info
We have a [developer's guide](https://clickhouse.yandex/docs/en/development/developer_instruction/) for writing code for ClickHouse. Besides this guide, you can find [Overview of ClickHouse Architecture](https://clickhouse.yandex/docs/en/development/architecture/) and instructions on how to build ClickHouse in different environments.
If you want to contribute to documentation, read the [Contributing to ClickHouse Documentation](docs/README.md) guide.
## Legal Info
In order for us (YANDEX LLC) to accept patches and other contributions from you, you may adopt our Yandex Contributor License Agreement (the "**CLA**"). The current version of the CLA you may find here:
1) https://yandex.ru/legal/cla/?lang=en (in English) and
2) https://yandex.ru/legal/cla/?lang=ru (in Russian).
By adopting the CLA, you state the following:
* You obviously wish and are willingly licensing your contributions to us for our open source projects under the terms of the CLA,
* You has read the terms and conditions of the CLA and agree with them in full,
* You have read the terms and conditions of the CLA and agree with them in full,
* You are legally able to provide and license your contributions as stated,
* We may use your contributions for our open source projects and for any other our project too,
* We rely on your assurances concerning the rights of third parties in relation to your contributes.
* We rely on your assurances concerning the rights of third parties in relation to your contributions.
If you agree with these principles, please read and adopt our CLA. By providing us your contributions, you hereby declare that you has already read and adopt our CLA, and we may freely merge your contributions with our corresponding open source project and use it in further in accordance with terms and conditions of the CLA.
If you agree with these principles, please read and adopt our CLA. By providing us your contributions, you hereby declare that you have already read and adopt our CLA, and we may freely merge your contributions with our corresponding open source project and use it in further in accordance with terms and conditions of the CLA.
If you have already adopted terms and conditions of the CLA, you are able to provide your contributes. When you submit your pull request, please add the following information into it:
@ -31,4 +37,7 @@ Replace the bracketed text as follows:
It is enough to provide us such notification once.
If you don't agree with the CLA, you still can open a pull request to provide your contributions.
As an alternative, you can provide DCO instead of CLA. You can find the text of DCO here: https://developercertificate.org/
It is enough to read and copy it verbatim to your pull request.
If you don't agree with the CLA and don't want to provide DCO, you still can open a pull request to provide your contributions.

View File

@ -1,4 +1,4 @@
Copyright 2016-2019 Yandex LLC
Copyright 2016-2020 Yandex LLC
Apache License
Version 2.0, January 2004
@ -188,7 +188,7 @@ Copyright 2016-2019 Yandex LLC
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright 2016-2019 Yandex LLC
Copyright 2016-2020 Yandex LLC
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.

28
cmake/find/avro.cmake Normal file
View File

@ -0,0 +1,28 @@
option (ENABLE_AVRO "Enable Avro" ${ENABLE_LIBRARIES})
if (ENABLE_AVRO)
option (USE_INTERNAL_AVRO_LIBRARY "Set to FALSE to use system avro library instead of bundled" ${NOT_UNBUNDLED})
if(NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/avro/lang/c++/CMakeLists.txt")
if(USE_INTERNAL_AVRO_LIBRARY)
message(WARNING "submodule contrib/avro is missing. to fix try run: \n git submodule update --init --recursive")
endif()
set(MISSING_INTERNAL_AVRO_LIBRARY 1)
set(USE_INTERNAL_AVRO_LIBRARY 0)
endif()
if (NOT USE_INTERNAL_AVRO_LIBRARY)
elseif(NOT MISSING_INTERNAL_AVRO_LIBRARY)
include(cmake/find/snappy.cmake)
set(AVROCPP_INCLUDE_DIR "${ClickHouse_SOURCE_DIR}/contrib/avro/lang/c++/include")
set(AVROCPP_LIBRARY avrocpp)
endif ()
if (AVROCPP_LIBRARY AND AVROCPP_INCLUDE_DIR)
set(USE_AVRO 1)
endif()
endif()
message (STATUS "Using avro=${USE_AVRO}: ${AVROCPP_INCLUDE_DIR} : ${AVROCPP_LIBRARY}")

View File

@ -11,7 +11,6 @@ if (ENABLE_BASE64)
if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/base64")
message (WARNING "submodule contrib/base64 is missing. to fix try run: \n git submodule update --init --recursive")
else()
set (BASE64_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/base64/include)
set (BASE64_LIBRARY base64)
set (USE_BASE64 1)
endif()

View File

@ -31,6 +31,7 @@ if (NOT Boost_SYSTEM_LIBRARY AND NOT MISSING_INTERNAL_BOOST_LIBRARY)
set (Boost_SYSTEM_LIBRARY boost_system_internal)
set (Boost_PROGRAM_OPTIONS_LIBRARY boost_program_options_internal)
set (Boost_FILESYSTEM_LIBRARY boost_filesystem_internal ${Boost_SYSTEM_LIBRARY})
set (Boost_IOSTREAMS_LIBRARY boost_iostreams_internal)
set (Boost_REGEX_LIBRARY boost_regex_internal)
set (Boost_INCLUDE_DIRS)
@ -48,4 +49,4 @@ if (NOT Boost_SYSTEM_LIBRARY AND NOT MISSING_INTERNAL_BOOST_LIBRARY)
list (APPEND Boost_INCLUDE_DIRS "${ClickHouse_SOURCE_DIR}/contrib/boost")
endif ()
message (STATUS "Using Boost: ${Boost_INCLUDE_DIRS} : ${Boost_PROGRAM_OPTIONS_LIBRARY},${Boost_SYSTEM_LIBRARY},${Boost_FILESYSTEM_LIBRARY},${Boost_REGEX_LIBRARY}")
message (STATUS "Using Boost: ${Boost_INCLUDE_DIRS} : ${Boost_PROGRAM_OPTIONS_LIBRARY},${Boost_SYSTEM_LIBRARY},${Boost_FILESYSTEM_LIBRARY},${Boost_IOSTREAMS_LIBRARY},${Boost_REGEX_LIBRARY}")

View File

@ -14,6 +14,7 @@ if (NOT ENABLE_LIBRARIES)
set (ENABLE_POCO_REDIS ${ENABLE_LIBRARIES} CACHE BOOL "")
set (ENABLE_POCO_ODBC ${ENABLE_LIBRARIES} CACHE BOOL "")
set (ENABLE_POCO_SQL ${ENABLE_LIBRARIES} CACHE BOOL "")
set (ENABLE_POCO_JSON ${ENABLE_LIBRARIES} CACHE BOOL "")
endif ()
set (POCO_COMPONENTS Net XML SQL Data)
@ -34,6 +35,9 @@ if (NOT DEFINED ENABLE_POCO_ODBC OR ENABLE_POCO_ODBC)
list (APPEND POCO_COMPONENTS DataODBC)
list (APPEND POCO_COMPONENTS SQLODBC)
endif ()
if (NOT DEFINED ENABLE_POCO_JSON OR ENABLE_POCO_JSON)
list (APPEND POCO_COMPONENTS JSON)
endif ()
if (NOT USE_INTERNAL_POCO_LIBRARY)
find_package (Poco COMPONENTS ${POCO_COMPONENTS})
@ -112,6 +116,11 @@ elseif (NOT MISSING_INTERNAL_POCO_LIBRARY)
endif ()
endif ()
if (NOT DEFINED ENABLE_POCO_JSON OR ENABLE_POCO_JSON)
set (Poco_JSON_LIBRARY PocoJSON)
set (Poco_JSON_INCLUDE_DIR "${ClickHouse_SOURCE_DIR}/contrib/poco/JSON/include/")
endif ()
if (OPENSSL_FOUND AND (NOT DEFINED ENABLE_POCO_NETSSL OR ENABLE_POCO_NETSSL))
set (Poco_NetSSL_LIBRARY PocoNetSSL ${OPENSSL_LIBRARIES})
set (Poco_Crypto_LIBRARY PocoCrypto ${OPENSSL_LIBRARIES})
@ -145,8 +154,11 @@ endif ()
if (Poco_SQLODBC_LIBRARY AND ODBC_FOUND)
set (USE_POCO_SQLODBC 1)
endif ()
if (Poco_JSON_LIBRARY)
set (USE_POCO_JSON 1)
endif ()
message(STATUS "Using Poco: ${Poco_INCLUDE_DIRS} : ${Poco_Foundation_LIBRARY},${Poco_Util_LIBRARY},${Poco_Net_LIBRARY},${Poco_NetSSL_LIBRARY},${Poco_Crypto_LIBRARY},${Poco_XML_LIBRARY},${Poco_Data_LIBRARY},${Poco_DataODBC_LIBRARY},${Poco_SQL_LIBRARY},${Poco_SQLODBC_LIBRARY},${Poco_MongoDB_LIBRARY},${Poco_Redis_LIBRARY}; MongoDB=${USE_POCO_MONGODB}, Redis=${USE_POCO_REDIS}, DataODBC=${USE_POCO_DATAODBC}, NetSSL=${USE_POCO_NETSSL}")
message(STATUS "Using Poco: ${Poco_INCLUDE_DIRS} : ${Poco_Foundation_LIBRARY},${Poco_Util_LIBRARY},${Poco_Net_LIBRARY},${Poco_NetSSL_LIBRARY},${Poco_Crypto_LIBRARY},${Poco_XML_LIBRARY},${Poco_Data_LIBRARY},${Poco_DataODBC_LIBRARY},${Poco_SQL_LIBRARY},${Poco_SQLODBC_LIBRARY},${Poco_MongoDB_LIBRARY},${Poco_Redis_LIBRARY},${Poco_JSON_LIBRARY}; MongoDB=${USE_POCO_MONGODB}, Redis=${USE_POCO_REDIS}, DataODBC=${USE_POCO_DATAODBC}, NetSSL=${USE_POCO_NETSSL}, JSON=${USE_POCO_JSON}")
# How to make sutable poco:
# use branch:

View File

@ -1,60 +0,0 @@
include (CMakePushCheckState)
cmake_push_check_state ()
option (ENABLE_READLINE "Enable readline" ${ENABLE_LIBRARIES})
if (ENABLE_READLINE)
set (READLINE_PATHS "/usr/local/opt/readline/lib")
# First try find custom lib for macos users (default lib without history support)
find_library (READLINE_LIB NAMES readline PATHS ${READLINE_PATHS} NO_DEFAULT_PATH)
if (NOT READLINE_LIB)
find_library (READLINE_LIB NAMES readline PATHS ${READLINE_PATHS})
endif ()
list(APPEND CMAKE_FIND_LIBRARY_SUFFIXES .so.2)
find_library (EDIT_LIB NAMES edit)
set(READLINE_INCLUDE_PATHS "/usr/local/opt/readline/include")
if (READLINE_LIB AND TERMCAP_LIBRARY)
find_path (READLINE_INCLUDE_DIR NAMES readline/readline.h PATHS ${READLINE_INCLUDE_PATHS} NO_DEFAULT_PATH)
if (NOT READLINE_INCLUDE_DIR)
find_path (READLINE_INCLUDE_DIR NAMES readline/readline.h PATHS ${READLINE_INCLUDE_PATHS})
endif ()
if (READLINE_INCLUDE_DIR AND READLINE_LIB)
set (USE_READLINE 1)
set (LINE_EDITING_LIBS ${READLINE_LIB} ${TERMCAP_LIBRARY})
message (STATUS "Using line editing libraries (readline): ${READLINE_INCLUDE_DIR} : ${LINE_EDITING_LIBS}")
endif ()
elseif (EDIT_LIB AND TERMCAP_LIBRARY)
find_library (CURSES_LIB NAMES curses)
find_path (READLINE_INCLUDE_DIR NAMES editline/readline.h PATHS ${READLINE_INCLUDE_PATHS})
if (CURSES_LIB AND READLINE_INCLUDE_DIR)
set (USE_LIBEDIT 1)
set (LINE_EDITING_LIBS ${EDIT_LIB} ${CURSES_LIB} ${TERMCAP_LIBRARY})
message (STATUS "Using line editing libraries (edit): ${READLINE_INCLUDE_DIR} : ${LINE_EDITING_LIBS}")
endif ()
endif ()
endif ()
if (LINE_EDITING_LIBS AND READLINE_INCLUDE_DIR)
include (CheckCXXSourceRuns)
set (CMAKE_REQUIRED_LIBRARIES ${CMAKE_REQUIRED_LIBRARIES} ${LINE_EDITING_LIBS})
set (CMAKE_REQUIRED_INCLUDES ${CMAKE_REQUIRED_INCLUDES} ${READLINE_INCLUDE_DIR})
check_cxx_source_runs ("
#include <stdio.h>
#include <readline/readline.h>
#include <readline/history.h>
int main() {
add_history(NULL);
append_history(1,NULL);
return 0;
}
" HAVE_READLINE_HISTORY)
else ()
message (STATUS "Not using any library for line editing.")
endif ()
cmake_pop_check_state ()

View File

@ -48,12 +48,12 @@ if (SANITIZE)
set (ENABLE_EMBEDDED_COMPILER 0 CACHE BOOL "")
set (USE_INTERNAL_CAPNP_LIBRARY 0 CACHE BOOL "")
set (USE_SIMDJSON 0 CACHE BOOL "")
set (ENABLE_READLINE 0 CACHE BOOL "")
set (ENABLE_ORC 0 CACHE BOOL "")
set (ENABLE_PARQUET 0 CACHE BOOL "")
set (USE_CAPNP 0 CACHE BOOL "")
set (USE_INTERNAL_ORC_LIBRARY 0 CACHE BOOL "")
set (USE_ORC 0 CACHE BOOL "")
set (USE_AVRO 0 CACHE BOOL "")
set (ENABLE_SSL 0 CACHE BOOL "")
elseif (SANITIZE STREQUAL "thread")

View File

@ -15,7 +15,6 @@ if (CMAKE_CROSSCOMPILING)
set (USE_SNAPPY OFF CACHE INTERNAL "")
set (ENABLE_PROTOBUF OFF CACHE INTERNAL "")
set (ENABLE_PARQUET OFF CACHE INTERNAL "")
set (ENABLE_READLINE OFF CACHE INTERNAL "")
set (ENABLE_ICU OFF CACHE INTERNAL "")
set (ENABLE_FASTOPS OFF CACHE INTERNAL "")
elseif (OS_LINUX)

View File

@ -146,6 +146,20 @@ if (ENABLE_ICU AND USE_INTERNAL_ICU_LIBRARY)
add_subdirectory (icu-cmake)
endif ()
if(USE_INTERNAL_SNAPPY_LIBRARY)
set(SNAPPY_BUILD_TESTS 0 CACHE INTERNAL "")
if (NOT MAKE_STATIC_LIBRARIES)
set(BUILD_SHARED_LIBS 1) # TODO: set at root dir
endif()
add_subdirectory(snappy)
set (SNAPPY_INCLUDE_DIR "${ClickHouse_SOURCE_DIR}/contrib/snappy")
if(SANITIZE STREQUAL "undefined")
target_compile_options(${SNAPPY_LIBRARY} PRIVATE -fno-sanitize=undefined)
endif()
endif()
if (USE_INTERNAL_PARQUET_LIBRARY)
if (USE_INTERNAL_PARQUET_LIBRARY_NATIVE_CMAKE)
# We dont use arrow's cmakefiles because they uses too many depends and download some libs in compile time
@ -189,20 +203,6 @@ if (USE_INTERNAL_PARQUET_LIBRARY_NATIVE_CMAKE)
endif()
else()
if(USE_INTERNAL_SNAPPY_LIBRARY)
set(SNAPPY_BUILD_TESTS 0 CACHE INTERNAL "")
if (NOT MAKE_STATIC_LIBRARIES)
set(BUILD_SHARED_LIBS 1) # TODO: set at root dir
endif()
add_subdirectory(snappy)
set (SNAPPY_INCLUDE_DIR "${ClickHouse_SOURCE_DIR}/contrib/snappy")
if(SANITIZE STREQUAL "undefined")
target_compile_options(${SNAPPY_LIBRARY} PRIVATE -fno-sanitize=undefined)
endif()
endif()
add_subdirectory(arrow-cmake)
# The library is large - avoid bloat.
@ -212,6 +212,10 @@ else()
endif()
endif()
if (USE_INTERNAL_AVRO_LIBRARY)
add_subdirectory(avro-cmake)
endif()
if (USE_INTERNAL_POCO_LIBRARY)
set (POCO_VERBOSE_MESSAGES 0 CACHE INTERNAL "")
set (save_CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})
@ -331,3 +335,5 @@ endif()
if (USE_FASTOPS)
add_subdirectory (fastops-cmake)
endif()
add_subdirectory(replxx-cmake)

1
contrib/avro vendored Submodule

@ -0,0 +1 @@
Subproject commit 5b2752041c8d2f75eb5c1dbec8b4c25fc0e24d12

View File

@ -0,0 +1,70 @@
set(AVROCPP_ROOT_DIR ${CMAKE_SOURCE_DIR}/contrib/avro/lang/c++)
set(AVROCPP_INCLUDE_DIR ${AVROCPP_ROOT_DIR}/api)
set(AVROCPP_SOURCE_DIR ${AVROCPP_ROOT_DIR}/impl)
set (CMAKE_CXX_STANDARD 17)
if (EXISTS ${AVROCPP_ROOT_DIR}/../../share/VERSION.txt)
file(READ "${AVROCPP_ROOT_DIR}/../../share/VERSION.txt"
AVRO_VERSION)
endif()
string(REPLACE "\n" "" AVRO_VERSION ${AVRO_VERSION})
set (AVRO_VERSION_MAJOR ${AVRO_VERSION})
set (AVRO_VERSION_MINOR "0")
set (AVROCPP_SOURCE_FILES
${AVROCPP_SOURCE_DIR}/Compiler.cc
${AVROCPP_SOURCE_DIR}/Node.cc
${AVROCPP_SOURCE_DIR}/LogicalType.cc
${AVROCPP_SOURCE_DIR}/NodeImpl.cc
${AVROCPP_SOURCE_DIR}/ResolverSchema.cc
${AVROCPP_SOURCE_DIR}/Schema.cc
${AVROCPP_SOURCE_DIR}/Types.cc
${AVROCPP_SOURCE_DIR}/ValidSchema.cc
${AVROCPP_SOURCE_DIR}/Zigzag.cc
${AVROCPP_SOURCE_DIR}/BinaryEncoder.cc
${AVROCPP_SOURCE_DIR}/BinaryDecoder.cc
${AVROCPP_SOURCE_DIR}/Stream.cc
${AVROCPP_SOURCE_DIR}/FileStream.cc
${AVROCPP_SOURCE_DIR}/Generic.cc
${AVROCPP_SOURCE_DIR}/GenericDatum.cc
${AVROCPP_SOURCE_DIR}/DataFile.cc
${AVROCPP_SOURCE_DIR}/parsing/Symbol.cc
${AVROCPP_SOURCE_DIR}/parsing/ValidatingCodec.cc
${AVROCPP_SOURCE_DIR}/parsing/JsonCodec.cc
${AVROCPP_SOURCE_DIR}/parsing/ResolvingDecoder.cc
${AVROCPP_SOURCE_DIR}/json/JsonIO.cc
${AVROCPP_SOURCE_DIR}/json/JsonDom.cc
${AVROCPP_SOURCE_DIR}/Resolver.cc
${AVROCPP_SOURCE_DIR}/Validator.cc
)
add_library (avrocpp ${AVROCPP_SOURCE_FILES})
set_target_properties (avrocpp PROPERTIES VERSION ${AVRO_VERSION_MAJOR}.${AVRO_VERSION_MINOR})
target_include_directories(avrocpp SYSTEM PUBLIC ${AVROCPP_INCLUDE_DIR})
target_include_directories(avrocpp SYSTEM PUBLIC ${Boost_INCLUDE_DIRS})
target_link_libraries (avrocpp ${Boost_IOSTREAMS_LIBRARY})
if (SNAPPY_INCLUDE_DIR AND SNAPPY_LIBRARY)
target_compile_definitions (avrocpp PUBLIC SNAPPY_CODEC_AVAILABLE)
target_include_directories (avrocpp PRIVATE ${SNAPPY_INCLUDE_DIR})
target_link_libraries (avrocpp ${SNAPPY_LIBRARY})
endif ()
if (COMPILER_GCC)
set (SUPPRESS_WARNINGS -Wno-non-virtual-dtor)
elseif (COMPILER_CLANG)
set (SUPPRESS_WARNINGS -Wno-non-virtual-dtor)
endif ()
target_compile_options(avrocpp PRIVATE ${SUPPRESS_WARNINGS})
# create a symlink to include headers with <avro/...>
ADD_CUSTOM_TARGET(avro_symlink_headers ALL
COMMAND ${CMAKE_COMMAND} -E make_directory ${AVROCPP_ROOT_DIR}/include
COMMAND ${CMAKE_COMMAND} -E create_symlink ${AVROCPP_ROOT_DIR}/api ${AVROCPP_ROOT_DIR}/include/avro
)
add_dependencies(avrocpp avro_symlink_headers)

View File

@ -74,7 +74,6 @@ file(GLOB S3_UNIFIED_SRC
)
set(S3_INCLUDES
"${CMAKE_CURRENT_SOURCE_DIR}/include/"
"${AWS_COMMON_LIBRARY_DIR}/include/"
"${AWS_EVENT_STREAM_LIBRARY_DIR}/include/"
"${AWS_S3_LIBRARY_DIR}/include/"
@ -96,7 +95,7 @@ target_compile_definitions(aws_s3 PUBLIC -DENABLE_CURL_CLIENT)
target_compile_definitions(aws_s3 PUBLIC "AWS_SDK_VERSION_MAJOR=1")
target_compile_definitions(aws_s3 PUBLIC "AWS_SDK_VERSION_MINOR=7")
target_compile_definitions(aws_s3 PUBLIC "AWS_SDK_VERSION_PATCH=231")
target_include_directories(aws_s3 PUBLIC ${S3_INCLUDES} "${CMAKE_BINARY_DIR}/install")
target_include_directories(aws_s3 PUBLIC ${S3_INCLUDES})
if (OPENSSL_FOUND)
target_compile_definitions(aws_s3 PUBLIC -DENABLE_OPENSSL_ENCRYPTION)

2
contrib/boost vendored

@ -1 +1 @@
Subproject commit 830e51edb59c4f37a8638138581e1e56c29ac44f
Subproject commit 86be2aef20bee2356b744e5569eed6eaded85dbe

View File

@ -37,3 +37,8 @@ target_link_libraries(boost_filesystem_internal PRIVATE boost_system_internal)
if (USE_INTERNAL_PARQUET_LIBRARY)
add_boost_lib(regex)
endif()
if (USE_INTERNAL_AVRO_LIBRARY)
add_boost_lib(iostreams)
target_link_libraries(boost_iostreams_internal PUBLIC ${ZLIB_LIBRARIES})
endif()

View File

@ -23,6 +23,10 @@ typedef unsigned __int64 uint64_t;
#endif // !defined(_MSC_VER)
#ifdef __cplusplus
extern "C" {
#endif
//-----------------------------------------------------------------------------
void MurmurHash3_x86_32 ( const void * key, int len, uint32_t seed, void * out );
@ -32,3 +36,7 @@ void MurmurHash3_x86_128 ( const void * key, int len, uint32_t seed, void * out
void MurmurHash3_x64_128 ( const void * key, int len, uint32_t seed, void * out );
//-----------------------------------------------------------------------------
#ifdef __cplusplus
}
#endif

1
contrib/replxx vendored Submodule

@ -0,0 +1 @@
Subproject commit 37582f0bb8c52513c6c6b76797c02d852d701dad

View File

@ -0,0 +1,57 @@
option (ENABLE_REPLXX "Enable replxx support" ${ENABLE_LIBRARIES})
if (ENABLE_REPLXX)
option (USE_INTERNAL_REPLXX "Use internal replxx library" ${NOT_UNBUNDLED})
if (USE_INTERNAL_REPLXX)
set (LIBRARY_DIR "${ClickHouse_SOURCE_DIR}/contrib/replxx")
set(SRCS
${LIBRARY_DIR}/src/conversion.cxx
${LIBRARY_DIR}/src/ConvertUTF.cpp
${LIBRARY_DIR}/src/escape.cxx
${LIBRARY_DIR}/src/history.cxx
${LIBRARY_DIR}/src/io.cxx
${LIBRARY_DIR}/src/prompt.cxx
${LIBRARY_DIR}/src/replxx_impl.cxx
${LIBRARY_DIR}/src/replxx.cxx
${LIBRARY_DIR}/src/util.cxx
${LIBRARY_DIR}/src/wcwidth.cpp
)
add_library (replxx ${SRCS})
target_include_directories(replxx PUBLIC ${LIBRARY_DIR}/include)
else ()
find_library(LIBRARY_REPLXX NAMES replxx replxx-static)
find_path(INCLUDE_REPLXX replxx.hxx)
add_library(replxx UNKNOWN IMPORTED)
set_property(TARGET replxx PROPERTY IMPORTED_LOCATION ${LIBRARY_REPLXX})
target_include_directories(replxx PUBLIC ${INCLUDE_REPLXX})
set(CMAKE_REQUIRED_LIBRARIES replxx)
check_cxx_source_compiles(
"
#include <replxx.hxx>
int main() {
replxx::Replxx rx;
}
"
EXTERNAL_REPLXX_WORKS
)
if (NOT EXTERNAL_REPLXX_WORKS)
message (FATAL_ERROR "replxx is unusable: ${LIBRARY_REPLXX} ${INCLUDE_REPLXX}")
endif ()
endif ()
target_compile_options(replxx PUBLIC -Wno-documentation)
target_compile_definitions(replxx PUBLIC USE_REPLXX=1)
message (STATUS "Using replxx")
else ()
add_library(replxx INTERFACE)
target_compile_definitions(replxx INTERFACE USE_REPLXX=0)
message (STATUS "Not using replxx (Beware! Runtime fallback to readline is possible!)")
endif ()

View File

@ -142,10 +142,10 @@ elseif (COMPILER_GCC)
add_cxx_compile_options(-Wmaybe-uninitialized)
# Warn when the indentation of the code does not reflect the block structure
add_cxx_compile_options(-Wmisleading-indentation)
# Warn if a global function is defined without a previous declaration
# Warn if a global function is defined without a previous declaration - disabled because of build times
# add_cxx_compile_options(-Wmissing-declarations)
# Warn if a user-supplied include directory does not exist
# add_cxx_compile_options(-Wmissing-include-dirs)
add_cxx_compile_options(-Wmissing-include-dirs)
# Obvious
add_cxx_compile_options(-Wnon-virtual-dtor)
# Obvious
@ -177,7 +177,7 @@ elseif (COMPILER_GCC)
# Warn for suspicious length parameters to certain string and memory built-in functions if the argument uses sizeof
add_cxx_compile_options(-Wsizeof-pointer-memaccess)
# Warn about overriding virtual functions that are not marked with the override keyword
# add_cxx_compile_options(-Wsuggest-override)
add_cxx_compile_options(-Wsuggest-override)
# Warn whenever a switch statement has an index of boolean type and the case values are outside the range of a boolean type
add_cxx_compile_options(-Wswitch-bool)
# Warn if a self-comparison always evaluates to true or false
@ -504,6 +504,10 @@ if (USE_POCO_NETSSL)
dbms_target_link_libraries (PRIVATE ${Poco_NetSSL_LIBRARY} ${Poco_Crypto_LIBRARY})
endif()
if (USE_POCO_JSON)
dbms_target_link_libraries (PRIVATE ${Poco_JSON_LIBRARY})
endif()
dbms_target_link_libraries (PRIVATE ${Poco_Foundation_LIBRARY})
if (USE_ICU)
@ -522,6 +526,11 @@ if (USE_PARQUET)
endif ()
endif ()
if (USE_AVRO)
dbms_target_link_libraries(PRIVATE ${AVROCPP_LIBRARY})
dbms_target_include_directories (SYSTEM BEFORE PRIVATE ${AVROCPP_INCLUDE_DIR})
endif ()
if (OPENSSL_CRYPTO_LIBRARY)
dbms_target_link_libraries (PRIVATE ${OPENSSL_CRYPTO_LIBRARY})
target_link_libraries (clickhouse_common_io PRIVATE ${OPENSSL_CRYPTO_LIBRARY})
@ -563,7 +572,7 @@ if (USE_JEMALLOC)
endif()
endif ()
dbms_target_include_directories (PUBLIC ${DBMS_INCLUDE_DIR} PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/src/Formats/include)
dbms_target_include_directories (PUBLIC ${DBMS_INCLUDE_DIR})
target_include_directories (clickhouse_common_io PUBLIC ${DBMS_INCLUDE_DIR})
target_include_directories (clickhouse_common_io SYSTEM BEFORE PUBLIC ${DOUBLE_CONVERSION_INCLUDE_DIR})

View File

@ -0,0 +1,19 @@
#!/usr/bin/env bash
QUERIES_FILE="queries.sql"
TABLE=$1
TRIES=3
cat "$QUERIES_FILE" | sed "s|{table}|\"${TABLE}\"|g" | while read query; do
echo -n "["
for i in $(seq 1 $TRIES); do
while true; do
RES=$(command time -f %e -o /dev/stdout curl -sS --location-trusted -H "Authorization: OAuth $YT_TOKEN" "$YT_PROXY.yt.yandex.net/query?default_format=Null&database=*$YT_CLIQUE_ID" --data-binary @- <<< "$query" 2>/dev/null) && break;
done
[[ "$?" == "0" ]] && echo -n "${RES}" || echo -n "null"
[[ "$i" != $TRIES ]] && echo -n ", "
done
echo "],"
done

View File

@ -0,0 +1,19 @@
#!/usr/bin/env bash
QUERIES_FILE="queries.sql"
TABLE=$1
TRIES=3
cat "$QUERIES_FILE" | sed "s|{table}|\"${TABLE}\"|g" | while read query; do
echo -n "["
for i in $(seq 1 $TRIES); do
while true; do
RES=$(command time -f %e -o time ./yql --clickhouse --syntax-version 1 -f empty <<< "USE chyt.hume; PRAGMA max_memory_usage = 100000000000; PRAGMA max_memory_usage_for_all_queries = 100000000000; $query" >/dev/null 2>&1 && cat time) && break;
done
[[ "$?" == "0" ]] && echo -n "${RES}" || echo -n "null"
[[ "$i" != $TRIES ]] && echo -n ", "
done
echo "],"
done

View File

@ -101,7 +101,7 @@ public:
}
void initialize(Poco::Util::Application & self [[maybe_unused]])
void initialize(Poco::Util::Application & self [[maybe_unused]]) override
{
std::string home_path;
const char * home_path_cstr = getenv("HOME");
@ -111,7 +111,7 @@ public:
configReadClient(config(), home_path);
}
int main(const std::vector<std::string> &)
int main(const std::vector<std::string> &) override
{
if (!json_path.empty() && Poco::File(json_path).exists()) /// Clear file with previous results
Poco::File(json_path).remove();
@ -418,7 +418,7 @@ private:
std::cerr << percent << "%\t\t";
for (const auto & info : infos)
{
std::cerr << info->sampler.quantileInterpolated(percent / 100.0) << " sec." << "\t";
std::cerr << info->sampler.quantileNearest(percent / 100.0) << " sec." << "\t";
}
std::cerr << "\n";
};
@ -453,7 +453,7 @@ private:
auto print_percentile = [&json_out](Stats & info, auto percent, bool with_comma = true)
{
json_out << "\"" << percent << "\"" << ": " << info.sampler.quantileInterpolated(percent / 100.0) << (with_comma ? ",\n" : "\n");
json_out << "\"" << percent << "\"" << ": " << info.sampler.quantileNearest(percent / 100.0) << (with_comma ? ",\n" : "\n");
};
json_out << "{\n";
@ -492,7 +492,7 @@ private:
public:
~Benchmark()
~Benchmark() override
{
shutdown = true;
}

View File

@ -1,14 +1,10 @@
set(CLICKHOUSE_CLIENT_SOURCES
${CMAKE_CURRENT_SOURCE_DIR}/Client.cpp
${CMAKE_CURRENT_SOURCE_DIR}/ConnectionParameters.cpp
${CMAKE_CURRENT_SOURCE_DIR}/Suggest.cpp
)
set(CLICKHOUSE_CLIENT_LINK PRIVATE clickhouse_common_config clickhouse_functions clickhouse_aggregate_functions clickhouse_common_io clickhouse_parsers string_utils ${LINE_EDITING_LIBS} ${Boost_PROGRAM_OPTIONS_LIBRARY})
set(CLICKHOUSE_CLIENT_INCLUDE PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/include)
if (READLINE_INCLUDE_DIR)
set(CLICKHOUSE_CLIENT_INCLUDE ${CLICKHOUSE_CLIENT_INCLUDE} SYSTEM PRIVATE ${READLINE_INCLUDE_DIR})
endif ()
set(CLICKHOUSE_CLIENT_LINK PRIVATE clickhouse_common_config clickhouse_functions clickhouse_aggregate_functions clickhouse_common_io clickhouse_parsers string_utils ${Boost_PROGRAM_OPTIONS_LIBRARY})
include(CheckSymbolExists)
check_symbol_exists(readpassphrase readpassphrase.h HAVE_READPASSPHRASE)

View File

@ -1,7 +1,13 @@
#include "TestHint.h"
#include "ConnectionParameters.h"
#include "Suggest.h"
#if USE_REPLXX
# include <common/ReplxxLineReader.h>
#else
# include <common/LineReader.h>
#endif
#include <port/unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <signal.h>
@ -18,8 +24,8 @@
#include <Poco/String.h>
#include <Poco/File.h>
#include <Poco/Util/Application.h>
#include <common/readline_use.h>
#include <common/find_symbols.h>
#include <common/LineReader.h>
#include <Common/ClickHouseRevision.h>
#include <Common/Stopwatch.h>
#include <Common/Exception.h>
@ -69,10 +75,6 @@
#include <common/argsToConfig.h>
#include <Common/TerminalSize.h>
#if USE_READLINE
#include "Suggest.h"
#endif
#ifndef __clang__
#pragma GCC optimize("-fno-var-tracking-assignments")
#endif
@ -89,39 +91,6 @@
#define DISABLE_LINE_WRAPPING "\033[?7l"
#define ENABLE_LINE_WRAPPING "\033[?7h"
#if USE_READLINE && RL_VERSION_MAJOR >= 7
#define BRACK_PASTE_PREF "\033[200~"
#define BRACK_PASTE_SUFF "\033[201~"
#define BRACK_PASTE_LAST '~'
#define BRACK_PASTE_SLEN 6
/// This handler bypasses some unused macro/event checkings.
static int clickhouse_rl_bracketed_paste_begin(int /* count */, int /* key */)
{
std::string buf;
buf.reserve(128);
RL_SETSTATE(RL_STATE_MOREINPUT);
SCOPE_EXIT(RL_UNSETSTATE(RL_STATE_MOREINPUT));
int c;
while ((c = rl_read_key()) >= 0)
{
if (c == '\r')
c = '\n';
buf.push_back(c);
if (buf.size() >= BRACK_PASTE_SLEN && c == BRACK_PASTE_LAST && buf.substr(buf.size() - BRACK_PASTE_SLEN) == BRACK_PASTE_SUFF)
{
buf.resize(buf.size() - BRACK_PASTE_SLEN);
break;
}
}
return static_cast<size_t>(rl_insert_text(buf.c_str())) == buf.size() ? 0 : 1;
}
#endif
namespace DB
{
@ -136,7 +105,6 @@ namespace ErrorCodes
extern const int UNEXPECTED_PACKET_FROM_SERVER;
extern const int CLIENT_OUTPUT_FORMAT_SPECIFIED;
extern const int CANNOT_SET_SIGNAL_HANDLER;
extern const int CANNOT_READLINE;
extern const int SYSTEM_ERROR;
extern const int INVALID_USAGE_OF_INPUT;
}
@ -157,7 +125,7 @@ private:
"учшеж", "йгшеж", "дщпщгеж",
"q", "й", "\\q", "\\Q", "\\й", "\\Й", ":q", "Жй"
};
bool is_interactive = true; /// Use either readline interface or batch mode.
bool is_interactive = true; /// Use either interactive line editing interface or batch mode.
bool need_render_progress = true; /// Render query execution progress.
bool echo_queries = false; /// Print queries before execution in batch mode.
bool ignore_error = false; /// In case of errors, don't print error message, continue to next query. Only applicable for non-interactive mode.
@ -242,7 +210,7 @@ private:
ConnectionParameters connection_parameters;
void initialize(Poco::Util::Application & self)
void initialize(Poco::Util::Application & self) override
{
Poco::Util::Application::initialize(self);
@ -270,7 +238,7 @@ private:
}
int main(const std::vector<std::string> & /*args*/)
int main(const std::vector<std::string> & /*args*/) override
{
try
{
@ -514,26 +482,10 @@ private:
if (print_time_to_stderr)
throw Exception("time option could be specified only in non-interactive mode", ErrorCodes::BAD_ARGUMENTS);
#if USE_READLINE
SCOPE_EXIT({ Suggest::instance().finalize(); });
if (server_revision >= Suggest::MIN_SERVER_REVISION
&& !config().getBool("disable_suggestion", false))
{
if (server_revision >= Suggest::MIN_SERVER_REVISION && !config().getBool("disable_suggestion", false))
/// Load suggestion data from the server.
Suggest::instance().load(connection_parameters, config().getInt("suggestion_limit"));
/// Added '.' to the default list. Because it is used to separate database and table.
rl_basic_word_break_characters = " \t\n\r\"\\'`@$><=;|&{(.";
/// Not append whitespace after single suggestion. Because whitespace after function name is meaningless.
rl_completion_append_character = '\0';
rl_completion_entry_function = Suggest::generator;
}
else
/// Turn tab completion off.
rl_bind_key('\t', rl_insert);
#endif
/// Load command history if present.
if (config().has("history_file"))
history_file = config().getString("history_file");
@ -546,70 +498,56 @@ private:
history_file = home_path + "/.clickhouse-client-history";
}
if (!history_file.empty())
{
if (Poco::File(history_file).exists())
{
#if USE_READLINE
int res = read_history(history_file.c_str());
if (res)
std::cerr << "Cannot read history from file " + history_file + ": "+ errnoToString(ErrorCodes::CANNOT_READ_HISTORY);
if (!history_file.empty() && !Poco::File(history_file).exists())
Poco::File(history_file).createFile();
#if USE_REPLXX
ReplxxLineReader lr(Suggest::instance(), history_file, '\\', config().has("multiline") ? ';' : 0);
#else
LineReader lr(history_file, '\\', config().has("multiline") ? ';' : 0);
#endif
do
{
auto input = lr.readLine(prompt(), ":-] ");
if (input.empty())
break;
has_vertical_output_suffix = false;
if (input.ends_with("\\G"))
{
input.resize(input.size() - 2);
has_vertical_output_suffix = true;
}
try
{
if (!process(input))
break;
}
catch (const Exception & e)
{
actual_client_error = e.code();
if (!actual_client_error || actual_client_error != expected_client_error)
{
std::cerr << std::endl
<< "Exception on client:" << std::endl
<< "Code: " << e.code() << ". " << e.displayText() << std::endl;
if (config().getBool("stacktrace", false))
std::cerr << "Stack trace:" << std::endl << e.getStackTraceString() << std::endl;
std::cerr << std::endl;
}
/// Client-side exception during query execution can result in the loss of
/// sync in the connection protocol.
/// So we reconnect and allow to enter the next query.
connect();
}
else /// Create history file.
Poco::File(history_file).createFile();
}
#if USE_READLINE
/// Install Ctrl+C signal handler that will be used in interactive mode.
if (rl_initialize())
throw Exception("Cannot initialize readline", ErrorCodes::CANNOT_READLINE);
#if RL_VERSION_MAJOR >= 7
/// Enable bracketed-paste-mode only when multiquery is enabled and multiline is
/// disabled, so that we are able to paste and execute multiline queries in a whole
/// instead of erroring out, while be less intrusive.
if (config().has("multiquery") && !config().has("multiline"))
{
/// When bracketed paste mode is set, pasted text is bracketed with control sequences so
/// that the program can differentiate pasted text from typed-in text. This helps
/// clickhouse-client so that without -m flag, one can still paste multiline queries, and
/// possibly get better pasting performance. See https://cirw.in/blog/bracketed-paste for
/// more details.
rl_variable_bind("enable-bracketed-paste", "on");
/// Use our bracketed paste handler to get better user experience. See comments above.
rl_bind_keyseq(BRACK_PASTE_PREF, clickhouse_rl_bracketed_paste_begin);
}
#endif
auto clear_prompt_or_exit = [](int)
{
/// This is signal safe.
ssize_t res = write(STDOUT_FILENO, "\n", 1);
/// Allow to quit client while query is in progress by pressing Ctrl+C twice.
/// (First press to Ctrl+C will try to cancel query by InterruptListener).
if (res == 1 && rl_line_buffer[0] && !RL_ISSTATE(RL_STATE_DONE))
{
rl_replace_line("", 0);
if (rl_forced_update_display())
_exit(0);
}
else
{
/// A little dirty, but we struggle to find better way to correctly
/// force readline to exit after returning from the signal handler.
_exit(0);
}
};
if (signal(SIGINT, clear_prompt_or_exit) == SIG_ERR)
throwFromErrno("Cannot set signal handler.", ErrorCodes::CANNOT_SET_SIGNAL_HANDLER);
#endif
loop();
while (true);
if (isNewYearMode())
std::cout << "Happy new year." << std::endl;
@ -621,17 +559,6 @@ private:
}
else
{
/// This is intended for testing purposes.
if (config().getBool("always_load_suggestion_data", false))
{
#if USE_READLINE
SCOPE_EXIT({ Suggest::instance().finalize(); });
Suggest::instance().load(connection_parameters, config().getInt("suggestion_limit"));
#else
throw Exception("Command line suggestions cannot work without readline", ErrorCodes::BAD_ARGUMENTS);
#endif
}
query_id = config().getString("query_id", "");
nonInteractive();
@ -706,111 +633,11 @@ private:
}
/// Check if multi-line query is inserted from the paste buffer.
/// Allows delaying the start of query execution until the entirety of query is inserted.
static bool hasDataInSTDIN()
{
timeval timeout = { 0, 0 };
fd_set fds;
FD_ZERO(&fds);
FD_SET(STDIN_FILENO, &fds);
return select(1, &fds, nullptr, nullptr, &timeout) == 1;
}
inline const String prompt() const
{
return boost::replace_all_copy(prompt_by_server_display_name, "{database}", config().getString("database", "default"));
}
void loop()
{
String input;
String prev_input;
while (char * line_ = readline(input.empty() ? prompt().c_str() : ":-] "))
{
String line = line_;
free(line_);
size_t ws = line.size();
while (ws > 0 && isWhitespaceASCII(line[ws - 1]))
--ws;
if (ws == 0 || line.empty())
continue;
bool ends_with_semicolon = line[ws - 1] == ';';
bool ends_with_backslash = line[ws - 1] == '\\';
has_vertical_output_suffix = (ws >= 2) && (line[ws - 2] == '\\') && (line[ws - 1] == 'G');
if (ends_with_backslash)
line = line.substr(0, ws - 1);
input += line;
if (!ends_with_backslash && (ends_with_semicolon || has_vertical_output_suffix || (!config().has("multiline") && !hasDataInSTDIN())))
{
// TODO: should we do sensitive data masking on client too? History file can be source of secret leaks.
if (input != prev_input)
{
/// Replace line breaks with spaces to prevent the following problem.
/// Every line of multi-line query is saved to history file as a separate line.
/// If the user restarts the client then after pressing the "up" button
/// every line of the query will be displayed separately.
std::string logged_query = input;
if (config().has("multiline"))
std::replace(logged_query.begin(), logged_query.end(), '\n', ' ');
add_history(logged_query.c_str());
#if USE_READLINE && HAVE_READLINE_HISTORY
if (!history_file.empty() && append_history(1, history_file.c_str()))
std::cerr << "Cannot append history to file " + history_file + ": " + errnoToString(ErrorCodes::CANNOT_APPEND_HISTORY);
#endif
prev_input = input;
}
if (has_vertical_output_suffix)
input = input.substr(0, input.length() - 2);
try
{
if (!process(input))
break;
}
catch (const Exception & e)
{
actual_client_error = e.code();
if (!actual_client_error || actual_client_error != expected_client_error)
{
std::cerr << std::endl
<< "Exception on client:" << std::endl
<< "Code: " << e.code() << ". " << e.displayText() << std::endl;
if (config().getBool("stacktrace", false))
std::cerr << "Stack trace:" << std::endl
<< e.getStackTraceString() << std::endl;
std::cerr << std::endl;
}
/// Client-side exception during query execution can result in the loss of
/// sync in the connection protocol.
/// So we reconnect and allow to enter the next query.
connect();
}
input = "";
}
else
{
input += '\n';
}
}
}
void nonInteractive()
{
@ -2001,13 +1828,6 @@ public:
server_logs_file = options["server_logs_file"].as<std::string>();
if (options.count("disable_suggestion"))
config().setBool("disable_suggestion", true);
if (options.count("always_load_suggestion_data"))
{
if (options.count("disable_suggestion"))
throw Exception("Command line parameters disable_suggestion (-A) and always_load_suggestion_data cannot be specified simultaneously",
ErrorCodes::BAD_ARGUMENTS);
config().setBool("always_load_suggestion_data", true);
}
if (options.count("suggestion_limit"))
config().setInt("suggestion_limit", options["suggestion_limit"].as<int>());

View File

@ -0,0 +1,144 @@
#include "Suggest.h"
#include <Columns/ColumnString.h>
#include <Common/typeid_cast.h>
namespace DB
{
void Suggest::load(const ConnectionParameters & connection_parameters, size_t suggestion_limit)
{
loading_thread = std::thread([connection_parameters, suggestion_limit, this]
{
try
{
Connection connection(
connection_parameters.host,
connection_parameters.port,
connection_parameters.default_database,
connection_parameters.user,
connection_parameters.password,
"client",
connection_parameters.compression,
connection_parameters.security);
loadImpl(connection, connection_parameters.timeouts, suggestion_limit);
}
catch (...)
{
std::cerr << "Cannot load data for command line suggestions: " << getCurrentExceptionMessage(false, true) << "\n";
}
/// Note that keyword suggestions are available even if we cannot load data from server.
std::sort(words.begin(), words.end());
ready = true;
});
}
Suggest::Suggest()
{
/// Keywords may be not up to date with ClickHouse parser.
words = {"CREATE", "DATABASE", "IF", "NOT", "EXISTS", "TEMPORARY", "TABLE", "ON", "CLUSTER", "DEFAULT",
"MATERIALIZED", "ALIAS", "ENGINE", "AS", "VIEW", "POPULATE", "SETTINGS", "ATTACH", "DETACH", "DROP",
"RENAME", "TO", "ALTER", "ADD", "MODIFY", "CLEAR", "COLUMN", "AFTER", "COPY", "PROJECT",
"PRIMARY", "KEY", "CHECK", "PARTITION", "PART", "FREEZE", "FETCH", "FROM", "SHOW", "INTO",
"OUTFILE", "FORMAT", "TABLES", "DATABASES", "LIKE", "PROCESSLIST", "CASE", "WHEN", "THEN", "ELSE",
"END", "DESCRIBE", "DESC", "USE", "SET", "OPTIMIZE", "FINAL", "DEDUPLICATE", "INSERT", "VALUES",
"SELECT", "DISTINCT", "SAMPLE", "ARRAY", "JOIN", "GLOBAL", "LOCAL", "ANY", "ALL", "INNER",
"LEFT", "RIGHT", "FULL", "OUTER", "CROSS", "USING", "PREWHERE", "WHERE", "GROUP", "BY",
"WITH", "TOTALS", "HAVING", "ORDER", "COLLATE", "LIMIT", "UNION", "AND", "OR", "ASC",
"IN", "KILL", "QUERY", "SYNC", "ASYNC", "TEST", "BETWEEN", "TRUNCATE"};
}
void Suggest::loadImpl(Connection & connection, const ConnectionTimeouts & timeouts, size_t suggestion_limit)
{
std::stringstream query;
query << "SELECT DISTINCT arrayJoin(extractAll(name, '[\\\\w_]{2,}')) AS res FROM ("
"SELECT name FROM system.functions"
" UNION ALL "
"SELECT name FROM system.table_engines"
" UNION ALL "
"SELECT name FROM system.formats"
" UNION ALL "
"SELECT name FROM system.table_functions"
" UNION ALL "
"SELECT name FROM system.data_type_families"
" UNION ALL "
"SELECT name FROM system.settings"
" UNION ALL "
"SELECT cluster FROM system.clusters"
" UNION ALL "
"SELECT concat(func.name, comb.name) FROM system.functions AS func CROSS JOIN system.aggregate_function_combinators AS comb WHERE is_aggregate";
/// The user may disable loading of databases, tables, columns by setting suggestion_limit to zero.
if (suggestion_limit > 0)
{
String limit_str = toString(suggestion_limit);
query <<
" UNION ALL "
"SELECT name FROM system.databases LIMIT " << limit_str
<< " UNION ALL "
"SELECT DISTINCT name FROM system.tables LIMIT " << limit_str
<< " UNION ALL "
"SELECT DISTINCT name FROM system.columns LIMIT " << limit_str;
}
query << ") WHERE notEmpty(res)";
fetch(connection, timeouts, query.str());
}
void Suggest::fetch(Connection & connection, const ConnectionTimeouts & timeouts, const std::string & query)
{
connection.sendQuery(timeouts, query);
while (true)
{
Packet packet = connection.receivePacket();
switch (packet.type)
{
case Protocol::Server::Data:
fillWordsFromBlock(packet.block);
continue;
case Protocol::Server::Progress:
continue;
case Protocol::Server::ProfileInfo:
continue;
case Protocol::Server::Totals:
continue;
case Protocol::Server::Extremes:
continue;
case Protocol::Server::Log:
continue;
case Protocol::Server::Exception:
packet.exception->rethrow();
return;
case Protocol::Server::EndOfStream:
return;
default:
throw Exception("Unknown packet from server", ErrorCodes::UNKNOWN_PACKET_FROM_SERVER);
}
}
}
void Suggest::fillWordsFromBlock(const Block & block)
{
if (!block)
return;
if (block.columns() != 1)
throw Exception("Wrong number of columns received for query to read words for suggestion", ErrorCodes::LOGICAL_ERROR);
const ColumnString & column = typeid_cast<const ColumnString &>(*block.getByPosition(0).column);
size_t rows = block.rows();
for (size_t i = 0; i < rows; ++i)
words.emplace_back(column.getDataAt(i).toString());
}
}

View File

@ -2,18 +2,9 @@
#include "ConnectionParameters.h"
#include <string>
#include <sstream>
#include <string.h>
#include <vector>
#include <algorithm>
#include <common/readline_use.h>
#include <Common/typeid_cast.h>
#include <Columns/ColumnString.h>
#include <Client/Connection.h>
#include <IO/ConnectionTimeouts.h>
#include <common/LineReader.h>
namespace DB
@ -24,141 +15,8 @@ namespace ErrorCodes
extern const int UNKNOWN_PACKET_FROM_SERVER;
}
class Suggest : private boost::noncopyable
class Suggest : public LineReader::Suggest, boost::noncopyable
{
private:
/// The vector will be filled with completion words from the server and sorted.
using Words = std::vector<std::string>;
/// Keywords may be not up to date with ClickHouse parser.
Words words
{
"CREATE", "DATABASE", "IF", "NOT", "EXISTS", "TEMPORARY", "TABLE", "ON", "CLUSTER", "DEFAULT", "MATERIALIZED", "ALIAS", "ENGINE",
"AS", "VIEW", "POPULATE", "SETTINGS", "ATTACH", "DETACH", "DROP", "RENAME", "TO", "ALTER", "ADD", "MODIFY", "CLEAR", "COLUMN", "AFTER",
"COPY", "PROJECT", "PRIMARY", "KEY", "CHECK", "PARTITION", "PART", "FREEZE", "FETCH", "FROM", "SHOW", "INTO", "OUTFILE", "FORMAT", "TABLES",
"DATABASES", "LIKE", "PROCESSLIST", "CASE", "WHEN", "THEN", "ELSE", "END", "DESCRIBE", "DESC", "USE", "SET", "OPTIMIZE", "FINAL", "DEDUPLICATE",
"INSERT", "VALUES", "SELECT", "DISTINCT", "SAMPLE", "ARRAY", "JOIN", "GLOBAL", "LOCAL", "ANY", "ALL", "INNER", "LEFT", "RIGHT", "FULL", "OUTER",
"CROSS", "USING", "PREWHERE", "WHERE", "GROUP", "BY", "WITH", "TOTALS", "HAVING", "ORDER", "COLLATE", "LIMIT", "UNION", "AND", "OR", "ASC", "IN",
"KILL", "QUERY", "SYNC", "ASYNC", "TEST", "BETWEEN", "TRUNCATE"
};
/// Words are fetched asynchronously.
std::thread loading_thread;
std::atomic<bool> ready{false};
/// Points to current word to suggest.
Words::const_iterator pos;
/// Points after the last possible match.
Words::const_iterator end;
/// Set iterators to the matched range of words if any.
void findRange(const char * prefix, size_t prefix_length)
{
std::string prefix_str(prefix);
std::tie(pos, end) = std::equal_range(words.begin(), words.end(), prefix_str,
[prefix_length](const std::string & s, const std::string & prefix_searched) { return strncmp(s.c_str(), prefix_searched.c_str(), prefix_length) < 0; });
}
/// Iterates through matched range.
char * nextMatch()
{
if (pos >= end)
return nullptr;
/// readline will free memory by itself.
char * word = strdup(pos->c_str());
++pos;
return word;
}
void loadImpl(Connection & connection, const ConnectionTimeouts & timeouts, size_t suggestion_limit)
{
std::stringstream query;
query << "SELECT DISTINCT arrayJoin(extractAll(name, '[\\\\w_]{2,}')) AS res FROM ("
"SELECT name FROM system.functions"
" UNION ALL "
"SELECT name FROM system.table_engines"
" UNION ALL "
"SELECT name FROM system.formats"
" UNION ALL "
"SELECT name FROM system.table_functions"
" UNION ALL "
"SELECT name FROM system.data_type_families"
" UNION ALL "
"SELECT name FROM system.settings"
" UNION ALL "
"SELECT concat(func.name, comb.name) FROM system.functions AS func CROSS JOIN system.aggregate_function_combinators AS comb WHERE is_aggregate";
/// The user may disable loading of databases, tables, columns by setting suggestion_limit to zero.
if (suggestion_limit > 0)
{
String limit_str = toString(suggestion_limit);
query <<
" UNION ALL "
"SELECT name FROM system.databases LIMIT " << limit_str
<< " UNION ALL "
"SELECT DISTINCT name FROM system.tables LIMIT " << limit_str
<< " UNION ALL "
"SELECT DISTINCT name FROM system.columns LIMIT " << limit_str;
}
query << ") WHERE notEmpty(res)";
fetch(connection, timeouts, query.str());
}
void fetch(Connection & connection, const ConnectionTimeouts & timeouts, const std::string & query)
{
connection.sendQuery(timeouts, query);
while (true)
{
Packet packet = connection.receivePacket();
switch (packet.type)
{
case Protocol::Server::Data:
fillWordsFromBlock(packet.block);
continue;
case Protocol::Server::Progress:
continue;
case Protocol::Server::ProfileInfo:
continue;
case Protocol::Server::Totals:
continue;
case Protocol::Server::Extremes:
continue;
case Protocol::Server::Log:
continue;
case Protocol::Server::Exception:
packet.exception->rethrow();
return;
case Protocol::Server::EndOfStream:
return;
default:
throw Exception("Unknown packet from server", ErrorCodes::UNKNOWN_PACKET_FROM_SERVER);
}
}
}
void fillWordsFromBlock(const Block & block)
{
if (!block)
return;
if (block.columns() != 1)
throw Exception("Wrong number of columns received for query to read words for suggestion", ErrorCodes::LOGICAL_ERROR);
const ColumnString & column = typeid_cast<const ColumnString &>(*block.getByPosition(0).column);
size_t rows = block.rows();
for (size_t i = 0; i < rows; ++i)
words.emplace_back(column.getDataAt(i).toString());
}
public:
static Suggest & instance()
{
@ -166,64 +24,25 @@ public:
return instance;
}
/// More old server versions cannot execute the query above.
void load(const ConnectionParameters & connection_parameters, size_t suggestion_limit);
/// Older server versions cannot execute the query above.
static constexpr int MIN_SERVER_REVISION = 54406;
void load(const ConnectionParameters & connection_parameters, size_t suggestion_limit)
{
loading_thread = std::thread([connection_parameters, suggestion_limit, this]
{
try
{
Connection connection(
connection_parameters.host,
connection_parameters.port,
connection_parameters.default_database,
connection_parameters.user,
connection_parameters.password,
"client",
connection_parameters.compression,
connection_parameters.security);
loadImpl(connection, connection_parameters.timeouts, suggestion_limit);
}
catch (...)
{
std::cerr << "Cannot load data for command line suggestions: " << getCurrentExceptionMessage(false, true) << "\n";
}
/// Note that keyword suggestions are available even if we cannot load data from server.
std::sort(words.begin(), words.end());
ready = true;
});
}
void finalize()
private:
Suggest();
~Suggest()
{
if (loading_thread.joinable())
loading_thread.join();
}
/// A function for readline.
static char * generator(const char * text, int state)
{
Suggest & suggest = Suggest::instance();
if (!suggest.ready)
return nullptr;
if (state == 0)
suggest.findRange(text, strlen(text));
void loadImpl(Connection & connection, const ConnectionTimeouts & timeouts, size_t suggestion_limit);
void fetch(Connection & connection, const ConnectionTimeouts & timeouts, const std::string & query);
void fillWordsFromBlock(const Block & block);
/// Do not append whitespace after word. For unknown reason, rl_completion_append_character = '\0' does not work.
rl_completion_suppress_append = 1;
return suggest.nextMatch();
}
~Suggest()
{
finalize();
}
/// Words are fetched asynchronously.
std::thread loading_thread;
};
}

View File

@ -10,4 +10,4 @@ set_target_properties(readpassphrase
PROPERTIES LINKER_LANGUAGE C
)
# . to allow #include <readpassphrase.h>
target_include_directories(readpassphrase PUBLIC . ${CMAKE_CURRENT_BINARY_DIR}/include ${CMAKE_CURRENT_BINARY_DIR}/../include)
target_include_directories(readpassphrase PUBLIC . ${CMAKE_CURRENT_BINARY_DIR}/include)

View File

@ -111,7 +111,7 @@ void LocalServer::tryInitPath()
/// In case of empty path set paths to helpful directories
std::string cd = Poco::Path::current();
context->setTemporaryPath(cd + "tmp");
context->setTemporaryStorage(cd + "tmp");
context->setFlagsPath(cd + "flags");
context->setUserFilesPath(""); // user's files are everywhere
}

View File

@ -15,20 +15,24 @@ set(CLICKHOUSE_ODBC_BRIDGE_INCLUDE PUBLIC ${ClickHouse_SOURCE_DIR}/libs/libdaemo
if (USE_POCO_SQLODBC)
set(CLICKHOUSE_ODBC_BRIDGE_LINK ${CLICKHOUSE_ODBC_BRIDGE_LINK} PRIVATE ${Poco_SQLODBC_LIBRARY})
set(CLICKHOUSE_ODBC_BRIDGE_INCLUDE ${CLICKHOUSE_ODBC_BRIDGE_INCLUDE} SYSTEM PRIVATE ${ODBC_INCLUDE_DIRS} ${Poco_SQLODBC_INCLUDE_DIR})
# Wouldnt work anyway because of the way list variable got expanded in `target_include_directories`
# set(CLICKHOUSE_ODBC_BRIDGE_INCLUDE ${CLICKHOUSE_ODBC_BRIDGE_INCLUDE} SYSTEM PRIVATE ${ODBC_INCLUDE_DIRS} ${Poco_SQLODBC_INCLUDE_DIR})
endif ()
if (Poco_SQL_FOUND)
set(CLICKHOUSE_ODBC_BRIDGE_LINK ${CLICKHOUSE_ODBC_BRIDGE_LINK} PRIVATE ${Poco_SQL_LIBRARY})
set(CLICKHOUSE_ODBC_BRIDGE_INCLUDE ${CLICKHOUSE_ODBC_BRIDGE_INCLUDE} SYSTEM PRIVATE ${Poco_SQL_INCLUDE_DIR})
# Wouldnt work anyway because of the way list variable got expanded in `target_include_directories`
# set(CLICKHOUSE_ODBC_BRIDGE_INCLUDE ${CLICKHOUSE_ODBC_BRIDGE_INCLUDE} SYSTEM PRIVATE ${Poco_SQL_INCLUDE_DIR})
endif ()
if (USE_POCO_DATAODBC)
set(CLICKHOUSE_ODBC_BRIDGE_LINK ${CLICKHOUSE_ODBC_BRIDGE_LINK} PRIVATE ${Poco_DataODBC_LIBRARY})
set(CLICKHOUSE_ODBC_BRIDGE_INCLUDE ${CLICKHOUSE_ODBC_BRIDGE_INCLUDE} SYSTEM PRIVATE ${ODBC_INCLUDE_DIRS} ${Poco_DataODBC_INCLUDE_DIR})
# Wouldnt work anyway because of the way list variable got expanded in `target_include_directories`
# set(CLICKHOUSE_ODBC_BRIDGE_INCLUDE ${CLICKHOUSE_ODBC_BRIDGE_INCLUDE} SYSTEM PRIVATE ${ODBC_INCLUDE_DIRS} ${Poco_DataODBC_INCLUDE_DIR})
endif()
if (Poco_Data_FOUND)
set(CLICKHOUSE_ODBC_BRIDGE_LINK ${CLICKHOUSE_ODBC_BRIDGE_LINK} PRIVATE ${Poco_Data_LIBRARY})
set(CLICKHOUSE_ODBC_BRIDGE_INCLUDE ${CLICKHOUSE_ODBC_BRIDGE_INCLUDE} SYSTEM PRIVATE ${Poco_Data_INCLUDE_DIR})
# Wouldnt work anyway because of the way list variable got expanded in `target_include_directories`
# set(CLICKHOUSE_ODBC_BRIDGE_INCLUDE ${CLICKHOUSE_ODBC_BRIDGE_INCLUDE} SYSTEM PRIVATE ${Poco_Data_INCLUDE_DIR})
endif ()
clickhouse_program_add_library(odbc-bridge)

View File

@ -17,6 +17,7 @@
#include <Common/setThreadName.h>
#include <Common/config.h>
#include <Common/SettingsChanges.h>
#include <Disks/DiskSpaceMonitor.h>
#include <Compression/CompressedReadBuffer.h>
#include <Compression/CompressedWriteBuffer.h>
#include <IO/ReadBufferFromIStream.h>
@ -351,7 +352,8 @@ void HTTPHandler::processQuery(
if (buffer_until_eof)
{
std::string tmp_path_template = context.getTemporaryPath() + "http_buffers/";
const std::string tmp_path(context.getTemporaryVolume()->getNextDisk()->getPath());
const std::string tmp_path_template(tmp_path + "http_buffers/");
auto create_tmp_disk_buffer = [tmp_path_template] (const WriteBufferPtr &)
{
@ -590,7 +592,11 @@ void HTTPHandler::processQuery(
customizeContext(context);
executeQuery(*in, *used_output.out_maybe_delayed_and_compressed, /* allow_into_outfile = */ false, context,
[&response] (const String & content_type) { response.setContentType(content_type); },
[&response] (const String & content_type, const String & format)
{
response.setContentType(content_type);
response.add("X-ClickHouse-Format", format);
},
[&response] (const String & current_query_id) { response.add("X-ClickHouse-Query-Id", current_query_id); });
if (used_output.hasDelayed())
@ -610,6 +616,8 @@ void HTTPHandler::trySendExceptionToClient(const std::string & s, int exception_
{
try
{
response.set("X-ClickHouse-Exception-Code", toString<int>(exception_code));
/// If HTTP method is POST and Keep-Alive is turned on, we should read the whole request body
/// to avoid reading part of the current request body in the next request.
if (request.getMethod() == Poco::Net::HTTPRequest::HTTP_POST

View File

@ -61,6 +61,10 @@ void InterserverIOHTTPHandler::processQuery(Poco::Net::HTTPServerRequest & reque
ReadBufferFromIStream body(request.stream());
auto endpoint = server.context().getInterserverIOHandler().getEndpoint(endpoint_name);
/// Locked for read while query processing
std::shared_lock lock(endpoint->rwlock);
if (endpoint->blocker.isCancelled())
throw Exception("Transferring part to replica was cancelled", ErrorCodes::ABORTED);
if (compress)
{

View File

@ -79,21 +79,19 @@ void MySQLHandler::run()
if (!connection_context.mysql.max_packet_size)
connection_context.mysql.max_packet_size = MAX_PACKET_LENGTH;
/* LOG_TRACE(log, "Capabilities: " << handshake_response.capability_flags
<< "\nmax_packet_size: "
LOG_TRACE(log, "Capabilities: " << handshake_response.capability_flags
<< ", max_packet_size: "
<< handshake_response.max_packet_size
<< "\ncharacter_set: "
<< handshake_response.character_set
<< "\nuser: "
<< ", character_set: "
<< static_cast<int>(handshake_response.character_set)
<< ", user: "
<< handshake_response.username
<< "\nauth_response length: "
<< ", auth_response length: "
<< handshake_response.auth_response.length()
<< "\nauth_response: "
<< handshake_response.auth_response
<< "\ndatabase: "
<< ", database: "
<< handshake_response.database
<< "\nauth_plugin_name: "
<< handshake_response.auth_plugin_name);*/
<< ", auth_plugin_name: "
<< handshake_response.auth_plugin_name);
client_capability_flags = handshake_response.capability_flags;
if (!(client_capability_flags & CLIENT_PROTOCOL_41))
@ -284,7 +282,8 @@ void MySQLHandler::comQuery(ReadBuffer & payload)
else
{
bool with_output = false;
std::function<void(const String &)> set_content_type = [&with_output](const String &) -> void {
std::function<void(const String &, const String &)> set_content_type_and_format = [&with_output](const String &, const String &) -> void
{
with_output = true;
};
@ -307,7 +306,7 @@ void MySQLHandler::comQuery(ReadBuffer & payload)
ReadBufferFromString replacement(replacement_query);
Context query_context = connection_context;
executeQuery(should_replace ? replacement : payload, *out, true, query_context, set_content_type, nullptr);
executeQuery(should_replace ? replacement : payload, *out, true, query_context, set_content_type_and_format, {});
if (!with_output)
packet_sender->sendPacket(OK_Packet(0x00, client_capability_flags, 0, 0, 0), true);

View File

@ -34,7 +34,7 @@ MySQLHandlerFactory::MySQLHandlerFactory(IServer & server_)
}
catch (...)
{
LOG_INFO(log, "Failed to create SSL context. SSL will be disabled. Error: " << getCurrentExceptionMessage(false));
LOG_TRACE(log, "Failed to create SSL context. SSL will be disabled. Error: " << getCurrentExceptionMessage(false));
ssl_enabled = false;
}
#endif
@ -47,7 +47,7 @@ MySQLHandlerFactory::MySQLHandlerFactory(IServer & server_)
}
catch (...)
{
LOG_WARNING(log, "Failed to read RSA keys. Error: " << getCurrentExceptionMessage(false));
LOG_TRACE(log, "Failed to read RSA key pair from server certificate. Error: " << getCurrentExceptionMessage(false));
generateRSAKeys();
}
#endif
@ -104,7 +104,7 @@ void MySQLHandlerFactory::readRSAKeys()
void MySQLHandlerFactory::generateRSAKeys()
{
LOG_INFO(log, "Generating new RSA key.");
LOG_TRACE(log, "Generating new RSA key pair.");
public_key.reset(RSA_new());
if (!public_key)
throw Exception("Failed to allocate RSA key. Error: " + getOpenSSLErrors(), ErrorCodes::OPENSSL_ERROR);

View File

@ -77,6 +77,31 @@ namespace CurrentMetrics
extern const Metric VersionInteger;
}
namespace
{
void setupTmpPath(Logger * log, const std::string & path)
{
LOG_DEBUG(log, "Setting up " << path << " to store temporary data in it");
Poco::File(path).createDirectories();
/// Clearing old temporary files.
Poco::DirectoryIterator dir_end;
for (Poco::DirectoryIterator it(path); it != dir_end; ++it)
{
if (it->isFile() && startsWith(it.name(), "tmp"))
{
LOG_DEBUG(log, "Removing old temporary file " << it->path());
it->remove();
}
else
LOG_DEBUG(log, "Skipped file in temporary path " << it->path());
}
}
}
namespace DB
{
@ -331,22 +356,14 @@ int Server::main(const std::vector<std::string> & /*args*/)
DateLUT::instance();
LOG_TRACE(log, "Initialized DateLUT with time zone '" << DateLUT::instance().getTimeZone() << "'.");
/// Directory with temporary data for processing of heavy queries.
/// Storage with temporary data for processing of heavy queries.
{
std::string tmp_path = config().getString("tmp_path", path + "tmp/");
global_context->setTemporaryPath(tmp_path);
Poco::File(tmp_path).createDirectories();
/// Clearing old temporary files.
Poco::DirectoryIterator dir_end;
for (Poco::DirectoryIterator it(tmp_path); it != dir_end; ++it)
{
if (it->isFile() && startsWith(it.name(), "tmp"))
{
LOG_DEBUG(log, "Removing old temporary file " << it->path());
it->remove();
}
}
std::string tmp_policy = config().getString("tmp_policy", "");
const VolumePtr & volume = global_context->setTemporaryStorage(tmp_path, tmp_policy);
for (const DiskPtr & disk : volume->disks)
setupTmpPath(log, disk->getPath());
}
/** Directory with 'flags': files indicating temporary settings for the server set by system administrator.
@ -436,8 +453,10 @@ int Server::main(const std::vector<std::string> & /*args*/)
main_config_zk_changed_event,
[&](ConfigurationPtr config)
{
setTextLog(global_context->getTextLog());
buildLoggers(*config, logger());
// FIXME logging-related things need synchronization -- see the 'Logger * log' saved
// in a lot of places. For now, disable updating log configuration without server restart.
//setTextLog(global_context->getTextLog());
//buildLoggers(*config, logger());
global_context->setClustersConfig(config);
global_context->setMacros(std::make_unique<Macros>(*config, "macros"));
@ -862,6 +881,13 @@ int Server::main(const std::vector<std::string> & /*args*/)
for (auto & server : servers)
server->start();
{
String level_str = config().getString("text_log.level", "");
int level = level_str.empty() ? INT_MAX : Poco::Logger::parseLevel(level_str);
setTextLog(global_context->getTextLog(), level);
}
buildLoggers(config(), logger());
main_config_reloader->start();
users_config_reloader->start();
if (dns_cache_updater)

View File

@ -591,11 +591,9 @@ void TCPHandler::processOrdinaryQueryWithProcessors(size_t num_threads)
}
});
/// Wait in case of exception. Delete pipeline to release memory.
/// Wait in case of exception happened outside of pool.
SCOPE_EXIT(
/// Clear queue in case if somebody is waiting lazy_format to push.
lazy_format->finish();
lazy_format->clearQueue();
try
{
@ -604,72 +602,58 @@ void TCPHandler::processOrdinaryQueryWithProcessors(size_t num_threads)
catch (...)
{
/// If exception was thrown during pipeline execution, skip it while processing other exception.
tryLogCurrentException(log);
}
pipeline = QueryPipeline()
);
while (true)
while (!lazy_format->isFinished() && !exception)
{
Block block;
while (true)
if (isQueryCancelled())
{
if (isQueryCancelled())
{
/// A packet was received requesting to stop execution of the request.
executor->cancel();
break;
}
else
{
if (after_send_progress.elapsed() / 1000 >= query_context->getSettingsRef().interactive_delay)
{
/// Some time passed and there is a progress.
after_send_progress.restart();
sendProgress();
}
sendLogs();
if ((block = lazy_format->getBlock(query_context->getSettingsRef().interactive_delay / 1000)))
break;
if (lazy_format->isFinished())
break;
if (exception)
{
pool.wait();
break;
}
}
}
/** If data has run out, we will send the profiling data and total values to
* the last zero block to be able to use
* this information in the suffix output of stream.
* If the request was interrupted, then `sendTotals` and other methods could not be called,
* because we have not read all the data yet,
* and there could be ongoing calculations in other threads at the same time.
*/
if (!block && !isQueryCancelled())
{
pool.wait();
pipeline.finalize();
sendTotals(lazy_format->getTotals());
sendExtremes(lazy_format->getExtremes());
sendProfileInfo(lazy_format->getProfileInfo());
sendProgress();
sendLogs();
}
sendData(block);
if (!block)
/// A packet was received requesting to stop execution of the request.
executor->cancel();
break;
}
if (after_send_progress.elapsed() / 1000 >= query_context->getSettingsRef().interactive_delay)
{
/// Some time passed and there is a progress.
after_send_progress.restart();
sendProgress();
}
sendLogs();
if (auto block = lazy_format->getBlock(query_context->getSettingsRef().interactive_delay / 1000))
{
if (!state.io.null_format)
sendData(block);
}
}
/// Finish lazy_format before waiting. Otherwise some thread may write into it, and waiting will lock.
lazy_format->finish();
pool.wait();
/** If data has run out, we will send the profiling data and total values to
* the last zero block to be able to use
* this information in the suffix output of stream.
* If the request was interrupted, then `sendTotals` and other methods could not be called,
* because we have not read all the data yet,
* and there could be ongoing calculations in other threads at the same time.
*/
if (!isQueryCancelled())
{
pipeline.finalize();
sendTotals(lazy_format->getTotals());
sendExtremes(lazy_format->getExtremes());
sendProfileInfo(lazy_format->getProfileInfo());
sendProgress();
sendLogs();
}
sendData({});
}
state.io.onFinish();
@ -993,7 +977,7 @@ bool TCPHandler::receiveData(bool scalar)
if (!(storage = query_context->tryGetExternalTable(name)))
{
NamesAndTypesList columns = block.getNamesAndTypesList();
storage = StorageMemory::create("_external", name, ColumnsDescription{columns}, ConstraintsDescription{});
storage = StorageMemory::create(StorageID("_external", name), ColumnsDescription{columns}, ConstraintsDescription{});
storage->startup();
query_context->addExternalTable(name, storage);
}

View File

@ -111,7 +111,7 @@ public:
server_display_name = server.config().getString("display_name", getFQDNOrHostName());
}
void run();
void run() override;
/// This method is called right before the query execution.
virtual void customizeContext(DB::Context & /*context*/) {}

View File

@ -3,27 +3,27 @@
NOTE: User and query level settings are set up in "users.xml" file.
-->
<yandex>
<!-- The list of hosts allowed to use in URL-related storage engines and table functions.
If this section is not present in configuration, all hosts are allowed.
-->
<remote_url_allow_hosts>
<!-- Host should be specified exactly as in URL. The name is checked before DNS resolution.
Example: "yandex.ru", "yandex.ru." and "www.yandex.ru" are different hosts.
If port is explicitly specified in URL, the host:port is checked as a whole.
If host specified here without port, any port with this host allowed.
"yandex.ru" -> "yandex.ru:443", "yandex.ru:80" etc. is allowed, but "yandex.ru:80" -> only "yandex.ru:80" is allowed.
If the host is specified as IP address, it is checked as specified in URL. Example: "[2a02:6b8:a::a]".
If there are redirects and support for redirects is enabled, every redirect (the Location field) is checked.
-->
<!-- The list of hosts allowed to use in URL-related storage engines and table functions.
If this section is not present in configuration, all hosts are allowed.
-->
<remote_url_allow_hosts>
<!-- Host should be specified exactly as in URL. The name is checked before DNS resolution.
Example: "yandex.ru", "yandex.ru." and "www.yandex.ru" are different hosts.
If port is explicitly specified in URL, the host:port is checked as a whole.
If host specified here without port, any port with this host allowed.
"yandex.ru" -> "yandex.ru:443", "yandex.ru:80" etc. is allowed, but "yandex.ru:80" -> only "yandex.ru:80" is allowed.
If the host is specified as IP address, it is checked as specified in URL. Example: "[2a02:6b8:a::a]".
If there are redirects and support for redirects is enabled, every redirect (the Location field) is checked.
-->
<!-- Regular expression can be specified. RE2 engine is used for regexps.
Regexps are not aligned: don't forget to add ^ and $. Also don't forget to escape dot (.) metacharacter
(forgetting to do so is a common source of error).
-->
</remote_url_allow_hosts>
<!-- Regular expression can be specified. RE2 engine is used for regexps.
Regexps are not aligned: don't forget to add ^ and $. Also don't forget to escape dot (.) metacharacter
(forgetting to do so is a common source of error).
-->
</remote_url_allow_hosts>
<logger>
<!-- Possible levels: https://github.com/pocoproject/poco/blob/develop/Foundation/include/Poco/Logger.h#L105 -->
<!-- Possible levels: https://github.com/pocoproject/poco/blob/poco-1.9.4-release/Foundation/include/Poco/Logger.h#L105 -->
<level>trace</level>
<log>/var/log/clickhouse-server/clickhouse-server.log</log>
<errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
@ -34,6 +34,7 @@
<!--display_name>production</display_name--> <!-- It is the name that will be shown in the client -->
<http_port>8123</http_port>
<tcp_port>9000</tcp_port>
<mysql_port>9004</mysql_port>
<!-- For HTTPS and SSL over native protocol. -->
<!--
<https_port>8443</https_port>
@ -132,6 +133,17 @@
<!-- Path to temporary data for processing hard queries. -->
<tmp_path>/var/lib/clickhouse/tmp/</tmp_path>
<!-- Policy from the <storage_configuration> for the temporary files.
If not set <tmp_path> is used, otherwise <tmp_path> is ignored.
Notes:
- move_factor is ignored
- keep_free_space_bytes is ignored
- max_data_part_size_bytes is ignored
- you must have exactly one volume in that policy
-->
<!-- <tmp_policy>tmp</tmp_policy> -->
<!-- Directory with user provided files that are accessible by 'file' table function. -->
<user_files_path>/var/lib/clickhouse/user_files/</user_files_path>
@ -343,6 +355,11 @@
toStartOfHour(event_time)
-->
<partition_by>toYYYYMM(event_date)</partition_by>
<!-- Instead of partition_by, you can provide full engine expression (starting with ENGINE = ) with parameters,
Example: <engine>ENGINE = MergeTree PARTITION BY toYYYYMM(event_date) ORDER BY (event_date, event_time) SETTINGS index_granularity = 1024</engine>
-->
<!-- Interval of flushing data. -->
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_log>
@ -377,10 +394,12 @@
<!-- Uncomment to write text log into table.
Text log contains all information from usual server log but stores it in structured and efficient way.
The level of the messages that goes to the table can be limited (<level>), if not specified all messages will go to the table.
<text_log>
<database>system</database>
<table>text_log</table>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
<level></level>
</text_log>
-->

View File

@ -49,7 +49,7 @@
In first line will be password and in second - corresponding SHA256.
How to generate double SHA1:
Execute: PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | openssl dgst -sha1 -binary | openssl dgst -sha1
Execute: PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha1sum | tr -d '-' | xxd -r -p | sha1sum | tr -d '-'
In first line will be password and in second - corresponding double SHA1.
-->
<password></password>

View File

@ -487,10 +487,17 @@ void LogisticRegression::compute(
size_t row_num)
{
Float64 derivative = bias;
std::vector<Float64> values(weights.size());
for (size_t i = 0; i < weights.size(); ++i)
{
auto value = (*columns[i]).getFloat64(row_num);
derivative += weights[i] * value;
values[i] = (*columns[i]).getFloat64(row_num);
}
for (size_t i = 0; i < weights.size(); ++i)
{
derivative += weights[i] * values[i];
}
derivative *= target;
derivative = exp(derivative);
@ -498,8 +505,7 @@ void LogisticRegression::compute(
batch_gradient[weights.size()] += target / (derivative + 1);
for (size_t i = 0; i < weights.size(); ++i)
{
auto value = (*columns[i]).getFloat64(row_num);
batch_gradient[i] += target * value / (derivative + 1) - 2 * l2_reg_coef * weights[i];
batch_gradient[i] += target * values[i] / (derivative + 1) - 2 * l2_reg_coef * weights[i];
}
}
@ -558,18 +564,25 @@ void LinearRegression::compute(
size_t row_num)
{
Float64 derivative = (target - bias);
std::vector<Float64> values(weights.size());
for (size_t i = 0; i < weights.size(); ++i)
{
auto value = (*columns[i]).getFloat64(row_num);
derivative -= weights[i] * value;
values[i] = (*columns[i]).getFloat64(row_num);
}
for (size_t i = 0; i < weights.size(); ++i)
{
derivative -= weights[i] * values[i];
}
derivative *= 2;
batch_gradient[weights.size()] += derivative;
for (size_t i = 0; i < weights.size(); ++i)
{
auto value = (*columns[i]).getFloat64(row_num);
batch_gradient[i] += derivative * value - 2 * l2_reg_coef * weights[i];
batch_gradient[i] += derivative * values[i] - 2 * l2_reg_coef * weights[i];
}
}

View File

@ -309,7 +309,7 @@ protected:
/// Uses a DFA based approach in order to better handle patterns without
/// time assertions.
///
/// NOTE: This implementation relies on the assumption that the pattern are *small*.
/// NOTE: This implementation relies on the assumption that the pattern is *small*.
///
/// This algorithm performs in O(mn) (with m the number of DFA states and N the number
/// of events) with a memory consumption and memory allocations in O(m). It means that

View File

@ -217,7 +217,7 @@ UInt64 ColumnVector<T>::get64(size_t n) const
}
template <typename T>
Float64 ColumnVector<T>::getFloat64(size_t n) const
inline Float64 ColumnVector<T>::getFloat64(size_t n) const
{
return static_cast<Float64>(data[n]);
}

View File

@ -50,16 +50,21 @@
*
* P.S. This is also required, because tcmalloc can not allocate a chunk of
* memory greater than 16 GB.
*
* P.P.S. Note that MMAP_THRESHOLD symbol is intentionally made weak. It allows
* to override it during linkage when using ClickHouse as a library in
* third-party applications which may already use own allocator doing mmaps
* in the implementation of alloc/realloc.
*/
#ifdef NDEBUG
static constexpr size_t MMAP_THRESHOLD = 64 * (1ULL << 20);
__attribute__((__weak__)) extern const size_t MMAP_THRESHOLD = 64 * (1ULL << 20);
#else
/**
* In debug build, use small mmap threshold to reproduce more memory
* stomping bugs. Along with ASLR it will hopefully detect more issues than
* ASan. The program may fail due to the limit on number of memory mappings.
*/
static constexpr size_t MMAP_THRESHOLD = 4096;
__attribute__((__weak__)) extern const size_t MMAP_THRESHOLD = 4096;
#endif
static constexpr size_t MMAP_MIN_ALIGNMENT = 4096;

View File

@ -387,7 +387,6 @@ namespace ErrorCodes
extern const int PTHREAD_ERROR = 411;
extern const int NETLINK_ERROR = 412;
extern const int CANNOT_SET_SIGNAL_HANDLER = 413;
extern const int CANNOT_READLINE = 414;
extern const int ALL_REPLICAS_LOST = 415;
extern const int REPLICA_STATUS_CHANGED = 416;
extern const int EXPECTED_ALL_OR_ANY = 417;
@ -479,6 +478,7 @@ namespace ErrorCodes
extern const int FILE_ALREADY_EXISTS = 504;
extern const int CANNOT_DELETE_DIRECTORY = 505;
extern const int UNEXPECTED_ERROR_CODE = 506;
extern const int UNABLE_TO_SKIP_UNUSED_SHARDS = 507;
extern const int KEEPER_EXCEPTION = 999;
extern const int POCO_EXCEPTION = 1000;

View File

@ -195,7 +195,7 @@ std::string getCurrentExceptionMessage(bool with_stacktrace, bool check_embedded
<< ", e.displayText() = " << e.displayText()
<< (with_stacktrace ? getExceptionStackTraceString(e) : "")
<< (with_extra_info ? getExtraExceptionInfo(e) : "")
<< " (version " << VERSION_STRING << VERSION_OFFICIAL;
<< " (version " << VERSION_STRING << VERSION_OFFICIAL << ")";
}
catch (...) {}
}

View File

@ -112,7 +112,7 @@ void FileChecker::save() const
out->next();
}
disk->moveFile(tmp_files_info_path, files_info_path);
disk->replaceFile(tmp_files_info_path, files_info_path);
}
void FileChecker::load(Map & local_map, const String & path) const

View File

@ -141,7 +141,15 @@ QueryProfilerBase<ProfilerImpl>::QueryProfilerBase(const Int32 thread_id, const
sev._sigev_un._tid = thread_id;
#endif
if (timer_create(clock_type, &sev, &timer_id))
{
/// In Google Cloud Run, the function "timer_create" is implemented incorrectly as of 2020-01-25.
/// https://mybranch.dev/posts/clickhouse-on-cloud-run/
if (errno == 0)
throw Exception("Failed to create thread timer. The function 'timer_create' returned non-zero but didn't set errno. This is bug in your OS.",
ErrorCodes::CANNOT_CREATE_TIMER);
throwFromErrno("Failed to create thread timer", ErrorCodes::CANNOT_CREATE_TIMER);
}
/// Randomize offset as uniform random value from 0 to period - 1.
/// It will allow to sample short queries even if timer period is large.

View File

@ -1,12 +1,13 @@
#include <re2/re2.h>
#include <Common/RemoteHostFilter.h>
#include <Poco/URI.h>
#include <Formats/FormatFactory.h>
#include <Poco/Util/AbstractConfiguration.h>
#include <Formats/FormatFactory.h>
#include <Common/RemoteHostFilter.h>
#include <Common/StringUtils/StringUtils.h>
#include <Common/Exception.h>
#include <IO/WriteHelpers.h>
namespace DB
{
namespace ErrorCodes

View File

@ -1,17 +1,19 @@
#pragma once
#include <string>
#include <vector>
#include <unordered_set>
#include <Poco/URI.h>
#include <Poco/Util/AbstractConfiguration.h>
namespace Poco { class URI; }
namespace Poco { namespace Util { class AbstractConfiguration; } }
namespace DB
{
class RemoteHostFilter
{
/**
* This class checks if url is allowed.
* This class checks if URL is allowed.
* If primary_hosts and regexp_hosts are empty all urls are allowed.
*/
public:
@ -25,6 +27,7 @@ private:
std::unordered_set<std::string> primary_hosts; /// Allowed primary (<host>) URL from config.xml
std::vector<std::string> regexp_hosts; /// Allowed regexp (<hots_regexp>) URL from config.xml
bool checkForDirectEntry(const std::string & str) const; /// Checks if the primary_hosts and regexp_hosts contain str. If primary_hosts and regexp_hosts are empty return true.
/// Checks if the primary_hosts and regexp_hosts contain str. If primary_hosts and regexp_hosts are empty return true.
bool checkForDirectEntry(const std::string & str) const;
};
}

View File

@ -23,7 +23,14 @@ namespace DB
static thread_local void * stack_address = nullptr;
static thread_local size_t max_stack_size = 0;
void checkStackSize()
/** It works fine when interpreters are instantiated by ClickHouse code in properly prepared threads,
* but there are cases when ClickHouse runs as a library inside another application.
* If application is using user-space lightweight threads with manually allocated stacks,
* current implementation is not reasonable, as it has no way to properly check the remaining
* stack size without knowing the details of how stacks are allocated.
* We mark this function as weak symbol to be able to replace it in another ClickHouse-based products.
*/
__attribute__((__weak__)) void checkStackSize()
{
using namespace DB;

View File

@ -147,6 +147,14 @@ using BlocksList = std::list<Block>;
using BlocksPtr = std::shared_ptr<Blocks>;
using BlocksPtrs = std::shared_ptr<std::vector<BlocksPtr>>;
/// Extends block with extra data in derived classes
struct ExtraBlock
{
Block block;
};
using ExtraBlockPtr = std::shared_ptr<ExtraBlock>;
/// Compare number of columns, data types, column types, column names, and values of constant columns.
bool blocksHaveEqualStructure(const Block & lhs, const Block & rhs);

View File

@ -160,7 +160,7 @@ void ExternalTablesHandler::handlePart(const Poco::Net::MessageHeader & header,
/// Create table
NamesAndTypesList columns = sample_block.getNamesAndTypesList();
StoragePtr storage = StorageMemory::create("_external", data.second, ColumnsDescription{columns}, ConstraintsDescription{});
StoragePtr storage = StorageMemory::create(StorageID("_external", data.second), ColumnsDescription{columns}, ConstraintsDescription{});
storage->startup();
context.addExternalTable(data.second, storage);
BlockOutputStreamPtr output = storage->write(ASTPtr(), context);

View File

@ -99,7 +99,7 @@ class ExternalTablesHandler : public Poco::Net::PartHandler, BaseExternalTable
public:
ExternalTablesHandler(Context & context_, const Poco::Net::NameValueCollection & params_) : context(context_), params(params_) {}
void handlePart(const Poco::Net::MessageHeader & header, std::istream & stream);
void handlePart(const Poco::Net::MessageHeader & header, std::istream & stream) override;
private:
Context & context;

View File

@ -1030,6 +1030,7 @@ public:
LOG_TRACE(log, "Authentication method match.");
}
bool sent_public_key = false;
if (auth_response == "\1")
{
LOG_TRACE(log, "Client requests public key.");
@ -1050,6 +1051,7 @@ public:
AuthMoreData data(pem);
packet_sender->sendPacket(data, true);
sent_public_key = true;
AuthSwitchResponse response;
packet_sender->receivePacket(response);
@ -1069,13 +1071,15 @@ public:
*/
if (!is_secure_connection && !auth_response->empty() && auth_response != String("\0", 1))
{
LOG_TRACE(log, "Received nonempty password");
LOG_TRACE(log, "Received nonempty password.");
auto ciphertext = reinterpret_cast<unsigned char *>(auth_response->data());
unsigned char plaintext[RSA_size(&private_key)];
int plaintext_size = RSA_private_decrypt(auth_response->size(), ciphertext, plaintext, &private_key, RSA_PKCS1_OAEP_PADDING);
if (plaintext_size == -1)
{
if (!sent_public_key)
LOG_WARNING(log, "Client could have encrypted password with different public key since it didn't request it from server.");
throw Exception("Failed to decrypt auth data. Error: " + getOpenSSLErrors(), ErrorCodes::OPENSSL_ERROR);
}

View File

@ -52,6 +52,8 @@ struct Settings : public SettingsCollection<Settings>
M(SettingUInt64, max_insert_block_size, DEFAULT_INSERT_BLOCK_SIZE, "The maximum block size for insertion, if we control the creation of blocks for insertion.", 0) \
M(SettingUInt64, min_insert_block_size_rows, DEFAULT_INSERT_BLOCK_SIZE, "Squash blocks passed to INSERT query to specified size in rows, if blocks are not big enough.", 0) \
M(SettingUInt64, min_insert_block_size_bytes, (DEFAULT_INSERT_BLOCK_SIZE * 256), "Squash blocks passed to INSERT query to specified size in bytes, if blocks are not big enough.", 0) \
M(SettingUInt64, max_joined_block_size_rows, DEFAULT_BLOCK_SIZE, "Maximum block size for JOIN result (if join algorithm supports it). 0 means unlimited.", 0) \
M(SettingUInt64, max_insert_threads, 0, "The maximum number of threads to execute the INSERT SELECT query. By default, it is determined automatically.", 0) \
M(SettingMaxThreads, max_threads, 0, "The maximum number of threads to execute the request. By default, it is determined automatically.", 0) \
M(SettingMaxThreads, max_alter_threads, 0, "The maximum number of threads to execute the ALTER requests. By default, it is determined automatically.", 0) \
M(SettingUInt64, max_read_buffer_size, DBMS_DEFAULT_BUFFER_SIZE, "The maximum size of the buffer to read from the filesystem.", 0) \
@ -64,14 +66,14 @@ struct Settings : public SettingsCollection<Settings>
M(SettingSeconds, send_timeout, DBMS_DEFAULT_SEND_TIMEOUT_SEC, "", 0) \
M(SettingSeconds, tcp_keep_alive_timeout, 0, "The time in seconds the connection needs to remain idle before TCP starts sending keepalive probes", 0) \
M(SettingMilliseconds, queue_max_wait_ms, 0, "The wait time in the request queue, if the number of concurrent requests exceeds the maximum.", 0) \
M(SettingMilliseconds, connection_pool_max_wait_ms, 0, "The wait time when connection pool is full.", 0) \
M(SettingMilliseconds, connection_pool_max_wait_ms, 0, "The wait time when the connection pool is full.", 0) \
M(SettingMilliseconds, replace_running_query_max_wait_ms, 5000, "The wait time for running query with the same query_id to finish when setting 'replace_running_query' is active.", 0) \
M(SettingMilliseconds, kafka_max_wait_ms, 5000, "The wait time for reading from Kafka before retry.", 0) \
M(SettingUInt64, poll_interval, DBMS_DEFAULT_POLL_INTERVAL, "Block at the query wait loop on the server for the specified number of seconds.", 0) \
M(SettingUInt64, idle_connection_timeout, 3600, "Close idle TCP connections after specified number of seconds.", 0) \
M(SettingUInt64, distributed_connections_pool_size, DBMS_DEFAULT_DISTRIBUTED_CONNECTIONS_POOL_SIZE, "Maximum number of connections with one remote server in the pool.", 0) \
M(SettingUInt64, connections_with_failover_max_tries, DBMS_CONNECTION_POOL_WITH_FAILOVER_DEFAULT_MAX_TRIES, "The maximum number of attempts to connect to replicas.", 0) \
M(SettingUInt64, s3_min_upload_part_size, 512*1024*1024, "The mininum size of part to upload during multipart upload to S3.", 0) \
M(SettingUInt64, s3_min_upload_part_size, 512*1024*1024, "The minimum size of part to upload during multipart upload to S3.", 0) \
M(SettingBool, extremes, false, "Calculate minimums and maximums of the result columns. They can be output in JSON-formats.", IMPORTANT) \
M(SettingBool, use_uncompressed_cache, true, "Whether to use the cache of uncompressed blocks.", 0) \
M(SettingBool, replace_running_query, false, "Whether the running request should be canceled with the same id as the new one.", 0) \
@ -110,6 +112,7 @@ struct Settings : public SettingsCollection<Settings>
\
M(SettingBool, distributed_group_by_no_merge, false, "Do not merge aggregation states from different servers for distributed query processing - in case it is for certain that there are different keys on different shards.", 0) \
M(SettingBool, optimize_skip_unused_shards, false, "Assumes that data is distributed by sharding_key. Optimization to skip unused shards if SELECT query filters by sharding_key.", 0) \
M(SettingUInt64, force_optimize_skip_unused_shards, 0, "Throw an exception if unused shards cannot be skipped (1 - throw only if the table has the sharding key, 2 - always throw.", 0) \
\
M(SettingBool, input_format_parallel_parsing, true, "Enable parallel parsing for some data formats.", 0) \
M(SettingUInt64, min_chunk_bytes_for_parallel_parsing, (1024 * 1024), "The minimum chunk size in bytes, which each thread will parse in parallel.", 0) \
@ -183,9 +186,10 @@ struct Settings : public SettingsCollection<Settings>
M(SettingBool, input_format_tsv_empty_as_default, false, "Treat empty fields in TSV input as default values.", 0) \
M(SettingBool, input_format_null_as_default, false, "For text input formats initialize null fields with default values if data type of this field is not nullable", 0) \
\
M(SettingBool, input_format_values_interpret_expressions, true, "For Values format: if field could not be parsed by streaming parser, run SQL parser and try to interpret it as SQL expression.", 0) \
M(SettingBool, input_format_values_deduce_templates_of_expressions, true, "For Values format: if field could not be parsed by streaming parser, run SQL parser, deduce template of the SQL expression, try to parse all rows using template and then interpret expression for all rows.", 0) \
M(SettingBool, input_format_values_interpret_expressions, true, "For Values format: if the field could not be parsed by streaming parser, run SQL parser and try to interpret it as SQL expression.", 0) \
M(SettingBool, input_format_values_deduce_templates_of_expressions, true, "For Values format: if the field could not be parsed by streaming parser, run SQL parser, deduce template of the SQL expression, try to parse all rows using template and then interpret expression for all rows.", 0) \
M(SettingBool, input_format_values_accurate_types_of_literals, true, "For Values format: when parsing and interpreting expressions using template, check actual type of literal to avoid possible overflow and precision issues.", 0) \
M(SettingString, input_format_avro_schema_registry_url, "", "For AvroConfluent format: Confluent Schema Registry URL.", 0) \
\
M(SettingBool, output_format_json_quote_64bit_integers, true, "Controls quoting of 64-bit integers in JSON output format.", 0) \
\
@ -197,6 +201,8 @@ struct Settings : public SettingsCollection<Settings>
M(SettingUInt64, output_format_pretty_max_column_pad_width, 250, "Maximum width to pad all values in a column in Pretty formats.", 0) \
M(SettingBool, output_format_pretty_color, true, "Use ANSI escape sequences to paint colors in Pretty formats", 0) \
M(SettingUInt64, output_format_parquet_row_group_size, 1000000, "Row group size in rows.", 0) \
M(SettingString, output_format_avro_codec, "", "Compression codec used for output. Possible values: 'null', 'deflate', 'snappy'.", 0) \
M(SettingUInt64, output_format_avro_sync_interval, 16 * 1024, "Sync interval in bytes.", 0) \
\
M(SettingBool, use_client_time_zone, false, "Use client timezone for interpreting DateTime string values, instead of adopting server timezone.", 0) \
\
@ -212,7 +218,7 @@ struct Settings : public SettingsCollection<Settings>
M(SettingBool, join_use_nulls, 0, "Use NULLs for non-joined rows of outer JOINs for types that can be inside Nullable. If false, use default value of corresponding columns data type.", IMPORTANT) \
\
M(SettingJoinStrictness, join_default_strictness, JoinStrictness::ALL, "Set default strictness in JOIN query. Possible values: empty string, 'ANY', 'ALL'. If empty, query without strictness will throw exception.", 0) \
M(SettingBool, any_join_distinct_right_table_keys, false, "Enable old ANY JOIN logic with many-to-one left-to-right table keys mapping for all ANY JOINs. It leads to confusing not equal results for 't1 ANY LEFT JOIN t2' and 't2 ANY RIGHT JOIN t1'. ANY RIGHT JOIN needs one-to-many keys maping to be consistent with LEFT one.", IMPORTANT) \
M(SettingBool, any_join_distinct_right_table_keys, false, "Enable old ANY JOIN logic with many-to-one left-to-right table keys mapping for all ANY JOINs. It leads to confusing not equal results for 't1 ANY LEFT JOIN t2' and 't2 ANY RIGHT JOIN t1'. ANY RIGHT JOIN needs one-to-many keys mapping to be consistent with LEFT one.", IMPORTANT) \
\
M(SettingUInt64, preferred_block_size_bytes, 1000000, "", 0) \
\
@ -249,8 +255,8 @@ struct Settings : public SettingsCollection<Settings>
M(SettingBool, empty_result_for_aggregation_by_empty_set, false, "Return empty result when aggregating without keys on empty set.", 0) \
M(SettingBool, allow_distributed_ddl, true, "If it is set to true, then a user is allowed to executed distributed DDL queries.", 0) \
M(SettingUInt64, odbc_max_field_size, 1024, "Max size of filed can be read from ODBC dictionary. Long strings are truncated.", 0) \
M(SettingUInt64, query_profiler_real_time_period_ns, 1000000000, "Highly experimental. Period for real clock timer of query profiler (in nanoseconds). Set 0 value to turn off real clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.", 0) \
M(SettingUInt64, query_profiler_cpu_time_period_ns, 1000000000, "Highly experimental. Period for CPU clock timer of query profiler (in nanoseconds). Set 0 value to turn off CPU clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.", 0) \
M(SettingUInt64, query_profiler_real_time_period_ns, 1000000000, "Period for real clock timer of query profiler (in nanoseconds). Set 0 value to turn off the real clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.", 0) \
M(SettingUInt64, query_profiler_cpu_time_period_ns, 1000000000, "Period for CPU clock timer of query profiler (in nanoseconds). Set 0 value to turn off the CPU clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.", 0) \
\
\
/** Limits during query execution are part of the settings. \
@ -310,9 +316,8 @@ struct Settings : public SettingsCollection<Settings>
M(SettingBool, join_any_take_last_row, false, "When disabled (default) ANY JOIN will take the first found row for a key. When enabled, it will take the last row seen if there are multiple rows for the same key.", IMPORTANT) \
M(SettingBool, partial_merge_join, false, "Use partial merge join instead of hash join for LEFT and INNER JOINs.", 0) \
M(SettingBool, partial_merge_join_optimizations, false, "Enable optimizations in partial merge join", 0) \
M(SettingUInt64, default_max_bytes_in_join, 100000000, "Maximum size of right-side table if limit's required but max_bytes_in_join is not set.", 0) \
M(SettingUInt64, default_max_bytes_in_join, 100000000, "Maximum size of right-side table if limit is required but max_bytes_in_join is not set.", 0) \
M(SettingUInt64, partial_merge_join_rows_in_right_blocks, 10000, "Split right-hand joining data in blocks of specified size. It's a portion of data indexed by min-max values and possibly unloaded on disk.", 0) \
M(SettingUInt64, partial_merge_join_rows_in_left_blocks, 10000, "Group left-hand joining data in bigger blocks. Setting it to a bigger value increase JOIN performance and memory usage.", 0) \
\
M(SettingUInt64, max_rows_to_transfer, 0, "Maximum size (in rows) of the transmitted external table obtained when the GLOBAL IN/JOIN section is executed.", 0) \
M(SettingUInt64, max_bytes_to_transfer, 0, "Maximum size (in uncompressed bytes) of the transmitted external table obtained when the GLOBAL IN/JOIN section is executed.", 0) \
@ -360,7 +365,7 @@ struct Settings : public SettingsCollection<Settings>
M(SettingBool, cancel_http_readonly_queries_on_client_close, false, "Cancel HTTP readonly queries when a client closes the connection without waiting for response.", 0) \
M(SettingBool, external_table_functions_use_nulls, true, "If it is set to true, external table functions will implicitly use Nullable type if needed. Otherwise NULLs will be substituted with default values. Currently supported only by 'mysql' and 'odbc' table functions.", 0) \
\
M(SettingBool, experimental_use_processors, false, "Use processors pipeline.", 0) \
M(SettingBool, experimental_use_processors, true, "Use processors pipeline.", 0) \
\
M(SettingBool, allow_hyperscan, true, "Allow functions that use Hyperscan library. Disable to avoid potentially long compilation times and excessive resource usage.", 0) \
M(SettingBool, allow_simdjson, true, "Allow using simdjson library in 'JSON*' functions if AVX2 instructions are available. If disabled rapidjson will be used.", 0) \
@ -371,7 +376,7 @@ struct Settings : public SettingsCollection<Settings>
M(SettingBool, allow_drop_detached, false, "Allow ALTER TABLE ... DROP DETACHED PART[ITION] ... queries", 0) \
\
M(SettingSeconds, distributed_replica_error_half_life, DBMS_CONNECTION_POOL_WITH_FAILOVER_DEFAULT_DECREASE_ERROR_PERIOD, "Time period reduces replica error counter by 2 times.", 0) \
M(SettingUInt64, distributed_replica_error_cap, DBMS_CONNECTION_POOL_WITH_FAILOVER_MAX_ERROR_COUNT, "Max number of errors per replica, prevents piling up increadible amount of errors if replica was offline for some time and allows it to be reconsidered in a shorter amount of time.", 0) \
M(SettingUInt64, distributed_replica_error_cap, DBMS_CONNECTION_POOL_WITH_FAILOVER_MAX_ERROR_COUNT, "Max number of errors per replica, prevents piling up an incredible amount of errors if replica was offline for some time and allows it to be reconsidered in a shorter amount of time.", 0) \
\
M(SettingBool, allow_experimental_live_view, false, "Enable LIVE VIEW. Not mature enough.", 0) \
M(SettingSeconds, live_view_heartbeat_interval, DEFAULT_LIVE_VIEW_HEARTBEAT_INTERVAL_SEC, "The heartbeat interval in seconds to indicate live query is alive.", 0) \
@ -394,6 +399,7 @@ struct Settings : public SettingsCollection<Settings>
M(SettingBool, allow_experimental_data_skipping_indices, true, "Obsolete setting, does nothing. Will be removed after 2020-05-31", 0) \
M(SettingBool, merge_tree_uniform_read_distribution, true, "Obsolete setting, does nothing. Will be removed after 2020-05-20", 0) \
M(SettingUInt64, mark_cache_min_lifetime, 0, "Obsolete setting, does nothing. Will be removed after 2020-05-31", 0) \
M(SettingUInt64, max_parser_depth, 1000, "Maximum parser depth.", 0) \
DECLARE_SETTINGS_COLLECTION(LIST_OF_SETTINGS)

View File

@ -62,7 +62,7 @@ void SettingNumber<Type>::set(const Field & x)
template <typename Type>
void SettingNumber<Type>::set(const String & x)
{
set(completeParse<Type>(x));
set(parseWithSizeSuffix<Type>(x));
}
template <>

View File

@ -31,7 +31,6 @@ enum class TypeIndex
Float64,
Date,
DateTime,
DateTime32 = DateTime,
DateTime64,
String,
FixedString,
@ -158,8 +157,6 @@ using Decimal32 = Decimal<Int32>;
using Decimal64 = Decimal<Int64>;
using Decimal128 = Decimal<Int128>;
// TODO (nemkov): consider making a strong typedef
//using DateTime32 = time_t;
using DateTime64 = Decimal64;
template <> struct TypeName<Decimal32> { static const char * get() { return "Decimal32"; } };

View File

@ -10,5 +10,6 @@
#cmakedefine01 USE_POCO_DATAODBC
#cmakedefine01 USE_POCO_MONGODB
#cmakedefine01 USE_POCO_REDIS
#cmakedefine01 USE_POCO_JSON
#cmakedefine01 USE_INTERNAL_LLVM_LIBRARY
#cmakedefine01 USE_SSL

View File

@ -47,7 +47,8 @@ std::ostream & operator<<(std::ostream & stream, const IDataType & what)
std::ostream & operator<<(std::ostream & stream, const IStorage & what)
{
stream << "IStorage(name = " << what.getName() << ", tableName = " << what.getTableName() << ") {"
auto table_id = what.getStorageID();
stream << "IStorage(name = " << what.getName() << ", tableName = " << table_id.table_name << ") {"
<< what.getColumns().getAllPhysical().toString() << "}";
return stream;
}

View File

@ -66,6 +66,8 @@ struct BlockIO
finish_callback = rhs.finish_callback;
exception_callback = rhs.exception_callback;
null_format = rhs.null_format;
return *this;
}
};

View File

@ -44,4 +44,29 @@ Block ExpressionBlockInputStream::readImpl()
return res;
}
Block InflatingExpressionBlockInputStream::readImpl()
{
if (!initialized)
{
if (expression->resultIsAlwaysEmpty())
return {};
initialized = true;
}
Block res;
if (likely(!not_processed))
{
res = children.back()->read();
if (res)
expression->execute(res, not_processed, action_number);
}
else
{
res = std::move(not_processed->block);
expression->execute(res, not_processed, action_number);
}
return res;
}
}

View File

@ -15,10 +15,9 @@ class ExpressionActions;
*/
class ExpressionBlockInputStream : public IBlockInputStream
{
private:
public:
using ExpressionActionsPtr = std::shared_ptr<ExpressionActions>;
public:
ExpressionBlockInputStream(const BlockInputStreamPtr & input, const ExpressionActionsPtr & expression_);
String getName() const override;
@ -26,12 +25,29 @@ public:
Block getHeader() const override;
protected:
bool initialized = false;
ExpressionActionsPtr expression;
Block readImpl() override;
private:
ExpressionActionsPtr expression;
Block cached_header;
bool initialized = false;
};
/// ExpressionBlockInputStream that could generate many out blocks for single input block.
class InflatingExpressionBlockInputStream : public ExpressionBlockInputStream
{
public:
InflatingExpressionBlockInputStream(const BlockInputStreamPtr & input, const ExpressionActionsPtr & expression_)
: ExpressionBlockInputStream(input, expression_)
{}
protected:
Block readImpl() override;
private:
ExtraBlockPtr not_processed;
size_t action_number = 0;
};
}

View File

@ -12,5 +12,6 @@ class IBlockOutputStream;
using BlockInputStreamPtr = std::shared_ptr<IBlockInputStream>;
using BlockInputStreams = std::vector<BlockInputStreamPtr>;
using BlockOutputStreamPtr = std::shared_ptr<IBlockOutputStream>;
using BlockOutputStreams = std::vector<BlockOutputStreamPtr>;
}

View File

@ -7,6 +7,7 @@
#include <IO/WriteBufferFromFile.h>
#include <Compression/CompressedWriteBuffer.h>
#include <Interpreters/sortBlock.h>
#include <Disks/DiskSpaceMonitor.h>
namespace ProfileEvents
@ -21,10 +22,10 @@ namespace DB
MergeSortingBlockInputStream::MergeSortingBlockInputStream(
const BlockInputStreamPtr & input, SortDescription & description_,
size_t max_merged_block_size_, UInt64 limit_, size_t max_bytes_before_remerge_,
size_t max_bytes_before_external_sort_, const std::string & tmp_path_, size_t min_free_disk_space_)
size_t max_bytes_before_external_sort_, VolumePtr tmp_volume_, size_t min_free_disk_space_)
: description(description_), max_merged_block_size(max_merged_block_size_), limit(limit_),
max_bytes_before_remerge(max_bytes_before_remerge_),
max_bytes_before_external_sort(max_bytes_before_external_sort_), tmp_path(tmp_path_),
max_bytes_before_external_sort(max_bytes_before_external_sort_), tmp_volume(tmp_volume_),
min_free_disk_space(min_free_disk_space_)
{
children.push_back(input);
@ -78,10 +79,14 @@ Block MergeSortingBlockInputStream::readImpl()
*/
if (max_bytes_before_external_sort && sum_bytes_in_blocks > max_bytes_before_external_sort)
{
if (!enoughSpaceInDirectory(tmp_path, sum_bytes_in_blocks + min_free_disk_space))
throw Exception("Not enough space for external sort in " + tmp_path, ErrorCodes::NOT_ENOUGH_SPACE);
size_t size = sum_bytes_in_blocks + min_free_disk_space;
auto reservation = tmp_volume->reserve(size);
if (!reservation)
throw Exception("Not enough space for external sort in temporary storage", ErrorCodes::NOT_ENOUGH_SPACE);
const std::string tmp_path(reservation->getDisk()->getPath());
temporary_files.emplace_back(createTemporaryFile(tmp_path));
const std::string & path = temporary_files.back()->path();
MergeSortingBlocksBlockInputStream block_in(blocks, description, max_merged_block_size, limit);

View File

@ -18,6 +18,9 @@ namespace DB
struct TemporaryFileStream;
class Volume;
using VolumePtr = std::shared_ptr<Volume>;
namespace ErrorCodes
{
extern const int NOT_ENOUGH_SPACE;
@ -77,7 +80,7 @@ public:
MergeSortingBlockInputStream(const BlockInputStreamPtr & input, SortDescription & description_,
size_t max_merged_block_size_, UInt64 limit_,
size_t max_bytes_before_remerge_,
size_t max_bytes_before_external_sort_, const std::string & tmp_path_,
size_t max_bytes_before_external_sort_, VolumePtr tmp_volume_,
size_t min_free_disk_space_);
String getName() const override { return "MergeSorting"; }
@ -97,7 +100,7 @@ private:
size_t max_bytes_before_remerge;
size_t max_bytes_before_external_sort;
const std::string tmp_path;
VolumePtr tmp_volume;
size_t min_free_disk_space;
Logger * log = &Logger::get("MergeSortingBlockInputStream");

View File

@ -21,9 +21,19 @@ class NullAndDoCopyBlockInputStream : public IBlockInputStream
{
public:
NullAndDoCopyBlockInputStream(const BlockInputStreamPtr & input_, BlockOutputStreamPtr output_)
: input(input_), output(output_)
{
children.push_back(input_);
input_streams.push_back(input_);
output_streams.push_back(output_);
for (auto & input_stream : input_streams)
children.push_back(input_stream);
}
NullAndDoCopyBlockInputStream(const BlockInputStreams & input_, BlockOutputStreams & output_)
: input_streams(input_), output_streams(output_)
{
for (auto & input_stream : input_)
children.push_back(input_stream);
}
/// Suppress readPrefix and readSuffix, because they are called by copyData.
@ -39,13 +49,20 @@ public:
protected:
Block readImpl() override
{
copyData(*input, *output);
/// We do not use cancel flag here.
/// If query was cancelled, it will be processed by child streams.
/// Part of the data will be processed.
if (input_streams.size() == 1 && output_streams.size() == 1)
copyData(*input_streams.at(0), *output_streams.at(0));
else
copyData(input_streams, output_streams);
return Block();
}
private:
BlockInputStreamPtr input;
BlockOutputStreamPtr output;
BlockInputStreams input_streams;
BlockOutputStreams output_streams;
};
}

View File

@ -18,7 +18,7 @@ namespace DB
{
PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream(
const String & database, const String & table, const StoragePtr & storage_,
const StoragePtr & storage_,
const Context & context_, const ASTPtr & query_ptr_, bool no_destination)
: storage(storage_), context(context_), query_ptr(query_ptr_)
{
@ -32,47 +32,59 @@ PushingToViewsBlockOutputStream::PushingToViewsBlockOutputStream(
/// Moreover, deduplication for AggregatingMergeTree children could produce false positives due to low size of inserting blocks
bool disable_deduplication_for_children = !no_destination && storage->supportsDeduplication();
if (!table.empty())
auto table_id = storage->getStorageID();
Dependencies dependencies = context.getDependencies(table_id);
/// We need special context for materialized views insertions
if (!dependencies.empty())
{
Dependencies dependencies = context.getDependencies(database, table);
views_context = std::make_unique<Context>(context);
// Do not deduplicate insertions into MV if the main insertion is Ok
if (disable_deduplication_for_children)
views_context->getSettingsRef().insert_deduplicate = false;
}
/// We need special context for materialized views insertions
if (!dependencies.empty())
for (const auto & database_table : dependencies)
{
auto dependent_table = context.getTable(database_table);
ASTPtr query;
BlockOutputStreamPtr out;
if (auto * materialized_view = dynamic_cast<const StorageMaterializedView *>(dependent_table.get()))
{
views_context = std::make_unique<Context>(context);
// Do not deduplicate insertions into MV if the main insertion is Ok
if (disable_deduplication_for_children)
views_context->getSettingsRef().insert_deduplicate = false;
StoragePtr inner_table = materialized_view->getTargetTable();
auto inner_table_id = inner_table->getStorageID();
query = materialized_view->getInnerQuery();
std::unique_ptr<ASTInsertQuery> insert = std::make_unique<ASTInsertQuery>();
insert->database = inner_table_id.database_name;
insert->table = inner_table_id.table_name;
/// Get list of columns we get from select query.
auto header = InterpreterSelectQuery(query, *views_context, SelectQueryOptions().analyze())
.getSampleBlock();
/// Insert only columns returned by select.
auto list = std::make_shared<ASTExpressionList>();
for (auto & column : header)
/// But skip columns which storage doesn't have.
if (inner_table->hasColumn(column.name))
list->children.emplace_back(std::make_shared<ASTIdentifier>(column.name));
insert->columns = std::move(list);
ASTPtr insert_query_ptr(insert.release());
InterpreterInsertQuery interpreter(insert_query_ptr, *views_context);
BlockIO io = interpreter.execute();
out = io.out;
}
else if (dynamic_cast<const StorageLiveView *>(dependent_table.get()))
out = std::make_shared<PushingToViewsBlockOutputStream>(dependent_table, *views_context, ASTPtr(), true);
else
out = std::make_shared<PushingToViewsBlockOutputStream>(dependent_table, *views_context, ASTPtr());
for (const auto & database_table : dependencies)
{
auto dependent_table = context.getTable(database_table.first, database_table.second);
ASTPtr query;
BlockOutputStreamPtr out;
if (auto * materialized_view = dynamic_cast<const StorageMaterializedView *>(dependent_table.get()))
{
StoragePtr inner_table = materialized_view->getTargetTable();
query = materialized_view->getInnerQuery();
std::unique_ptr<ASTInsertQuery> insert = std::make_unique<ASTInsertQuery>();
insert->database = inner_table->getDatabaseName();
insert->table = inner_table->getTableName();
ASTPtr insert_query_ptr(insert.release());
InterpreterInsertQuery interpreter(insert_query_ptr, *views_context);
BlockIO io = interpreter.execute();
out = io.out;
}
else if (dynamic_cast<const StorageLiveView *>(dependent_table.get()))
out = std::make_shared<PushingToViewsBlockOutputStream>(
database_table.first, database_table.second, dependent_table, *views_context, ASTPtr(), true);
else
out = std::make_shared<PushingToViewsBlockOutputStream>(
database_table.first, database_table.second, dependent_table, *views_context, ASTPtr());
views.emplace_back(ViewInfo{std::move(query), database_table.first, database_table.second, std::move(out)});
}
views.emplace_back(ViewInfo{std::move(query), database_table, std::move(out)});
}
/* Do not push to destination table if the flag is set */
@ -161,7 +173,7 @@ void PushingToViewsBlockOutputStream::writePrefix()
}
catch (Exception & ex)
{
ex.addMessage("while write prefix to view " + view.database + "." + view.table);
ex.addMessage("while write prefix to view " + view.table_id.getNameForLogs());
throw;
}
}
@ -180,7 +192,7 @@ void PushingToViewsBlockOutputStream::writeSuffix()
}
catch (Exception & ex)
{
ex.addMessage("while write prefix to view " + view.database + "." + view.table);
ex.addMessage("while write prefix to view " + view.table_id.getNameForLogs());
throw;
}
}
@ -223,7 +235,7 @@ void PushingToViewsBlockOutputStream::process(const Block & block, size_t view_n
/// InterpreterSelectQuery will do processing of alias columns.
Context local_context = *views_context;
local_context.addViewSource(
StorageValues::create(storage->getDatabaseName(), storage->getTableName(), storage->getColumns(),
StorageValues::create(storage->getStorageID(), storage->getColumns(),
block));
select.emplace(view.query, local_context, SelectQueryOptions());
in = std::make_shared<MaterializingBlockInputStream>(select->execute().in);
@ -250,7 +262,7 @@ void PushingToViewsBlockOutputStream::process(const Block & block, size_t view_n
}
catch (Exception & ex)
{
ex.addMessage("while pushing to view " + backQuoteIfNeed(view.database) + "." + backQuoteIfNeed(view.table));
ex.addMessage("while pushing to view " + view.table_id.getNameForLogs());
throw;
}
}

View File

@ -17,8 +17,7 @@ class ReplicatedMergeTreeBlockOutputStream;
class PushingToViewsBlockOutputStream : public IBlockOutputStream
{
public:
PushingToViewsBlockOutputStream(
const String & database, const String & table, const StoragePtr & storage_,
PushingToViewsBlockOutputStream(const StoragePtr & storage_,
const Context & context_, const ASTPtr & query_ptr_, bool no_destination = false);
Block getHeader() const override;
@ -39,8 +38,7 @@ private:
struct ViewInfo
{
ASTPtr query;
String database;
String table;
StorageID table_id;
BlockOutputStreamPtr out;
};

View File

@ -70,7 +70,7 @@ bool TTLBlockInputStream::isTTLExpired(time_t ttl)
Block TTLBlockInputStream::readImpl()
{
/// Skip all data if table ttl is expired for part
if (storage.hasTableTTL() && isTTLExpired(old_ttl_infos.table_ttl.max))
if (storage.hasRowsTTL() && isTTLExpired(old_ttl_infos.table_ttl.max))
{
rows_removed = data_part->rows_count;
return {};
@ -80,7 +80,7 @@ Block TTLBlockInputStream::readImpl()
if (!block)
return block;
if (storage.hasTableTTL() && (force || isTTLExpired(old_ttl_infos.table_ttl.min)))
if (storage.hasRowsTTL() && (force || isTTLExpired(old_ttl_infos.table_ttl.min)))
removeRowsWithExpiredTableTTL(block);
removeValuesWithExpiredColumnTTL(block);
@ -106,10 +106,10 @@ void TTLBlockInputStream::readSuffixImpl()
void TTLBlockInputStream::removeRowsWithExpiredTableTTL(Block & block)
{
storage.ttl_table_entry.expression->execute(block);
storage.rows_ttl_entry.expression->execute(block);
const IColumn * ttl_column =
block.getByName(storage.ttl_table_entry.result_column).column.get();
block.getByName(storage.rows_ttl_entry.result_column).column.get();
const auto & column_names = header.getNames();
MutableColumns result_columns;

View File

@ -1,6 +1,10 @@
#include <thread>
#include <DataStreams/IBlockInputStream.h>
#include <DataStreams/IBlockOutputStream.h>
#include <DataStreams/copyData.h>
#include <DataStreams/ParallelInputsProcessor.h>
#include <Common/ConcurrentBoundedQueue.h>
#include <Common/ThreadPool.h>
namespace DB
@ -51,6 +55,79 @@ void copyDataImpl(IBlockInputStream & from, IBlockOutputStream & to, TCancelCall
inline void doNothing(const Block &) {}
namespace
{
struct ParallelInsertsHandler
{
using CencellationHook = std::function<void()>;
explicit ParallelInsertsHandler(BlockOutputStreams & output_streams, CencellationHook cancellation_hook_, size_t num_threads)
: outputs(output_streams.size()), cancellation_hook(std::move(cancellation_hook_))
{
exceptions.resize(num_threads);
for (auto & output : output_streams)
outputs.push(output.get());
}
void onBlock(Block & block, size_t /*thread_num*/)
{
IBlockOutputStream * out = nullptr;
outputs.pop(out);
out->write(block);
outputs.push(out);
}
void onFinishThread(size_t /*thread_num*/) {}
void onFinish() {}
void onException(std::exception_ptr & exception, size_t thread_num)
{
exceptions[thread_num] = exception;
cancellation_hook();
}
void rethrowFirstException()
{
for (auto & exception : exceptions)
if (exception)
std::rethrow_exception(exception);
}
ConcurrentBoundedQueue<IBlockOutputStream *> outputs;
std::vector<std::exception_ptr> exceptions;
CencellationHook cancellation_hook;
};
}
static void copyDataImpl(BlockInputStreams & inputs, BlockOutputStreams & outputs)
{
for (auto & output : outputs)
output->writePrefix();
using Processor = ParallelInputsProcessor<ParallelInsertsHandler>;
Processor * processor_ptr = nullptr;
ParallelInsertsHandler handler(outputs, [&processor_ptr]() { processor_ptr->cancel(false); }, inputs.size());
ParallelInputsProcessor<ParallelInsertsHandler> processor(inputs, nullptr, inputs.size(), handler);
processor_ptr = &processor;
processor.process();
processor.wait();
handler.rethrowFirstException();
/// readPrefix is called in ParallelInputsProcessor.
for (auto & input : inputs)
input->readSuffix();
for (auto & output : outputs)
output->writeSuffix();
}
void copyData(IBlockInputStream & from, IBlockOutputStream & to, std::atomic<bool> * is_cancelled)
{
auto is_cancelled_pred = [is_cancelled] ()
@ -61,6 +138,10 @@ void copyData(IBlockInputStream & from, IBlockOutputStream & to, std::atomic<boo
copyDataImpl(from, to, is_cancelled_pred, doNothing);
}
void copyData(BlockInputStreams & inputs, BlockOutputStreams & outputs)
{
copyDataImpl(inputs, outputs);
}
void copyData(IBlockInputStream & from, IBlockOutputStream & to, const std::function<bool()> & is_cancelled)
{

View File

@ -16,6 +16,8 @@ class Block;
*/
void copyData(IBlockInputStream & from, IBlockOutputStream & to, std::atomic<bool> * is_cancelled = nullptr);
void copyData(BlockInputStreams & inputs, BlockOutputStreams & outputs);
void copyData(IBlockInputStream & from, IBlockOutputStream & to, const std::function<bool()> & is_cancelled);
void copyData(IBlockInputStream & from, IBlockOutputStream & to, const std::function<bool()> & is_cancelled,

View File

@ -1,6 +1,7 @@
#pragma once
#include <DataTypes/IDataTypeDummy.h>
#include <Columns/ColumnSet.h>
namespace DB
@ -18,6 +19,9 @@ public:
bool equals(const IDataType & rhs) const override { return typeid(rhs) == typeid(*this); }
bool isParametric() const override { return true; }
// Used for expressions analysis.
MutableColumnPtr createColumn() const override { return ColumnSet::create(0, nullptr); }
// Used only for debugging, making it DUMPABLE
Field getDefault() const override { return Tuple(); }
};

View File

@ -60,7 +60,7 @@ std::ostream & operator<<(std::ostream & ostr, const TypesTestCase & test_case)
class TypeTest : public ::testing::TestWithParam<TypesTestCase>
{
public:
void SetUp()
void SetUp() override
{
const auto & p = GetParam();
from_types = typesFromString(p.from_types);

View File

@ -52,7 +52,7 @@ Tables DatabaseDictionary::listTables(const Context & context, const FilterByNam
auto dict_name = dict_ptr->getName();
const DictionaryStructure & dictionary_structure = dict_ptr->getStructure();
auto columns = StorageDictionary::getNamesAndTypes(dictionary_structure);
tables[dict_name] = StorageDictionary::create(getDatabaseName(), dict_name, ColumnsDescription{columns}, context, true, dict_name);
tables[dict_name] = StorageDictionary::create(StorageID(getDatabaseName(), dict_name), ColumnsDescription{columns}, context, true, dict_name);
}
}
return tables;
@ -74,7 +74,7 @@ StoragePtr DatabaseDictionary::tryGetTable(
{
const DictionaryStructure & dictionary_structure = dict_ptr->getStructure();
auto columns = StorageDictionary::getNamesAndTypes(dictionary_structure);
return StorageDictionary::create(getDatabaseName(), table_name, ColumnsDescription{columns}, context, true, table_name);
return StorageDictionary::create(StorageID(getDatabaseName(), table_name), ColumnsDescription{columns}, context, true, table_name);
}
return {};
@ -109,11 +109,12 @@ ASTPtr DatabaseDictionary::getCreateTableQueryImpl(const Context & context,
buffer << ") Engine = Dictionary(" << backQuoteIfNeed(table_name) << ")";
}
auto settings = context.getSettingsRef();
ParserCreateQuery parser;
const char * pos = query.data();
std::string error_message;
auto ast = tryParseQuery(parser, pos, pos + query.size(), error_message,
/* hilite = */ false, "", /* allow_multi_statements = */ false, 0);
/* hilite = */ false, "", /* allow_multi_statements = */ false, 0, settings.max_parser_depth);
if (!ast && throw_on_error)
throw Exception(error_message, ErrorCodes::SYNTAX_ERROR);
@ -121,15 +122,16 @@ ASTPtr DatabaseDictionary::getCreateTableQueryImpl(const Context & context,
return ast;
}
ASTPtr DatabaseDictionary::getCreateDatabaseQuery() const
ASTPtr DatabaseDictionary::getCreateDatabaseQuery(const Context & context) const
{
String query;
{
WriteBufferFromString buffer(query);
buffer << "CREATE DATABASE " << backQuoteIfNeed(database_name) << " ENGINE = Dictionary";
}
auto settings = context.getSettingsRef();
ParserCreateQuery parser;
return parseQuery(parser, query.data(), query.data() + query.size(), "", 0);
return parseQuery(parser, query.data(), query.data() + query.size(), "", 0, settings.max_parser_depth);
}
void DatabaseDictionary::shutdown()

View File

@ -41,7 +41,7 @@ public:
bool empty(const Context & context) const override;
ASTPtr getCreateDatabaseQuery() const override;
ASTPtr getCreateDatabaseQuery(const Context & context) const override;
void shutdown() override;

View File

@ -122,7 +122,7 @@ StoragePtr DatabaseLazy::tryGetTable(
std::lock_guard lock(mutex);
auto it = tables_cache.find(table_name);
if (it == tables_cache.end())
throw Exception("Table " + backQuote(getDatabaseName()) + "." + backQuote(table_name) + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE);
return {};
if (it->second.table)
{
@ -230,7 +230,7 @@ StoragePtr DatabaseLazy::loadTable(const Context & context, const String & table
StoragePtr table;
Context context_copy(context); /// some tables can change context, but not LogTables
auto ast = parseQueryFromMetadata(table_metadata_path, /*throw_on_error*/ true, /*remove_empty*/false);
auto ast = parseQueryFromMetadata(context, table_metadata_path, /*throw_on_error*/ true, /*remove_empty*/false);
if (ast)
{
auto & ast_create = ast->as<const ASTCreateQuery &>();

View File

@ -9,6 +9,7 @@ namespace DB
DatabaseMemory::DatabaseMemory(const String & name_)
: DatabaseWithOwnTablesBase(name_, "DatabaseMemory(" + name_ + ")")
, data_path("data/" + escapeForFileName(database_name) + "/")
{}
void DatabaseMemory::createTable(
@ -27,7 +28,7 @@ void DatabaseMemory::removeTable(
detachTable(table_name);
}
ASTPtr DatabaseMemory::getCreateDatabaseQuery() const
ASTPtr DatabaseMemory::getCreateDatabaseQuery(const Context & /*context*/) const
{
auto create_query = std::make_shared<ASTCreateQuery>();
create_query->database = database_name;

View File

@ -1,6 +1,8 @@
#pragma once
#include <Databases/DatabasesCommon.h>
#include <Common/escapeForFileName.h>
#include <Parsers/ASTCreateQuery.h>
namespace Poco { class Logger; }
@ -31,7 +33,17 @@ public:
const Context & context,
const String & table_name) override;
ASTPtr getCreateDatabaseQuery() const override;
ASTPtr getCreateDatabaseQuery(const Context & /*context*/) const override;
/// DatabaseMemory allows to create tables, which store data on disk.
/// It's needed to create such tables in default database of clickhouse-local.
/// TODO May be it's better to use DiskMemory for such tables.
/// To save data on disk it's possible to explicitly CREATE DATABASE db ENGINE=Ordinary in clickhouse-local.
String getTableDataPath(const String & table_name) const override { return data_path + escapeForFileName(table_name) + "/"; }
String getTableDataPath(const ASTCreateQuery & query) const override { return getTableDataPath(query.table); }
private:
String data_path;
};
}

View File

@ -132,9 +132,9 @@ static ASTPtr getCreateQueryFromStorage(const StoragePtr & storage, const ASTPtr
{
/// init create query.
create_table_query->table = storage->getTableName();
create_table_query->database = storage->getDatabaseName();
auto table_id = storage->getStorageID();
create_table_query->table = table_id.table_name;
create_table_query->database = table_id.database_name;
for (const auto & column_type_and_name : storage->getColumns().getOrdinary())
{
@ -144,7 +144,7 @@ static ASTPtr getCreateQueryFromStorage(const StoragePtr & storage, const ASTPtr
columns_expression_list->children.emplace_back(column_declaration);
}
auto mysql_table_name = std::make_shared<ASTLiteral>(storage->getTableName());
auto mysql_table_name = std::make_shared<ASTLiteral>(table_id.table_name);
auto storage_engine_arguments = table_storage_define->as<ASTStorage>()->engine->arguments;
storage_engine_arguments->children.insert(storage_engine_arguments->children.begin() + 2, mysql_table_name);
}
@ -181,7 +181,7 @@ time_t DatabaseMySQL::getObjectMetadataModificationTime(const String & table_nam
return time_t(local_tables_cache[table_name].first);
}
ASTPtr DatabaseMySQL::getCreateDatabaseQuery() const
ASTPtr DatabaseMySQL::getCreateDatabaseQuery(const Context & /*context*/) const
{
const auto & create_query = std::make_shared<ASTCreateQuery>();
create_query->database = database_name;
@ -239,7 +239,7 @@ void DatabaseMySQL::fetchLatestTablesStructureIntoCache(const std::map<String, U
}
local_tables_cache[table_name] = std::make_pair(table_modification_time, StorageMySQL::create(
database_name, table_name, std::move(mysql_pool), database_name_in_mysql, table_name,
StorageID(database_name, table_name), std::move(mysql_pool), database_name_in_mysql, table_name,
false, "", ColumnsDescription{columns_name_and_type}, ConstraintsDescription{}, global_context));
}
}

View File

@ -32,7 +32,7 @@ public:
DatabaseTablesIteratorPtr getTablesIterator(const Context & context, const FilterByNameFunction & filter_by_table_name = {}) override;
ASTPtr getCreateDatabaseQuery() const override;
ASTPtr getCreateDatabaseQuery(const Context & /*context*/) const override;
bool isTableExist(const Context & context, const String & name) const override;

View File

@ -68,9 +68,12 @@ std::pair<String, StoragePtr> createTableFromAST(
ast_create_query.table,
StorageFactory::instance().get(
ast_create_query,
table_data_path_relative, ast_create_query.table, database_name, context, context.getGlobalContext(),
columns, constraints,
true, has_force_restore_data_flag)
table_data_path_relative,
context,
context.getGlobalContext(),
columns,
constraints,
has_force_restore_data_flag)
};
}
@ -211,7 +214,7 @@ void DatabaseOnDisk::renameTable(
if (!table)
throw Exception("Table " + backQuote(getDatabaseName()) + "." + backQuote(table_name) + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE);
ASTPtr ast = parseQueryFromMetadata(getObjectMetadataPath(table_name));
ASTPtr ast = parseQueryFromMetadata(context, getObjectMetadataPath(table_name));
if (!ast)
throw Exception("There is no metadata file for table " + backQuote(table_name) + ".", ErrorCodes::FILE_DOESNT_EXIST);
auto & create = ast->as<ASTCreateQuery &>();
@ -244,7 +247,7 @@ ASTPtr DatabaseOnDisk::getCreateTableQueryImpl(const Context & context, const St
ASTPtr ast;
auto table_metadata_path = getObjectMetadataPath(table_name);
ast = getCreateQueryFromMetadata(table_metadata_path, throw_on_error);
ast = getCreateQueryFromMetadata(context, table_metadata_path, throw_on_error);
if (!ast && throw_on_error)
{
/// Handle system.* tables for which there are no table.sql files.
@ -260,20 +263,21 @@ ASTPtr DatabaseOnDisk::getCreateTableQueryImpl(const Context & context, const St
return ast;
}
ASTPtr DatabaseOnDisk::getCreateDatabaseQuery() const
ASTPtr DatabaseOnDisk::getCreateDatabaseQuery(const Context & context) const
{
ASTPtr ast;
auto settings = context.getSettingsRef();
auto metadata_dir_path = getMetadataPath();
auto database_metadata_path = metadata_dir_path.substr(0, metadata_dir_path.size() - 1) + ".sql";
ast = getCreateQueryFromMetadata(database_metadata_path, true);
ast = getCreateQueryFromMetadata(context, database_metadata_path, true);
if (!ast)
{
/// Handle databases (such as default) for which there are no database.sql files.
/// If database.sql doesn't exist, then engine is Ordinary
String query = "CREATE DATABASE " + backQuoteIfNeed(getDatabaseName()) + " ENGINE = Ordinary";
ParserCreateQuery parser;
ast = parseQuery(parser, query.data(), query.data() + query.size(), "", 0);
ast = parseQuery(parser, query.data(), query.data() + query.size(), "", 0, settings.max_parser_depth);
}
return ast;
@ -353,7 +357,7 @@ void DatabaseOnDisk::iterateMetadataFiles(const Context & context, const Iterati
}
}
ASTPtr DatabaseOnDisk::parseQueryFromMetadata(const String & metadata_file_path, bool throw_on_error /*= true*/, bool remove_empty /*= false*/) const
ASTPtr DatabaseOnDisk::parseQueryFromMetadata(const Context & context, const String & metadata_file_path, bool throw_on_error /*= true*/, bool remove_empty /*= false*/) const
{
String query;
@ -380,11 +384,12 @@ ASTPtr DatabaseOnDisk::parseQueryFromMetadata(const String & metadata_file_path,
return nullptr;
}
auto settings = context.getSettingsRef();
ParserCreateQuery parser;
const char * pos = query.data();
std::string error_message;
auto ast = tryParseQuery(parser, pos, pos + query.size(), error_message, /* hilite = */ false,
"in file " + getMetadataPath(), /* allow_multi_statements = */ false, 0);
"in file " + getMetadataPath(), /* allow_multi_statements = */ false, 0, settings.max_parser_depth);
if (!ast && throw_on_error)
throw Exception(error_message, ErrorCodes::SYNTAX_ERROR);
@ -394,9 +399,9 @@ ASTPtr DatabaseOnDisk::parseQueryFromMetadata(const String & metadata_file_path,
return ast;
}
ASTPtr DatabaseOnDisk::getCreateQueryFromMetadata(const String & database_metadata_path, bool throw_on_error) const
ASTPtr DatabaseOnDisk::getCreateQueryFromMetadata(const Context & context, const String & database_metadata_path, bool throw_on_error) const
{
ASTPtr ast = parseQueryFromMetadata(database_metadata_path, throw_on_error);
ASTPtr ast = parseQueryFromMetadata(context, database_metadata_path, throw_on_error);
if (ast)
{

View File

@ -52,7 +52,7 @@ public:
const String & to_table_name,
TableStructureWriteLockHolder & lock) override;
ASTPtr getCreateDatabaseQuery() const override;
ASTPtr getCreateDatabaseQuery(const Context & context) const override;
void drop(const Context & context) override;
@ -74,8 +74,8 @@ protected:
const String & table_name,
bool throw_on_error) const override;
ASTPtr parseQueryFromMetadata(const String & metadata_file_path, bool throw_on_error = true, bool remove_empty = false) const;
ASTPtr getCreateQueryFromMetadata(const String & metadata_path, bool throw_on_error) const;
ASTPtr parseQueryFromMetadata(const Context & context, const String & metadata_file_path, bool throw_on_error = true, bool remove_empty = false) const;
ASTPtr getCreateQueryFromMetadata(const Context & context, const String & metadata_path, bool throw_on_error) const;
const String metadata_path;

View File

@ -122,12 +122,12 @@ void DatabaseOrdinary::loadStoredObjects(
FileNames file_names;
size_t total_dictionaries = 0;
iterateMetadataFiles(context, [&file_names, &total_dictionaries, this](const String & file_name)
iterateMetadataFiles(context, [&context, &file_names, &total_dictionaries, this](const String & file_name)
{
String full_path = getMetadataPath() + file_name;
try
{
auto ast = parseQueryFromMetadata(full_path, /*throw_on_error*/ true, /*remove_empty*/false);
auto ast = parseQueryFromMetadata(context, full_path, /*throw_on_error*/ true, /*remove_empty*/false);
if (ast)
{
auto * create_query = ast->as<ASTCreateQuery>();

View File

@ -222,7 +222,7 @@ StoragePtr DatabaseWithDictionaries::getDictionaryStorage(const Context & contex
{
const DictionaryStructure & dictionary_structure = dict_ptr->getStructure();
auto columns = StorageDictionary::getNamesAndTypes(dictionary_structure);
return StorageDictionary::create(database_name, table_name, ColumnsDescription{columns}, context, true, dict_name);
return StorageDictionary::create(StorageID(database_name, table_name), ColumnsDescription{columns}, context, true, dict_name);
}
return nullptr;
}
@ -235,7 +235,7 @@ ASTPtr DatabaseWithDictionaries::getCreateDictionaryQueryImpl(
ASTPtr ast;
auto dictionary_metadata_path = getObjectMetadataPath(dictionary_name);
ast = getCreateQueryFromMetadata(dictionary_metadata_path, throw_on_error);
ast = getCreateQueryFromMetadata(context, dictionary_metadata_path, throw_on_error);
if (!ast && throw_on_error)
{
/// Handle system.* tables for which there are no table.sql files.

View File

@ -56,13 +56,13 @@ public:
DatabaseTablesSnapshotIterator(Tables && tables_) : tables(tables_), it(tables.begin()) {}
void next() { ++it; }
void next() override { ++it; }
bool isValid() const { return it != tables.end(); }
bool isValid() const override { return it != tables.end(); }
const String & name() const { return it->first; }
const String & name() const override { return it->first; }
const StoragePtr & table() const { return it->second; }
const StoragePtr & table() const override { return it->second; }
};
/// Copies list of dictionaries and iterates through such snapshot.
@ -262,7 +262,7 @@ public:
}
/// Get the CREATE DATABASE query for current database.
virtual ASTPtr getCreateDatabaseQuery() const = 0;
virtual ASTPtr getCreateDatabaseQuery(const Context & /*context*/) const = 0;
/// Get name of database.
String getDatabaseName() const { return database_name; }

View File

@ -23,6 +23,8 @@ namespace DB
class Context;
/** Create dictionary according to its layout.
*/
class DictionaryFactory : private boost::noncopyable
{
public:

View File

@ -15,7 +15,57 @@ namespace ErrorCodes
extern const int PATH_ACCESS_DENIED;
}
std::mutex DiskLocal::mutex;
std::mutex DiskLocal::reservation_mutex;
using DiskLocalPtr = std::shared_ptr<DiskLocal>;
class DiskLocalReservation : public IReservation
{
public:
DiskLocalReservation(const DiskLocalPtr & disk_, UInt64 size_)
: disk(disk_), size(size_), metric_increment(CurrentMetrics::DiskSpaceReservedForMerge, size_)
{
}
UInt64 getSize() const override { return size; }
DiskPtr getDisk() const override { return disk; }
void update(UInt64 new_size) override;
~DiskLocalReservation() override;
private:
DiskLocalPtr disk;
UInt64 size;
CurrentMetrics::Increment metric_increment;
};
class DiskLocalDirectoryIterator : public IDiskDirectoryIterator
{
public:
explicit DiskLocalDirectoryIterator(const String & disk_path_, const String & dir_path_) :
dir_path(dir_path_), iter(disk_path_ + dir_path_) {}
void next() override { ++iter; }
bool isValid() const override { return iter != Poco::DirectoryIterator(); }
String path() const override
{
if (iter->isDirectory())
return dir_path + iter.name() + '/';
else
return dir_path + iter.name();
}
private:
String dir_path;
Poco::DirectoryIterator iter;
};
ReservationPtr DiskLocal::reserve(UInt64 bytes)
{
@ -26,7 +76,7 @@ ReservationPtr DiskLocal::reserve(UInt64 bytes)
bool DiskLocal::tryReserve(UInt64 bytes)
{
std::lock_guard lock(mutex);
std::lock_guard lock(DiskLocal::reservation_mutex);
if (bytes == 0)
{
LOG_DEBUG(&Logger::get("DiskLocal"), "Reserving 0 bytes on disk " << backQuote(name));
@ -71,7 +121,7 @@ UInt64 DiskLocal::getAvailableSpace() const
UInt64 DiskLocal::getUnreservedSpace() const
{
std::lock_guard lock(mutex);
std::lock_guard lock(DiskLocal::reservation_mutex);
auto available_space = getAvailableSpace();
available_space -= std::min(available_space, reserved_bytes);
return available_space;
@ -161,20 +211,31 @@ std::unique_ptr<WriteBuffer> DiskLocal::writeFile(const String & path, size_t bu
return std::make_unique<WriteBufferFromFile>(disk_path + path, buf_size, flags);
}
void DiskLocal::remove(const String & path)
{
Poco::File(disk_path + path).remove(false);
}
void DiskLocal::removeRecursive(const String & path)
{
Poco::File(disk_path + path).remove(true);
}
void DiskLocalReservation::update(UInt64 new_size)
{
std::lock_guard lock(DiskLocal::mutex);
std::lock_guard lock(DiskLocal::reservation_mutex);
disk->reserved_bytes -= size;
size = new_size;
disk->reserved_bytes += size;
}
DiskLocalReservation::~DiskLocalReservation()
{
try
{
std::lock_guard lock(DiskLocal::mutex);
std::lock_guard lock(DiskLocal::reservation_mutex);
if (disk->reserved_bytes < size)
{
disk->reserved_bytes = 0;

View File

@ -4,7 +4,6 @@
#include <IO/ReadBufferFromFile.h>
#include <IO/WriteBufferFromFile.h>
#include <mutex>
#include <Poco/DirectoryIterator.h>
#include <Poco/File.h>
@ -71,6 +70,10 @@ public:
std::unique_ptr<WriteBuffer> writeFile(const String & path, size_t buf_size = DBMS_DEFAULT_BUFFER_SIZE, WriteMode mode = WriteMode::Rewrite) override;
void remove(const String & path) override;
void removeRecursive(const String & path) override;
private:
bool tryReserve(UInt64 bytes);
@ -79,61 +82,10 @@ private:
const String disk_path;
const UInt64 keep_free_space_bytes;
/// Used for reservation counters modification
static std::mutex mutex;
UInt64 reserved_bytes = 0;
UInt64 reservation_count = 0;
static std::mutex reservation_mutex;
};
using DiskLocalPtr = std::shared_ptr<DiskLocal>;
class DiskLocalDirectoryIterator : public IDiskDirectoryIterator
{
public:
explicit DiskLocalDirectoryIterator(const String & disk_path_, const String & dir_path_) :
dir_path(dir_path_), iter(disk_path_ + dir_path_) {}
void next() override { ++iter; }
bool isValid() const override { return iter != Poco::DirectoryIterator(); }
String path() const override
{
if (iter->isDirectory())
return dir_path + iter.name() + '/';
else
return dir_path + iter.name();
}
private:
String dir_path;
Poco::DirectoryIterator iter;
};
class DiskLocalReservation : public IReservation
{
public:
DiskLocalReservation(const DiskLocalPtr & disk_, UInt64 size_)
: disk(disk_), size(size_), metric_increment(CurrentMetrics::DiskSpaceReservedForMerge, size_)
{
}
UInt64 getSize() const override { return size; }
DiskPtr getDisk() const override { return disk; }
void update(UInt64 new_size) override;
~DiskLocalReservation() override;
private:
DiskLocalPtr disk;
UInt64 size;
CurrentMetrics::Increment metric_increment;
};
class DiskFactory;
void registerDiskLocal(DiskFactory & factory);
}

View File

@ -10,14 +10,33 @@ namespace DB
{
namespace ErrorCodes
{
extern const int UNKNOWN_ELEMENT_IN_CONFIG;
extern const int EXCESSIVE_ELEMENT_IN_CONFIG;
extern const int FILE_DOESNT_EXIST;
extern const int FILE_ALREADY_EXISTS;
extern const int DIRECTORY_DOESNT_EXIST;
extern const int CANNOT_DELETE_DIRECTORY;
}
class DiskMemoryDirectoryIterator : public IDiskDirectoryIterator
{
public:
explicit DiskMemoryDirectoryIterator(std::vector<String> && dir_file_paths_)
: dir_file_paths(std::move(dir_file_paths_)), iter(dir_file_paths.begin())
{
}
void next() override { ++iter; }
bool isValid() const override { return iter != dir_file_paths.end(); }
String path() const override { return *iter; }
private:
std::vector<String> dir_file_paths;
std::vector<String>::iterator iter;
};
ReservationPtr DiskMemory::reserve(UInt64 /*bytes*/)
{
throw Exception("Method reserve is not implemented for memory disks", ErrorCodes::NOT_IMPLEMENTED);
@ -73,7 +92,7 @@ size_t DiskMemory::getFileSize(const String & path) const
auto iter = files.find(path);
if (iter == files.end())
throw Exception("File " + path + " does not exist", ErrorCodes::FILE_DOESNT_EXIST);
throw Exception("File '" + path + "' does not exist", ErrorCodes::FILE_DOESNT_EXIST);
return iter->second.data.size();
}
@ -88,7 +107,7 @@ void DiskMemory::createDirectory(const String & path)
String parent_path = parentPath(path);
if (!parent_path.empty() && files.find(parent_path) == files.end())
throw Exception(
"Failed to create directory " + path + ". Parent directory " + parent_path + " does not exist",
"Failed to create directory '" + path + "'. Parent directory " + parent_path + " does not exist",
ErrorCodes::DIRECTORY_DOESNT_EXIST);
files.emplace(path, FileData{FileType::Directory});
@ -118,7 +137,7 @@ void DiskMemory::clearDirectory(const String & path)
std::lock_guard lock(mutex);
if (files.find(path) == files.end())
throw Exception("Directory " + path + " does not exist", ErrorCodes::DIRECTORY_DOESNT_EXIST);
throw Exception("Directory '" + path + "' does not exist", ErrorCodes::DIRECTORY_DOESNT_EXIST);
for (auto iter = files.begin(); iter != files.end();)
{
@ -130,7 +149,7 @@ void DiskMemory::clearDirectory(const String & path)
if (iter->second.type == FileType::Directory)
throw Exception(
"Failed to clear directory " + path + ". " + iter->first + " is a directory", ErrorCodes::CANNOT_DELETE_DIRECTORY);
"Failed to clear directory '" + path + "'. " + iter->first + " is a directory", ErrorCodes::CANNOT_DELETE_DIRECTORY);
files.erase(iter++);
}
@ -146,7 +165,7 @@ DiskDirectoryIteratorPtr DiskMemory::iterateDirectory(const String & path)
std::lock_guard lock(mutex);
if (!path.empty() && files.find(path) == files.end())
throw Exception("Directory " + path + " does not exist", ErrorCodes::DIRECTORY_DOESNT_EXIST);
throw Exception("Directory '" + path + "' does not exist", ErrorCodes::DIRECTORY_DOESNT_EXIST);
std::vector<String> dir_file_paths;
for (const auto & file : files)
@ -205,7 +224,7 @@ std::unique_ptr<ReadBuffer> DiskMemory::readFile(const String & path, size_t /*b
auto iter = files.find(path);
if (iter == files.end())
throw Exception("File " + path + " does not exist", ErrorCodes::FILE_DOESNT_EXIST);
throw Exception("File '" + path + "' does not exist", ErrorCodes::FILE_DOESNT_EXIST);
return std::make_unique<ReadBufferFromString>(iter->second.data);
}
@ -220,7 +239,7 @@ std::unique_ptr<WriteBuffer> DiskMemory::writeFile(const String & path, size_t /
String parent_path = parentPath(path);
if (!parent_path.empty() && files.find(parent_path) == files.end())
throw Exception(
"Failed to create file " + path + ". Directory " + parent_path + " does not exist", ErrorCodes::DIRECTORY_DOESNT_EXIST);
"Failed to create file '" + path + "'. Directory " + parent_path + " does not exist", ErrorCodes::DIRECTORY_DOESNT_EXIST);
iter = files.emplace(path, FileData{FileType::File}).first;
}
@ -231,6 +250,46 @@ std::unique_ptr<WriteBuffer> DiskMemory::writeFile(const String & path, size_t /
return std::make_unique<WriteBufferFromString>(iter->second.data);
}
void DiskMemory::remove(const String & path)
{
std::lock_guard lock(mutex);
auto file_it = files.find(path);
if (file_it == files.end())
throw Exception("File '" + path + "' doesn't exist", ErrorCodes::FILE_DOESNT_EXIST);
if (file_it->second.type == FileType::Directory)
{
files.erase(file_it);
if (std::any_of(files.begin(), files.end(), [path](const auto & file) { return parentPath(file.first) == path; }))
throw Exception("Directory '" + path + "' is not empty", ErrorCodes::CANNOT_DELETE_DIRECTORY);
}
else
{
files.erase(file_it);
}
}
void DiskMemory::removeRecursive(const String & path)
{
std::lock_guard lock(mutex);
auto file_it = files.find(path);
if (file_it == files.end())
throw Exception("File '" + path + "' doesn't exist", ErrorCodes::FILE_DOESNT_EXIST);
for (auto iter = files.begin(); iter != files.end();)
{
if (iter->first.size() >= path.size() && std::string_view(iter->first.data(), path.size()) == path)
iter = files.erase(iter);
else
++iter;
}
}
using DiskMemoryPtr = std::shared_ptr<DiskMemory>;
void registerDiskMemory(DiskFactory & factory)
{

View File

@ -1,19 +1,24 @@
#pragma once
#include <Disks/IDisk.h>
#include <IO/ReadBuffer.h>
#include <IO/WriteBuffer.h>
#include <mutex>
#include <memory>
#include <unordered_map>
namespace DB
{
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
}
class ReadBuffer;
class WriteBuffer;
/** Implementation of Disk intended only for testing purposes.
* All filesystem objects are stored in memory and lost on server restart.
*
* NOTE Work in progress. Currently the interface is not viable enough to support MergeTree or even StripeLog tables.
* Please delete this interface if it will not be finished after 2020-06-18.
*/
class DiskMemory : public IDisk
{
public:
@ -62,6 +67,10 @@ public:
size_t buf_size = DBMS_DEFAULT_BUFFER_SIZE,
WriteMode mode = WriteMode::Rewrite) override;
void remove(const String & path) override;
void removeRecursive(const String & path) override;
private:
void createDirectoriesImpl(const String & path);
void replaceFileImpl(const String & from_path, const String & to_path);
@ -88,30 +97,4 @@ private:
mutable std::mutex mutex;
};
using DiskMemoryPtr = std::shared_ptr<DiskMemory>;
class DiskMemoryDirectoryIterator : public IDiskDirectoryIterator
{
public:
explicit DiskMemoryDirectoryIterator(std::vector<String> && dir_file_paths_)
: dir_file_paths(std::move(dir_file_paths_)), iter(dir_file_paths.begin())
{
}
void next() override { ++iter; }
bool isValid() const override { return iter != dir_file_paths.end(); }
String path() const override { return *iter; }
private:
std::vector<String> dir_file_paths;
std::vector<String>::iterator iter;
};
class DiskFactory;
void registerDiskMemory(DiskFactory & factory);
}

439
dbms/src/Disks/DiskS3.cpp Normal file
View File

@ -0,0 +1,439 @@
#include "DiskS3.h"
#if USE_AWS_S3
# include "DiskFactory.h"
# include <random>
# include <IO/S3Common.h>
# include <IO/ReadBufferFromS3.h>
# include <IO/WriteBufferFromS3.h>
# include <IO/ReadBufferFromFile.h>
# include <IO/WriteBufferFromFile.h>
# include <IO/ReadHelpers.h>
# include <IO/WriteHelpers.h>
# include <Poco/File.h>
# include <Common/checkStackSize.h>
# include <Common/quoteString.h>
# include <Common/thread_local_rng.h>
# include <aws/s3/model/CopyObjectRequest.h>
# include <aws/s3/model/DeleteObjectRequest.h>
# include <aws/s3/model/GetObjectRequest.h>
namespace DB
{
namespace ErrorCodes
{
extern const int FILE_ALREADY_EXISTS;
extern const int FILE_DOESNT_EXIST;
extern const int PATH_ACCESS_DENIED;
}
namespace
{
template <typename Result, typename Error>
void throwIfError(Aws::Utils::Outcome<Result, Error> && response)
{
if (!response.IsSuccess())
{
const auto & err = response.GetError();
throw Exception(err.GetMessage(), static_cast<int>(err.GetErrorType()));
}
}
String readKeyFromFile(const String & path)
{
String key;
ReadBufferFromFile buf(path, 1024); /* reasonable buffer size for small file */
readStringUntilEOF(key, buf);
return key;
}
void writeKeyToFile(const String & key, const String & path)
{
WriteBufferFromFile buf(path, 1024);
writeString(key, buf);
buf.next();
}
/// Stores data in S3 and the object key in file in local filesystem.
class WriteIndirectBufferFromS3 : public WriteBufferFromS3
{
public:
WriteIndirectBufferFromS3(
std::shared_ptr<Aws::S3::S3Client> & client_ptr_,
const String & bucket_,
const String & metadata_path_,
const String & s3_path_,
size_t buf_size_)
: WriteBufferFromS3(client_ptr_, bucket_, s3_path_, DEFAULT_BLOCK_SIZE, buf_size_)
, metadata_path(metadata_path_)
, s3_path(s3_path_)
{
}
void finalize() override
{
WriteBufferFromS3::finalize();
writeKeyToFile(s3_path, metadata_path);
finalized = true;
}
~WriteIndirectBufferFromS3() override
{
if (finalized)
return;
try
{
finalize();
}
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__);
}
}
private:
bool finalized = false;
const String metadata_path;
const String s3_path;
};
}
class DiskS3DirectoryIterator : public IDiskDirectoryIterator
{
public:
DiskS3DirectoryIterator(const String & full_path, const String & folder_path_) : iter(full_path), folder_path(folder_path_) {}
void next() override { ++iter; }
bool isValid() const override { return iter != Poco::DirectoryIterator(); }
String path() const override
{
if (iter->isDirectory())
return folder_path + iter.name() + '/';
else
return folder_path + iter.name();
}
private:
Poco::DirectoryIterator iter;
String folder_path;
};
using DiskS3Ptr = std::shared_ptr<DiskS3>;
class DiskS3Reservation : public IReservation
{
public:
DiskS3Reservation(const DiskS3Ptr & disk_, UInt64 size_)
: disk(disk_), size(size_), metric_increment(CurrentMetrics::DiskSpaceReservedForMerge, size_)
{
}
UInt64 getSize() const override { return size; }
DiskPtr getDisk() const override { return disk; }
void update(UInt64 new_size) override
{
std::lock_guard lock(disk->reservation_mutex);
disk->reserved_bytes -= size;
size = new_size;
disk->reserved_bytes += size;
}
~DiskS3Reservation() override;
private:
DiskS3Ptr disk;
UInt64 size;
CurrentMetrics::Increment metric_increment;
};
DiskS3::DiskS3(String name_, std::shared_ptr<Aws::S3::S3Client> client_, String bucket_, String s3_root_path_, String metadata_path_)
: name(std::move(name_))
, client(std::move(client_))
, bucket(std::move(bucket_))
, s3_root_path(std::move(s3_root_path_))
, metadata_path(std::move(metadata_path_))
{
}
ReservationPtr DiskS3::reserve(UInt64 bytes)
{
if (!tryReserve(bytes))
return {};
return std::make_unique<DiskS3Reservation>(std::static_pointer_cast<DiskS3>(shared_from_this()), bytes);
}
bool DiskS3::exists(const String & path) const
{
return Poco::File(metadata_path + path).exists();
}
bool DiskS3::isFile(const String & path) const
{
return Poco::File(metadata_path + path).isFile();
}
bool DiskS3::isDirectory(const String & path) const
{
return Poco::File(metadata_path + path).isDirectory();
}
size_t DiskS3::getFileSize(const String & path) const
{
Aws::S3::Model::GetObjectRequest request;
request.SetBucket(bucket);
request.SetKey(getS3Path(path));
auto outcome = client->GetObject(request);
if (!outcome.IsSuccess())
{
auto & err = outcome.GetError();
throw Exception(err.GetMessage(), static_cast<int>(err.GetErrorType()));
}
else
{
return outcome.GetResult().GetContentLength();
}
}
void DiskS3::createDirectory(const String & path)
{
Poco::File(metadata_path + path).createDirectory();
}
void DiskS3::createDirectories(const String & path)
{
Poco::File(metadata_path + path).createDirectories();
}
DiskDirectoryIteratorPtr DiskS3::iterateDirectory(const String & path)
{
return std::make_unique<DiskS3DirectoryIterator>(metadata_path + path, path);
}
void DiskS3::clearDirectory(const String & path)
{
for (auto it{iterateDirectory(path)}; it->isValid(); it->next())
if (isFile(it->path()))
remove(it->path());
}
void DiskS3::moveFile(const String & from_path, const String & to_path)
{
if (exists(to_path))
throw Exception("File already exists " + to_path, ErrorCodes::FILE_ALREADY_EXISTS);
Poco::File(metadata_path + from_path).renameTo(metadata_path + to_path);
}
void DiskS3::replaceFile(const String & from_path, const String & to_path)
{
Poco::File from_file(metadata_path + from_path);
Poco::File to_file(metadata_path + to_path);
if (to_file.exists())
{
Poco::File tmp_file(metadata_path + to_path + ".old");
to_file.renameTo(tmp_file.path());
from_file.renameTo(metadata_path + to_path);
remove(to_path + ".old");
}
else
from_file.renameTo(to_file.path());
}
void DiskS3::copyFile(const String & from_path, const String & to_path)
{
if (exists(to_path))
remove(to_path);
String s3_from_path = readKeyFromFile(metadata_path + from_path);
String s3_to_path = s3_root_path + getRandomName();
Aws::S3::Model::CopyObjectRequest req;
req.SetBucket(bucket);
req.SetCopySource(s3_from_path);
req.SetKey(s3_to_path);
throwIfError(client->CopyObject(req));
writeKeyToFile(s3_to_path, metadata_path + to_path);
}
std::unique_ptr<ReadBuffer> DiskS3::readFile(const String & path, size_t buf_size) const
{
return std::make_unique<ReadBufferFromS3>(client, bucket, getS3Path(path), buf_size);
}
std::unique_ptr<WriteBuffer> DiskS3::writeFile(const String & path, size_t buf_size, WriteMode mode)
{
if (!exists(path) || mode == WriteMode::Rewrite)
{
String new_s3_path = s3_root_path + getRandomName();
return std::make_unique<WriteIndirectBufferFromS3>(client, bucket, metadata_path + path, new_s3_path, buf_size);
}
else
{
auto old_s3_path = getS3Path(path);
ReadBufferFromS3 read_buffer(client, bucket, old_s3_path, buf_size);
auto writeBuffer = std::make_unique<WriteIndirectBufferFromS3>(client, bucket, metadata_path + path, old_s3_path, buf_size);
std::vector<char> buffer(buf_size);
while (!read_buffer.eof())
writeBuffer->write(buffer.data(), read_buffer.read(buffer.data(), buf_size));
return writeBuffer;
}
}
void DiskS3::remove(const String & path)
{
Poco::File file(metadata_path + path);
if (file.isFile())
{
Aws::S3::Model::DeleteObjectRequest request;
request.SetBucket(bucket);
request.SetKey(getS3Path(path));
throwIfError(client->DeleteObject(request));
}
file.remove();
}
void DiskS3::removeRecursive(const String & path)
{
checkStackSize(); /// This is needed to prevent stack overflow in case of cyclic symlinks.
Poco::File file(metadata_path + path);
if (file.isFile())
{
Aws::S3::Model::DeleteObjectRequest request;
request.SetBucket(bucket);
request.SetKey(getS3Path(path));
throwIfError(client->DeleteObject(request));
}
else
{
for (auto it{iterateDirectory(path)}; it->isValid(); it->next())
removeRecursive(it->path());
}
file.remove();
}
String DiskS3::getS3Path(const String & path) const
{
if (!exists(path))
throw Exception("File not found: " + path, ErrorCodes::FILE_DOESNT_EXIST);
return readKeyFromFile(metadata_path + path);
}
String DiskS3::getRandomName() const
{
std::uniform_int_distribution<int> distribution('a', 'z');
String res(32, ' '); /// The number of bits of entropy should be not less than 128.
for (auto & c : res)
c = distribution(thread_local_rng);
return res;
}
bool DiskS3::tryReserve(UInt64 bytes)
{
std::lock_guard lock(reservation_mutex);
if (bytes == 0)
{
LOG_DEBUG(&Logger::get("DiskS3"), "Reserving 0 bytes on s3 disk " << backQuote(name));
++reservation_count;
return true;
}
auto available_space = getAvailableSpace();
UInt64 unreserved_space = available_space - std::min(available_space, reserved_bytes);
if (unreserved_space >= bytes)
{
LOG_DEBUG(
&Logger::get("DiskS3"),
"Reserving " << formatReadableSizeWithBinarySuffix(bytes) << " on disk " << backQuote(name) << ", having unreserved "
<< formatReadableSizeWithBinarySuffix(unreserved_space) << ".");
++reservation_count;
reserved_bytes += bytes;
return true;
}
return false;
}
DiskS3Reservation::~DiskS3Reservation()
{
try
{
std::lock_guard lock(disk->reservation_mutex);
if (disk->reserved_bytes < size)
{
disk->reserved_bytes = 0;
LOG_ERROR(&Logger::get("DiskLocal"), "Unbalanced reservations size for disk '" + disk->getName() + "'.");
}
else
{
disk->reserved_bytes -= size;
}
if (disk->reservation_count == 0)
LOG_ERROR(&Logger::get("DiskLocal"), "Unbalanced reservation count for disk '" + disk->getName() + "'.");
else
--disk->reservation_count;
}
catch (...)
{
tryLogCurrentException(__PRETTY_FUNCTION__);
}
}
void registerDiskS3(DiskFactory & factory)
{
auto creator = [](const String & name,
const Poco::Util::AbstractConfiguration & config,
const String & config_prefix,
const Context & context) -> DiskPtr {
Poco::File disk{context.getPath() + "disks/" + name};
disk.createDirectories();
S3::URI uri(Poco::URI(config.getString(config_prefix + ".endpoint")));
auto client = S3::ClientFactory::instance().create(
uri.endpoint,
config.getString(config_prefix + ".access_key_id", ""),
config.getString(config_prefix + ".secret_access_key", ""));
if (uri.key.back() != '/')
throw Exception("S3 path must ends with '/', but '" + uri.key + "' doesn't.", ErrorCodes::LOGICAL_ERROR);
String metadata_path = context.getPath() + "disks/" + name + "/";
auto s3disk = std::make_shared<DiskS3>(name, client, uri.bucket, uri.key, metadata_path);
/// This code is used only to check access to the corresponding disk.
{
auto file = s3disk->writeFile("test_acl", DBMS_DEFAULT_BUFFER_SIZE, WriteMode::Rewrite);
file->write("test", 4);
}
{
auto file = s3disk->readFile("test_acl", DBMS_DEFAULT_BUFFER_SIZE);
String buf(4, '0');
file->readStrict(buf.data(), 4);
if (buf != "test")
throw Exception("No read accecss to S3 bucket in disk " + name, ErrorCodes::PATH_ACCESS_DENIED);
}
{
s3disk->remove("test_acl");
}
return s3disk;
};
factory.registerDiskType("s3", creator);
}
}
#endif

93
dbms/src/Disks/DiskS3.h Normal file
View File

@ -0,0 +1,93 @@
#pragma once
#include <Common/config.h>
#if USE_AWS_S3
# include "DiskFactory.h"
# include <aws/s3/S3Client.h>
# include <Poco/DirectoryIterator.h>
namespace DB
{
/**
* Storage for persisting data in S3 and metadata on the local disk.
* Files are represented by file in local filesystem (clickhouse_root/disks/disk_name/path/to/file)
* that contains S3 object key with actual data.
*/
class DiskS3 : public IDisk
{
public:
friend class DiskS3Reservation;
DiskS3(String name_, std::shared_ptr<Aws::S3::S3Client> client_, String bucket_, String s3_root_path_, String metadata_path_);
const String & getName() const override { return name; }
const String & getPath() const override { return s3_root_path; }
ReservationPtr reserve(UInt64 bytes) override;
UInt64 getTotalSpace() const override { return std::numeric_limits<UInt64>::max(); }
UInt64 getAvailableSpace() const override { return std::numeric_limits<UInt64>::max(); }
UInt64 getUnreservedSpace() const override { return std::numeric_limits<UInt64>::max(); }
UInt64 getKeepingFreeSpace() const override { return 0; }
bool exists(const String & path) const override;
bool isFile(const String & path) const override;
bool isDirectory(const String & path) const override;
size_t getFileSize(const String & path) const override;
void createDirectory(const String & path) override;
void createDirectories(const String & path) override;
void clearDirectory(const String & path) override;
void moveDirectory(const String & from_path, const String & to_path) override { moveFile(from_path, to_path); }
DiskDirectoryIteratorPtr iterateDirectory(const String & path) override;
void moveFile(const String & from_path, const String & to_path) override;
void replaceFile(const String & from_path, const String & to_path) override;
void copyFile(const String & from_path, const String & to_path) override;
std::unique_ptr<ReadBuffer> readFile(const String & path, size_t buf_size) const override;
std::unique_ptr<WriteBuffer> writeFile(const String & path, size_t buf_size, WriteMode mode) override;
void remove(const String & path) override;
void removeRecursive(const String & path) override;
private:
String getS3Path(const String & path) const;
String getRandomName() const;
bool tryReserve(UInt64 bytes);
private:
const String name;
std::shared_ptr<Aws::S3::S3Client> client;
const String bucket;
const String s3_root_path;
const String metadata_path;
UInt64 reserved_bytes = 0;
UInt64 reservation_count = 0;
std::mutex reservation_mutex;
};
}
#endif

View File

@ -111,6 +111,12 @@ Volume::Volume(
<< " < " << formatReadableSizeWithBinarySuffix(MIN_PART_SIZE) << ")");
}
DiskPtr Volume::getNextDisk()
{
size_t start_from = last_used.fetch_add(1u, std::memory_order_relaxed);
size_t index = start_from % disks.size();
return disks[index];
}
ReservationPtr Volume::reserve(UInt64 expected_size)
{

View File

@ -67,6 +67,13 @@ public:
const String & config_prefix,
const DiskSelector & disk_selector);
/// Next disk (round-robin)
///
/// - Used with policy for temporary data
/// - Ignores all limitations
/// - Shares last access with reserve()
DiskPtr getNextDisk();
/// Uses Round-robin to choose disk for reservation.
/// Returns valid reservation or nullptr if there is no space left on any disk.
ReservationPtr reserve(UInt64 bytes) override;

View File

@ -2,6 +2,7 @@
namespace DB
{
bool IDisk::isDirectoryEmpty(const String & path)
{
return !iterateDirectory(path)->isValid();

View File

@ -6,6 +6,7 @@
#include <Common/Exception.h>
#include <memory>
#include <mutex>
#include <utility>
#include <boost/noncopyable.hpp>
#include <Poco/Path.h>
@ -97,7 +98,7 @@ public:
/// Create directory and all parent directories if necessary.
virtual void createDirectories(const String & path) = 0;
/// Remove all files from the directory.
/// Remove all files from the directory. Directories are not removed.
virtual void clearDirectory(const String & path) = 0;
/// Move directory from `from_path` to `to_path`.
@ -125,6 +126,12 @@ public:
/// Open the file for write and return WriteBuffer object.
virtual std::unique_ptr<WriteBuffer> writeFile(const String & path, size_t buf_size = DBMS_DEFAULT_BUFFER_SIZE, WriteMode mode = WriteMode::Rewrite) = 0;
/// Remove file or directory. Throws exception if file doesn't exists or if directory is not empty.
virtual void remove(const String & path) = 0;
/// Remove file or directory with all children. Use with extra caution. Throws exception if file doesn't exists.
virtual void removeRecursive(const String & path) = 0;
};
using DiskPtr = std::shared_ptr<IDisk>;
@ -151,7 +158,7 @@ public:
/**
* Information about reserved size on particular disk.
*/
class IReservation
class IReservation : boost::noncopyable
{
public:
/// Get reservation size.

Some files were not shown because too many files have changed in this diff Show More