diff --git a/CHANGELOG.md b/CHANGELOG.md index 950bdc7e374..e1764f07acf 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,145 @@ +## ClickHouse release 20.8 + +### ClickHouse release v20.8.2.3-stable, 2020-09-08 + +#### Backward Incompatible Change + +* Now `OPTIMIZE FINAL` query doesn't recalculate TTL for parts that were added before TTL was created. Use `ALTER TABLE ... MATERIALIZE TTL` once to calculate them, after that `OPTIMIZE FINAL` will evaluate TTL's properly. This behavior never worked for replicated tables. [#14220](https://github.com/ClickHouse/ClickHouse/pull/14220) ([alesapin](https://github.com/alesapin)). +* Extend `parallel_distributed_insert_select` setting, adding an option to run `INSERT` into local table. The setting changes type from `Bool` to `UInt64`, so the values `false` and `true` are no longer supported. If you have these values in server configuration, the server will not start. Please replace them with `0` and `1`, respectively. [#14060](https://github.com/ClickHouse/ClickHouse/pull/14060) ([Azat Khuzhin](https://github.com/azat)). +* Remove support for the `ODBCDriver` input/output format. This was a deprecated format once used for communication with the ClickHouse ODBC driver, now long superseded by the `ODBCDriver2` format. Resolves [#13629](https://github.com/ClickHouse/ClickHouse/issues/13629). [#13847](https://github.com/ClickHouse/ClickHouse/pull/13847) ([hexiaoting](https://github.com/hexiaoting)). + +#### New Feature + +* ClickHouse can work as MySQL replica - it is implemented by `MaterializeMySQL` database engine. Implements [#4006](https://github.com/ClickHouse/ClickHouse/issues/4006). [#10851](https://github.com/ClickHouse/ClickHouse/pull/10851) ([Winter Zhang](https://github.com/zhang2014)). +* Add the ability to specify `Default` compression codec for columns that correspond to settings specified in `config.xml`. Implements: [#9074](https://github.com/ClickHouse/ClickHouse/issues/9074). [#14049](https://github.com/ClickHouse/ClickHouse/pull/14049) ([alesapin](https://github.com/alesapin)). +* Support Kerberos authentication in Kafka, using `krb5` and `cyrus-sasl` libraries. [#12771](https://github.com/ClickHouse/ClickHouse/pull/12771) ([Ilya Golshtein](https://github.com/ilejn)). +* Add function `normalizeQuery` that replaces literals, sequences of literals and complex aliases with placeholders. Add function `normalizedQueryHash` that returns identical 64bit hash values for similar queries. It helps to analyze query log. This closes [#11271](https://github.com/ClickHouse/ClickHouse/issues/11271). [#13816](https://github.com/ClickHouse/ClickHouse/pull/13816) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Add `time_zones` table. [#13880](https://github.com/ClickHouse/ClickHouse/pull/13880) ([Bharat Nallan](https://github.com/bharatnc)). +* Add function `defaultValueOfTypeName` that returns the default value for a given type. [#13877](https://github.com/ClickHouse/ClickHouse/pull/13877) ([hcz](https://github.com/hczhcz)). +* Add `countDigits(x)` function that count number of decimal digits in integer or decimal column. Add `isDecimalOverflow(d, [p])` function that checks if the value in Decimal column is out of its (or specified) precision. [#14151](https://github.com/ClickHouse/ClickHouse/pull/14151) ([Artem Zuikov](https://github.com/4ertus2)). +* Add `quantileExactLow` and `quantileExactHigh` implementations with respective aliases for `medianExactLow` and `medianExactHigh`. [#13818](https://github.com/ClickHouse/ClickHouse/pull/13818) ([Bharat Nallan](https://github.com/bharatnc)). +* Added `date_trunc` function that truncates a date/time value to a specified date/time part. [#13888](https://github.com/ClickHouse/ClickHouse/pull/13888) ([Vladimir Golovchenko](https://github.com/vladimir-golovchenko)). +* Add new optional section `` to the main config. [#13425](https://github.com/ClickHouse/ClickHouse/pull/13425) ([Vitaly Baranov](https://github.com/vitlibar)). +* Add `ALTER SAMPLE BY` statement that allows to change table sample clause. [#13280](https://github.com/ClickHouse/ClickHouse/pull/13280) ([Amos Bird](https://github.com/amosbird)). +* Function `position` now supports optional `start_pos` argument. [#13237](https://github.com/ClickHouse/ClickHouse/pull/13237) ([vdimir](https://github.com/vdimir)). + +#### Bug Fix + +* Fix visible data clobbering by progress bar in client in interactive mode. This fixes [#12562](https://github.com/ClickHouse/ClickHouse/issues/12562) and [#13369](https://github.com/ClickHouse/ClickHouse/issues/13369) and [#13584](https://github.com/ClickHouse/ClickHouse/issues/13584) and fixes [#12964](https://github.com/ClickHouse/ClickHouse/issues/12964). [#13691](https://github.com/ClickHouse/ClickHouse/pull/13691) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Fixed incorrect sorting order if `LowCardinality` column when sorting by multiple columns. This fixes [#13958](https://github.com/ClickHouse/ClickHouse/issues/13958). [#14223](https://github.com/ClickHouse/ClickHouse/pull/14223) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). +* Check for array size overflow in `topK` aggregate function. Without this check the user may send a query with carefully crafter parameters that will lead to server crash. This closes [#14452](https://github.com/ClickHouse/ClickHouse/issues/14452). [#14467](https://github.com/ClickHouse/ClickHouse/pull/14467) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Fix bug which can lead to wrong merges assignment if table has partitions with a single part. [#14444](https://github.com/ClickHouse/ClickHouse/pull/14444) ([alesapin](https://github.com/alesapin)). +* Stop query execution if exception happened in `PipelineExecutor` itself. This could prevent rare possible query hung. Continuation of [#14334](https://github.com/ClickHouse/ClickHouse/issues/14334). [#14402](https://github.com/ClickHouse/ClickHouse/pull/14402) [#14334](https://github.com/ClickHouse/ClickHouse/pull/14334) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix crash during `ALTER` query for table which was created `AS table_function`. Fixes [#14212](https://github.com/ClickHouse/ClickHouse/issues/14212). [#14326](https://github.com/ClickHouse/ClickHouse/pull/14326) ([alesapin](https://github.com/alesapin)). +* Fix exception during ALTER LIVE VIEW query with REFRESH command. Live view is an experimental feature. [#14320](https://github.com/ClickHouse/ClickHouse/pull/14320) ([Bharat Nallan](https://github.com/bharatnc)). +* Fix QueryPlan lifetime (for EXPLAIN PIPELINE graph=1) for queries with nested interpreter. [#14315](https://github.com/ClickHouse/ClickHouse/pull/14315) ([Azat Khuzhin](https://github.com/azat)). +* Fix segfault in `clickhouse-odbc-bridge` during schema fetch from some external sources. This PR fixes https://github.com/ClickHouse/ClickHouse/issues/13861. [#14267](https://github.com/ClickHouse/ClickHouse/pull/14267) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix crash in mark inclusion search introduced in https://github.com/ClickHouse/ClickHouse/pull/12277. [#14225](https://github.com/ClickHouse/ClickHouse/pull/14225) ([Amos Bird](https://github.com/amosbird)). +* Fix creation of tables with named tuples. This fixes [#13027](https://github.com/ClickHouse/ClickHouse/issues/13027). [#14143](https://github.com/ClickHouse/ClickHouse/pull/14143) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Fix formatting of minimal negative decimal numbers. This fixes https://github.com/ClickHouse/ClickHouse/issues/14111. [#14119](https://github.com/ClickHouse/ClickHouse/pull/14119) ([Alexander Kuzmenkov](https://github.com/akuzm)). +* Fix `DistributedFilesToInsert` metric (zeroed when it should not). [#14095](https://github.com/ClickHouse/ClickHouse/pull/14095) ([Azat Khuzhin](https://github.com/azat)). +* Fix `pointInPolygon` with const 2d array as polygon. [#14079](https://github.com/ClickHouse/ClickHouse/pull/14079) ([Alexey Ilyukhov](https://github.com/livace)). +* Fixed wrong mount point in extra info for `Poco::Exception: no space left on device`. [#14050](https://github.com/ClickHouse/ClickHouse/pull/14050) ([tavplubix](https://github.com/tavplubix)). +* Fix GRANT ALL statement when executed on a non-global level. [#13987](https://github.com/ClickHouse/ClickHouse/pull/13987) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix parser to reject create table as table function with engine. [#13940](https://github.com/ClickHouse/ClickHouse/pull/13940) ([hcz](https://github.com/hczhcz)). +* Fix wrong results in select queries with `DISTINCT` keyword and subqueries with UNION ALL in case `optimize_duplicate_order_by_and_distinct` setting is enabled. [#13925](https://github.com/ClickHouse/ClickHouse/pull/13925) ([Artem Zuikov](https://github.com/4ertus2)). +* Fixed potential deadlock when renaming `Distributed` table. [#13922](https://github.com/ClickHouse/ClickHouse/pull/13922) ([tavplubix](https://github.com/tavplubix)). +* Fix incorrect sorting for `FixedString` columns when sorting by multiple columns. Fixes [#13182](https://github.com/ClickHouse/ClickHouse/issues/13182). [#13887](https://github.com/ClickHouse/ClickHouse/pull/13887) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix potentially imprecise result of `topK`/`topKWeighted` merge (with non-default parameters). [#13817](https://github.com/ClickHouse/ClickHouse/pull/13817) ([Azat Khuzhin](https://github.com/azat)). +* Fix reading from MergeTree table with INDEX of type SET fails when comparing against NULL. This fixes [#13686](https://github.com/ClickHouse/ClickHouse/issues/13686). [#13793](https://github.com/ClickHouse/ClickHouse/pull/13793) ([Amos Bird](https://github.com/amosbird)). +* Fix `arrayJoin` capturing in lambda (LOGICAL_ERROR). [#13792](https://github.com/ClickHouse/ClickHouse/pull/13792) ([Azat Khuzhin](https://github.com/azat)). +* Add step overflow check in function `range`. [#13790](https://github.com/ClickHouse/ClickHouse/pull/13790) ([Azat Khuzhin](https://github.com/azat)). +* Fixed `Directory not empty` error when concurrently executing `DROP DATABASE` and `CREATE TABLE`. [#13756](https://github.com/ClickHouse/ClickHouse/pull/13756) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Add range check for `h3KRing` function. This fixes [#13633](https://github.com/ClickHouse/ClickHouse/issues/13633). [#13752](https://github.com/ClickHouse/ClickHouse/pull/13752) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Fix race condition between DETACH and background merges. Parts may revive after detach. This is continuation of [#8602](https://github.com/ClickHouse/ClickHouse/issues/8602) that did not fix the issue but introduced a test that started to fail in very rare cases, demonstrating the issue. [#13746](https://github.com/ClickHouse/ClickHouse/pull/13746) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Fix logging Settings.Names/Values when log_queries_min_type > QUERY_START. [#13737](https://github.com/ClickHouse/ClickHouse/pull/13737) ([Azat Khuzhin](https://github.com/azat)). +* Fixes `/replicas_status` endpoint response status code when verbose=1. [#13722](https://github.com/ClickHouse/ClickHouse/pull/13722) ([javi santana](https://github.com/javisantana)). +* Fix incorrect message in `clickhouse-server.init` while checking user and group. [#13711](https://github.com/ClickHouse/ClickHouse/pull/13711) ([ylchou](https://github.com/ylchou)). +* Do not optimize any(arrayJoin()) -> arrayJoin() under `optimize_move_functions_out_of_any` setting. [#13681](https://github.com/ClickHouse/ClickHouse/pull/13681) ([Azat Khuzhin](https://github.com/azat)). +* Fix crash in JOIN with StorageMerge and `set enable_optimize_predicate_expression=1`. [#13679](https://github.com/ClickHouse/ClickHouse/pull/13679) ([Artem Zuikov](https://github.com/4ertus2)). +* Fix typo in error message about `The value of 'number_of_free_entries_in_pool_to_lower_max_size_of_merge' setting`. [#13678](https://github.com/ClickHouse/ClickHouse/pull/13678) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Concurrent `ALTER ... REPLACE/MOVE PARTITION ...` queries might cause deadlock. It's fixed. [#13626](https://github.com/ClickHouse/ClickHouse/pull/13626) ([tavplubix](https://github.com/tavplubix)). +* Fixed the behaviour when sometimes cache-dictionary returned default value instead of present value from source. [#13624](https://github.com/ClickHouse/ClickHouse/pull/13624) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). +* Fix secondary indices corruption in compact parts. Compact parts are experimental feature. [#13538](https://github.com/ClickHouse/ClickHouse/pull/13538) ([Anton Popov](https://github.com/CurtizJ)). +* Fix premature `ON CLUSTER` timeouts for queries that must be executed on a single replica. Fixes [#6704](https://github.com/ClickHouse/ClickHouse/issues/6704), [#7228](https://github.com/ClickHouse/ClickHouse/issues/7228), [#13361](https://github.com/ClickHouse/ClickHouse/issues/13361), [#11884](https://github.com/ClickHouse/ClickHouse/issues/11884). [#13450](https://github.com/ClickHouse/ClickHouse/pull/13450) ([alesapin](https://github.com/alesapin)). +* Fix wrong code in function `netloc`. This fixes [#13335](https://github.com/ClickHouse/ClickHouse/issues/13335). [#13446](https://github.com/ClickHouse/ClickHouse/pull/13446) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Fix possible race in `StorageMemory`. [#13416](https://github.com/ClickHouse/ClickHouse/pull/13416) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Fix missing or excessive headers in `TSV/CSVWithNames` formats in HTTP protocol. This fixes [#12504](https://github.com/ClickHouse/ClickHouse/issues/12504). [#13343](https://github.com/ClickHouse/ClickHouse/pull/13343) ([Azat Khuzhin](https://github.com/azat)). +* Fix parsing row policies from users.xml when names of databases or tables contain dots. This fixes https://github.com/ClickHouse/ClickHouse/issues/5779, https://github.com/ClickHouse/ClickHouse/issues/12527. [#13199](https://github.com/ClickHouse/ClickHouse/pull/13199) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix access to `redis` dictionary after connection was dropped once. It may happen with `cache` and `direct` dictionary layouts. [#13082](https://github.com/ClickHouse/ClickHouse/pull/13082) ([Anton Popov](https://github.com/CurtizJ)). +* Removed wrong auth access check when using ClickHouseDictionarySource to query remote tables. [#12756](https://github.com/ClickHouse/ClickHouse/pull/12756) ([sundyli](https://github.com/sundy-li)). +* Properly distinguish subqueries in some cases for common subexpression elimination. https://github.com/ClickHouse/ClickHouse/issues/8333. [#8367](https://github.com/ClickHouse/ClickHouse/pull/8367) ([Amos Bird](https://github.com/amosbird)). + +#### Improvement + +* Disallows `CODEC` on `ALIAS` column type. Fixes [#13911](https://github.com/ClickHouse/ClickHouse/issues/13911). [#14263](https://github.com/ClickHouse/ClickHouse/pull/14263) ([Bharat Nallan](https://github.com/bharatnc)). +* When waiting for a dictionary update to complete, use the timeout specified by `query_wait_timeout_milliseconds` setting instead of a hard-coded value. [#14105](https://github.com/ClickHouse/ClickHouse/pull/14105) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). +* Add setting `min_index_granularity_bytes` that protects against accidentally creating a table with very low `index_granularity_bytes` setting. [#14139](https://github.com/ClickHouse/ClickHouse/pull/14139) ([Bharat Nallan](https://github.com/bharatnc)). +* Now it's possible to fetch partitions from clusters that use different ZooKeeper: `ALTER TABLE table_name FETCH PARTITION partition_expr FROM 'zk-name:/path-in-zookeeper'`. It's useful for shipping data to new clusters. [#14155](https://github.com/ClickHouse/ClickHouse/pull/14155) ([Amos Bird](https://github.com/amosbird)). +* Slightly better performance of Memory table if it was constructed from a huge number of very small blocks (that's unlikely). Author of the idea: [Mark Papadakis](https://github.com/markpapadakis). Closes [#14043](https://github.com/ClickHouse/ClickHouse/issues/14043). [#14056](https://github.com/ClickHouse/ClickHouse/pull/14056) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Conditional aggregate functions (for example: `avgIf`, `sumIf`, `maxIf`) should return `NULL` when miss rows and use nullable arguments. [#13964](https://github.com/ClickHouse/ClickHouse/pull/13964) ([Winter Zhang](https://github.com/zhang2014)). +* Increase limit in -Resample combinator to 1M. [#13947](https://github.com/ClickHouse/ClickHouse/pull/13947) ([Mikhail f. Shiryaev](https://github.com/Felixoid)). +* Corrected an error in AvroConfluent format that caused the Kafka table engine to stop processing messages when an abnormally small, malformed, message was received. [#13941](https://github.com/ClickHouse/ClickHouse/pull/13941) ([Gervasio Varela](https://github.com/gervarela)). +* Fix wrong error for long queries. It was possible to get syntax error other than `Max query size exceeded` for correct query. [#13928](https://github.com/ClickHouse/ClickHouse/pull/13928) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Better error message for null value of `TabSeparated` format. [#13906](https://github.com/ClickHouse/ClickHouse/pull/13906) ([jiang tao](https://github.com/tomjiang1987)). +* Function `arrayCompact` will compare NaNs bitwise if the type of array elements is Float32/Float64. In previous versions NaNs were always not equal if the type of array elements is Float32/Float64 and were always equal if the type is more complex, like Nullable(Float64). This closes [#13857](https://github.com/ClickHouse/ClickHouse/issues/13857). [#13868](https://github.com/ClickHouse/ClickHouse/pull/13868) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Fix data race in `lgamma` function. This race was caught only in `tsan`, no side effects a really happened. [#13842](https://github.com/ClickHouse/ClickHouse/pull/13842) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). +* Avoid too slow queries when arrays are manipulated as fields. Throw exception instead. [#13753](https://github.com/ClickHouse/ClickHouse/pull/13753) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Added Redis requirepass authorization (for redis dictionary source). [#13688](https://github.com/ClickHouse/ClickHouse/pull/13688) ([Ivan Torgashov](https://github.com/it1804)). +* Add MergeTree Write-Ahead-Log (WAL) dump tool. WAL is an experimental feature. [#13640](https://github.com/ClickHouse/ClickHouse/pull/13640) ([BohuTANG](https://github.com/BohuTANG)). +* In previous versions `lcm` function may produce assertion violation in debug build if called with specifically crafted arguments. This fixes [#13368](https://github.com/ClickHouse/ClickHouse/issues/13368). [#13510](https://github.com/ClickHouse/ClickHouse/pull/13510) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Provide monotonicity for `toDate/toDateTime` functions in more cases. Monotonicity information is used for index analysis (more complex queries will be able to use index). Now the input arguments are saturated more naturally and provides better monotonicity. [#13497](https://github.com/ClickHouse/ClickHouse/pull/13497) ([Amos Bird](https://github.com/amosbird)). +* Support compound identifiers for custom settings. Custom settings is an integration point of ClickHouse codebase with other codebases (no benefits for ClickHouse itself) [#13496](https://github.com/ClickHouse/ClickHouse/pull/13496) ([Vitaly Baranov](https://github.com/vitlibar)). +* Move parts from DiskLocal to DiskS3 in parallel. `DiskS3` is an experimental feature. [#13459](https://github.com/ClickHouse/ClickHouse/pull/13459) ([Pavel Kovalenko](https://github.com/Jokser)). +* Enable mixed granularity parts by default. [#13449](https://github.com/ClickHouse/ClickHouse/pull/13449) ([alesapin](https://github.com/alesapin)). +* Proper remote host checking in S3 redirects (security-related thing). [#13404](https://github.com/ClickHouse/ClickHouse/pull/13404) ([Vladimir Chebotarev](https://github.com/excitoon)). +* Add `QueryTimeMicroseconds`, `SelectQueryTimeMicroseconds` and `InsertQueryTimeMicroseconds` to system.events. [#13336](https://github.com/ClickHouse/ClickHouse/pull/13336) ([ianton-ru](https://github.com/ianton-ru)). +* Fix debug assertion when Decimal has too large negative exponent. Fixes [#13188](https://github.com/ClickHouse/ClickHouse/issues/13188). [#13228](https://github.com/ClickHouse/ClickHouse/pull/13228) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Added cache layer for DiskS3 (cache to local disk mark and index files). `DiskS3` is an experimental feature. [#13076](https://github.com/ClickHouse/ClickHouse/pull/13076) ([Pavel Kovalenko](https://github.com/Jokser)). +* Fix readline so it dumps history to file now. [#13600](https://github.com/ClickHouse/ClickHouse/pull/13600) ([Amos Bird](https://github.com/amosbird)). +* Create `system` database with `Atomic` engine by default (a preparation to enable `Atomic` database engine by default everywhere). [#13680](https://github.com/ClickHouse/ClickHouse/pull/13680) ([tavplubix](https://github.com/tavplubix)). + +#### Performance Improvement + +* Slightly optimize very short queries with `LowCardinality`. [#14129](https://github.com/ClickHouse/ClickHouse/pull/14129) ([Anton Popov](https://github.com/CurtizJ)). +* Enable parallel INSERTs for table engines `Null`, `Memory`, `Distributed` and `Buffer` when the setting `max_insert_threads` is set. [#14120](https://github.com/ClickHouse/ClickHouse/pull/14120) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Fail fast if `max_rows_to_read` limit is exceeded on parts scan. The motivation behind this change is to skip ranges scan for all selected parts if it is clear that `max_rows_to_read` is already exceeded. The change is quite noticeable for queries over big number of parts. [#13677](https://github.com/ClickHouse/ClickHouse/pull/13677) ([Roman Khavronenko](https://github.com/hagen1778)). +* Slightly improve performance of aggregation by UInt8/UInt16 keys. [#13099](https://github.com/ClickHouse/ClickHouse/pull/13099) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Optimize `has()`, `indexOf()` and `countEqual()` functions for `Array(LowCardinality(T))` and constant right arguments. [#12550](https://github.com/ClickHouse/ClickHouse/pull/12550) ([myrrc](https://github.com/myrrc)). +* When performing trivial `INSERT SELECT` queries, automatically set `max_threads` to 1 or `max_insert_threads`, and set `max_block_size` to `min_insert_block_size_rows`. Related to [#5907](https://github.com/ClickHouse/ClickHouse/issues/5907). [#12195](https://github.com/ClickHouse/ClickHouse/pull/12195) ([flynn](https://github.com/ucasFL)). + +#### Experimental Feature + +* Add types `Int128`, `Int256`, `UInt256` and related functions for them. Extend Decimals with Decimal256 (precision up to 76 digits). New types are under the setting `allow_experimental_bigint_types`. It is working extremely slow and bad. The implementation is incomplete. Please don't use this feature. [#13097](https://github.com/ClickHouse/ClickHouse/pull/13097) ([Artem Zuikov](https://github.com/4ertus2)). + +#### Build/Testing/Packaging Improvement + +* Added `clickhouse install` script, that is useful if you only have a single binary. [#13528](https://github.com/ClickHouse/ClickHouse/pull/13528) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Allow to run `clickhouse` binary without configuration. [#13515](https://github.com/ClickHouse/ClickHouse/pull/13515) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Enable check for typos in code with `codespell`. [#13513](https://github.com/ClickHouse/ClickHouse/pull/13513) [#13511](https://github.com/ClickHouse/ClickHouse/pull/13511) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Enable Shellcheck in CI as a linter of .sh tests. This closes [#13168](https://github.com/ClickHouse/ClickHouse/issues/13168). [#13530](https://github.com/ClickHouse/ClickHouse/pull/13530) [#13529](https://github.com/ClickHouse/ClickHouse/pull/13529) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Add a CMake option to fail configuration instead of auto-reconfiguration, enabled by default. [#13687](https://github.com/ClickHouse/ClickHouse/pull/13687) ([Konstantin](https://github.com/podshumok)). +* Expose version of embedded tzdata via TZDATA_VERSION in system.build_options. [#13648](https://github.com/ClickHouse/ClickHouse/pull/13648) ([filimonov](https://github.com/filimonov)). +* Improve generation of system.time_zones table during build. Closes [#14209](https://github.com/ClickHouse/ClickHouse/issues/14209). [#14215](https://github.com/ClickHouse/ClickHouse/pull/14215) ([filimonov](https://github.com/filimonov)). +* Build ClickHouse with the most fresh tzdata from package repository. [#13623](https://github.com/ClickHouse/ClickHouse/pull/13623) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Add the ability to write js-style comments in skip_list.json. [#14159](https://github.com/ClickHouse/ClickHouse/pull/14159) ([alesapin](https://github.com/alesapin)). +* Ensure that there is no copy-pasted GPL code. [#13514](https://github.com/ClickHouse/ClickHouse/pull/13514) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Switch tests docker images to use test-base parent. [#14167](https://github.com/ClickHouse/ClickHouse/pull/14167) ([Ilya Yatsishin](https://github.com/qoega)). +* Adding retry logic when bringing up docker-compose cluster; Increasing COMPOSE_HTTP_TIMEOUT. [#14112](https://github.com/ClickHouse/ClickHouse/pull/14112) ([vzakaznikov](https://github.com/vzakaznikov)). +* Enabled `system.text_log` in stress test to find more bugs. [#13855](https://github.com/ClickHouse/ClickHouse/pull/13855) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). +* Testflows LDAP module: adding missing certificates and dhparam.pem for openldap4. [#13780](https://github.com/ClickHouse/ClickHouse/pull/13780) ([vzakaznikov](https://github.com/vzakaznikov)). +* ZooKeeper cannot work reliably in unit tests in CI infrastructure. Using unit tests for ZooKeeper interaction with real ZooKeeper is bad idea from the start (unit tests are not supposed to verify complex distributed systems). We already using integration tests for this purpose and they are better suited. [#13745](https://github.com/ClickHouse/ClickHouse/pull/13745) ([alexey-milovidov](https://github.com/alexey-milovidov)). +* Added docker image for style check. Added style check that all docker and docker compose files are located in docker directory. [#13724](https://github.com/ClickHouse/ClickHouse/pull/13724) ([Ilya Yatsishin](https://github.com/qoega)). +* Fix cassandra build on Mac OS. [#13708](https://github.com/ClickHouse/ClickHouse/pull/13708) ([Ilya Yatsishin](https://github.com/qoega)). +* Fix link error in shared build. [#13700](https://github.com/ClickHouse/ClickHouse/pull/13700) ([Amos Bird](https://github.com/amosbird)). +* Updating LDAP user authentication suite to check that it works with RBAC. [#13656](https://github.com/ClickHouse/ClickHouse/pull/13656) ([vzakaznikov](https://github.com/vzakaznikov)). +* Removed `-DENABLE_CURL_CLIENT` for `contrib/aws`. [#13628](https://github.com/ClickHouse/ClickHouse/pull/13628) ([Vladimir Chebotarev](https://github.com/excitoon)). +* Increasing health-check timeouts for ClickHouse nodes and adding support to dump docker-compose logs if unhealthy containers found. [#13612](https://github.com/ClickHouse/ClickHouse/pull/13612) ([vzakaznikov](https://github.com/vzakaznikov)). +* Make sure https://github.com/ClickHouse/ClickHouse/issues/10977 is invalid. [#13539](https://github.com/ClickHouse/ClickHouse/pull/13539) ([Amos Bird](https://github.com/amosbird)). +* Skip PR's from robot-clickhouse. [#13489](https://github.com/ClickHouse/ClickHouse/pull/13489) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). +* Move Dockerfiles from integration tests to `docker/test` directory. docker_compose files are available in `runner` docker container. Docker images are built in CI and not in integration tests. [#13448](https://github.com/ClickHouse/ClickHouse/pull/13448) ([Ilya Yatsishin](https://github.com/qoega)). + + ## ClickHouse release 20.7 ### ClickHouse release v20.7.2.30-stable, 2020-08-31 @@ -22,14 +164,14 @@ * Add setting `allow_non_metadata_alters` which restricts to execute `ALTER` queries which modify data on disk. Disabled be default. Closes [#11547](https://github.com/ClickHouse/ClickHouse/issues/11547). [#12635](https://github.com/ClickHouse/ClickHouse/pull/12635) ([alesapin](https://github.com/alesapin)). * A function `formatRow` is added to support turning arbitrary expressions into a string via given format. It's useful for manipulating SQL outputs and is quite versatile combined with the `columns` function. [#12574](https://github.com/ClickHouse/ClickHouse/pull/12574) ([Amos Bird](https://github.com/amosbird)). * Add `FROM_UNIXTIME` function for compatibility with MySQL, related to [12149](https://github.com/ClickHouse/ClickHouse/issues/12149). [#12484](https://github.com/ClickHouse/ClickHouse/pull/12484) ([flynn](https://github.com/ucasFL)). -* Allow Nullable types as keys in MergeTree tables if `allow_nullable_key` table setting is enabled. https://github.com/ClickHouse/ClickHouse/issues/5319. [#12433](https://github.com/ClickHouse/ClickHouse/pull/12433) ([Amos Bird](https://github.com/amosbird)). +* Allow Nullable types as keys in MergeTree tables if `allow_nullable_key` table setting is enabled. Closes [#5319](https://github.com/ClickHouse/ClickHouse/issues/5319). [#12433](https://github.com/ClickHouse/ClickHouse/pull/12433) ([Amos Bird](https://github.com/amosbird)). * Integration with [COS](https://intl.cloud.tencent.com/product/cos). [#12386](https://github.com/ClickHouse/ClickHouse/pull/12386) ([fastio](https://github.com/fastio)). * Add mapAdd and mapSubtract functions for adding/subtracting key-mapped values. [#11735](https://github.com/ClickHouse/ClickHouse/pull/11735) ([Ildus Kurbangaliev](https://github.com/ildus)). #### Bug Fix * Fix premature `ON CLUSTER` timeouts for queries that must be executed on a single replica. Fixes [#6704](https://github.com/ClickHouse/ClickHouse/issues/6704), [#7228](https://github.com/ClickHouse/ClickHouse/issues/7228), [#13361](https://github.com/ClickHouse/ClickHouse/issues/13361), [#11884](https://github.com/ClickHouse/ClickHouse/issues/11884). [#13450](https://github.com/ClickHouse/ClickHouse/pull/13450) ([alesapin](https://github.com/alesapin)). -* Fix crash in mark inclusion search introduced in https://github.com/ClickHouse/ClickHouse/pull/12277. [#14225](https://github.com/ClickHouse/ClickHouse/pull/14225) ([Amos Bird](https://github.com/amosbird)). +* Fix crash in mark inclusion search introduced in [#12277](https://github.com/ClickHouse/ClickHouse/pull/12277). [#14225](https://github.com/ClickHouse/ClickHouse/pull/14225) ([Amos Bird](https://github.com/amosbird)). * Fix race condition in external dictionaries with cache layout which can lead server crash. [#12566](https://github.com/ClickHouse/ClickHouse/pull/12566) ([alesapin](https://github.com/alesapin)). * Fix visible data clobbering by progress bar in client in interactive mode. This fixes [#12562](https://github.com/ClickHouse/ClickHouse/issues/12562) and [#13369](https://github.com/ClickHouse/ClickHouse/issues/13369) and [#13584](https://github.com/ClickHouse/ClickHouse/issues/13584) and fixes [#12964](https://github.com/ClickHouse/ClickHouse/issues/12964). [#13691](https://github.com/ClickHouse/ClickHouse/pull/13691) ([alexey-milovidov](https://github.com/alexey-milovidov)). * Fixed incorrect sorting order for `LowCardinality` columns when ORDER BY multiple columns is used. This fixes [#13958](https://github.com/ClickHouse/ClickHouse/issues/13958). [#14223](https://github.com/ClickHouse/ClickHouse/pull/14223) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)). @@ -71,7 +213,7 @@ * Fix function if with nullable constexpr as cond that is not literal NULL. Fixes [#12463](https://github.com/ClickHouse/ClickHouse/issues/12463). [#13226](https://github.com/ClickHouse/ClickHouse/pull/13226) ([alexey-milovidov](https://github.com/alexey-milovidov)). * Fix assert in `arrayElement` function in case of array elements are Nullable and array subscript is also Nullable. This fixes [#12172](https://github.com/ClickHouse/ClickHouse/issues/12172). [#13224](https://github.com/ClickHouse/ClickHouse/pull/13224) ([alexey-milovidov](https://github.com/alexey-milovidov)). * Fix DateTime64 conversion functions with constant argument. [#13205](https://github.com/ClickHouse/ClickHouse/pull/13205) ([Azat Khuzhin](https://github.com/azat)). -* Fix parsing row policies from users.xml when names of databases or tables contain dots. This fixes https://github.com/ClickHouse/ClickHouse/issues/5779, https://github.com/ClickHouse/ClickHouse/issues/12527. [#13199](https://github.com/ClickHouse/ClickHouse/pull/13199) ([Vitaly Baranov](https://github.com/vitlibar)). +* Fix parsing row policies from users.xml when names of databases or tables contain dots. This fixes [#5779](https://github.com/ClickHouse/ClickHouse/issues/5779), [#12527](https://github.com/ClickHouse/ClickHouse/issues/12527). [#13199](https://github.com/ClickHouse/ClickHouse/pull/13199) ([Vitaly Baranov](https://github.com/vitlibar)). * Fix access to `redis` dictionary after connection was dropped once. It may happen with `cache` and `direct` dictionary layouts. [#13082](https://github.com/ClickHouse/ClickHouse/pull/13082) ([Anton Popov](https://github.com/CurtizJ)). * Fix wrong index analysis with functions. It could lead to some data parts being skipped when reading from `MergeTree` tables. Fixes [#13060](https://github.com/ClickHouse/ClickHouse/issues/13060). Fixes [#12406](https://github.com/ClickHouse/ClickHouse/issues/12406). [#13081](https://github.com/ClickHouse/ClickHouse/pull/13081) ([Anton Popov](https://github.com/CurtizJ)). * Fix error `Cannot convert column because it is constant but values of constants are different in source and result` for remote queries which use deterministic functions in scope of query, but not deterministic between queries, like `now()`, `now64()`, `randConstant()`. Fixes [#11327](https://github.com/ClickHouse/ClickHouse/issues/11327). [#13075](https://github.com/ClickHouse/ClickHouse/pull/13075) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). @@ -89,7 +231,7 @@ * Fixed [#10572](https://github.com/ClickHouse/ClickHouse/issues/10572) fix bloom filter index with const expression. [#12659](https://github.com/ClickHouse/ClickHouse/pull/12659) ([Winter Zhang](https://github.com/zhang2014)). * Fix SIGSEGV in StorageKafka when broker is unavailable (and not only). [#12658](https://github.com/ClickHouse/ClickHouse/pull/12658) ([Azat Khuzhin](https://github.com/azat)). * Add support for function `if` with `Array(UUID)` arguments. This fixes [#11066](https://github.com/ClickHouse/ClickHouse/issues/11066). [#12648](https://github.com/ClickHouse/ClickHouse/pull/12648) ([alexey-milovidov](https://github.com/alexey-milovidov)). -* CREATE USER IF NOT EXISTS now doesn't throw exception if the user exists. This fixes https://github.com/ClickHouse/ClickHouse/issues/12507. [#12646](https://github.com/ClickHouse/ClickHouse/pull/12646) ([Vitaly Baranov](https://github.com/vitlibar)). +* CREATE USER IF NOT EXISTS now doesn't throw exception if the user exists. This fixes [#12507](https://github.com/ClickHouse/ClickHouse/issues/12507). [#12646](https://github.com/ClickHouse/ClickHouse/pull/12646) ([Vitaly Baranov](https://github.com/vitlibar)). * Exception `There is no supertype...` can be thrown during `ALTER ... UPDATE` in unexpected cases (e.g. when subtracting from UInt64 column). This fixes [#7306](https://github.com/ClickHouse/ClickHouse/issues/7306). This fixes [#4165](https://github.com/ClickHouse/ClickHouse/issues/4165). [#12633](https://github.com/ClickHouse/ClickHouse/pull/12633) ([alexey-milovidov](https://github.com/alexey-milovidov)). * Fix possible `Pipeline stuck` error for queries with external sorting. Fixes [#12617](https://github.com/ClickHouse/ClickHouse/issues/12617). [#12618](https://github.com/ClickHouse/ClickHouse/pull/12618) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). * Fix error `Output of TreeExecutor is not sorted` for `OPTIMIZE DEDUPLICATE`. Fixes [#11572](https://github.com/ClickHouse/ClickHouse/issues/11572). [#12613](https://github.com/ClickHouse/ClickHouse/pull/12613) ([Nikolai Kochetov](https://github.com/KochetovNicolai)). @@ -123,7 +265,7 @@ * Fix assert in `parseDateTimeBestEffort`. This fixes [#12649](https://github.com/ClickHouse/ClickHouse/issues/12649). [#13227](https://github.com/ClickHouse/ClickHouse/pull/13227) ([alexey-milovidov](https://github.com/alexey-milovidov)). * Minor optimization in Processors/PipelineExecutor: breaking out of a loop because it makes sense to do so. [#13058](https://github.com/ClickHouse/ClickHouse/pull/13058) ([Mark Papadakis](https://github.com/markpapadakis)). * Support TRUNCATE table without TABLE keyword. [#12653](https://github.com/ClickHouse/ClickHouse/pull/12653) ([Winter Zhang](https://github.com/zhang2014)). -* Fix explain query format overwrite by default, issue https://github.com/ClickHouse/ClickHouse/issues/12432. [#12541](https://github.com/ClickHouse/ClickHouse/pull/12541) ([BohuTANG](https://github.com/BohuTANG)). +* Fix explain query format overwrite by default. This fixes [#12541](https://github.com/ClickHouse/ClickHouse/issues/12432). [#12541](https://github.com/ClickHouse/ClickHouse/pull/12541) ([BohuTANG](https://github.com/BohuTANG)). * Allow to set JOIN kind and type in more standad way: `LEFT SEMI JOIN` instead of `SEMI LEFT JOIN`. For now both are correct. [#12520](https://github.com/ClickHouse/ClickHouse/pull/12520) ([Artem Zuikov](https://github.com/4ertus2)). * Changes default value for `multiple_joins_rewriter_version` to 2. It enables new multiple joins rewriter that knows about column names. [#12469](https://github.com/ClickHouse/ClickHouse/pull/12469) ([Artem Zuikov](https://github.com/4ertus2)). * Add several metrics for requests to S3 storages. [#12464](https://github.com/ClickHouse/ClickHouse/pull/12464) ([ianton-ru](https://github.com/ianton-ru)). diff --git a/README.md b/README.md index 300ef4555a2..f1c8e17086b 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,7 @@ ClickHouse is an open-source column-oriented database management system that all * [Contacts](https://clickhouse.tech/#contacts) can help to get your questions answered if there are any. * You can also [fill this form](https://clickhouse.tech/#meet) to meet Yandex ClickHouse team in person. -## Upcoming Events +## Upcoming Events -* [ClickHouse Data Integration Virtual Meetup](https://www.eventbrite.com/e/clickhouse-september-virtual-meetup-data-integration-tickets-117421895049) on September 10, 2020. +* [eBay migrating from Druid](https://us02web.zoom.us/webinar/register/tZMkfu6rpjItHtaQ1DXcgPWcSOnmM73HLGKL) on September 23, 2020. +* [ClickHouse for Edge Analytics](https://ones2020.sched.com/event/bWPs) on September 29, 2020. diff --git a/base/common/arithmeticOverflow.h b/base/common/arithmeticOverflow.h index e228af287e2..c20fd635924 100644 --- a/base/common/arithmeticOverflow.h +++ b/base/common/arithmeticOverflow.h @@ -1,6 +1,6 @@ #pragma once -#include +#include namespace common { diff --git a/base/common/extended_types.h b/base/common/extended_types.h new file mode 100644 index 00000000000..fe5f7184954 --- /dev/null +++ b/base/common/extended_types.h @@ -0,0 +1,108 @@ +#pragma once + +#include + +#include +#include + +using Int128 = __int128; + +using wInt256 = wide::integer<256, signed>; +using wUInt256 = wide::integer<256, unsigned>; + +static_assert(sizeof(wInt256) == 32); +static_assert(sizeof(wUInt256) == 32); + +/// The standard library type traits, such as std::is_arithmetic, with one exception +/// (std::common_type), are "set in stone". Attempting to specialize them causes undefined behavior. +/// So instead of using the std type_traits, we use our own version which allows extension. +template +struct is_signed +{ + static constexpr bool value = std::is_signed_v; +}; + +template <> struct is_signed { static constexpr bool value = true; }; +template <> struct is_signed { static constexpr bool value = true; }; + +template +inline constexpr bool is_signed_v = is_signed::value; + +template +struct is_unsigned +{ + static constexpr bool value = std::is_unsigned_v; +}; + +template <> struct is_unsigned { static constexpr bool value = true; }; + +template +inline constexpr bool is_unsigned_v = is_unsigned::value; + + +/// TODO: is_integral includes char, char8_t and wchar_t. +template +struct is_integer +{ + static constexpr bool value = std::is_integral_v; +}; + +template <> struct is_integer { static constexpr bool value = true; }; +template <> struct is_integer { static constexpr bool value = true; }; +template <> struct is_integer { static constexpr bool value = true; }; + +template +inline constexpr bool is_integer_v = is_integer::value; + + +template +struct is_arithmetic +{ + static constexpr bool value = std::is_arithmetic_v; +}; + +template <> struct is_arithmetic<__int128> { static constexpr bool value = true; }; + +template +inline constexpr bool is_arithmetic_v = is_arithmetic::value; + +template +struct make_unsigned +{ + typedef std::make_unsigned_t type; +}; + +template <> struct make_unsigned { using type = unsigned __int128; }; +template <> struct make_unsigned { using type = wUInt256; }; +template <> struct make_unsigned { using type = wUInt256; }; + +template using make_unsigned_t = typename make_unsigned::type; + +template +struct make_signed +{ + typedef std::make_signed_t type; +}; + +template <> struct make_signed { using type = wInt256; }; +template <> struct make_signed { using type = wInt256; }; + +template using make_signed_t = typename make_signed::type; + +template +struct is_big_int +{ + static constexpr bool value = false; +}; + +template <> struct is_big_int { static constexpr bool value = true; }; +template <> struct is_big_int { static constexpr bool value = true; }; + +template +inline constexpr bool is_big_int_v = is_big_int::value; + +template +inline To bigint_cast(const From & x [[maybe_unused]]) +{ + return static_cast(x); +} diff --git a/base/common/throwError.h b/base/common/throwError.h new file mode 100644 index 00000000000..b495a0fbc7a --- /dev/null +++ b/base/common/throwError.h @@ -0,0 +1,13 @@ +#pragma once +#include + +/// Throw DB::Exception-like exception before its definition. +/// DB::Exception derived from Poco::Exception derived from std::exception. +/// DB::Exception generally cought as Poco::Exception. std::exception generally has other catch blocks and could lead to other outcomes. +/// DB::Exception is not defined yet. It'd better to throw Poco::Exception but we do not want to include any big header here, even . +/// So we throw some std::exception instead in the hope its catch block is the same as DB::Exception one. +template +inline void throwError(const T & err) +{ + throw std::runtime_error(err); +} diff --git a/base/common/types.h b/base/common/types.h index 682fe94366c..f3572da2972 100644 --- a/base/common/types.h +++ b/base/common/types.h @@ -1,12 +1,7 @@ #pragma once -#include #include -#include #include -#include - -#include using Int8 = int8_t; using Int16 = int16_t; @@ -23,112 +18,24 @@ using UInt16 = uint16_t; using UInt32 = uint32_t; using UInt64 = uint64_t; -using Int128 = __int128; +using String = std::string; -using wInt256 = std::wide_integer<256, signed>; -using wUInt256 = std::wide_integer<256, unsigned>; +namespace DB +{ -static_assert(sizeof(wInt256) == 32); -static_assert(sizeof(wUInt256) == 32); +using UInt8 = ::UInt8; +using UInt16 = ::UInt16; +using UInt32 = ::UInt32; +using UInt64 = ::UInt64; + +using Int8 = ::Int8; +using Int16 = ::Int16; +using Int32 = ::Int32; +using Int64 = ::Int64; + +using Float32 = float; +using Float64 = double; using String = std::string; -/// The standard library type traits, such as std::is_arithmetic, with one exception -/// (std::common_type), are "set in stone". Attempting to specialize them causes undefined behavior. -/// So instead of using the std type_traits, we use our own version which allows extension. -template -struct is_signed -{ - static constexpr bool value = std::is_signed_v; -}; - -template <> struct is_signed { static constexpr bool value = true; }; -template <> struct is_signed { static constexpr bool value = true; }; - -template -inline constexpr bool is_signed_v = is_signed::value; - -template -struct is_unsigned -{ - static constexpr bool value = std::is_unsigned_v; -}; - -template <> struct is_unsigned { static constexpr bool value = true; }; - -template -inline constexpr bool is_unsigned_v = is_unsigned::value; - - -/// TODO: is_integral includes char, char8_t and wchar_t. -template -struct is_integer -{ - static constexpr bool value = std::is_integral_v; -}; - -template <> struct is_integer { static constexpr bool value = true; }; -template <> struct is_integer { static constexpr bool value = true; }; -template <> struct is_integer { static constexpr bool value = true; }; - -template -inline constexpr bool is_integer_v = is_integer::value; - - -template -struct is_arithmetic -{ - static constexpr bool value = std::is_arithmetic_v; -}; - -template <> struct is_arithmetic<__int128> { static constexpr bool value = true; }; - -template -inline constexpr bool is_arithmetic_v = is_arithmetic::value; - -template -struct make_unsigned -{ - typedef std::make_unsigned_t type; -}; - -template <> struct make_unsigned { using type = unsigned __int128; }; -template <> struct make_unsigned { using type = wUInt256; }; -template <> struct make_unsigned { using type = wUInt256; }; - -template using make_unsigned_t = typename make_unsigned::type; - -template -struct make_signed -{ - typedef std::make_signed_t type; -}; - -template <> struct make_signed { using type = wInt256; }; -template <> struct make_signed { using type = wInt256; }; - -template using make_signed_t = typename make_signed::type; - -template -struct is_big_int -{ - static constexpr bool value = false; -}; - -template <> struct is_big_int { static constexpr bool value = true; }; -template <> struct is_big_int { static constexpr bool value = true; }; - -template -inline constexpr bool is_big_int_v = is_big_int::value; - -template -inline std::string bigintToString(const T & x) -{ - return to_string(x); -} - -template -inline To bigint_cast(const From & x [[maybe_unused]]) -{ - return static_cast(x); } diff --git a/base/common/wide_integer.h b/base/common/wide_integer.h index 67d0b3f04da..2aeac072b3f 100644 --- a/base/common/wide_integer.h +++ b/base/common/wide_integer.h @@ -22,79 +22,87 @@ * without express or implied warranty. */ -#include // CHAR_BIT -#include #include #include #include +#include + +namespace wide +{ +template +class integer; +} namespace std { -template -class wide_integer; template -struct common_type, wide_integer>; +struct common_type, wide::integer>; template -struct common_type, Arithmetic>; +struct common_type, Arithmetic>; template -struct common_type>; +struct common_type>; + +} + +namespace wide +{ template -class wide_integer +class integer { public: using base_type = uint8_t; using signed_base_type = int8_t; // ctors - wide_integer() = default; + integer() = default; template - constexpr wide_integer(T rhs) noexcept; + constexpr integer(T rhs) noexcept; template - constexpr wide_integer(std::initializer_list il) noexcept; + constexpr integer(std::initializer_list il) noexcept; // assignment template - constexpr wide_integer & operator=(const wide_integer & rhs) noexcept; + constexpr integer & operator=(const integer & rhs) noexcept; template - constexpr wide_integer & operator=(Arithmetic rhs) noexcept; + constexpr integer & operator=(Arithmetic rhs) noexcept; template - constexpr wide_integer & operator*=(const Arithmetic & rhs); + constexpr integer & operator*=(const Arithmetic & rhs); template - constexpr wide_integer & operator/=(const Arithmetic & rhs); + constexpr integer & operator/=(const Arithmetic & rhs); template - constexpr wide_integer & operator+=(const Arithmetic & rhs) noexcept(is_same::value); + constexpr integer & operator+=(const Arithmetic & rhs) noexcept(std::is_same_v); template - constexpr wide_integer & operator-=(const Arithmetic & rhs) noexcept(is_same::value); + constexpr integer & operator-=(const Arithmetic & rhs) noexcept(std::is_same_v); template - constexpr wide_integer & operator%=(const Integral & rhs); + constexpr integer & operator%=(const Integral & rhs); template - constexpr wide_integer & operator&=(const Integral & rhs) noexcept; + constexpr integer & operator&=(const Integral & rhs) noexcept; template - constexpr wide_integer & operator|=(const Integral & rhs) noexcept; + constexpr integer & operator|=(const Integral & rhs) noexcept; template - constexpr wide_integer & operator^=(const Integral & rhs) noexcept; + constexpr integer & operator^=(const Integral & rhs) noexcept; - constexpr wide_integer & operator<<=(int n); - constexpr wide_integer & operator>>=(int n) noexcept; + constexpr integer & operator<<=(int n) noexcept; + constexpr integer & operator>>=(int n) noexcept; - constexpr wide_integer & operator++() noexcept(is_same::value); - constexpr wide_integer operator++(int) noexcept(is_same::value); - constexpr wide_integer & operator--() noexcept(is_same::value); - constexpr wide_integer operator--(int) noexcept(is_same::value); + constexpr integer & operator++() noexcept(std::is_same_v); + constexpr integer operator++(int) noexcept(std::is_same_v); + constexpr integer & operator--() noexcept(std::is_same_v); + constexpr integer operator--(int) noexcept(std::is_same_v); // observers @@ -114,10 +122,10 @@ public: private: template - friend class wide_integer; + friend class integer; - friend class numeric_limits>; - friend class numeric_limits>; + friend class std::numeric_limits>; + friend class std::numeric_limits>; base_type m_arr[_impl::arr_size]; }; @@ -134,115 +142,117 @@ using __only_integer = typename std::enable_if() && IntegralC // Unary operators template -constexpr wide_integer operator~(const wide_integer & lhs) noexcept; +constexpr integer operator~(const integer & lhs) noexcept; template -constexpr wide_integer operator-(const wide_integer & lhs) noexcept(is_same::value); +constexpr integer operator-(const integer & lhs) noexcept(std::is_same_v); template -constexpr wide_integer operator+(const wide_integer & lhs) noexcept(is_same::value); +constexpr integer operator+(const integer & lhs) noexcept(std::is_same_v); // Binary operators template -std::common_type_t, wide_integer> constexpr -operator*(const wide_integer & lhs, const wide_integer & rhs); +std::common_type_t, integer> constexpr +operator*(const integer & lhs, const integer & rhs); template > std::common_type_t constexpr operator*(const Arithmetic & rhs, const Arithmetic2 & lhs); template -std::common_type_t, wide_integer> constexpr -operator/(const wide_integer & lhs, const wide_integer & rhs); +std::common_type_t, integer> constexpr +operator/(const integer & lhs, const integer & rhs); template > std::common_type_t constexpr operator/(const Arithmetic & rhs, const Arithmetic2 & lhs); template -std::common_type_t, wide_integer> constexpr -operator+(const wide_integer & lhs, const wide_integer & rhs); +std::common_type_t, integer> constexpr +operator+(const integer & lhs, const integer & rhs); template > std::common_type_t constexpr operator+(const Arithmetic & rhs, const Arithmetic2 & lhs); template -std::common_type_t, wide_integer> constexpr -operator-(const wide_integer & lhs, const wide_integer & rhs); +std::common_type_t, integer> constexpr +operator-(const integer & lhs, const integer & rhs); template > std::common_type_t constexpr operator-(const Arithmetic & rhs, const Arithmetic2 & lhs); template -std::common_type_t, wide_integer> constexpr -operator%(const wide_integer & lhs, const wide_integer & rhs); +std::common_type_t, integer> constexpr +operator%(const integer & lhs, const integer & rhs); template > std::common_type_t constexpr operator%(const Integral & rhs, const Integral2 & lhs); template -std::common_type_t, wide_integer> constexpr -operator&(const wide_integer & lhs, const wide_integer & rhs); +std::common_type_t, integer> constexpr +operator&(const integer & lhs, const integer & rhs); template > std::common_type_t constexpr operator&(const Integral & rhs, const Integral2 & lhs); template -std::common_type_t, wide_integer> constexpr -operator|(const wide_integer & lhs, const wide_integer & rhs); +std::common_type_t, integer> constexpr +operator|(const integer & lhs, const integer & rhs); template > std::common_type_t constexpr operator|(const Integral & rhs, const Integral2 & lhs); template -std::common_type_t, wide_integer> constexpr -operator^(const wide_integer & lhs, const wide_integer & rhs); +std::common_type_t, integer> constexpr +operator^(const integer & lhs, const integer & rhs); template > std::common_type_t constexpr operator^(const Integral & rhs, const Integral2 & lhs); // TODO: Integral template -constexpr wide_integer operator<<(const wide_integer & lhs, int n) noexcept; +constexpr integer operator<<(const integer & lhs, int n) noexcept; template -constexpr wide_integer operator>>(const wide_integer & lhs, int n) noexcept; +constexpr integer operator>>(const integer & lhs, int n) noexcept; template >> -constexpr wide_integer operator<<(const wide_integer & lhs, Int n) noexcept +constexpr integer operator<<(const integer & lhs, Int n) noexcept { return lhs << int(n); } template >> -constexpr wide_integer operator>>(const wide_integer & lhs, Int n) noexcept +constexpr integer operator>>(const integer & lhs, Int n) noexcept { return lhs >> int(n); } template -constexpr bool operator<(const wide_integer & lhs, const wide_integer & rhs); +constexpr bool operator<(const integer & lhs, const integer & rhs); template > constexpr bool operator<(const Arithmetic & rhs, const Arithmetic2 & lhs); template -constexpr bool operator>(const wide_integer & lhs, const wide_integer & rhs); +constexpr bool operator>(const integer & lhs, const integer & rhs); template > constexpr bool operator>(const Arithmetic & rhs, const Arithmetic2 & lhs); template -constexpr bool operator<=(const wide_integer & lhs, const wide_integer & rhs); +constexpr bool operator<=(const integer & lhs, const integer & rhs); template > constexpr bool operator<=(const Arithmetic & rhs, const Arithmetic2 & lhs); template -constexpr bool operator>=(const wide_integer & lhs, const wide_integer & rhs); +constexpr bool operator>=(const integer & lhs, const integer & rhs); template > constexpr bool operator>=(const Arithmetic & rhs, const Arithmetic2 & lhs); template -constexpr bool operator==(const wide_integer & lhs, const wide_integer & rhs); +constexpr bool operator==(const integer & lhs, const integer & rhs); template > constexpr bool operator==(const Arithmetic & rhs, const Arithmetic2 & lhs); template -constexpr bool operator!=(const wide_integer & lhs, const wide_integer & rhs); +constexpr bool operator!=(const integer & lhs, const integer & rhs); template > constexpr bool operator!=(const Arithmetic & rhs, const Arithmetic2 & lhs); -template -std::string to_string(const wide_integer & n); +} + +namespace std +{ template -struct hash>; +struct hash>; } diff --git a/base/common/wide_integer_impl.h b/base/common/wide_integer_impl.h index c77a9120a55..26bd6704bdc 100644 --- a/base/common/wide_integer_impl.h +++ b/base/common/wide_integer_impl.h @@ -1,19 +1,47 @@ /// Original is here https://github.com/cerevra/int #pragma once -#include "wide_integer.h" +#include "throwError.h" -#include -#include +#ifndef CHAR_BIT +#define CHAR_BIT 8 +#endif + +namespace wide +{ + +template +struct IsWideInteger +{ + static const constexpr bool value = false; +}; + +template +struct IsWideInteger> +{ + static const constexpr bool value = true; +}; + +template +static constexpr bool ArithmeticConcept() noexcept +{ + return std::is_arithmetic_v || IsWideInteger::value; +} + +template +static constexpr bool IntegralConcept() noexcept +{ + return std::is_integral_v || IsWideInteger::value; +} + +} namespace std { -#define CT(x) \ - std::common_type_t, std::decay_t> { x } // numeric limits template -class numeric_limits> +class numeric_limits> { public: static constexpr bool is_specialized = true; @@ -40,103 +68,84 @@ public: static constexpr bool traps = true; static constexpr bool tinyness_before = false; - static constexpr wide_integer min() noexcept + static constexpr wide::integer min() noexcept { if (is_same::value) { - using T = wide_integer; + using T = wide::integer; T res{}; - res.m_arr[T::_impl::big(0)] = std::numeric_limits::signed_base_type>::min(); + res.m_arr[T::_impl::big(0)] = std::numeric_limits::signed_base_type>::min(); return res; } return 0; } - static constexpr wide_integer max() noexcept + static constexpr wide::integer max() noexcept { - using T = wide_integer; + using T = wide::integer; T res{}; res.m_arr[T::_impl::big(0)] = is_same::value - ? std::numeric_limits::signed_base_type>::max() - : std::numeric_limits::base_type>::max(); - for (int i = 1; i < wide_integer::_impl::arr_size; ++i) + ? std::numeric_limits::signed_base_type>::max() + : std::numeric_limits::base_type>::max(); + for (int i = 1; i < wide::integer::_impl::arr_size; ++i) { - res.m_arr[T::_impl::big(i)] = std::numeric_limits::base_type>::max(); + res.m_arr[T::_impl::big(i)] = std::numeric_limits::base_type>::max(); } return res; } - static constexpr wide_integer lowest() noexcept { return min(); } - static constexpr wide_integer epsilon() noexcept { return 0; } - static constexpr wide_integer round_error() noexcept { return 0; } - static constexpr wide_integer infinity() noexcept { return 0; } - static constexpr wide_integer quiet_NaN() noexcept { return 0; } - static constexpr wide_integer signaling_NaN() noexcept { return 0; } - static constexpr wide_integer denorm_min() noexcept { return 0; } + static constexpr wide::integer lowest() noexcept { return min(); } + static constexpr wide::integer epsilon() noexcept { return 0; } + static constexpr wide::integer round_error() noexcept { return 0; } + static constexpr wide::integer infinity() noexcept { return 0; } + static constexpr wide::integer quiet_NaN() noexcept { return 0; } + static constexpr wide::integer signaling_NaN() noexcept { return 0; } + static constexpr wide::integer denorm_min() noexcept { return 0; } }; -template -struct IsWideInteger -{ - static const constexpr bool value = false; -}; - -template -struct IsWideInteger> -{ - static const constexpr bool value = true; -}; - -template -static constexpr bool ArithmeticConcept() noexcept -{ - return std::is_arithmetic_v || IsWideInteger::value; -} - -template -static constexpr bool IntegralConcept() noexcept -{ - return std::is_integral_v || IsWideInteger::value; -} - // type traits template -struct common_type, wide_integer> +struct common_type, wide::integer> { using type = std::conditional_t < Bits == Bits2, - wide_integer< + wide::integer< Bits, - std::conditional_t<(std::is_same::value && std::is_same::value), signed, unsigned>>, - std::conditional_t, wide_integer>>; + std::conditional_t<(std::is_same_v && std::is_same_v), signed, unsigned>>, + std::conditional_t, wide::integer>>; }; template -struct common_type, Arithmetic> +struct common_type, Arithmetic> { - static_assert(ArithmeticConcept(), ""); + static_assert(wide::ArithmeticConcept()); using type = std::conditional_t< - std::is_floating_point::value, + std::is_floating_point_v, Arithmetic, std::conditional_t< sizeof(Arithmetic) < Bits * sizeof(long), - wide_integer, + wide::integer, std::conditional_t< Bits * sizeof(long) < sizeof(Arithmetic), Arithmetic, std::conditional_t< - Bits * sizeof(long) == sizeof(Arithmetic) && (is_same::value || std::is_signed::value), + Bits * sizeof(long) == sizeof(Arithmetic) && (std::is_same_v || std::is_signed_v), Arithmetic, - wide_integer>>>>; + wide::integer>>>>; }; template -struct common_type> : std::common_type, Arithmetic> +struct common_type> : common_type, Arithmetic> { }; +} + +namespace wide +{ + template -struct wide_integer::_impl +struct integer::_impl { static_assert(Bits % CHAR_BIT == 0, "=)"); @@ -152,7 +161,7 @@ struct wide_integer::_impl static constexpr unsigned any(unsigned idx) { return idx; } template - constexpr static bool is_negative(const wide_integer & n) noexcept + constexpr static bool is_negative(const integer & n) noexcept { if constexpr (std::is_same_v) return static_cast(n.m_arr[big(0)]) < 0; @@ -161,7 +170,7 @@ struct wide_integer::_impl } template - constexpr static wide_integer make_positive(const wide_integer & n) noexcept + constexpr static integer make_positive(const integer & n) noexcept { return is_negative(n) ? operator_unary_minus(n) : n; } @@ -178,7 +187,7 @@ struct wide_integer::_impl } template - constexpr static void wide_integer_from_bultin(wide_integer & self, Integral rhs) noexcept + constexpr static void wide_integer_from_bultin(integer & self, Integral rhs) noexcept { auto r = _impl::to_Integral(rhs); @@ -197,7 +206,7 @@ struct wide_integer::_impl } } - constexpr static void wide_integer_from_bultin(wide_integer & self, double rhs) noexcept + constexpr static void wide_integer_from_bultin(integer & self, double rhs) noexcept { if ((rhs > 0 && rhs < std::numeric_limits::max()) || (rhs < 0 && rhs > std::numeric_limits::min())) { @@ -223,10 +232,10 @@ struct wide_integer::_impl template constexpr static void - wide_integer_from_wide_integer(wide_integer & self, const wide_integer & rhs) noexcept + wide_integer_from_wide_integer(integer & self, const integer & rhs) noexcept { // int Bits_to_copy = std::min(arr_size, rhs.arr_size); - auto rhs_arr_size = wide_integer::_impl::arr_size; + auto rhs_arr_size = integer::_impl::arr_size; int base_elems_to_copy = _impl::arr_size < rhs_arr_size ? _impl::arr_size : rhs_arr_size; for (int i = 0; i < base_elems_to_copy; ++i) { @@ -244,14 +253,14 @@ struct wide_integer::_impl return sizeof(T) * CHAR_BIT <= Bits; } - constexpr static wide_integer shift_left(const wide_integer & rhs, int n) + constexpr static integer shift_left(const integer & rhs, int n) noexcept { if (static_cast(n) >= base_bits * arr_size) return 0; if (n <= 0) return rhs; - wide_integer lhs = rhs; + integer lhs = rhs; int bit_shift = n % base_bits; unsigned n_bytes = n / base_bits; if (bit_shift) @@ -275,23 +284,19 @@ struct wide_integer::_impl return lhs; } - constexpr static wide_integer shift_left(const wide_integer & rhs, int n) + constexpr static integer shift_left(const integer & rhs, int n) noexcept { - // static_assert(is_negative(rhs), "shift left for negative lhsbers is underfined!"); - if (is_negative(rhs)) - throw std::runtime_error("shift left for negative lhsbers is underfined!"); - - return wide_integer(shift_left(wide_integer(rhs), n)); + return integer(shift_left(integer(rhs), n)); } - constexpr static wide_integer shift_right(const wide_integer & rhs, int n) noexcept + constexpr static integer shift_right(const integer & rhs, int n) noexcept { if (static_cast(n) >= base_bits * arr_size) return 0; if (n <= 0) return rhs; - wide_integer lhs = rhs; + integer lhs = rhs; int bit_shift = n % base_bits; unsigned n_bytes = n / base_bits; if (bit_shift) @@ -315,7 +320,7 @@ struct wide_integer::_impl return lhs; } - constexpr static wide_integer shift_right(const wide_integer & rhs, int n) noexcept + constexpr static integer shift_right(const integer & rhs, int n) noexcept { if (static_cast(n) >= base_bits * arr_size) return 0; @@ -324,14 +329,14 @@ struct wide_integer::_impl bool is_neg = is_negative(rhs); if (!is_neg) - return shift_right(wide_integer(rhs), n); + return shift_right(integer(rhs), n); - wide_integer lhs = rhs; + integer lhs = rhs; int bit_shift = n % base_bits; unsigned n_bytes = n / base_bits; if (bit_shift) { - lhs = shift_right(wide_integer(lhs), bit_shift); + lhs = shift_right(integer(lhs), bit_shift); lhs.m_arr[big(0)] |= std::numeric_limits::max() << (base_bits - bit_shift); } if (n_bytes) @@ -349,8 +354,8 @@ struct wide_integer::_impl } template - constexpr static wide_integer - operator_plus_T(const wide_integer & lhs, T rhs) noexcept(is_same::value) + constexpr static integer + operator_plus_T(const integer & lhs, T rhs) noexcept(std::is_same_v) { if (rhs < 0) return _operator_minus_T(lhs, -rhs); @@ -360,10 +365,10 @@ struct wide_integer::_impl private: template - constexpr static wide_integer - _operator_minus_T(const wide_integer & lhs, T rhs) noexcept(is_same::value) + constexpr static integer + _operator_minus_T(const integer & lhs, T rhs) noexcept(std::is_same_v) { - wide_integer res = lhs; + integer res = lhs; bool is_underflow = false; int r_idx = 0; @@ -399,10 +404,10 @@ private: } template - constexpr static wide_integer - _operator_plus_T(const wide_integer & lhs, T rhs) noexcept(is_same::value) + constexpr static integer + _operator_plus_T(const integer & lhs, T rhs) noexcept(std::is_same_v) { - wide_integer res = lhs; + integer res = lhs; bool is_overflow = false; int r_idx = 0; @@ -438,27 +443,27 @@ private: } public: - constexpr static wide_integer operator_unary_tilda(const wide_integer & lhs) noexcept + constexpr static integer operator_unary_tilda(const integer & lhs) noexcept { - wide_integer res{}; + integer res{}; for (int i = 0; i < arr_size; ++i) res.m_arr[any(i)] = ~lhs.m_arr[any(i)]; return res; } - constexpr static wide_integer - operator_unary_minus(const wide_integer & lhs) noexcept(is_same::value) + constexpr static integer + operator_unary_minus(const integer & lhs) noexcept(std::is_same_v) { return operator_plus_T(operator_unary_tilda(lhs), 1); } template - constexpr static auto operator_plus(const wide_integer & lhs, const T & rhs) noexcept(is_same::value) + constexpr static auto operator_plus(const integer & lhs, const T & rhs) noexcept(std::is_same_v) { if constexpr (should_keep_size()) { - wide_integer t = rhs; + integer t = rhs; if (is_negative(t)) return _operator_minus_wide_integer(lhs, operator_unary_minus(t)); else @@ -467,17 +472,17 @@ public: else { static_assert(T::_impl::_is_wide_integer, ""); - return std::common_type_t, wide_integer>::_impl::operator_plus( - wide_integer(lhs), rhs); + return std::common_type_t, integer>::_impl::operator_plus( + integer(lhs), rhs); } } template - constexpr static auto operator_minus(const wide_integer & lhs, const T & rhs) noexcept(is_same::value) + constexpr static auto operator_minus(const integer & lhs, const T & rhs) noexcept(std::is_same_v) { if constexpr (should_keep_size()) { - wide_integer t = rhs; + integer t = rhs; if (is_negative(t)) return _operator_plus_wide_integer(lhs, operator_unary_minus(t)); else @@ -486,16 +491,16 @@ public: else { static_assert(T::_impl::_is_wide_integer, ""); - return std::common_type_t, wide_integer>::_impl::operator_minus( - wide_integer(lhs), rhs); + return std::common_type_t, integer>::_impl::operator_minus( + integer(lhs), rhs); } } private: - constexpr static wide_integer _operator_minus_wide_integer( - const wide_integer & lhs, const wide_integer & rhs) noexcept(is_same::value) + constexpr static integer _operator_minus_wide_integer( + const integer & lhs, const integer & rhs) noexcept(std::is_same_v) { - wide_integer res = lhs; + integer res = lhs; bool is_underflow = false; for (int idx = 0; idx < arr_size; ++idx) @@ -518,10 +523,10 @@ private: return res; } - constexpr static wide_integer _operator_plus_wide_integer( - const wide_integer & lhs, const wide_integer & rhs) noexcept(is_same::value) + constexpr static integer _operator_plus_wide_integer( + const integer & lhs, const integer & rhs) noexcept(std::is_same_v) { - wide_integer res = lhs; + integer res = lhs; bool is_overflow = false; for (int idx = 0; idx < arr_size; ++idx) @@ -546,14 +551,14 @@ private: public: template - constexpr static auto operator_star(const wide_integer & lhs, const T & rhs) + constexpr static auto operator_star(const integer & lhs, const T & rhs) { if constexpr (should_keep_size()) { - const wide_integer a = make_positive(lhs); - wide_integer t = make_positive(wide_integer(rhs)); + const integer a = make_positive(lhs); + integer t = make_positive(integer(rhs)); - wide_integer res = 0; + integer res = 0; for (size_t i = 0; i < arr_size * base_bits; ++i) { @@ -563,7 +568,7 @@ public: t = shift_right(t, 1); } - if (is_same::value && is_negative(wide_integer(rhs)) != is_negative(lhs)) + if (std::is_same_v && is_negative(integer(rhs)) != is_negative(lhs)) res = operator_unary_minus(res); return res; @@ -571,19 +576,19 @@ public: else { static_assert(T::_impl::_is_wide_integer, ""); - return std::common_type_t, T>::_impl::operator_star(T(lhs), rhs); + return std::common_type_t, T>::_impl::operator_star(T(lhs), rhs); } } template - constexpr static bool operator_more(const wide_integer & lhs, const T & rhs) noexcept + constexpr static bool operator_more(const integer & lhs, const T & rhs) noexcept { if constexpr (should_keep_size()) { // static_assert(Signed == std::is_signed::value, // "warning: operator_more: comparison of integers of different signs"); - wide_integer t = rhs; + integer t = rhs; if (std::numeric_limits::is_signed && (is_negative(lhs) != is_negative(t))) return is_negative(t); @@ -599,19 +604,19 @@ public: else { static_assert(T::_impl::_is_wide_integer, ""); - return std::common_type_t, T>::_impl::operator_more(T(lhs), rhs); + return std::common_type_t, T>::_impl::operator_more(T(lhs), rhs); } } template - constexpr static bool operator_less(const wide_integer & lhs, const T & rhs) noexcept + constexpr static bool operator_less(const integer & lhs, const T & rhs) noexcept { if constexpr (should_keep_size()) { // static_assert(Signed == std::is_signed::value, // "warning: operator_less: comparison of integers of different signs"); - wide_integer t = rhs; + integer t = rhs; if (std::numeric_limits::is_signed && (is_negative(lhs) != is_negative(t))) return is_negative(lhs); @@ -625,16 +630,16 @@ public: else { static_assert(T::_impl::_is_wide_integer, ""); - return std::common_type_t, T>::_impl::operator_less(T(lhs), rhs); + return std::common_type_t, T>::_impl::operator_less(T(lhs), rhs); } } template - constexpr static bool operator_eq(const wide_integer & lhs, const T & rhs) noexcept + constexpr static bool operator_eq(const integer & lhs, const T & rhs) noexcept { if constexpr (should_keep_size()) { - wide_integer t = rhs; + integer t = rhs; for (int i = 0; i < arr_size; ++i) if (lhs.m_arr[any(i)] != t.m_arr[any(i)]) @@ -645,17 +650,17 @@ public: else { static_assert(T::_impl::_is_wide_integer, ""); - return std::common_type_t, T>::_impl::operator_eq(T(lhs), rhs); + return std::common_type_t, T>::_impl::operator_eq(T(lhs), rhs); } } template - constexpr static auto operator_pipe(const wide_integer & lhs, const T & rhs) noexcept + constexpr static auto operator_pipe(const integer & lhs, const T & rhs) noexcept { if constexpr (should_keep_size()) { - wide_integer t = rhs; - wide_integer res = lhs; + integer t = rhs; + integer res = lhs; for (int i = 0; i < arr_size; ++i) res.m_arr[any(i)] |= t.m_arr[any(i)]; @@ -664,17 +669,17 @@ public: else { static_assert(T::_impl::_is_wide_integer, ""); - return std::common_type_t, T>::_impl::operator_pipe(T(lhs), rhs); + return std::common_type_t, T>::_impl::operator_pipe(T(lhs), rhs); } } template - constexpr static auto operator_amp(const wide_integer & lhs, const T & rhs) noexcept + constexpr static auto operator_amp(const integer & lhs, const T & rhs) noexcept { if constexpr (should_keep_size()) { - wide_integer t = rhs; - wide_integer res = lhs; + integer t = rhs; + integer res = lhs; for (int i = 0; i < arr_size; ++i) res.m_arr[any(i)] &= t.m_arr[any(i)]; @@ -683,7 +688,7 @@ public: else { static_assert(T::_impl::_is_wide_integer, ""); - return std::common_type_t, T>::_impl::operator_amp(T(lhs), rhs); + return std::common_type_t, T>::_impl::operator_amp(T(lhs), rhs); } } @@ -702,7 +707,7 @@ private: } if (is_zero) - throw std::domain_error("divide by zero"); + throwError("divide by zero"); T n = lhserator; T d = denominator; @@ -733,15 +738,15 @@ private: public: template - constexpr static auto operator_slash(const wide_integer & lhs, const T & rhs) + constexpr static auto operator_slash(const integer & lhs, const T & rhs) { if constexpr (should_keep_size()) { - wide_integer o = rhs; - wide_integer quotient{}, remainder{}; + integer o = rhs; + integer quotient{}, remainder{}; divide(make_positive(lhs), make_positive(o), quotient, remainder); - if (is_same::value && is_negative(o) != is_negative(lhs)) + if (std::is_same_v && is_negative(o) != is_negative(lhs)) quotient = operator_unary_minus(quotient); return quotient; @@ -749,20 +754,20 @@ public: else { static_assert(T::_impl::_is_wide_integer, ""); - return std::common_type_t, wide_integer>::operator_slash(T(lhs), rhs); + return std::common_type_t, integer>::operator_slash(T(lhs), rhs); } } template - constexpr static auto operator_percent(const wide_integer & lhs, const T & rhs) + constexpr static auto operator_percent(const integer & lhs, const T & rhs) { if constexpr (should_keep_size()) { - wide_integer o = rhs; - wide_integer quotient{}, remainder{}; + integer o = rhs; + integer quotient{}, remainder{}; divide(make_positive(lhs), make_positive(o), quotient, remainder); - if (is_same::value && is_negative(lhs)) + if (std::is_same_v && is_negative(lhs)) remainder = operator_unary_minus(remainder); return remainder; @@ -770,18 +775,18 @@ public: else { static_assert(T::_impl::_is_wide_integer, ""); - return std::common_type_t, wide_integer>::operator_percent(T(lhs), rhs); + return std::common_type_t, integer>::operator_percent(T(lhs), rhs); } } // ^ template - constexpr static auto operator_circumflex(const wide_integer & lhs, const T & rhs) noexcept + constexpr static auto operator_circumflex(const integer & lhs, const T & rhs) noexcept { if constexpr (should_keep_size()) { - wide_integer t(rhs); - wide_integer res = lhs; + integer t(rhs); + integer res = lhs; for (int i = 0; i < arr_size; ++i) res.m_arr[any(i)] ^= t.m_arr[any(i)]; @@ -794,11 +799,11 @@ public: } } - constexpr static wide_integer from_str(const char * c) + constexpr static integer from_str(const char * c) { - wide_integer res = 0; + integer res = 0; - bool is_neg = is_same::value && *c == '-'; + bool is_neg = std::is_same_v && *c == '-'; if (is_neg) ++c; @@ -827,7 +832,7 @@ public: ++c; } else - throw std::runtime_error("invalid char from"); + throwError("invalid char from"); } } else @@ -835,7 +840,7 @@ public: while (*c) { if (*c < '0' || *c > '9') - throw std::runtime_error("invalid char from"); + throwError("invalid char from"); res = operator_star(res, 10U); res = operator_plus_T(res, *c - '0'); @@ -854,7 +859,7 @@ public: template template -constexpr wide_integer::wide_integer(T rhs) noexcept +constexpr integer::integer(T rhs) noexcept : m_arr{} { if constexpr (IsWideInteger::value) @@ -865,7 +870,7 @@ constexpr wide_integer::wide_integer(T rhs) noexcept template template -constexpr wide_integer::wide_integer(std::initializer_list il) noexcept +constexpr integer::integer(std::initializer_list il) noexcept : m_arr{} { if (il.size() == 1) @@ -881,7 +886,7 @@ constexpr wide_integer::wide_integer(std::initializer_list il) template template -constexpr wide_integer & wide_integer::operator=(const wide_integer & rhs) noexcept +constexpr integer & integer::operator=(const integer & rhs) noexcept { _impl::wide_integer_from_wide_integer(*this, rhs); return *this; @@ -889,7 +894,7 @@ constexpr wide_integer & wide_integer::operator=(con template template -constexpr wide_integer & wide_integer::operator=(T rhs) noexcept +constexpr integer & integer::operator=(T rhs) noexcept { _impl::wide_integer_from_bultin(*this, rhs); return *this; @@ -897,7 +902,7 @@ constexpr wide_integer & wide_integer::operator=(T r template template -constexpr wide_integer & wide_integer::operator*=(const T & rhs) +constexpr integer & integer::operator*=(const T & rhs) { *this = *this * rhs; return *this; @@ -905,7 +910,7 @@ constexpr wide_integer & wide_integer::operator*=(co template template -constexpr wide_integer & wide_integer::operator/=(const T & rhs) +constexpr integer & integer::operator/=(const T & rhs) { *this = *this / rhs; return *this; @@ -913,7 +918,7 @@ constexpr wide_integer & wide_integer::operator/=(co template template -constexpr wide_integer & wide_integer::operator+=(const T & rhs) noexcept(is_same::value) +constexpr integer & integer::operator+=(const T & rhs) noexcept(std::is_same_v) { *this = *this + rhs; return *this; @@ -921,7 +926,7 @@ constexpr wide_integer & wide_integer::operator+=(co template template -constexpr wide_integer & wide_integer::operator-=(const T & rhs) noexcept(is_same::value) +constexpr integer & integer::operator-=(const T & rhs) noexcept(std::is_same_v) { *this = *this - rhs; return *this; @@ -929,7 +934,7 @@ constexpr wide_integer & wide_integer::operator-=(co template template -constexpr wide_integer & wide_integer::operator%=(const T & rhs) +constexpr integer & integer::operator%=(const T & rhs) { *this = *this % rhs; return *this; @@ -937,7 +942,7 @@ constexpr wide_integer & wide_integer::operator%=(co template template -constexpr wide_integer & wide_integer::operator&=(const T & rhs) noexcept +constexpr integer & integer::operator&=(const T & rhs) noexcept { *this = *this & rhs; return *this; @@ -945,7 +950,7 @@ constexpr wide_integer & wide_integer::operator&=(co template template -constexpr wide_integer & wide_integer::operator|=(const T & rhs) noexcept +constexpr integer & integer::operator|=(const T & rhs) noexcept { *this = *this | rhs; return *this; @@ -953,35 +958,35 @@ constexpr wide_integer & wide_integer::operator|=(co template template -constexpr wide_integer & wide_integer::operator^=(const T & rhs) noexcept +constexpr integer & integer::operator^=(const T & rhs) noexcept { *this = *this ^ rhs; return *this; } template -constexpr wide_integer & wide_integer::operator<<=(int n) +constexpr integer & integer::operator<<=(int n) noexcept { *this = _impl::shift_left(*this, n); return *this; } template -constexpr wide_integer & wide_integer::operator>>=(int n) noexcept +constexpr integer & integer::operator>>=(int n) noexcept { *this = _impl::shift_right(*this, n); return *this; } template -constexpr wide_integer & wide_integer::operator++() noexcept(is_same::value) +constexpr integer & integer::operator++() noexcept(std::is_same_v) { *this = _impl::operator_plus(*this, 1); return *this; } template -constexpr wide_integer wide_integer::operator++(int) noexcept(is_same::value) +constexpr integer integer::operator++(int) noexcept(std::is_same_v) { auto tmp = *this; *this = _impl::operator_plus(*this, 1); @@ -989,14 +994,14 @@ constexpr wide_integer wide_integer::operator++(int) } template -constexpr wide_integer & wide_integer::operator--() noexcept(is_same::value) +constexpr integer & integer::operator--() noexcept(std::is_same_v) { *this = _impl::operator_minus(*this, 1); return *this; } template -constexpr wide_integer wide_integer::operator--(int) noexcept(is_same::value) +constexpr integer integer::operator--(int) noexcept(std::is_same_v) { auto tmp = *this; *this = _impl::operator_minus(*this, 1); @@ -1004,14 +1009,14 @@ constexpr wide_integer wide_integer::operator--(int) } template -constexpr wide_integer::operator bool() const noexcept +constexpr integer::operator bool() const noexcept { return !_impl::operator_eq(*this, 0); } template template -constexpr wide_integer::operator T() const noexcept +constexpr integer::operator T() const noexcept { static_assert(std::numeric_limits::is_integer, ""); T res = 0; @@ -1023,12 +1028,12 @@ constexpr wide_integer::operator T() const noexcept } template -constexpr wide_integer::operator long double() const noexcept +constexpr integer::operator long double() const noexcept { if (_impl::operator_eq(*this, 0)) return 0; - wide_integer tmp = *this; + integer tmp = *this; if (_impl::is_negative(*this)) tmp = -tmp; @@ -1048,42 +1053,45 @@ constexpr wide_integer::operator long double() const noexcept } template -constexpr wide_integer::operator double() const noexcept +constexpr integer::operator double() const noexcept { return static_cast(*this); } template -constexpr wide_integer::operator float() const noexcept +constexpr integer::operator float() const noexcept { return static_cast(*this); } // Unary operators template -constexpr wide_integer operator~(const wide_integer & lhs) noexcept +constexpr integer operator~(const integer & lhs) noexcept { - return wide_integer::_impl::operator_unary_tilda(lhs); + return integer::_impl::operator_unary_tilda(lhs); } template -constexpr wide_integer operator-(const wide_integer & lhs) noexcept(is_same::value) +constexpr integer operator-(const integer & lhs) noexcept(std::is_same_v) { - return wide_integer::_impl::operator_unary_minus(lhs); + return integer::_impl::operator_unary_minus(lhs); } template -constexpr wide_integer operator+(const wide_integer & lhs) noexcept(is_same::value) +constexpr integer operator+(const integer & lhs) noexcept(std::is_same_v) { return lhs; } +#define CT(x) \ + std::common_type_t, std::decay_t> { x } + // Binary operators template -std::common_type_t, wide_integer> constexpr -operator*(const wide_integer & lhs, const wide_integer & rhs) +std::common_type_t, integer> constexpr +operator*(const integer & lhs, const integer & rhs) { - return std::common_type_t, wide_integer>::_impl::operator_star(lhs, rhs); + return std::common_type_t, integer>::_impl::operator_star(lhs, rhs); } template @@ -1093,10 +1101,10 @@ std::common_type_t constexpr operator*(const Arithmetic } template -std::common_type_t, wide_integer> constexpr -operator/(const wide_integer & lhs, const wide_integer & rhs) +std::common_type_t, integer> constexpr +operator/(const integer & lhs, const integer & rhs) { - return std::common_type_t, wide_integer>::_impl::operator_slash(lhs, rhs); + return std::common_type_t, integer>::_impl::operator_slash(lhs, rhs); } template std::common_type_t constexpr operator/(const Arithmetic & lhs, const Arithmetic2 & rhs) @@ -1105,10 +1113,10 @@ std::common_type_t constexpr operator/(const Arithmetic } template -std::common_type_t, wide_integer> constexpr -operator+(const wide_integer & lhs, const wide_integer & rhs) +std::common_type_t, integer> constexpr +operator+(const integer & lhs, const integer & rhs) { - return std::common_type_t, wide_integer>::_impl::operator_plus(lhs, rhs); + return std::common_type_t, integer>::_impl::operator_plus(lhs, rhs); } template std::common_type_t constexpr operator+(const Arithmetic & lhs, const Arithmetic2 & rhs) @@ -1117,10 +1125,10 @@ std::common_type_t constexpr operator+(const Arithmetic } template -std::common_type_t, wide_integer> constexpr -operator-(const wide_integer & lhs, const wide_integer & rhs) +std::common_type_t, integer> constexpr +operator-(const integer & lhs, const integer & rhs) { - return std::common_type_t, wide_integer>::_impl::operator_minus(lhs, rhs); + return std::common_type_t, integer>::_impl::operator_minus(lhs, rhs); } template std::common_type_t constexpr operator-(const Arithmetic & lhs, const Arithmetic2 & rhs) @@ -1129,10 +1137,10 @@ std::common_type_t constexpr operator-(const Arithmetic } template -std::common_type_t, wide_integer> constexpr -operator%(const wide_integer & lhs, const wide_integer & rhs) +std::common_type_t, integer> constexpr +operator%(const integer & lhs, const integer & rhs) { - return std::common_type_t, wide_integer>::_impl::operator_percent(lhs, rhs); + return std::common_type_t, integer>::_impl::operator_percent(lhs, rhs); } template std::common_type_t constexpr operator%(const Integral & lhs, const Integral2 & rhs) @@ -1141,10 +1149,10 @@ std::common_type_t constexpr operator%(const Integral & lhs } template -std::common_type_t, wide_integer> constexpr -operator&(const wide_integer & lhs, const wide_integer & rhs) +std::common_type_t, integer> constexpr +operator&(const integer & lhs, const integer & rhs) { - return std::common_type_t, wide_integer>::_impl::operator_amp(lhs, rhs); + return std::common_type_t, integer>::_impl::operator_amp(lhs, rhs); } template std::common_type_t constexpr operator&(const Integral & lhs, const Integral2 & rhs) @@ -1153,10 +1161,10 @@ std::common_type_t constexpr operator&(const Integral & lhs } template -std::common_type_t, wide_integer> constexpr -operator|(const wide_integer & lhs, const wide_integer & rhs) +std::common_type_t, integer> constexpr +operator|(const integer & lhs, const integer & rhs) { - return std::common_type_t, wide_integer>::_impl::operator_pipe(lhs, rhs); + return std::common_type_t, integer>::_impl::operator_pipe(lhs, rhs); } template std::common_type_t constexpr operator|(const Integral & lhs, const Integral2 & rhs) @@ -1165,10 +1173,10 @@ std::common_type_t constexpr operator|(const Integral & lhs } template -std::common_type_t, wide_integer> constexpr -operator^(const wide_integer & lhs, const wide_integer & rhs) +std::common_type_t, integer> constexpr +operator^(const integer & lhs, const integer & rhs) { - return std::common_type_t, wide_integer>::_impl::operator_circumflex(lhs, rhs); + return std::common_type_t, integer>::_impl::operator_circumflex(lhs, rhs); } template std::common_type_t constexpr operator^(const Integral & lhs, const Integral2 & rhs) @@ -1177,20 +1185,20 @@ std::common_type_t constexpr operator^(const Integral & lhs } template -constexpr wide_integer operator<<(const wide_integer & lhs, int n) noexcept +constexpr integer operator<<(const integer & lhs, int n) noexcept { - return wide_integer::_impl::shift_left(lhs, n); + return integer::_impl::shift_left(lhs, n); } template -constexpr wide_integer operator>>(const wide_integer & lhs, int n) noexcept +constexpr integer operator>>(const integer & lhs, int n) noexcept { - return wide_integer::_impl::shift_right(lhs, n); + return integer::_impl::shift_right(lhs, n); } template -constexpr bool operator<(const wide_integer & lhs, const wide_integer & rhs) +constexpr bool operator<(const integer & lhs, const integer & rhs) { - return std::common_type_t, wide_integer>::_impl::operator_less(lhs, rhs); + return std::common_type_t, integer>::_impl::operator_less(lhs, rhs); } template constexpr bool operator<(const Arithmetic & lhs, const Arithmetic2 & rhs) @@ -1199,9 +1207,9 @@ constexpr bool operator<(const Arithmetic & lhs, const Arithmetic2 & rhs) } template -constexpr bool operator>(const wide_integer & lhs, const wide_integer & rhs) +constexpr bool operator>(const integer & lhs, const integer & rhs) { - return std::common_type_t, wide_integer>::_impl::operator_more(lhs, rhs); + return std::common_type_t, integer>::_impl::operator_more(lhs, rhs); } template constexpr bool operator>(const Arithmetic & lhs, const Arithmetic2 & rhs) @@ -1210,10 +1218,10 @@ constexpr bool operator>(const Arithmetic & lhs, const Arithmetic2 & rhs) } template -constexpr bool operator<=(const wide_integer & lhs, const wide_integer & rhs) +constexpr bool operator<=(const integer & lhs, const integer & rhs) { - return std::common_type_t, wide_integer>::_impl::operator_less(lhs, rhs) - || std::common_type_t, wide_integer>::_impl::operator_eq(lhs, rhs); + return std::common_type_t, integer>::_impl::operator_less(lhs, rhs) + || std::common_type_t, integer>::_impl::operator_eq(lhs, rhs); } template constexpr bool operator<=(const Arithmetic & lhs, const Arithmetic2 & rhs) @@ -1222,10 +1230,10 @@ constexpr bool operator<=(const Arithmetic & lhs, const Arithmetic2 & rhs) } template -constexpr bool operator>=(const wide_integer & lhs, const wide_integer & rhs) +constexpr bool operator>=(const integer & lhs, const integer & rhs) { - return std::common_type_t, wide_integer>::_impl::operator_more(lhs, rhs) - || std::common_type_t, wide_integer>::_impl::operator_eq(lhs, rhs); + return std::common_type_t, integer>::_impl::operator_more(lhs, rhs) + || std::common_type_t, integer>::_impl::operator_eq(lhs, rhs); } template constexpr bool operator>=(const Arithmetic & lhs, const Arithmetic2 & rhs) @@ -1234,9 +1242,9 @@ constexpr bool operator>=(const Arithmetic & lhs, const Arithmetic2 & rhs) } template -constexpr bool operator==(const wide_integer & lhs, const wide_integer & rhs) +constexpr bool operator==(const integer & lhs, const integer & rhs) { - return std::common_type_t, wide_integer>::_impl::operator_eq(lhs, rhs); + return std::common_type_t, integer>::_impl::operator_eq(lhs, rhs); } template constexpr bool operator==(const Arithmetic & lhs, const Arithmetic2 & rhs) @@ -1245,9 +1253,9 @@ constexpr bool operator==(const Arithmetic & lhs, const Arithmetic2 & rhs) } template -constexpr bool operator!=(const wide_integer & lhs, const wide_integer & rhs) +constexpr bool operator!=(const integer & lhs, const integer & rhs) { - return !std::common_type_t, wide_integer>::_impl::operator_eq(lhs, rhs); + return !std::common_type_t, integer>::_impl::operator_eq(lhs, rhs); } template constexpr bool operator!=(const Arithmetic & lhs, const Arithmetic2 & rhs) @@ -1255,35 +1263,17 @@ constexpr bool operator!=(const Arithmetic & lhs, const Arithmetic2 & rhs) return CT(lhs) != CT(rhs); } -template -inline std::string to_string(const wide_integer & n) -{ - std::string res; - if (wide_integer::_impl::operator_eq(n, 0U)) - return "0"; +#undef CT - wide_integer t; - bool is_neg = wide_integer::_impl::is_negative(n); - if (is_neg) - t = wide_integer::_impl::operator_unary_minus(n); - else - t = n; - - while (!wide_integer::_impl::operator_eq(t, 0U)) - { - res.insert(res.begin(), '0' + char(wide_integer::_impl::operator_percent(t, 10U))); - t = wide_integer::_impl::operator_slash(t, 10U); - } - - if (is_neg) - res.insert(res.begin(), '-'); - return res; } -template -struct hash> +namespace std { - std::size_t operator()(const wide_integer & lhs) const + +template +struct hash> +{ + std::size_t operator()(const wide::integer & lhs) const { static_assert(Bits % (sizeof(size_t) * 8) == 0); @@ -1293,9 +1283,8 @@ struct hash> size_t res = 0; for (unsigned i = 0; i < count; ++i) res ^= ptr[i]; - return hash()(res); + return res; } }; -#undef CT } diff --git a/base/common/wide_integer_to_string.h b/base/common/wide_integer_to_string.h new file mode 100644 index 00000000000..9908ef4be7a --- /dev/null +++ b/base/common/wide_integer_to_string.h @@ -0,0 +1,35 @@ +#pragma once + +#include + +#include "wide_integer.h" + +namespace wide +{ + +template +inline std::string to_string(const integer & n) +{ + std::string res; + if (integer::_impl::operator_eq(n, 0U)) + return "0"; + + integer t; + bool is_neg = integer::_impl::is_negative(n); + if (is_neg) + t = integer::_impl::operator_unary_minus(n); + else + t = n; + + while (!integer::_impl::operator_eq(t, 0U)) + { + res.insert(res.begin(), '0' + char(integer::_impl::operator_percent(t, 10U))); + t = integer::_impl::operator_slash(t, 10U); + } + + if (is_neg) + res.insert(res.begin(), '-'); + return res; +} + +} diff --git a/base/mysqlxx/ResultBase.h b/base/mysqlxx/ResultBase.h index 126a5c1ecca..b72b5682122 100644 --- a/base/mysqlxx/ResultBase.h +++ b/base/mysqlxx/ResultBase.h @@ -1,9 +1,7 @@ #pragma once -#include #include - namespace mysqlxx { @@ -22,6 +20,11 @@ class ResultBase public: ResultBase(MYSQL_RES * res_, Connection * conn_, const Query * query_); + ResultBase(const ResultBase &) = delete; + ResultBase & operator=(const ResultBase &) = delete; + ResultBase(ResultBase &&) = default; + ResultBase & operator=(ResultBase &&) = default; + Connection * getConnection() { return conn; } MYSQL_FIELDS getFields() { return fields; } unsigned getNumFields() { return num_fields; } diff --git a/base/mysqlxx/Value.h b/base/mysqlxx/Value.h index 9fdb33a442d..dfa86e8aa7d 100644 --- a/base/mysqlxx/Value.h +++ b/base/mysqlxx/Value.h @@ -254,7 +254,23 @@ template <> inline std::string Value::get() cons template <> inline LocalDate Value::get() const { return getDate(); } template <> inline LocalDateTime Value::get() const { return getDateTime(); } -template inline T Value::get() const { return T(*this); } + +namespace details +{ +// To avoid stack overflow when converting to type with no appropriate c-tor, +// resulting in endless recursive calls from `Value::get()` to `Value::operator T()` to `Value::get()` to ... +template >> +inline T contructFromValue(const Value & val) +{ + return T(val); +} +} + +template +inline T Value::get() const +{ + return details::contructFromValue(*this); +} inline std::ostream & operator<< (std::ostream & ostr, const Value & x) diff --git a/cmake/autogenerated_versions.txt b/cmake/autogenerated_versions.txt index 27586821af2..6ca3999ff7f 100644 --- a/cmake/autogenerated_versions.txt +++ b/cmake/autogenerated_versions.txt @@ -1,9 +1,9 @@ # This strings autochanged from release_lib.sh: -SET(VERSION_REVISION 54439) +SET(VERSION_REVISION 54440) SET(VERSION_MAJOR 20) -SET(VERSION_MINOR 9) +SET(VERSION_MINOR 10) SET(VERSION_PATCH 1) -SET(VERSION_GITHASH 0586f0d555f7481b394afc55bbb29738cd573a1c) -SET(VERSION_DESCRIBE v20.9.1.1-prestable) -SET(VERSION_STRING 20.9.1.1) +SET(VERSION_GITHASH 11a247d2f42010c1a17bf678c3e00a4bc89b23f8) +SET(VERSION_DESCRIBE v20.10.1.1-prestable) +SET(VERSION_STRING 20.10.1.1) # end of autochange diff --git a/cmake/sanitize.cmake b/cmake/sanitize.cmake index 32443ed78c3..7c7e9c388a0 100644 --- a/cmake/sanitize.cmake +++ b/cmake/sanitize.cmake @@ -36,7 +36,15 @@ if (SANITIZE) endif () elseif (SANITIZE STREQUAL "thread") - set (TSAN_FLAGS "-fsanitize=thread -fsanitize-blacklist=${CMAKE_SOURCE_DIR}/tests/tsan_suppressions.txt") + set (TSAN_FLAGS "-fsanitize=thread") + if (COMPILER_CLANG) + set (TSAN_FLAGS "${TSAN_FLAGS} -fsanitize-blacklist=${CMAKE_SOURCE_DIR}/tests/tsan_suppressions.txt") + else() + message (WARNING "TSAN suppressions was not passed to the compiler (since the compiler is not clang)") + message (WARNING "Use the following command to pass them manually:") + message (WARNING " export TSAN_OPTIONS=\"$TSAN_OPTIONS suppressions=${CMAKE_SOURCE_DIR}/tests/tsan_suppressions.txt\"") + endif() + set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SAN_FLAGS} ${TSAN_FLAGS}") set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${SAN_FLAGS} ${TSAN_FLAGS}") diff --git a/cmake/warnings.cmake b/cmake/warnings.cmake index 2f78dc34079..6b26b9b95a5 100644 --- a/cmake/warnings.cmake +++ b/cmake/warnings.cmake @@ -23,7 +23,7 @@ option (WEVERYTHING "Enables -Weverything option with some exceptions. This is i # Control maximum size of stack frames. It can be important if the code is run in fibers with small stack size. # Only in release build because debug has too large stack frames. if ((NOT CMAKE_BUILD_TYPE_UC STREQUAL "DEBUG") AND (NOT SANITIZE)) - add_warning(frame-larger-than=16384) + add_warning(frame-larger-than=32768) endif () if (COMPILER_CLANG) @@ -169,9 +169,16 @@ elseif (COMPILER_GCC) # Warn if vector operation is not implemented via SIMD capabilities of the architecture add_cxx_compile_options(-Wvector-operation-performance) - # XXX: gcc10 stuck with this option while compiling GatherUtils code - # (anyway there are builds with clang, that will warn) if (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER_EQUAL 10) + # XXX: gcc10 stuck with this option while compiling GatherUtils code + # (anyway there are builds with clang, that will warn) add_cxx_compile_options(-Wno-sequence-point) + # XXX: gcc10 false positive with this warning in MergeTreePartition.cpp + # inlined from 'void writeHexByteLowercase(UInt8, void*)' at ../src/Common/hex.h:39:11, + # inlined from 'DB::String DB::MergeTreePartition::getID(const DB::Block&) const' at ../src/Storages/MergeTree/MergeTreePartition.cpp:85:30: + # ../contrib/libc-headers/x86_64-linux-gnu/bits/string_fortified.h:34:33: error: writing 2 bytes into a region of size 0 [-Werror=stringop-overflow=] + # 34 | return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest)); + # For some reason (bug in gcc?) macro 'GCC diagnostic ignored "-Wstringop-overflow"' doesn't help. + add_cxx_compile_options(-Wno-stringop-overflow) endif() endif () diff --git a/contrib/llvm b/contrib/llvm index 3d6c7e91676..8f24d507c1c 160000 --- a/contrib/llvm +++ b/contrib/llvm @@ -1 +1 @@ -Subproject commit 3d6c7e916760b395908f28a1c885c8334d4fa98b +Subproject commit 8f24d507c1cfeec66d27f48fe74518fd278e2d25 diff --git a/debian/changelog b/debian/changelog index c7c20ccd6d0..244b2b1fde4 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,5 +1,5 @@ -clickhouse (20.9.1.1) unstable; urgency=low +clickhouse (20.10.1.1) unstable; urgency=low * Modified source code - -- clickhouse-release Mon, 31 Aug 2020 23:07:38 +0300 + -- clickhouse-release Tue, 08 Sep 2020 17:04:39 +0300 diff --git a/docker/client/Dockerfile b/docker/client/Dockerfile index 36ca0ee107a..5ce506aafa3 100644 --- a/docker/client/Dockerfile +++ b/docker/client/Dockerfile @@ -1,7 +1,7 @@ FROM ubuntu:18.04 ARG repository="deb https://repo.clickhouse.tech/deb/stable/ main/" -ARG version=20.9.1.* +ARG version=20.10.1.* RUN apt-get update \ && apt-get install --yes --no-install-recommends \ diff --git a/docker/packager/binary/Dockerfile b/docker/packager/binary/Dockerfile index 45c35c2e0f3..03bb3b5aefa 100644 --- a/docker/packager/binary/Dockerfile +++ b/docker/packager/binary/Dockerfile @@ -32,8 +32,6 @@ RUN apt-get update \ curl \ gcc-9 \ g++-9 \ - gcc-10 \ - g++-10 \ llvm-${LLVM_VERSION} \ clang-${LLVM_VERSION} \ lld-${LLVM_VERSION} \ @@ -93,5 +91,16 @@ RUN wget -nv "https://developer.arm.com/-/media/Files/downloads/gnu-a/8.3-2019.0 # Download toolchain for FreeBSD 11.3 RUN wget -nv https://clickhouse-datasets.s3.yandex.net/toolchains/toolchains/freebsd-11.3-toolchain.tar.xz +# NOTE: For some reason we have outdated version of gcc-10 in ubuntu 20.04 stable. +# Current workaround is to use latest version proposed repo. Remove as soon as +# gcc-10.2 appear in stable repo. +RUN echo 'deb http://archive.ubuntu.com/ubuntu/ focal-proposed restricted main multiverse universe' > /etc/apt/sources.list.d/proposed-repositories.list + +RUN apt-get update \ + && apt-get install gcc-10 g++-10 --yes + +RUN rm /etc/apt/sources.list.d/proposed-repositories.list && apt-get update + + COPY build.sh / CMD ["/bin/bash", "/build.sh"] diff --git a/docker/packager/binary/build.sh b/docker/packager/binary/build.sh index 72adba5d762..7c3de9aaebd 100755 --- a/docker/packager/binary/build.sh +++ b/docker/packager/binary/build.sh @@ -18,7 +18,7 @@ ccache --zero-stats ||: ln -s /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0 /usr/lib/libOpenCL.so ||: rm -f CMakeCache.txt cmake --debug-trycompile --verbose=1 -DCMAKE_VERBOSE_MAKEFILE=1 -LA -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DSANITIZE=$SANITIZER $CMAKE_FLAGS .. -ninja $NINJA_FLAGS clickhouse-bundle +ninja -j $(($(nproc) / 2)) $NINJA_FLAGS clickhouse-bundle mv ./programs/clickhouse* /output mv ./src/unit_tests_dbms /output find . -name '*.so' -print -exec mv '{}' /output \; diff --git a/docker/packager/deb/Dockerfile b/docker/packager/deb/Dockerfile index 87f4582f8e2..a3c87f13fe4 100644 --- a/docker/packager/deb/Dockerfile +++ b/docker/packager/deb/Dockerfile @@ -42,8 +42,6 @@ RUN export CODENAME="$(lsb_release --codename --short | tr 'A-Z' 'a-z')" \ # Libraries from OS are only needed to test the "unbundled" build (this is not used in production). RUN apt-get update \ && apt-get install \ - gcc-10 \ - g++-10 \ gcc-9 \ g++-9 \ clang-11 \ @@ -75,6 +73,16 @@ RUN apt-get update \ pigz \ --yes --no-install-recommends +# NOTE: For some reason we have outdated version of gcc-10 in ubuntu 20.04 stable. +# Current workaround is to use latest version proposed repo. Remove as soon as +# gcc-10.2 appear in stable repo. +RUN echo 'deb http://archive.ubuntu.com/ubuntu/ focal-proposed restricted main multiverse universe' > /etc/apt/sources.list.d/proposed-repositories.list + +RUN apt-get update \ + && apt-get install gcc-10 g++-10 --yes --no-install-recommends + +RUN rm /etc/apt/sources.list.d/proposed-repositories.list && apt-get update + # This symlink required by gcc to find lld compiler RUN ln -s /usr/bin/lld-${LLVM_VERSION} /usr/bin/ld.lld diff --git a/docker/packager/packager b/docker/packager/packager index 5874bedd17a..909f20acd6d 100755 --- a/docker/packager/packager +++ b/docker/packager/packager @@ -93,7 +93,7 @@ def parse_env_variables(build_type, compiler, sanitizer, package_type, image_typ cxx = cc.replace('gcc', 'g++').replace('clang', 'clang++') - if image_type == "deb": + if image_type == "deb" or image_type == "unbundled": result.append("DEB_CC={}".format(cc)) result.append("DEB_CXX={}".format(cxx)) elif image_type == "binary": diff --git a/docker/server/Dockerfile b/docker/server/Dockerfile index c3950c58437..c15bd89b646 100644 --- a/docker/server/Dockerfile +++ b/docker/server/Dockerfile @@ -1,7 +1,7 @@ FROM ubuntu:20.04 ARG repository="deb https://repo.clickhouse.tech/deb/stable/ main/" -ARG version=20.9.1.* +ARG version=20.10.1.* ARG gosu_ver=1.10 RUN apt-get update \ diff --git a/docker/test/Dockerfile b/docker/test/Dockerfile index bb09fa1de56..ae588af2459 100644 --- a/docker/test/Dockerfile +++ b/docker/test/Dockerfile @@ -1,7 +1,7 @@ FROM ubuntu:18.04 ARG repository="deb https://repo.clickhouse.tech/deb/stable/ main/" -ARG version=20.9.1.* +ARG version=20.10.1.* RUN apt-get update && \ apt-get install -y apt-transport-https dirmngr && \ diff --git a/docker/test/fasttest/run.sh b/docker/test/fasttest/run.sh index 3317bb06043..ccbadb84f27 100755 --- a/docker/test/fasttest/run.sh +++ b/docker/test/fasttest/run.sh @@ -10,7 +10,7 @@ stage=${stage:-} # A variable to pass additional flags to CMake. # Here we explicitly default it to nothing so that bash doesn't complain about -# it being undefined. Also read it as array so that we can pass an empty list +# it being undefined. Also read it as array so that we can pass an empty list # of additional variable to cmake properly, and it doesn't generate an extra # empty parameter. read -ra FASTTEST_CMAKE_FLAGS <<< "${FASTTEST_CMAKE_FLAGS:-}" @@ -127,6 +127,7 @@ ln -s /usr/share/clickhouse-test/config/access_management.xml /etc/clickhouse-se ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/ ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/ ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/ +ln -s /usr/share/clickhouse-test/config/executable_dictionary.xml /etc/clickhouse-server/ ln -s /usr/share/clickhouse-test/config/macros.xml /etc/clickhouse-server/config.d/ ln -s /usr/share/clickhouse-test/config/disks.xml /etc/clickhouse-server/config.d/ #ln -s /usr/share/clickhouse-test/config/secure_ports.xml /etc/clickhouse-server/config.d/ diff --git a/docker/test/integration/runner/compose/docker_compose_redis.yml b/docker/test/integration/runner/compose/docker_compose_redis.yml index 2c9ace96d0c..72df99ec59b 100644 --- a/docker/test/integration/runner/compose/docker_compose_redis.yml +++ b/docker/test/integration/runner/compose/docker_compose_redis.yml @@ -5,4 +5,4 @@ services: restart: always ports: - 6380:6379 - command: redis-server --requirepass "clickhouse" + command: redis-server --requirepass "clickhouse" --databases 32 diff --git a/docker/test/performance-comparison/compare.sh b/docker/test/performance-comparison/compare.sh index 364e9994ab7..32ea74193b0 100755 --- a/docker/test/performance-comparison/compare.sh +++ b/docker/test/performance-comparison/compare.sh @@ -394,12 +394,24 @@ create table query_run_metrics_denorm engine File(TSV, 'analyze/query-run-metric order by test, query_index, metric_names, version, query_id ; +-- Filter out tests that don't have an even number of runs, to avoid breaking +-- the further calculations. This may happen if there was an error during the +-- test runs, e.g. the server died. It will be reported in test errors, so we +-- don't have to report it again. +create view broken_queries as + select test, query_index + from query_runs + group by test, query_index + having count(*) % 2 != 0 + ; + -- This is for statistical processing with eqmed.sql create table query_run_metrics_for_stats engine File( TSV, -- do not add header -- will parse with grep 'analyze/query-run-metrics-for-stats.tsv') as select test, query_index, 0 run, version, metric_values from query_run_metric_arrays + where (test, query_index) not in broken_queries order by test, query_index, run, version ; @@ -915,13 +927,15 @@ done function report_metrics { +build_log_column_definitions + rm -rf metrics ||: mkdir metrics clickhouse-local --query " create view right_async_metric_log as select * from file('right-async-metric-log.tsv', TSVWithNamesAndTypes, - 'event_date Date, event_time DateTime, name String, value Float64') + '$(cat right-async-metric-log.tsv.columns)') ; -- Use the right log as time reference because it may have higher precision. @@ -930,7 +944,7 @@ create table metrics engine File(TSV, 'metrics/metrics.tsv') as select name metric, r.event_time - min_time event_time, l.value as left, r.value as right from right_async_metric_log r asof join file('left-async-metric-log.tsv', TSVWithNamesAndTypes, - 'event_date Date, event_time DateTime, name String, value Float64') l + '$(cat left-async-metric-log.tsv.columns)') l on l.name = r.name and r.event_time <= l.event_time order by metric, event_time ; diff --git a/docker/test/performance-comparison/config/config.d/perf-comparison-tweaks-config.xml b/docker/test/performance-comparison/config/config.d/perf-comparison-tweaks-config.xml index 6f1726ab36b..bc7ddf1fbbb 100644 --- a/docker/test/performance-comparison/config/config.d/perf-comparison-tweaks-config.xml +++ b/docker/test/performance-comparison/config/config.d/perf-comparison-tweaks-config.xml @@ -1,4 +1,4 @@ - + @@ -22,4 +22,6 @@ 1000000000 10 + + true diff --git a/docker/test/performance-comparison/eqmed.sql b/docker/test/performance-comparison/eqmed.sql index f7f8d6ac40d..139f0758798 100644 --- a/docker/test/performance-comparison/eqmed.sql +++ b/docker/test/performance-comparison/eqmed.sql @@ -8,7 +8,7 @@ select from ( -- quantiles of randomization distributions - select quantileExactForEach(0.999)( + select quantileExactForEach(0.99)( arrayMap(x, y -> abs(x - y), metrics_by_label[1], metrics_by_label[2]) as d ) threshold ---- uncomment to see what the distribution is really like @@ -33,7 +33,7 @@ from -- strip the query away before the join -- it might be several kB long; (select metrics, run, version from table) no_query, -- duplicate input measurements into many virtual runs - numbers(1, 100000) nn + numbers(1, 10000) nn -- for each virtual run, randomly reorder measurements order by virtual_run, rand() ) virtual_runs diff --git a/docker/test/performance-comparison/perf.py b/docker/test/performance-comparison/perf.py index e1476d9aeb4..05e89c9e44c 100755 --- a/docker/test/performance-comparison/perf.py +++ b/docker/test/performance-comparison/perf.py @@ -20,7 +20,7 @@ parser = argparse.ArgumentParser(description='Run performance test.') parser.add_argument('file', metavar='FILE', type=argparse.FileType('r', encoding='utf-8'), nargs=1, help='test description file') parser.add_argument('--host', nargs='*', default=['localhost'], help="Server hostname(s). Corresponds to '--port' options.") parser.add_argument('--port', nargs='*', default=[9000], help="Server port(s). Corresponds to '--host' options.") -parser.add_argument('--runs', type=int, default=int(os.environ.get('CHPC_RUNS', 13)), help='Number of query runs per server. Defaults to CHPC_RUNS environment variable.') +parser.add_argument('--runs', type=int, default=int(os.environ.get('CHPC_RUNS', 7)), help='Number of query runs per server. Defaults to CHPC_RUNS environment variable.') parser.add_argument('--long', action='store_true', help='Do not skip the tests tagged as long.') parser.add_argument('--print-queries', action='store_true', help='Print test queries and exit.') parser.add_argument('--print-settings', action='store_true', help='Print test settings and exit.') diff --git a/docker/test/performance-comparison/report.py b/docker/test/performance-comparison/report.py index 1003a6d0e1a..e9e2ac68c1e 100755 --- a/docker/test/performance-comparison/report.py +++ b/docker/test/performance-comparison/report.py @@ -372,7 +372,7 @@ if args.report == 'main': 'New, s', # 1 'Ratio of speedup (-) or slowdown (+)', # 2 'Relative difference (new − old) / old', # 3 - 'p < 0.001 threshold', # 4 + 'p < 0.01 threshold', # 4 # Failed # 5 'Test', # 6 '#', # 7 @@ -416,7 +416,7 @@ if args.report == 'main': 'Old, s', #0 'New, s', #1 'Relative difference (new - old)/old', #2 - 'p < 0.001 threshold', #3 + 'p < 0.01 threshold', #3 # Failed #4 'Test', #5 '#', #6 @@ -470,12 +470,13 @@ if args.report == 'main': text = tableStart('Test times') text += tableHeader(columns) - nominal_runs = 13 # FIXME pass this as an argument + nominal_runs = 7 # FIXME pass this as an argument total_runs = (nominal_runs + 1) * 2 # one prewarm run, two servers + allowed_average_run_time = allowed_single_run_time + 60 / total_runs; # some allowance for fill/create queries attrs = ['' for c in columns] for r in rows: anchor = f'{currentTableAnchor()}.{r[0]}' - if float(r[6]) > 1.5 * total_runs: + if float(r[6]) > allowed_average_run_time * total_runs: # FIXME should be 15s max -- investigate parallel_insert slow_average_tests += 1 attrs[6] = f'style="background: {color_bad}"' @@ -649,7 +650,7 @@ elif args.report == 'all-queries': 'New, s', #3 'Ratio of speedup (-) or slowdown (+)', #4 'Relative difference (new − old) / old', #5 - 'p < 0.001 threshold', #6 + 'p < 0.01 threshold', #6 'Test', #7 '#', #8 'Query', #9 diff --git a/docker/test/stateless/run.sh b/docker/test/stateless/run.sh index 2ff15ca9c6a..4a9ad891883 100755 --- a/docker/test/stateless/run.sh +++ b/docker/test/stateless/run.sh @@ -24,6 +24,7 @@ ln -s /usr/share/clickhouse-test/config/access_management.xml /etc/clickhouse-se ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/ ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/ ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/ +ln -s /usr/share/clickhouse-test/config/executable_dictionary.xml /etc/clickhouse-server/ ln -s /usr/share/clickhouse-test/config/macros.xml /etc/clickhouse-server/config.d/ ln -s /usr/share/clickhouse-test/config/disks.xml /etc/clickhouse-server/config.d/ ln -s /usr/share/clickhouse-test/config/secure_ports.xml /etc/clickhouse-server/config.d/ diff --git a/docker/test/stateless_unbundled/run.sh b/docker/test/stateless_unbundled/run.sh index 2ff15ca9c6a..4a9ad891883 100755 --- a/docker/test/stateless_unbundled/run.sh +++ b/docker/test/stateless_unbundled/run.sh @@ -24,6 +24,7 @@ ln -s /usr/share/clickhouse-test/config/access_management.xml /etc/clickhouse-se ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/ ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/ ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/ +ln -s /usr/share/clickhouse-test/config/executable_dictionary.xml /etc/clickhouse-server/ ln -s /usr/share/clickhouse-test/config/macros.xml /etc/clickhouse-server/config.d/ ln -s /usr/share/clickhouse-test/config/disks.xml /etc/clickhouse-server/config.d/ ln -s /usr/share/clickhouse-test/config/secure_ports.xml /etc/clickhouse-server/config.d/ diff --git a/docker/test/stateless_with_coverage/run.sh b/docker/test/stateless_with_coverage/run.sh index 64317ee62fd..c3ccb18659b 100755 --- a/docker/test/stateless_with_coverage/run.sh +++ b/docker/test/stateless_with_coverage/run.sh @@ -57,6 +57,7 @@ ln -s /usr/share/clickhouse-test/config/access_management.xml /etc/clickhouse-se ln -s /usr/share/clickhouse-test/config/ints_dictionary.xml /etc/clickhouse-server/ ln -s /usr/share/clickhouse-test/config/strings_dictionary.xml /etc/clickhouse-server/ ln -s /usr/share/clickhouse-test/config/decimals_dictionary.xml /etc/clickhouse-server/ +ln -s /usr/share/clickhouse-test/config/executable_dictionary.xml /etc/clickhouse-server/ ln -s /usr/share/clickhouse-test/config/macros.xml /etc/clickhouse-server/config.d/ ln -s /usr/share/clickhouse-test/config/disks.xml /etc/clickhouse-server/config.d/ ln -s /usr/share/clickhouse-test/config/secure_ports.xml /etc/clickhouse-server/config.d/ diff --git a/docker/test/stress/stress b/docker/test/stress/stress index e8675da1546..60db5ec465c 100755 --- a/docker/test/stress/stress +++ b/docker/test/stress/stress @@ -28,7 +28,7 @@ def get_options(i): options = "" if 0 < i: options += " --order=random" - if i == 1: + if i % 2 == 1: options += " --atomic-db-engine" return options diff --git a/docs/en/interfaces/formats.md b/docs/en/interfaces/formats.md index 9d3965b4a9c..bfe5b6218e4 100644 --- a/docs/en/interfaces/formats.md +++ b/docs/en/interfaces/formats.md @@ -10,42 +10,51 @@ results of a `SELECT`, and to perform `INSERT`s into a file-backed table. The supported formats are: -| Format | Input | Output | -|-----------------------------------------------------------------|-------|--------| -| [TabSeparated](#tabseparated) | ✔ | ✔ | -| [TabSeparatedRaw](#tabseparatedraw) | ✔ | ✔ | -| [TabSeparatedWithNames](#tabseparatedwithnames) | ✔ | ✔ | -| [TabSeparatedWithNamesAndTypes](#tabseparatedwithnamesandtypes) | ✔ | ✔ | -| [Template](#format-template) | ✔ | ✔ | -| [TemplateIgnoreSpaces](#templateignorespaces) | ✔ | ✗ | -| [CSV](#csv) | ✔ | ✔ | -| [CSVWithNames](#csvwithnames) | ✔ | ✔ | -| [CustomSeparated](#format-customseparated) | ✔ | ✔ | -| [Values](#data-format-values) | ✔ | ✔ | -| [Vertical](#vertical) | ✗ | ✔ | -| [VerticalRaw](#verticalraw) | ✗ | ✔ | -| [JSON](#json) | ✗ | ✔ | -| [JSONCompact](#jsoncompact) | ✗ | ✔ | -| [JSONEachRow](#jsoneachrow) | ✔ | ✔ | -| [TSKV](#tskv) | ✔ | ✔ | -| [Pretty](#pretty) | ✗ | ✔ | -| [PrettyCompact](#prettycompact) | ✗ | ✔ | -| [PrettyCompactMonoBlock](#prettycompactmonoblock) | ✗ | ✔ | -| [PrettyNoEscapes](#prettynoescapes) | ✗ | ✔ | -| [PrettySpace](#prettyspace) | ✗ | ✔ | -| [Protobuf](#protobuf) | ✔ | ✔ | -| [Avro](#data-format-avro) | ✔ | ✔ | -| [AvroConfluent](#data-format-avro-confluent) | ✔ | ✗ | -| [Parquet](#data-format-parquet) | ✔ | ✔ | -| [Arrow](#data-format-arrow) | ✔ | ✔ | -| [ArrowStream](#data-format-arrow-stream) | ✔ | ✔ | -| [ORC](#data-format-orc) | ✔ | ✗ | -| [RowBinary](#rowbinary) | ✔ | ✔ | -| [RowBinaryWithNamesAndTypes](#rowbinarywithnamesandtypes) | ✔ | ✔ | -| [Native](#native) | ✔ | ✔ | -| [Null](#null) | ✗ | ✔ | -| [XML](#xml) | ✗ | ✔ | -| [CapnProto](#capnproto) | ✔ | ✗ | +| Format | Input | Output | +|-----------------------------------------------------------------------------------------|-------|--------| +| [TabSeparated](#tabseparated) | ✔ | ✔ | +| [TabSeparatedRaw](#tabseparatedraw) | ✔ | ✔ | +| [TabSeparatedWithNames](#tabseparatedwithnames) | ✔ | ✔ | +| [TabSeparatedWithNamesAndTypes](#tabseparatedwithnamesandtypes) | ✔ | ✔ | +| [Template](#format-template) | ✔ | ✔ | +| [TemplateIgnoreSpaces](#templateignorespaces) | ✔ | ✗ | +| [CSV](#csv) | ✔ | ✔ | +| [CSVWithNames](#csvwithnames) | ✔ | ✔ | +| [CustomSeparated](#format-customseparated) | ✔ | ✔ | +| [Values](#data-format-values) | ✔ | ✔ | +| [Vertical](#vertical) | ✗ | ✔ | +| [VerticalRaw](#verticalraw) | ✗ | ✔ | +| [JSON](#json) | ✗ | ✔ | +| [JSONString](#jsonstring) | ✗ | ✔ | +| [JSONCompact](#jsoncompact) | ✗ | ✔ | +| [JSONCompactString](#jsoncompactstring) | ✗ | ✔ | +| [JSONEachRow](#jsoneachrow) | ✔ | ✔ | +| [JSONEachRowWithProgress](#jsoneachrowwithprogress) | ✗ | ✔ | +| [JSONStringEachRow](#jsonstringeachrow) | ✔ | ✔ | +| [JSONStringEachRowWithProgress](#jsonstringeachrowwithprogress) | ✗ | ✔ | +| [JSONCompactEachRow](#jsoncompacteachrow) | ✔ | ✔ | +| [JSONCompactEachRowWithNamesAndTypes](#jsoncompacteachrowwithnamesandtypes) | ✔ | ✔ | +| [JSONCompactStringEachRow](#jsoncompactstringeachrow) | ✔ | ✔ | +| [JSONCompactStringEachRowWithNamesAndTypes](#jsoncompactstringeachrowwithnamesandtypes) | ✔ | ✔ | +| [TSKV](#tskv) | ✔ | ✔ | +| [Pretty](#pretty) | ✗ | ✔ | +| [PrettyCompact](#prettycompact) | ✗ | ✔ | +| [PrettyCompactMonoBlock](#prettycompactmonoblock) | ✗ | ✔ | +| [PrettyNoEscapes](#prettynoescapes) | ✗ | ✔ | +| [PrettySpace](#prettyspace) | ✗ | ✔ | +| [Protobuf](#protobuf) | ✔ | ✔ | +| [Avro](#data-format-avro) | ✔ | ✔ | +| [AvroConfluent](#data-format-avro-confluent) | ✔ | ✗ | +| [Parquet](#data-format-parquet) | ✔ | ✔ | +| [Arrow](#data-format-arrow) | ✔ | ✔ | +| [ArrowStream](#data-format-arrow-stream) | ✔ | ✔ | +| [ORC](#data-format-orc) | ✔ | ✗ | +| [RowBinary](#rowbinary) | ✔ | ✔ | +| [RowBinaryWithNamesAndTypes](#rowbinarywithnamesandtypes) | ✔ | ✔ | +| [Native](#native) | ✔ | ✔ | +| [Null](#null) | ✗ | ✔ | +| [XML](#xml) | ✗ | ✔ | +| [CapnProto](#capnproto) | ✔ | ✗ | You can control some format processing parameters with the ClickHouse settings. For more information read the [Settings](../operations/settings/settings.md) section. @@ -392,62 +401,41 @@ SELECT SearchPhrase, count() AS c FROM test.hits GROUP BY SearchPhrase WITH TOTA "meta": [ { - "name": "SearchPhrase", + "name": "'hello'", "type": "String" }, { - "name": "c", + "name": "multiply(42, number)", "type": "UInt64" + }, + { + "name": "range(5)", + "type": "Array(UInt8)" } ], "data": [ { - "SearchPhrase": "", - "c": "8267016" + "'hello'": "hello", + "multiply(42, number)": "0", + "range(5)": [0,1,2,3,4] }, { - "SearchPhrase": "bathroom interior design", - "c": "2166" + "'hello'": "hello", + "multiply(42, number)": "42", + "range(5)": [0,1,2,3,4] }, { - "SearchPhrase": "yandex", - "c": "1655" - }, - { - "SearchPhrase": "spring 2014 fashion", - "c": "1549" - }, - { - "SearchPhrase": "freeform photos", - "c": "1480" + "'hello'": "hello", + "multiply(42, number)": "84", + "range(5)": [0,1,2,3,4] } ], - "totals": - { - "SearchPhrase": "", - "c": "8873898" - }, + "rows": 3, - "extremes": - { - "min": - { - "SearchPhrase": "", - "c": "1480" - }, - "max": - { - "SearchPhrase": "", - "c": "8267016" - } - }, - - "rows": 5, - - "rows_before_limit_at_least": 141137 + "rows_before_limit_at_least": 3 } ``` @@ -468,63 +456,165 @@ ClickHouse supports [NULL](../sql-reference/syntax.md), which is displayed as `n See also the [JSONEachRow](#jsoneachrow) format. +## JSONString {#jsonstring} + +Differs from JSON only in that data fields are output in strings, not in typed json values. + +Example: + +```json +{ + "meta": + [ + { + "name": "'hello'", + "type": "String" + }, + { + "name": "multiply(42, number)", + "type": "UInt64" + }, + { + "name": "range(5)", + "type": "Array(UInt8)" + } + ], + + "data": + [ + { + "'hello'": "hello", + "multiply(42, number)": "0", + "range(5)": "[0,1,2,3,4]" + }, + { + "'hello'": "hello", + "multiply(42, number)": "42", + "range(5)": "[0,1,2,3,4]" + }, + { + "'hello'": "hello", + "multiply(42, number)": "84", + "range(5)": "[0,1,2,3,4]" + } + ], + + "rows": 3, + + "rows_before_limit_at_least": 3 +} +``` + ## JSONCompact {#jsoncompact} +## JSONCompactString {#jsoncompactstring} Differs from JSON only in that data rows are output in arrays, not in objects. Example: ``` json +// JSONCompact { "meta": [ { - "name": "SearchPhrase", + "name": "'hello'", "type": "String" }, { - "name": "c", + "name": "multiply(42, number)", "type": "UInt64" + }, + { + "name": "range(5)", + "type": "Array(UInt8)" } ], "data": [ - ["", "8267016"], - ["bathroom interior design", "2166"], - ["yandex", "1655"], - ["fashion trends spring 2014", "1549"], - ["freeform photo", "1480"] + ["hello", "0", [0,1,2,3,4]], + ["hello", "42", [0,1,2,3,4]], + ["hello", "84", [0,1,2,3,4]] ], - "totals": ["","8873898"], + "rows": 3, - "extremes": - { - "min": ["","1480"], - "max": ["","8267016"] - }, - - "rows": 5, - - "rows_before_limit_at_least": 141137 + "rows_before_limit_at_least": 3 } ``` -This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table). -See also the `JSONEachRow` format. +```json +// JSONCompactString +{ + "meta": + [ + { + "name": "'hello'", + "type": "String" + }, + { + "name": "multiply(42, number)", + "type": "UInt64" + }, + { + "name": "range(5)", + "type": "Array(UInt8)" + } + ], -## JSONEachRow {#jsoneachrow} + "data": + [ + ["hello", "0", "[0,1,2,3,4]"], + ["hello", "42", "[0,1,2,3,4]"], + ["hello", "84", "[0,1,2,3,4]"] + ], -When using this format, ClickHouse outputs rows as separated, newline-delimited JSON objects, but the data as a whole is not valid JSON. + "rows": 3, -``` json -{"SearchPhrase":"curtain designs","count()":"1064"} -{"SearchPhrase":"baku","count()":"1000"} -{"SearchPhrase":"","count()":"8267016"} + "rows_before_limit_at_least": 3 +} ``` -When inserting the data, you should provide a separate JSON object for each row. +## JSONEachRow {#jsoneachrow} +## JSONStringEachRow {#jsonstringeachrow} +## JSONCompactEachRow {#jsoncompacteachrow} +## JSONCompactStringEachRow {#jsoncompactstringeachrow} + +When using these formats, ClickHouse outputs rows as separated, newline-delimited JSON values, but the data as a whole is not valid JSON. + +``` json +{"some_int":42,"some_str":"hello","some_tuple":[1,"a"]} // JSONEachRow +[42,"hello",[1,"a"]] // JSONCompactEachRow +["42","hello","(2,'a')"] // JSONCompactStringsEachRow +``` + +When inserting the data, you should provide a separate JSON value for each row. + +## JSONEachRowWithProgress {#jsoneachrowwithprogress} +## JSONStringEachRowWithProgress {#jsonstringeachrowwithprogress} + +Differs from JSONEachRow/JSONStringEachRow in that ClickHouse will also yield progress information as JSON objects. + +```json +{"row":{"'hello'":"hello","multiply(42, number)":"0","range(5)":[0,1,2,3,4]}} +{"row":{"'hello'":"hello","multiply(42, number)":"42","range(5)":[0,1,2,3,4]}} +{"row":{"'hello'":"hello","multiply(42, number)":"84","range(5)":[0,1,2,3,4]}} +{"progress":{"read_rows":"3","read_bytes":"24","written_rows":"0","written_bytes":"0","total_rows_to_read":"3"}} +``` + +## JSONCompactEachRowWithNamesAndTypes {#jsoncompacteachrowwithnamesandtypes} +## JSONCompactStringEachRowWithNamesAndTypes {#jsoncompactstringeachrowwithnamesandtypes} + +Differs from JSONCompactEachRow/JSONCompactStringEachRow in that the column names and types are written as the first two rows. + +```json +["'hello'", "multiply(42, number)", "range(5)"] +["String", "UInt64", "Array(UInt8)"] +["hello", "0", [0,1,2,3,4]] +["hello", "42", [0,1,2,3,4]] +["hello", "84", [0,1,2,3,4]] +``` ### Inserting Data {#inserting-data} diff --git a/docs/en/operations/system-tables/asynchronous_metric_log.md b/docs/en/operations/system-tables/asynchronous_metric_log.md index 6b1d71e1ca6..75607cc30b0 100644 --- a/docs/en/operations/system-tables/asynchronous_metric_log.md +++ b/docs/en/operations/system-tables/asynchronous_metric_log.md @@ -6,6 +6,7 @@ Columns: - `event_date` ([Date](../../sql-reference/data-types/date.md)) — Event date. - `event_time` ([DateTime](../../sql-reference/data-types/datetime.md)) — Event time. +- `event_time_microseconds` ([DateTime64](../../sql-reference/data-types/datetime64.md)) — Event time with microseconds resolution. - `name` ([String](../../sql-reference/data-types/string.md)) — Metric name. - `value` ([Float64](../../sql-reference/data-types/float.md)) — Metric value. @@ -16,18 +17,18 @@ SELECT * FROM system.asynchronous_metric_log LIMIT 10 ``` ``` text -┌─event_date─┬──────────event_time─┬─name─────────────────────────────────────┬────value─┐ -│ 2020-06-22 │ 2020-06-22 06:57:30 │ jemalloc.arenas.all.pmuzzy │ 0 │ -│ 2020-06-22 │ 2020-06-22 06:57:30 │ jemalloc.arenas.all.pdirty │ 4214 │ -│ 2020-06-22 │ 2020-06-22 06:57:30 │ jemalloc.background_thread.run_intervals │ 0 │ -│ 2020-06-22 │ 2020-06-22 06:57:30 │ jemalloc.background_thread.num_runs │ 0 │ -│ 2020-06-22 │ 2020-06-22 06:57:30 │ jemalloc.retained │ 17657856 │ -│ 2020-06-22 │ 2020-06-22 06:57:30 │ jemalloc.mapped │ 71471104 │ -│ 2020-06-22 │ 2020-06-22 06:57:30 │ jemalloc.resident │ 61538304 │ -│ 2020-06-22 │ 2020-06-22 06:57:30 │ jemalloc.metadata │ 6199264 │ -│ 2020-06-22 │ 2020-06-22 06:57:30 │ jemalloc.allocated │ 38074336 │ -│ 2020-06-22 │ 2020-06-22 06:57:30 │ jemalloc.epoch │ 2 │ -└────────────┴─────────────────────┴──────────────────────────────────────────┴──────────┘ +┌─event_date─┬──────────event_time─┬────event_time_microseconds─┬─name─────────────────────────────────────┬─────value─┐ +│ 2020-09-05 │ 2020-09-05 15:56:30 │ 2020-09-05 15:56:30.025227 │ CPUFrequencyMHz_0 │ 2120.9 │ +│ 2020-09-05 │ 2020-09-05 15:56:30 │ 2020-09-05 15:56:30.025227 │ jemalloc.arenas.all.pmuzzy │ 743 │ +│ 2020-09-05 │ 2020-09-05 15:56:30 │ 2020-09-05 15:56:30.025227 │ jemalloc.arenas.all.pdirty │ 26288 │ +│ 2020-09-05 │ 2020-09-05 15:56:30 │ 2020-09-05 15:56:30.025227 │ jemalloc.background_thread.run_intervals │ 0 │ +│ 2020-09-05 │ 2020-09-05 15:56:30 │ 2020-09-05 15:56:30.025227 │ jemalloc.background_thread.num_runs │ 0 │ +│ 2020-09-05 │ 2020-09-05 15:56:30 │ 2020-09-05 15:56:30.025227 │ jemalloc.retained │ 60694528 │ +│ 2020-09-05 │ 2020-09-05 15:56:30 │ 2020-09-05 15:56:30.025227 │ jemalloc.mapped │ 303161344 │ +│ 2020-09-05 │ 2020-09-05 15:56:30 │ 2020-09-05 15:56:30.025227 │ jemalloc.resident │ 260931584 │ +│ 2020-09-05 │ 2020-09-05 15:56:30 │ 2020-09-05 15:56:30.025227 │ jemalloc.metadata │ 12079488 │ +│ 2020-09-05 │ 2020-09-05 15:56:30 │ 2020-09-05 15:56:30.025227 │ jemalloc.allocated │ 133756128 │ +└────────────┴─────────────────────┴────────────────────────────┴──────────────────────────────────────────┴───────────┘ ``` **See Also** diff --git a/docs/en/operations/system-tables/merges.md b/docs/en/operations/system-tables/merges.md index fb98a2b9e34..3e712e2962c 100644 --- a/docs/en/operations/system-tables/merges.md +++ b/docs/en/operations/system-tables/merges.md @@ -10,12 +10,16 @@ Columns: - `progress` (Float64) — The percentage of completed work from 0 to 1. - `num_parts` (UInt64) — The number of pieces to be merged. - `result_part_name` (String) — The name of the part that will be formed as the result of merging. -- `is_mutation` (UInt8) - 1 if this process is a part mutation. +- `is_mutation` (UInt8) — 1 if this process is a part mutation. - `total_size_bytes_compressed` (UInt64) — The total size of the compressed data in the merged chunks. - `total_size_marks` (UInt64) — The total number of marks in the merged parts. - `bytes_read_uncompressed` (UInt64) — Number of bytes read, uncompressed. - `rows_read` (UInt64) — Number of rows read. - `bytes_written_uncompressed` (UInt64) — Number of bytes written, uncompressed. - `rows_written` (UInt64) — Number of rows written. +- `memory_usage` (UInt64) — Memory consumption of the merge process. +- `thread_id` (UInt64) — Thread ID of the merge process. +- `merge_type` — The type of current merge. Empty if it's an mutation. +- `merge_algorithm` — The algorithm used in current merge. Empty if it's an mutation. [Original article](https://clickhouse.tech/docs/en/operations/system_tables/merges) diff --git a/docs/en/operations/system-tables/metric_log.md b/docs/en/operations/system-tables/metric_log.md index 9ccf61291d2..063fe81923b 100644 --- a/docs/en/operations/system-tables/metric_log.md +++ b/docs/en/operations/system-tables/metric_log.md @@ -23,28 +23,28 @@ SELECT * FROM system.metric_log LIMIT 1 FORMAT Vertical; ``` text Row 1: ────── -event_date: 2020-02-18 -event_time: 2020-02-18 07:15:33 -milliseconds: 554 -ProfileEvent_Query: 0 -ProfileEvent_SelectQuery: 0 -ProfileEvent_InsertQuery: 0 -ProfileEvent_FileOpen: 0 -ProfileEvent_Seek: 0 -ProfileEvent_ReadBufferFromFileDescriptorRead: 1 -ProfileEvent_ReadBufferFromFileDescriptorReadFailed: 0 -ProfileEvent_ReadBufferFromFileDescriptorReadBytes: 0 -ProfileEvent_WriteBufferFromFileDescriptorWrite: 1 -ProfileEvent_WriteBufferFromFileDescriptorWriteFailed: 0 -ProfileEvent_WriteBufferFromFileDescriptorWriteBytes: 56 +event_date: 2020-09-05 +event_time: 2020-09-05 16:22:33 +event_time_microseconds: 2020-09-05 16:22:33.196807 +milliseconds: 196 +ProfileEvent_Query: 0 +ProfileEvent_SelectQuery: 0 +ProfileEvent_InsertQuery: 0 +ProfileEvent_FailedQuery: 0 +ProfileEvent_FailedSelectQuery: 0 ... -CurrentMetric_Query: 0 -CurrentMetric_Merge: 0 -CurrentMetric_PartMutation: 0 -CurrentMetric_ReplicatedFetch: 0 -CurrentMetric_ReplicatedSend: 0 -CurrentMetric_ReplicatedChecks: 0 ... +CurrentMetric_Revision: 54439 +CurrentMetric_VersionInteger: 20009001 +CurrentMetric_RWLockWaitingReaders: 0 +CurrentMetric_RWLockWaitingWriters: 0 +CurrentMetric_RWLockActiveReaders: 0 +CurrentMetric_RWLockActiveWriters: 0 +CurrentMetric_GlobalThread: 74 +CurrentMetric_GlobalThreadActive: 26 +CurrentMetric_LocalThread: 0 +CurrentMetric_LocalThreadActive: 0 +CurrentMetric_DistributedFilesToInsert: 0 ``` **See also** diff --git a/docs/en/operations/system-tables/stack_trace.md b/docs/en/operations/system-tables/stack_trace.md index b1714a93a20..44b13047cc3 100644 --- a/docs/en/operations/system-tables/stack_trace.md +++ b/docs/en/operations/system-tables/stack_trace.md @@ -82,8 +82,8 @@ res: /lib/x86_64-linux-gnu/libc-2.27.so - [Introspection Functions](../../sql-reference/functions/introspection.md) — Which introspection functions are available and how to use them. - [system.trace_log](../system-tables/trace_log.md) — Contains stack traces collected by the sampling query profiler. -- [arrayMap](../../sql-reference/functions/higher-order-functions.md#higher_order_functions-array-map) — Description and usage example of the `arrayMap` function. -- [arrayFilter](../../sql-reference/functions/higher-order-functions.md#higher_order_functions-array-filter) — Description and usage example of the `arrayFilter` function. +- [arrayMap](../../sql-reference/functions/array-functions.md#array-map) — Description and usage example of the `arrayMap` function. +- [arrayFilter](../../sql-reference/functions/array-functions.md#array-filter) — Description and usage example of the `arrayFilter` function. [Original article](https://clickhouse.tech/docs/en/operations/system-tables/stack_trace) diff --git a/docs/en/operations/utilities/clickhouse-benchmark.md b/docs/en/operations/utilities/clickhouse-benchmark.md index ab67ca197dd..f948630b7bb 100644 --- a/docs/en/operations/utilities/clickhouse-benchmark.md +++ b/docs/en/operations/utilities/clickhouse-benchmark.md @@ -38,7 +38,7 @@ clickhouse-benchmark [keys] < queries_file - `-d N`, `--delay=N` — Interval in seconds between intermediate reports (set 0 to disable reports). Default value: 1. - `-h WORD`, `--host=WORD` — Server host. Default value: `localhost`. For the [comparison mode](#clickhouse-benchmark-comparison-mode) you can use multiple `-h` keys. - `-p N`, `--port=N` — Server port. Default value: 9000. For the [comparison mode](#clickhouse-benchmark-comparison-mode) you can use multiple `-p` keys. -- `-i N`, `--iterations=N` — Total number of queries. Default value: 0. +- `-i N`, `--iterations=N` — Total number of queries. Default value: 0 (repeat forever). - `-r`, `--randomize` — Random order of queries execution if there is more then one input query. - `-s`, `--secure` — Using TLS connection. - `-t N`, `--timelimit=N` — Time limit in seconds. `clickhouse-benchmark` stops sending queries when the specified time limit is reached. Default value: 0 (time limit disabled). diff --git a/docs/en/sql-reference/data-types/tuple.md b/docs/en/sql-reference/data-types/tuple.md index 60adb942925..e396006d957 100644 --- a/docs/en/sql-reference/data-types/tuple.md +++ b/docs/en/sql-reference/data-types/tuple.md @@ -7,7 +7,7 @@ toc_title: Tuple(T1, T2, ...) A tuple of elements, each having an individual [type](../../sql-reference/data-types/index.md#data_types). -Tuples are used for temporary column grouping. Columns can be grouped when an IN expression is used in a query, and for specifying certain formal parameters of lambda functions. For more information, see the sections [IN operators](../../sql-reference/operators/in.md) and [Higher order functions](../../sql-reference/functions/higher-order-functions.md). +Tuples are used for temporary column grouping. Columns can be grouped when an IN expression is used in a query, and for specifying certain formal parameters of lambda functions. For more information, see the sections [IN operators](../../sql-reference/operators/in.md) and [Higher order functions](../../sql-reference/functions/index.md#higher-order-functions). Tuples can be the result of a query. In this case, for text formats other than JSON, values are comma-separated in brackets. In JSON formats, tuples are output as arrays (in square brackets). diff --git a/docs/en/sql-reference/functions/arithmetic-functions.md b/docs/en/sql-reference/functions/arithmetic-functions.md index 5d89d6d335b..c4b151f59ce 100644 --- a/docs/en/sql-reference/functions/arithmetic-functions.md +++ b/docs/en/sql-reference/functions/arithmetic-functions.md @@ -1,5 +1,5 @@ --- -toc_priority: 35 +toc_priority: 34 toc_title: Arithmetic --- diff --git a/docs/en/sql-reference/functions/array-functions.md b/docs/en/sql-reference/functions/array-functions.md index 91ecc963b1f..82700a109b5 100644 --- a/docs/en/sql-reference/functions/array-functions.md +++ b/docs/en/sql-reference/functions/array-functions.md @@ -1,9 +1,9 @@ --- -toc_priority: 46 +toc_priority: 35 toc_title: Arrays --- -# Functions for Working with Arrays {#functions-for-working-with-arrays} +# Array Functions {#functions-for-working-with-arrays} ## empty {#function-empty} @@ -241,6 +241,12 @@ SELECT indexOf([1, 3, NULL, NULL], NULL) Elements set to `NULL` are handled as normal values. +## arrayCount(\[func,\] arr1, …) {#array-count} + +Returns the number of elements in the arr array for which func returns something other than 0. If ‘func’ is not specified, it returns the number of non-zero elements in the array. + +Note that the `arrayCount` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You can pass a lambda function to it as the first argument. + ## countEqual(arr, x) {#countequalarr-x} Returns the number of elements in the array equal to x. Equivalent to arrayCount (elem -\> elem = x, arr). @@ -568,7 +574,7 @@ SELECT arraySort([1, nan, 2, NULL, 3, nan, -4, NULL, inf, -inf]); - `NaN` values are right before `NULL`. - `Inf` values are right before `NaN`. -Note that `arraySort` is a [higher-order function](../../sql-reference/functions/higher-order-functions.md). You can pass a lambda function to it as the first argument. In this case, sorting order is determined by the result of the lambda function applied to the elements of the array. +Note that `arraySort` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You can pass a lambda function to it as the first argument. In this case, sorting order is determined by the result of the lambda function applied to the elements of the array. Let’s consider the following example: @@ -668,7 +674,7 @@ SELECT arrayReverseSort([1, nan, 2, NULL, 3, nan, -4, NULL, inf, -inf]) as res; - `NaN` values are right before `NULL`. - `-Inf` values are right before `NaN`. -Note that the `arrayReverseSort` is a [higher-order function](../../sql-reference/functions/higher-order-functions.md). You can pass a lambda function to it as the first argument. Example is shown below. +Note that the `arrayReverseSort` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You can pass a lambda function to it as the first argument. Example is shown below. ``` sql SELECT arrayReverseSort((x) -> -x, [1, 2, 3]) as res; @@ -1120,7 +1126,205 @@ Result: ``` text ┌─arrayAUC([0.1, 0.4, 0.35, 0.8], [0, 0, 1, 1])─┐ │ 0.75 │ -└────────────────────────────────────────---──┘ +└───────────────────────────────────────────────┘ ``` +## arrayMap(func, arr1, …) {#array-map} + +Returns an array obtained from the original application of the `func` function to each element in the `arr` array. + +Examples: + +``` sql +SELECT arrayMap(x -> (x + 2), [1, 2, 3]) as res; +``` + +``` text +┌─res─────┐ +│ [3,4,5] │ +└─────────┘ +``` + +The following example shows how to create a tuple of elements from different arrays: + +``` sql +SELECT arrayMap((x, y) -> (x, y), [1, 2, 3], [4, 5, 6]) AS res +``` + +``` text +┌─res─────────────────┐ +│ [(1,4),(2,5),(3,6)] │ +└─────────────────────┘ +``` + +Note that the `arrayMap` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it can’t be omitted. + +## arrayFilter(func, arr1, …) {#array-filter} + +Returns an array containing only the elements in `arr1` for which `func` returns something other than 0. + +Examples: + +``` sql +SELECT arrayFilter(x -> x LIKE '%World%', ['Hello', 'abc World']) AS res +``` + +``` text +┌─res───────────┐ +│ ['abc World'] │ +└───────────────┘ +``` + +``` sql +SELECT + arrayFilter( + (i, x) -> x LIKE '%World%', + arrayEnumerate(arr), + ['Hello', 'abc World'] AS arr) + AS res +``` + +``` text +┌─res─┐ +│ [2] │ +└─────┘ +``` + +Note that the `arrayFilter` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it can’t be omitted. + +## arrayFill(func, arr1, …) {#array-fill} + +Scan through `arr1` from the first element to the last element and replace `arr1[i]` by `arr1[i - 1]` if `func` returns 0. The first element of `arr1` will not be replaced. + +Examples: + +``` sql +SELECT arrayFill(x -> not isNull(x), [1, null, 3, 11, 12, null, null, 5, 6, 14, null, null]) AS res +``` + +``` text +┌─res──────────────────────────────┐ +│ [1,1,3,11,12,12,12,5,6,14,14,14] │ +└──────────────────────────────────┘ +``` + +Note that the `arrayFill` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it can’t be omitted. + +## arrayReverseFill(func, arr1, …) {#array-reverse-fill} + +Scan through `arr1` from the last element to the first element and replace `arr1[i]` by `arr1[i + 1]` if `func` returns 0. The last element of `arr1` will not be replaced. + +Examples: + +``` sql +SELECT arrayReverseFill(x -> not isNull(x), [1, null, 3, 11, 12, null, null, 5, 6, 14, null, null]) AS res +``` + +``` text +┌─res────────────────────────────────┐ +│ [1,3,3,11,12,5,5,5,6,14,NULL,NULL] │ +└────────────────────────────────────┘ +``` + +Note that the `arrayReverseFilter` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it can’t be omitted. + +## arraySplit(func, arr1, …) {#array-split} + +Split `arr1` into multiple arrays. When `func` returns something other than 0, the array will be split on the left hand side of the element. The array will not be split before the first element. + +Examples: + +``` sql +SELECT arraySplit((x, y) -> y, [1, 2, 3, 4, 5], [1, 0, 0, 1, 0]) AS res +``` + +``` text +┌─res─────────────┐ +│ [[1,2,3],[4,5]] │ +└─────────────────┘ +``` + +Note that the `arraySplit` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it can’t be omitted. + +## arrayReverseSplit(func, arr1, …) {#array-reverse-split} + +Split `arr1` into multiple arrays. When `func` returns something other than 0, the array will be split on the right hand side of the element. The array will not be split after the last element. + +Examples: + +``` sql +SELECT arrayReverseSplit((x, y) -> y, [1, 2, 3, 4, 5], [1, 0, 0, 1, 0]) AS res +``` + +``` text +┌─res───────────────┐ +│ [[1],[2,3,4],[5]] │ +└───────────────────┘ +``` + +Note that the `arrayReverseSplit` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it can’t be omitted. + +## arrayExists(\[func,\] arr1, …) {#arrayexistsfunc-arr1} + +Returns 1 if there is at least one element in `arr` for which `func` returns something other than 0. Otherwise, it returns 0. + +Note that the `arrayExists` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You can pass a lambda function to it as the first argument. + +## arrayAll(\[func,\] arr1, …) {#arrayallfunc-arr1} + +Returns 1 if `func` returns something other than 0 for all the elements in `arr`. Otherwise, it returns 0. + +Note that the `arrayAll` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You can pass a lambda function to it as the first argument. + +## arrayFirst(func, arr1, …) {#array-first} + +Returns the first element in the `arr1` array for which `func` returns something other than 0. + +Note that the `arrayFirst` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it can’t be omitted. + +## arrayFirstIndex(func, arr1, …) {#array-first-index} + +Returns the index of the first element in the `arr1` array for which `func` returns something other than 0. + +Note that the `arrayFirstIndex` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You must pass a lambda function to it as the first argument, and it can’t be omitted. + +## arraySum(\[func,\] arr1, …) {#array-sum} + +Returns the sum of the `func` values. If the function is omitted, it just returns the sum of the array elements. + +Note that the `arraySum` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You can pass a lambda function to it as the first argument. + +## arrayCumSum(\[func,\] arr1, …) {#arraycumsumfunc-arr1} + +Returns an array of partial sums of elements in the source array (a running sum). If the `func` function is specified, then the values of the array elements are converted by this function before summing. + +Example: + +``` sql +SELECT arrayCumSum([1, 1, 1, 1]) AS res +``` + +``` text +┌─res──────────┐ +│ [1, 2, 3, 4] │ +└──────────────┘ +``` + +Note that the `arrayCumSum` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You can pass a lambda function to it as the first argument. + +## arrayCumSumNonNegative(arr) {#arraycumsumnonnegativearr} + +Same as `arrayCumSum`, returns an array of partial sums of elements in the source array (a running sum). Different `arrayCumSum`, when then returned value contains a value less than zero, the value is replace with zero and the subsequent calculation is performed with zero parameters. For example: + +``` sql +SELECT arrayCumSumNonNegative([1, 1, -4, 1]) AS res +``` + +``` text +┌─res───────┐ +│ [1,2,0,1] │ +└───────────┘ +``` +Note that the `arraySumNonNegative` is a [higher-order function](../../sql-reference/functions/index.md#higher-order-functions). You can pass a lambda function to it as the first argument. + [Original article](https://clickhouse.tech/docs/en/query_language/functions/array_functions/) diff --git a/docs/en/sql-reference/functions/ext-dict-functions.md b/docs/en/sql-reference/functions/ext-dict-functions.md index 49b1c2dda2c..e0ecdd74fad 100644 --- a/docs/en/sql-reference/functions/ext-dict-functions.md +++ b/docs/en/sql-reference/functions/ext-dict-functions.md @@ -3,6 +3,9 @@ toc_priority: 58 toc_title: External Dictionaries --- +!!! attention "Attention" + `dict_name` parameter must be fully qualified for dictionaries created with DDL queries. Eg. `.`. + # Functions for Working with External Dictionaries {#ext_dict_functions} For information on connecting and configuring external dictionaries, see [External dictionaries](../../sql-reference/dictionaries/external-dictionaries/external-dicts.md). diff --git a/docs/en/sql-reference/functions/higher-order-functions.md b/docs/en/sql-reference/functions/higher-order-functions.md deleted file mode 100644 index 484bdaa12e6..00000000000 --- a/docs/en/sql-reference/functions/higher-order-functions.md +++ /dev/null @@ -1,262 +0,0 @@ ---- -toc_priority: 57 -toc_title: Higher-Order ---- - -# Higher-order Functions {#higher-order-functions} - -## `->` operator, lambda(params, expr) function {#operator-lambdaparams-expr-function} - -Allows describing a lambda function for passing to a higher-order function. The left side of the arrow has a formal parameter, which is any ID, or multiple formal parameters – any IDs in a tuple. The right side of the arrow has an expression that can use these formal parameters, as well as any table columns. - -Examples: `x -> 2 * x, str -> str != Referer.` - -Higher-order functions can only accept lambda functions as their functional argument. - -A lambda function that accepts multiple arguments can be passed to a higher-order function. In this case, the higher-order function is passed several arrays of identical length that these arguments will correspond to. - -For some functions, such as [arrayCount](#higher_order_functions-array-count) or [arraySum](#higher_order_functions-array-count), the first argument (the lambda function) can be omitted. In this case, identical mapping is assumed. - -A lambda function can’t be omitted for the following functions: - -- [arrayMap](#higher_order_functions-array-map) -- [arrayFilter](#higher_order_functions-array-filter) -- [arrayFill](#higher_order_functions-array-fill) -- [arrayReverseFill](#higher_order_functions-array-reverse-fill) -- [arraySplit](#higher_order_functions-array-split) -- [arrayReverseSplit](#higher_order_functions-array-reverse-split) -- [arrayFirst](#higher_order_functions-array-first) -- [arrayFirstIndex](#higher_order_functions-array-first-index) - -### arrayMap(func, arr1, …) {#higher_order_functions-array-map} - -Returns an array obtained from the original application of the `func` function to each element in the `arr` array. - -Examples: - -``` sql -SELECT arrayMap(x -> (x + 2), [1, 2, 3]) as res; -``` - -``` text -┌─res─────┐ -│ [3,4,5] │ -└─────────┘ -``` - -The following example shows how to create a tuple of elements from different arrays: - -``` sql -SELECT arrayMap((x, y) -> (x, y), [1, 2, 3], [4, 5, 6]) AS res -``` - -``` text -┌─res─────────────────┐ -│ [(1,4),(2,5),(3,6)] │ -└─────────────────────┘ -``` - -Note that the first argument (lambda function) can’t be omitted in the `arrayMap` function. - -### arrayFilter(func, arr1, …) {#higher_order_functions-array-filter} - -Returns an array containing only the elements in `arr1` for which `func` returns something other than 0. - -Examples: - -``` sql -SELECT arrayFilter(x -> x LIKE '%World%', ['Hello', 'abc World']) AS res -``` - -``` text -┌─res───────────┐ -│ ['abc World'] │ -└───────────────┘ -``` - -``` sql -SELECT - arrayFilter( - (i, x) -> x LIKE '%World%', - arrayEnumerate(arr), - ['Hello', 'abc World'] AS arr) - AS res -``` - -``` text -┌─res─┐ -│ [2] │ -└─────┘ -``` - -Note that the first argument (lambda function) can’t be omitted in the `arrayFilter` function. - -### arrayFill(func, arr1, …) {#higher_order_functions-array-fill} - -Scan through `arr1` from the first element to the last element and replace `arr1[i]` by `arr1[i - 1]` if `func` returns 0. The first element of `arr1` will not be replaced. - -Examples: - -``` sql -SELECT arrayFill(x -> not isNull(x), [1, null, 3, 11, 12, null, null, 5, 6, 14, null, null]) AS res -``` - -``` text -┌─res──────────────────────────────┐ -│ [1,1,3,11,12,12,12,5,6,14,14,14] │ -└──────────────────────────────────┘ -``` - -Note that the first argument (lambda function) can’t be omitted in the `arrayFill` function. - -### arrayReverseFill(func, arr1, …) {#higher_order_functions-array-reverse-fill} - -Scan through `arr1` from the last element to the first element and replace `arr1[i]` by `arr1[i + 1]` if `func` returns 0. The last element of `arr1` will not be replaced. - -Examples: - -``` sql -SELECT arrayReverseFill(x -> not isNull(x), [1, null, 3, 11, 12, null, null, 5, 6, 14, null, null]) AS res -``` - -``` text -┌─res────────────────────────────────┐ -│ [1,3,3,11,12,5,5,5,6,14,NULL,NULL] │ -└────────────────────────────────────┘ -``` - -Note that the first argument (lambda function) can’t be omitted in the `arrayReverseFill` function. - -### arraySplit(func, arr1, …) {#higher_order_functions-array-split} - -Split `arr1` into multiple arrays. When `func` returns something other than 0, the array will be split on the left hand side of the element. The array will not be split before the first element. - -Examples: - -``` sql -SELECT arraySplit((x, y) -> y, [1, 2, 3, 4, 5], [1, 0, 0, 1, 0]) AS res -``` - -``` text -┌─res─────────────┐ -│ [[1,2,3],[4,5]] │ -└─────────────────┘ -``` - -Note that the first argument (lambda function) can’t be omitted in the `arraySplit` function. - -### arrayReverseSplit(func, arr1, …) {#higher_order_functions-array-reverse-split} - -Split `arr1` into multiple arrays. When `func` returns something other than 0, the array will be split on the right hand side of the element. The array will not be split after the last element. - -Examples: - -``` sql -SELECT arrayReverseSplit((x, y) -> y, [1, 2, 3, 4, 5], [1, 0, 0, 1, 0]) AS res -``` - -``` text -┌─res───────────────┐ -│ [[1],[2,3,4],[5]] │ -└───────────────────┘ -``` - -Note that the first argument (lambda function) can’t be omitted in the `arraySplit` function. - -### arrayCount(\[func,\] arr1, …) {#higher_order_functions-array-count} - -Returns the number of elements in the arr array for which func returns something other than 0. If ‘func’ is not specified, it returns the number of non-zero elements in the array. - -### arrayExists(\[func,\] arr1, …) {#arrayexistsfunc-arr1} - -Returns 1 if there is at least one element in ‘arr’ for which ‘func’ returns something other than 0. Otherwise, it returns 0. - -### arrayAll(\[func,\] arr1, …) {#arrayallfunc-arr1} - -Returns 1 if ‘func’ returns something other than 0 for all the elements in ‘arr’. Otherwise, it returns 0. - -### arraySum(\[func,\] arr1, …) {#higher-order-functions-array-sum} - -Returns the sum of the ‘func’ values. If the function is omitted, it just returns the sum of the array elements. - -### arrayFirst(func, arr1, …) {#higher_order_functions-array-first} - -Returns the first element in the ‘arr1’ array for which ‘func’ returns something other than 0. - -Note that the first argument (lambda function) can’t be omitted in the `arrayFirst` function. - -### arrayFirstIndex(func, arr1, …) {#higher_order_functions-array-first-index} - -Returns the index of the first element in the ‘arr1’ array for which ‘func’ returns something other than 0. - -Note that the first argument (lambda function) can’t be omitted in the `arrayFirstIndex` function. - -### arrayCumSum(\[func,\] arr1, …) {#arraycumsumfunc-arr1} - -Returns an array of partial sums of elements in the source array (a running sum). If the `func` function is specified, then the values of the array elements are converted by this function before summing. - -Example: - -``` sql -SELECT arrayCumSum([1, 1, 1, 1]) AS res -``` - -``` text -┌─res──────────┐ -│ [1, 2, 3, 4] │ -└──────────────┘ -``` - -### arrayCumSumNonNegative(arr) {#arraycumsumnonnegativearr} - -Same as `arrayCumSum`, returns an array of partial sums of elements in the source array (a running sum). Different `arrayCumSum`, when then returned value contains a value less than zero, the value is replace with zero and the subsequent calculation is performed with zero parameters. For example: - -``` sql -SELECT arrayCumSumNonNegative([1, 1, -4, 1]) AS res -``` - -``` text -┌─res───────┐ -│ [1,2,0,1] │ -└───────────┘ -``` - -### arraySort(\[func,\] arr1, …) {#arraysortfunc-arr1} - -Returns an array as result of sorting the elements of `arr1` in ascending order. If the `func` function is specified, sorting order is determined by the result of the function `func` applied to the elements of array (arrays) - -The [Schwartzian transform](https://en.wikipedia.org/wiki/Schwartzian_transform) is used to improve sorting efficiency. - -Example: - -``` sql -SELECT arraySort((x, y) -> y, ['hello', 'world'], [2, 1]); -``` - -``` text -┌─res────────────────┐ -│ ['world', 'hello'] │ -└────────────────────┘ -``` - -For more information about the `arraySort` method, see the [Functions for Working With Arrays](../../sql-reference/functions/array-functions.md#array_functions-sort) section. - -### arrayReverseSort(\[func,\] arr1, …) {#arrayreversesortfunc-arr1} - -Returns an array as result of sorting the elements of `arr1` in descending order. If the `func` function is specified, sorting order is determined by the result of the function `func` applied to the elements of array (arrays). - -Example: - -``` sql -SELECT arrayReverseSort((x, y) -> y, ['hello', 'world'], [2, 1]) as res; -``` - -``` text -┌─res───────────────┐ -│ ['hello','world'] │ -└───────────────────┘ -``` - -For more information about the `arrayReverseSort` method, see the [Functions for Working With Arrays](../../sql-reference/functions/array-functions.md#array_functions-reverse-sort) section. - -[Original article](https://clickhouse.tech/docs/en/query_language/functions/higher_order_functions/) diff --git a/docs/en/sql-reference/functions/index.md b/docs/en/sql-reference/functions/index.md index 65514eff673..1a0b9d83b5f 100644 --- a/docs/en/sql-reference/functions/index.md +++ b/docs/en/sql-reference/functions/index.md @@ -44,6 +44,21 @@ Functions have the following behaviors: Functions can’t change the values of their arguments – any changes are returned as the result. Thus, the result of calculating separate functions does not depend on the order in which the functions are written in the query. +## Higher-order functions, `->` operator and lambda(params, expr) function {#higher-order-functions} + +Higher-order functions can only accept lambda functions as their functional argument. To pass a lambda function to a higher-order function use `->` operator. The left side of the arrow has a formal parameter, which is any ID, or multiple formal parameters – any IDs in a tuple. The right side of the arrow has an expression that can use these formal parameters, as well as any table columns. + +Examples: + +``` +x -> 2 * x +str -> str != Referer +``` + +A lambda function that accepts multiple arguments can also be passed to a higher-order function. In this case, the higher-order function is passed several arrays of identical length that these arguments will correspond to. + +For some functions the first argument (the lambda function) can be omitted. In this case, identical mapping is assumed. + ## Error Handling {#error-handling} Some functions might throw an exception if the data is invalid. In this case, the query is canceled and an error text is returned to the client. For distributed processing, when an exception occurs on one of the servers, the other servers also attempt to abort the query. diff --git a/docs/en/sql-reference/functions/introspection.md b/docs/en/sql-reference/functions/introspection.md index 6848f74da1f..1fd39c704c5 100644 --- a/docs/en/sql-reference/functions/introspection.md +++ b/docs/en/sql-reference/functions/introspection.md @@ -98,7 +98,7 @@ LIMIT 1 \G ``` -The [arrayMap](../../sql-reference/functions/higher-order-functions.md#higher_order_functions-array-map) function allows to process each individual element of the `trace` array by the `addressToLine` function. The result of this processing you see in the `trace_source_code_lines` column of output. +The [arrayMap](../../sql-reference/functions/array-functions.md#array-map) function allows to process each individual element of the `trace` array by the `addressToLine` function. The result of this processing you see in the `trace_source_code_lines` column of output. ``` text Row 1: @@ -184,7 +184,7 @@ LIMIT 1 \G ``` -The [arrayMap](../../sql-reference/functions/higher-order-functions.md#higher_order_functions-array-map) function allows to process each individual element of the `trace` array by the `addressToSymbols` function. The result of this processing you see in the `trace_symbols` column of output. +The [arrayMap](../../sql-reference/functions/array-functions.md#array-map) function allows to process each individual element of the `trace` array by the `addressToSymbols` function. The result of this processing you see in the `trace_symbols` column of output. ``` text Row 1: @@ -281,7 +281,7 @@ LIMIT 1 \G ``` -The [arrayMap](../../sql-reference/functions/higher-order-functions.md#higher_order_functions-array-map) function allows to process each individual element of the `trace` array by the `demangle` function. The result of this processing you see in the `trace_functions` column of output. +The [arrayMap](../../sql-reference/functions/array-functions.md#array-map) function allows to process each individual element of the `trace` array by the `demangle` function. The result of this processing you see in the `trace_functions` column of output. ``` text Row 1: diff --git a/docs/en/sql-reference/functions/other-functions.md b/docs/en/sql-reference/functions/other-functions.md index 05247b6db7d..1c059e9f97b 100644 --- a/docs/en/sql-reference/functions/other-functions.md +++ b/docs/en/sql-reference/functions/other-functions.md @@ -515,6 +515,29 @@ SELECT └────────────────┴────────────┘ ``` +## formatReadableQuantity(x) {#formatreadablequantityx} + +Accepts the number. Returns a rounded number with a suffix (thousand, million, billion, etc.) as a string. + +It is useful for reading big numbers by human. + +Example: + +``` sql +SELECT + arrayJoin([1024, 1234 * 1000, (4567 * 1000) * 1000, 98765432101234]) AS number, + formatReadableQuantity(number) AS number_for_humans +``` + +``` text +┌─────────number─┬─number_for_humans─┐ +│ 1024 │ 1.02 thousand │ +│ 1234000 │ 1.23 million │ +│ 4567000000 │ 4.57 billion │ +│ 98765432101234 │ 98.77 trillion │ +└────────────────┴───────────────────┘ +``` + ## least(a, b) {#leasta-b} Returns the smallest value from a and b. diff --git a/docs/en/sql-reference/functions/tuple-map-functions.md b/docs/en/sql-reference/functions/tuple-map-functions.md index 343f45135eb..f826b810d23 100644 --- a/docs/en/sql-reference/functions/tuple-map-functions.md +++ b/docs/en/sql-reference/functions/tuple-map-functions.md @@ -46,3 +46,25 @@ SELECT mapSubtract(([toUInt8(1), 2], [toInt32(1), 1]), ([toUInt8(1), 2], [toInt3 │ ([1,2],[-1,0]) │ Tuple(Array(UInt8), Array(Int64)) │ └────────────────┴───────────────────────────────────┘ ```` + +## mapPopulateSeries {#function-mappopulateseries} + +Syntax: `mapPopulateSeries((keys : Array(), values : Array()[, max : ])` + +Generates a map, where keys are a series of numbers, from minimum to maximum keys (or `max` argument if it specified) taken from `keys` array with step size of one, +and corresponding values taken from `values` array. If the value is not specified for the key, then it uses default value in the resulting map. +For repeated keys only the first value (in order of appearing) gets associated with the key. + +The number of elements in `keys` and `values` must be the same for each row. + +Returns a tuple of two arrays: keys in sorted order, and values the corresponding keys. + +``` sql +select mapPopulateSeries([1,2,4], [11,22,44], 5) as res, toTypeName(res) as type; +``` + +``` text +┌─res──────────────────────────┬─type──────────────────────────────┐ +│ ([1,2,3,4,5],[11,22,0,44,0]) │ Tuple(Array(UInt8), Array(UInt8)) │ +└──────────────────────────────┴───────────────────────────────────┘ +``` diff --git a/docs/es/operations/backup.md b/docs/es/operations/backup.md index f1e5b3d3e09..a6297070663 100644 --- a/docs/es/operations/backup.md +++ b/docs/es/operations/backup.md @@ -1,20 +1,18 @@ --- -machine_translated: true -machine_translated_rev: 72537a2d527c63c07aa5d2361a8829f3895cf2bd toc_priority: 49 toc_title: Copia de seguridad de datos --- # Copia de seguridad de datos {#data-backup} -Mientras [replicación](../engines/table-engines/mergetree-family/replication.md) provides protection from hardware failures, it does not protect against human errors: accidental deletion of data, deletion of the wrong table or a table on the wrong cluster, and software bugs that result in incorrect data processing or data corruption. In many cases mistakes like these will affect all replicas. ClickHouse has built-in safeguards to prevent some types of mistakes — for example, by default [no puede simplemente eliminar tablas con un motor similar a MergeTree que contenga más de 50 Gb de datos](https://github.com/ClickHouse/ClickHouse/blob/v18.14.18-stable/programs/server/config.xml#L322-L330). Sin embargo, estas garantías no cubren todos los casos posibles y pueden eludirse. +Mientras que la [replicación](../engines/table-engines/mergetree-family/replication.md) proporciona protección contra fallos de hardware, no protege de errores humanos: el borrado accidental de datos, elminar la tabla equivocada o una tabla en el clúster equivocado, y bugs de software que dan como resultado un procesado incorrecto de los datos o la corrupción de los datos. En muchos casos, errores como estos afectarán a todas las réplicas. ClickHouse dispone de salvaguardas para prevenir algunos tipos de errores — por ejemplo, por defecto [no se puede simplemente eliminar tablas con un motor similar a MergeTree que contenga más de 50 Gb de datos](https://github.com/ClickHouse/ClickHouse/blob/v18.14.18-stable/programs/server/config.xml#L322-L330). Sin embargo, estas salvaguardas no cubren todos los casos posibles y pueden eludirse. Para mitigar eficazmente los posibles errores humanos, debe preparar cuidadosamente una estrategia para realizar copias de seguridad y restaurar sus datos **previamente**. -Cada empresa tiene diferentes recursos disponibles y requisitos comerciales, por lo que no existe una solución universal para las copias de seguridad y restauraciones de ClickHouse que se adapten a cada situación. Lo que funciona para un gigabyte de datos probablemente no funcionará para decenas de petabytes. Hay una variedad de posibles enfoques con sus propios pros y contras, que se discutirán a continuación. Es una buena idea utilizar varios enfoques en lugar de solo uno para compensar sus diversas deficiencias. +Cada empresa tiene diferentes recursos disponibles y requisitos comerciales, por lo que no existe una solución universal para las copias de seguridad y restauraciones de ClickHouse que se adapten a cada situación. Lo que funciona para un gigabyte de datos probablemente no funcionará para decenas de petabytes. Hay una variedad de posibles enfoques con sus propios pros y contras, que se discutirán a continuación. Es una buena idea utilizar varios enfoques en lugar de uno solo para compensar sus diversas deficiencias. !!! note "Nota" - Tenga en cuenta que si realizó una copia de seguridad de algo y nunca intentó restaurarlo, es probable que la restauración no funcione correctamente cuando realmente la necesite (o al menos tomará más tiempo de lo que las empresas pueden tolerar). Por lo tanto, cualquiera que sea el enfoque de copia de seguridad que elija, asegúrese de automatizar el proceso de restauración también y practicarlo en un clúster de ClickHouse de repuesto regularmente. + Tenga en cuenta que si realizó una copia de seguridad de algo y nunca intentó restaurarlo, es probable que la restauración no funcione correctamente cuando realmente la necesite (o al menos tomará más tiempo de lo que las empresas pueden tolerar). Por lo tanto, cualquiera que sea el enfoque de copia de seguridad que elija, asegúrese de automatizar el proceso de restauración también y ponerlo en practica en un clúster de ClickHouse de repuesto regularmente. ## Duplicar datos de origen en otro lugar {#duplicating-source-data-somewhere-else} @@ -32,7 +30,7 @@ Para volúmenes de datos más pequeños, un simple `INSERT INTO ... SELECT ...` ## Manipulaciones con piezas {#manipulations-with-parts} -ClickHouse permite usar el `ALTER TABLE ... FREEZE PARTITION ...` consulta para crear una copia local de particiones de tabla. Esto se implementa utilizando enlaces duros al `/var/lib/clickhouse/shadow/` carpeta, por lo que generalmente no consume espacio adicional en disco para datos antiguos. Las copias creadas de archivos no son manejadas por el servidor ClickHouse, por lo que puede dejarlas allí: tendrá una copia de seguridad simple que no requiere ningún sistema externo adicional, pero seguirá siendo propenso a problemas de hardware. Por esta razón, es mejor copiarlos de forma remota en otra ubicación y luego eliminar las copias locales. Los sistemas de archivos distribuidos y los almacenes de objetos siguen siendo una buena opción para esto, pero los servidores de archivos conectados normales con una capacidad lo suficientemente grande podrían funcionar también (en este caso, la transferencia ocurrirá a través del sistema de archivos de red o tal vez [rsync](https://en.wikipedia.org/wiki/Rsync)). +ClickHouse permite usar la consulta `ALTER TABLE ... FREEZE PARTITION ...` para crear una copia local de particiones de tabla. Esto se implementa utilizando enlaces duros a la carpeta `/var/lib/clickhouse/shadow/`, por lo que generalmente no consume espacio adicional en disco para datos antiguos. Las copias creadas de archivos no son manejadas por el servidor ClickHouse, por lo que puede dejarlas allí: tendrá una copia de seguridad simple que no requiere ningún sistema externo adicional, pero seguirá siendo propenso a problemas de hardware. Por esta razón, es mejor copiarlos de forma remota en otra ubicación y luego eliminar las copias locales. Los sistemas de archivos distribuidos y los almacenes de objetos siguen siendo una buena opción para esto, pero los servidores de archivos conectados normales con una capacidad lo suficientemente grande podrían funcionar también (en este caso, la transferencia ocurrirá a través del sistema de archivos de red o tal vez [rsync](https://en.wikipedia.org/wiki/Rsync)). Para obtener más información sobre las consultas relacionadas con las manipulaciones de particiones, consulte [Documentación de ALTER](../sql-reference/statements/alter.md#alter_manipulations-with-partitions). diff --git a/docs/ru/engines/table-engines/mergetree-family/mergetree.md b/docs/ru/engines/table-engines/mergetree-family/mergetree.md index f04fbae18ba..3c80fe663f1 100644 --- a/docs/ru/engines/table-engines/mergetree-family/mergetree.md +++ b/docs/ru/engines/table-engines/mergetree-family/mergetree.md @@ -1,3 +1,8 @@ +--- +toc_priority: 30 +toc_title: MergeTree +--- + # MergeTree {#table_engines-mergetree} Движок `MergeTree`, а также другие движки этого семейства (`*MergeTree`) — это наиболее функциональные движки таблиц ClickHouse. @@ -28,8 +33,8 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1, INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2 ) ENGINE = MergeTree() +ORDER BY expr [PARTITION BY expr] -[ORDER BY expr] [PRIMARY KEY expr] [SAMPLE BY expr] [TTL expr [DELETE|TO DISK 'xxx'|TO VOLUME 'xxx'], ...] @@ -38,27 +43,42 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] Описание параметров смотрите в [описании запроса CREATE](../../../engines/table-engines/mergetree-family/mergetree.md). -!!! note "Note" +!!! note "Примечание" `INDEX` — экспериментальная возможность, смотрите [Индексы пропуска данных](#table_engine-mergetree-data_skipping-indexes). ### Секции запроса {#mergetree-query-clauses} - `ENGINE` — имя и параметры движка. `ENGINE = MergeTree()`. `MergeTree` не имеет параметров. -- `PARTITION BY` — [ключ партиционирования](custom-partitioning-key.md). Для партиционирования по месяцам используйте выражение `toYYYYMM(date_column)`, где `date_column` — столбец с датой типа [Date](../../../engines/table-engines/mergetree-family/mergetree.md). В этом случае имена партиций имеют формат `"YYYYMM"`. +- `ORDER BY` — ключ сортировки. + + Кортеж столбцов или произвольных выражений. Пример: `ORDER BY (CounterID, EventDate)`. -- `ORDER BY` — ключ сортировки. Кортеж столбцов или произвольных выражений. Пример: `ORDER BY (CounterID, EventDate)`. + ClickHouse использует ключ сортировки в качестве первичного ключа, если первичный ключ не задан в секции `PRIMARY KEY`. -- `PRIMARY KEY` — первичный ключ, если он [отличается от ключа сортировки](#pervichnyi-kliuch-otlichnyi-ot-kliucha-sortirovki). По умолчанию первичный ключ совпадает с ключом сортировки (который задаётся секцией `ORDER BY`.) Поэтому в большинстве случаев секцию `PRIMARY KEY` отдельно указывать не нужно. + Чтобы отключить сортировку, используйте синтаксис `ORDER BY tuple()`. Смотрите [выбор первичного ключа](#vybor-pervichnogo-kliucha). -- `SAMPLE BY` — выражение для сэмплирования. Если используется выражение для сэмплирования, то первичный ключ должен содержать его. Пример: `SAMPLE BY intHash32(UserID) ORDER BY (CounterID, EventDate, intHash32(UserID))`. +- `PARTITION BY` — [ключ партиционирования](custom-partitioning-key.md). Необязательный параметр. -- `TTL` — список правил, определяющих длительности хранения строк, а также задающих правила перемещения частей на определённые тома или диски. Выражение должно возвращать столбец `Date` или `DateTime`. Пример: `TTL date + INTERVAL 1 DAY`. - - Тип правила `DELETE|TO DISK 'xxx'|TO VOLUME 'xxx'` указывает действие, которое будет выполнено с частью, удаление строк (прореживание), перемещение (при выполнении условия для всех строк части) на определённый диск (`TO DISK 'xxx'`) или том (`TO VOLUME 'xxx'`). - - Поведение по умолчанию соответствует удалению строк (`DELETE`). В списке правил может быть указано только одно выражение с поведением `DELETE`. - - Дополнительные сведения смотрите в разделе [TTL для столбцов и таблиц](#table_engine-mergetree-ttl) + Для партиционирования по месяцам используйте выражение `toYYYYMM(date_column)`, где `date_column` — столбец с датой типа [Date](../../../engines/table-engines/mergetree-family/mergetree.md). В этом случае имена партиций имеют формат `"YYYYMM"`. -- `SETTINGS` — дополнительные параметры, регулирующие поведение `MergeTree`: +- `PRIMARY KEY` — первичный ключ, если он [отличается от ключа сортировки](#pervichnyi-kliuch-otlichnyi-ot-kliucha-sortirovki). Необязательный параметр. + + По умолчанию первичный ключ совпадает с ключом сортировки (который задаётся секцией `ORDER BY`.) Поэтому в большинстве случаев секцию `PRIMARY KEY` отдельно указывать не нужно. + +- `SAMPLE BY` — выражение для сэмплирования. Необязательный параметр. + + Если используется выражение для сэмплирования, то первичный ключ должен содержать его. Пример: `SAMPLE BY intHash32(UserID) ORDER BY (CounterID, EventDate, intHash32(UserID))`. + +- `TTL` — список правил, определяющих длительности хранения строк, а также задающих правила перемещения частей на определённые тома или диски. Необязательный параметр. + + Выражение должно возвращать столбец `Date` или `DateTime`. Пример: `TTL date + INTERVAL 1 DAY`. + + Тип правила `DELETE|TO DISK 'xxx'|TO VOLUME 'xxx'` указывает действие, которое будет выполнено с частью, удаление строк (прореживание), перемещение (при выполнении условия для всех строк части) на определённый диск (`TO DISK 'xxx'`) или том (`TO VOLUME 'xxx'`). Поведение по умолчанию соответствует удалению строк (`DELETE`). В списке правил может быть указано только одно выражение с поведением `DELETE`. + + Дополнительные сведения смотрите в разделе [TTL для столбцов и таблиц](#table_engine-mergetree-ttl) + +- `SETTINGS` — дополнительные параметры, регулирующие поведение `MergeTree` (необязательные): - `index_granularity` — максимальное количество строк данных между засечками индекса. По умолчанию — 8192. Смотрите [Хранение данных](#mergetree-data-storage). - `index_granularity_bytes` — максимальный размер гранул данных в байтах. По умолчанию — 10Mb. Чтобы ограничить размер гранул только количеством строк, установите значение 0 (не рекомендовано). Смотрите [Хранение данных](#mergetree-data-storage). @@ -180,6 +200,14 @@ ClickHouse не требует уникального первичного кл Длинный первичный ключ будет негативно влиять на производительность вставки и потребление памяти, однако на производительность ClickHouse при запросах `SELECT` лишние столбцы в первичном ключе не влияют. +Вы можете создать таблицу без первичного ключа, используя синтаксис `ORDER BY tuple()`. В этом случае ClickHouse хранит данные в порядке вставки. Если вы хотите сохранить порядок данных при вставке данных с помощью запросов `INSERT ... SELECT`, установите [max\_insert\_threads = 1](../../../operations/settings/settings.md#settings-max-insert-threads). + +Чтобы выбрать данные в первоначальном порядке, используйте +[однопоточные](../../../operations/settings/settings.md#settings-max_threads) запросы `SELECT. + + + + ### Первичный ключ, отличный от ключа сортировки {#pervichnyi-kliuch-otlichnyi-ot-kliucha-sortirovki} Существует возможность задать первичный ключ (выражение, значения которого будут записаны в индексный файл для diff --git a/docs/ru/interfaces/formats.md b/docs/ru/interfaces/formats.md index 054f75e8da8..04bca115974 100644 --- a/docs/ru/interfaces/formats.md +++ b/docs/ru/interfaces/formats.md @@ -28,6 +28,8 @@ ClickHouse может принимать (`INSERT`) и отдавать (`SELECT | [PrettySpace](#prettyspace) | ✗ | ✔ | | [Protobuf](#protobuf) | ✔ | ✔ | | [Parquet](#data-format-parquet) | ✔ | ✔ | +| [Arrow](#data-format-arrow) | ✔ | ✔ | +| [ArrowStream](#data-format-arrow-stream) | ✔ | ✔ | | [ORC](#data-format-orc) | ✔ | ✗ | | [RowBinary](#rowbinary) | ✔ | ✔ | | [RowBinaryWithNamesAndTypes](#rowbinarywithnamesandtypes) | ✔ | ✔ | @@ -947,6 +949,12 @@ ClickHouse пишет и читает сообщения `Protocol Buffers` в ## Avro {#data-format-avro} +[Apache Avro](https://avro.apache.org/) — это ориентированный на строки фреймворк для сериализации данных. Разработан в рамках проекта Apache Hadoop. + +В ClickHouse формат Avro поддерживает чтение и запись [файлов данных Avro](https://avro.apache.org/docs/current/spec.html#Object+Container+Files). + +[Логические типы Avro](https://avro.apache.org/docs/current/spec.html#Logical+Types) + ## AvroConfluent {#data-format-avro-confluent} Для формата `AvroConfluent` ClickHouse поддерживает декодирование сообщений `Avro` с одним объектом. Такие сообщения используются с [Kafka] (http://kafka.apache.org/) и реестром схем [Confluent](https://docs.confluent.io/current/schema-registry/index.html). @@ -996,7 +1004,7 @@ SELECT * FROM topic1_stream; ## Parquet {#data-format-parquet} -[Apache Parquet](http://parquet.apache.org/) — формат поколоночного хранения данных, который распространён в экосистеме Hadoop. Для формата `Parquet` ClickHouse поддерживает операции чтения и записи. +[Apache Parquet](https://parquet.apache.org/) — формат поколоночного хранения данных, который распространён в экосистеме Hadoop. Для формата `Parquet` ClickHouse поддерживает операции чтения и записи. ### Соответствие типов данных {#sootvetstvie-tipov-dannykh} @@ -1042,6 +1050,16 @@ $ clickhouse-client --query="SELECT * FROM {some_table} FORMAT Parquet" > {some_ Для обмена данными с экосистемой Hadoop можно использовать движки таблиц [HDFS](../engines/table-engines/integrations/hdfs.md). +## Arrow {data-format-arrow} + +[Apache Arrow](https://arrow.apache.org/) поставляется с двумя встроенными поколоночнами форматами хранения. ClickHouse поддерживает операции чтения и записи для этих форматов. + +`Arrow` — это Apache Arrow's "file mode" формат. Он предназначен для произвольного доступа в памяти. + +## ArrowStream {data-format-arrow-stream} + +`ArrowStream` — это Apache Arrow's "stream mode" формат. Он предназначен для обработки потоков в памяти. + ## ORC {#data-format-orc} [Apache ORC](https://orc.apache.org/) - это column-oriented формат данных, распространённый в экосистеме Hadoop. Вы можете только вставлять данные этого формата в ClickHouse. diff --git a/docs/ru/interfaces/third-party/gui.md b/docs/ru/interfaces/third-party/gui.md index a872e35ce0b..f7eaa5cc77f 100644 --- a/docs/ru/interfaces/third-party/gui.md +++ b/docs/ru/interfaces/third-party/gui.md @@ -93,6 +93,10 @@ [cickhouse-plantuml](https://pypi.org/project/clickhouse-plantuml/) — скрипт, генерирующий [PlantUML](https://plantuml.com/) диаграммы схем таблиц. +### xeus-clickhouse {#xeus-clickhouse} + +[xeus-clickhouse](https://github.com/wangfenjin/xeus-clickhouse) — это ядро Jupyter для ClickHouse, которое поддерживает запрос ClickHouse-данных с использованием SQL в Jupyter. + ## Коммерческие {#kommercheskie} ### DataGrip {#datagrip} diff --git a/docs/ru/operations/system-tables/index.md b/docs/ru/operations/system-tables/index.md index 95715cd84c4..6fa989d3d0d 100644 --- a/docs/ru/operations/system-tables/index.md +++ b/docs/ru/operations/system-tables/index.md @@ -7,10 +7,38 @@ toc_title: Системные таблицы ## Введение {#system-tables-introduction} -Системные таблицы используются для реализации части функциональности системы, а также предоставляют доступ к информации о работе системы. -Вы не можете удалить системную таблицу (хотя можете сделать DETACH). -Для системных таблиц нет файлов с данными на диске и файлов с метаданными. Сервер создаёт все системные таблицы при старте. -В системные таблицы нельзя записывать данные - можно только читать. -Системные таблицы расположены в базе данных system. +Системные таблицы содержат информацию о: + +- Состоянии сервера, процессов и окружении. +- Внутренних процессах сервера. + +Системные таблицы: + +- Находятся в базе данных `system`. +- Доступны только для чтения данных. +- Не могут быть удалены или изменены, но их можно отсоединить. + +Системные таблицы `metric_log`, `query_log`, `query_thread_log`, `trace_log` системные таблицы хранят данные в файловой системе. Остальные системные таблицы хранят свои данные в оперативной памяти. Сервер ClickHouse создает такие системные таблицы при запуске. + +### Источники системных показателей + +Для сбора системных показателей сервер ClickHouse использует: + +- Возможности `CAP_NET_ADMIN`. +- [procfs](https://ru.wikipedia.org/wiki/Procfs) (только Linux). + +**procfs** + +Если для сервера ClickHouse не включено `CAP_NET_ADMIN`, он пытается обратиться к `ProcfsMetricsProvider`. `ProcfsMetricsProvider` позволяет собирать системные показатели для каждого запроса (для CPU и I/O). + +Если procfs поддерживается и включена в системе, то сервер ClickHouse собирает следующие системные показатели: + +- `OSCPUVirtualTimeMicroseconds` +- `OSCPUWaitMicroseconds` +- `OSIOWaitMicroseconds` +- `OSReadChars` +- `OSWriteChars` +- `OSReadBytes` +- `OSWriteBytes` [Оригинальная статья](https://clickhouse.tech/docs/ru/operations/system-tables/) diff --git a/docs/ru/operations/system-tables/stack_trace.md b/docs/ru/operations/system-tables/stack_trace.md index 966a07633d8..0689e15c35c 100644 --- a/docs/ru/operations/system-tables/stack_trace.md +++ b/docs/ru/operations/system-tables/stack_trace.md @@ -82,7 +82,7 @@ res: /lib/x86_64-linux-gnu/libc-2.27.so - [Функции интроспекции](../../sql-reference/functions/introspection.md) — Что такое функции интроспекции и как их использовать. - [system.trace_log](../../operations/system-tables/trace_log.md#system_tables-trace_log) — Содержит трассировки стека, собранные профилировщиком выборочных запросов. -- [arrayMap](../../sql-reference/functions/higher-order-functions.md#higher_order_functions-array-map) — Описание и пример использования функции `arrayMap`. -- [arrayFilter](../../sql-reference/functions/higher-order-functions.md#higher_order_functions-array-filter) — Описание и пример использования функции `arrayFilter`. +- [arrayMap](../../sql-reference/functions/array-functions.md#array-map) — Описание и пример использования функции `arrayMap`. +- [arrayFilter](../../sql-reference/functions/array-functions.md#array-filter) — Описание и пример использования функции `arrayFilter`. [Оригинальная статья](https://clickhouse.tech/docs/ru/operations/system_tables/stack_trace) diff --git a/docs/ru/sql-reference/data-types/simpleaggregatefunction.md b/docs/ru/sql-reference/data-types/simpleaggregatefunction.md index d36dc87e8ba..52f0412a177 100644 --- a/docs/ru/sql-reference/data-types/simpleaggregatefunction.md +++ b/docs/ru/sql-reference/data-types/simpleaggregatefunction.md @@ -9,6 +9,7 @@ The following aggregate functions are supported: - [`min`](../../sql-reference/aggregate-functions/reference/min.md#agg_function-min) - [`max`](../../sql-reference/aggregate-functions/reference/max.md#agg_function-max) - [`sum`](../../sql-reference/aggregate-functions/reference/sum.md#agg_function-sum) +- [`sumWithOverflow`](../../sql-reference/aggregate-functions/reference/sumwithoverflow.md#sumwithoverflowx) - [`groupBitAnd`](../../sql-reference/aggregate-functions/reference/groupbitand.md#groupbitand) - [`groupBitOr`](../../sql-reference/aggregate-functions/reference/groupbitor.md#groupbitor) - [`groupBitXor`](../../sql-reference/aggregate-functions/reference/groupbitxor.md#groupbitxor) diff --git a/docs/ru/sql-reference/data-types/tuple.md b/docs/ru/sql-reference/data-types/tuple.md index cb8130f28a3..e2a1450b47f 100644 --- a/docs/ru/sql-reference/data-types/tuple.md +++ b/docs/ru/sql-reference/data-types/tuple.md @@ -7,7 +7,7 @@ toc_title: Tuple(T1, T2, ...) Кортеж из элементов любого [типа](index.md#data_types). Элементы кортежа могут быть одного или разных типов. -Кортежи используются для временной группировки столбцов. Столбцы могут группироваться при использовании выражения IN в запросе, а также для указания нескольких формальных параметров лямбда-функций. Подробнее смотрите разделы [Операторы IN](../../sql-reference/data-types/tuple.md), [Функции высшего порядка](../../sql-reference/functions/higher-order-functions.md#higher-order-functions). +Кортежи используются для временной группировки столбцов. Столбцы могут группироваться при использовании выражения IN в запросе, а также для указания нескольких формальных параметров лямбда-функций. Подробнее смотрите разделы [Операторы IN](../../sql-reference/data-types/tuple.md), [Функции высшего порядка](../../sql-reference/functions/index.md#higher-order-functions). Кортежи могут быть результатом запроса. В этом случае, в текстовых форматах кроме JSON, значения выводятся в круглых скобках через запятую. В форматах JSON, кортежи выводятся в виде массивов (в квадратных скобках). diff --git a/docs/ru/sql-reference/functions/array-functions.md b/docs/ru/sql-reference/functions/array-functions.md index cb1d179be47..91c0443c85d 100644 --- a/docs/ru/sql-reference/functions/array-functions.md +++ b/docs/ru/sql-reference/functions/array-functions.md @@ -1,4 +1,4 @@ -# Функции по работе с массивами {#funktsii-po-rabote-s-massivami} +# Массивы {#functions-for-working-with-arrays} ## empty {#function-empty} @@ -186,6 +186,13 @@ SELECT indexOf([1, 3, NULL, NULL], NULL) Элементы, равные `NULL`, обрабатываются как обычные значения. +## arrayCount(\[func,\] arr1, …) {#array-count} + +Возвращает количество элементов массива `arr`, для которых функция `func` возвращает не 0. Если `func` не указана - возвращает количество ненулевых элементов массива. + +Функция `arrayCount` является [функцией высшего порядка](../../sql-reference/functions/index.md#higher-order-functions) — в качестве первого аргумента ей можно передать лямбда-функцию. + + ## countEqual(arr, x) {#countequalarr-x} Возвращает количество элементов массива, равных x. Эквивалентно arrayCount(elem -\> elem = x, arr). @@ -513,7 +520,7 @@ SELECT arraySort([1, nan, 2, NULL, 3, nan, -4, NULL, inf, -inf]); - Значения `NaN` идут перед `NULL`. - Значения `Inf` идут перед `NaN`. -Функция `arraySort` является [функцией высшего порядка](higher-order-functions.md) — в качестве первого аргумента ей можно передать лямбда-функцию. В этом случае порядок сортировки определяется результатом применения лямбда-функции на элементы массива. +Функция `arraySort` является [функцией высшего порядка](../../sql-reference/functions/index.md#higher-order-functions) — в качестве первого аргумента ей можно передать лямбда-функцию. В этом случае порядок сортировки определяется результатом применения лямбда-функции на элементы массива. Рассмотрим пример: @@ -613,7 +620,7 @@ SELECT arrayReverseSort([1, nan, 2, NULL, 3, nan, -4, NULL, inf, -inf]) as res; - Значения `NaN` идут перед `NULL`. - Значения `-Inf` идут перед `NaN`. -Функция `arrayReverseSort` является [функцией высшего порядка](higher-order-functions.md). Вы можете передать ей в качестве первого аргумента лямбда-функцию. Например: +Функция `arrayReverseSort` является [функцией высшего порядка](../../sql-reference/functions/index.md#higher-order-functions) — в качестве первого аргумента ей можно передать лямбда-функцию. Например: ``` sql SELECT arrayReverseSort((x) -> -x, [1, 2, 3]) as res; @@ -1036,6 +1043,116 @@ SELECT arrayZip(['a', 'b', 'c'], [5, 2, 1]) └──────────────────────────────────────┘ ``` +## arrayMap(func, arr1, …) {#array-map} + +Возвращает массив, полученный на основе результатов применения функции `func` к каждому элементу массива `arr`. + +Примеры: + +``` sql +SELECT arrayMap(x -> (x + 2), [1, 2, 3]) as res; +``` + +``` text +┌─res─────┐ +│ [3,4,5] │ +└─────────┘ +``` + +Следующий пример показывает, как создать кортежи из элементов разных массивов: + +``` sql +SELECT arrayMap((x, y) -> (x, y), [1, 2, 3], [4, 5, 6]) AS res +``` + +``` text +┌─res─────────────────┐ +│ [(1,4),(2,5),(3,6)] │ +└─────────────────────┘ +``` + +Функция `arrayMap` является [функцией высшего порядка](../../sql-reference/functions/index.md#higher-order-functions) — в качестве первого аргумента ей нужно передать лямбда-функцию, и этот аргумент не может быть опущен. + +## arrayFilter(func, arr1, …) {#array-filter} + +Возвращает массив, содержащий только те элементы массива `arr1`, для которых функция `func` возвращает не 0. + +Примеры: + +``` sql +SELECT arrayFilter(x -> x LIKE '%World%', ['Hello', 'abc World']) AS res +``` + +``` text +┌─res───────────┐ +│ ['abc World'] │ +└───────────────┘ +``` + +``` sql +SELECT + arrayFilter( + (i, x) -> x LIKE '%World%', + arrayEnumerate(arr), + ['Hello', 'abc World'] AS arr) + AS res +``` + +``` text +┌─res─┐ +│ [2] │ +└─────┘ +``` + +Функция `arrayFilter` является [функцией высшего порядка](../../sql-reference/functions/index.md#higher-order-functions) — в качестве первого аргумента ей нужно передать лямбда-функцию, и этот аргумент не может быть опущен. + +## arrayExists(\[func,\] arr1, …) {#arrayexistsfunc-arr1} + +Возвращает 1, если существует хотя бы один элемент массива `arr`, для которого функция func возвращает не 0. Иначе возвращает 0. + +Функция `arrayExists` является [функцией высшего порядка](../../sql-reference/functions/index.md#higher-order-functions) - в качестве первого аргумента ей можно передать лямбда-функцию. + +## arrayAll(\[func,\] arr1, …) {#arrayallfunc-arr1} + +Возвращает 1, если для всех элементов массива `arr`, функция `func` возвращает не 0. Иначе возвращает 0. + +Функция `arrayAll` является [функцией высшего порядка](../../sql-reference/functions/index.md#higher-order-functions) - в качестве первого аргумента ей можно передать лямбда-функцию. + +## arrayFirst(func, arr1, …) {#array-first} + +Возвращает первый элемент массива `arr1`, для которого функция func возвращает не 0. + +Функция `arrayFirst` является [функцией высшего порядка](../../sql-reference/functions/index.md#higher-order-functions) — в качестве первого аргумента ей нужно передать лямбда-функцию, и этот аргумент не может быть опущен. + +## arrayFirstIndex(func, arr1, …) {#array-first-index} + +Возвращает индекс первого элемента массива `arr1`, для которого функция func возвращает не 0. + +Функция `arrayFirstIndex` является [функцией высшего порядка](../../sql-reference/functions/index.md#higher-order-functions) — в качестве первого аргумента ей нужно передать лямбда-функцию, и этот аргумент не может быть опущен. + +## arraySum(\[func,\] arr1, …) {#array-sum} + +Возвращает сумму значений функции `func`. Если функция не указана - просто возвращает сумму элементов массива. + +Функция `arraySum` является [функцией высшего порядка](../../sql-reference/functions/index.md#higher-order-functions) - в качестве первого аргумента ей можно передать лямбда-функцию. + +## arrayCumSum(\[func,\] arr1, …) {#arraycumsumfunc-arr1} + +Возвращает массив из частичных сумм элементов исходного массива (сумма с накоплением). Если указана функция `func`, то значения элементов массива преобразуются этой функцией перед суммированием. + +Функция `arrayCumSum` является [функцией высшего порядка](../../sql-reference/functions/index.md#higher-order-functions) - в качестве первого аргумента ей можно передать лямбда-функцию. + +Пример: + +``` sql +SELECT arrayCumSum([1, 1, 1, 1]) AS res +``` + +``` text +┌─res──────────┐ +│ [1, 2, 3, 4] │ +└──────────────┘ + ## arrayAUC {#arrayauc} Вычисляет площадь под кривой. diff --git a/docs/ru/sql-reference/functions/higher-order-functions.md b/docs/ru/sql-reference/functions/higher-order-functions.md deleted file mode 100644 index cd3dee5b1a7..00000000000 --- a/docs/ru/sql-reference/functions/higher-order-functions.md +++ /dev/null @@ -1,167 +0,0 @@ -# Функции высшего порядка {#higher-order-functions} - -## Оператор `->`, функция lambda(params, expr) {#operator-funktsiia-lambdaparams-expr} - -Позволяет описать лямбда-функцию для передачи в функцию высшего порядка. Слева от стрелочки стоит формальный параметр - произвольный идентификатор, или несколько формальных параметров - произвольные идентификаторы в кортеже. Справа от стрелочки стоит выражение, в котором могут использоваться эти формальные параметры, а также любые столбцы таблицы. - -Примеры: `x -> 2 * x, str -> str != Referer.` - -Функции высшего порядка, в качестве своего функционального аргумента могут принимать только лямбда-функции. - -В функции высшего порядка может быть передана лямбда-функция, принимающая несколько аргументов. В этом случае, в функцию высшего порядка передаётся несколько массивов одинаковых длин, которым эти аргументы будут соответствовать. - -Для некоторых функций, например [arrayCount](#higher_order_functions-array-count) или [arraySum](#higher_order_functions-array-sum), первый аргумент (лямбда-функция) может отсутствовать. В этом случае, подразумевается тождественное отображение. - -Для функций, перечисленных ниже, лямбда-функцию должна быть указана всегда: - -- [arrayMap](#higher_order_functions-array-map) -- [arrayFilter](#higher_order_functions-array-filter) -- [arrayFirst](#higher_order_functions-array-first) -- [arrayFirstIndex](#higher_order_functions-array-first-index) - -### arrayMap(func, arr1, …) {#higher_order_functions-array-map} - -Вернуть массив, полученный на основе результатов применения функции `func` к каждому элементу массива `arr`. - -Примеры: - -``` sql -SELECT arrayMap(x -> (x + 2), [1, 2, 3]) as res; -``` - -``` text -┌─res─────┐ -│ [3,4,5] │ -└─────────┘ -``` - -Следующий пример показывает, как создать кортежи из элементов разных массивов: - -``` sql -SELECT arrayMap((x, y) -> (x, y), [1, 2, 3], [4, 5, 6]) AS res -``` - -``` text -┌─res─────────────────┐ -│ [(1,4),(2,5),(3,6)] │ -└─────────────────────┘ -``` - -Обратите внимание, что у функции `arrayMap` первый аргумент (лямбда-функция) не может быть опущен. - -### arrayFilter(func, arr1, …) {#higher_order_functions-array-filter} - -Вернуть массив, содержащий только те элементы массива `arr1`, для которых функция `func` возвращает не 0. - -Примеры: - -``` sql -SELECT arrayFilter(x -> x LIKE '%World%', ['Hello', 'abc World']) AS res -``` - -``` text -┌─res───────────┐ -│ ['abc World'] │ -└───────────────┘ -``` - -``` sql -SELECT - arrayFilter( - (i, x) -> x LIKE '%World%', - arrayEnumerate(arr), - ['Hello', 'abc World'] AS arr) - AS res -``` - -``` text -┌─res─┐ -│ [2] │ -└─────┘ -``` - -Обратите внимание, что у функции `arrayFilter` первый аргумент (лямбда-функция) не может быть опущен. - -### arrayCount(\[func,\] arr1, …) {#higher_order_functions-array-count} - -Вернуть количество элементов массива `arr`, для которых функция func возвращает не 0. Если func не указана - вернуть количество ненулевых элементов массива. - -### arrayExists(\[func,\] arr1, …) {#arrayexistsfunc-arr1} - -Вернуть 1, если существует хотя бы один элемент массива `arr`, для которого функция func возвращает не 0. Иначе вернуть 0. - -### arrayAll(\[func,\] arr1, …) {#arrayallfunc-arr1} - -Вернуть 1, если для всех элементов массива `arr`, функция `func` возвращает не 0. Иначе вернуть 0. - -### arraySum(\[func,\] arr1, …) {#higher_order_functions-array-sum} - -Вернуть сумму значений функции `func`. Если функция не указана - просто вернуть сумму элементов массива. - -### arrayFirst(func, arr1, …) {#higher_order_functions-array-first} - -Вернуть первый элемент массива `arr1`, для которого функция func возвращает не 0. - -Обратите внимание, что у функции `arrayFirst` первый аргумент (лямбда-функция) не может быть опущен. - -### arrayFirstIndex(func, arr1, …) {#higher_order_functions-array-first-index} - -Вернуть индекс первого элемента массива `arr1`, для которого функция func возвращает не 0. - -Обратите внимание, что у функции `arrayFirstFilter` первый аргумент (лямбда-функция) не может быть опущен. - -### arrayCumSum(\[func,\] arr1, …) {#arraycumsumfunc-arr1} - -Возвращает массив из частичных сумм элементов исходного массива (сумма с накоплением). Если указана функция `func`, то значения элементов массива преобразуются этой функцией перед суммированием. - -Пример: - -``` sql -SELECT arrayCumSum([1, 1, 1, 1]) AS res -``` - -``` text -┌─res──────────┐ -│ [1, 2, 3, 4] │ -└──────────────┘ -``` - -### arraySort(\[func,\] arr1, …) {#arraysortfunc-arr1} - -Возвращает отсортированный в восходящем порядке массив `arr1`. Если задана функция `func`, то порядок сортировки определяется результатом применения функции `func` на элементы массива (массивов). - -Для улучшения эффективности сортировки применяется [Преобразование Шварца](https://ru.wikipedia.org/wiki/%D0%9F%D1%80%D0%B5%D0%BE%D0%B1%D1%80%D0%B0%D0%B7%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D0%B5_%D0%A8%D0%B2%D0%B0%D1%80%D1%86%D0%B0). - -Пример: - -``` sql -SELECT arraySort((x, y) -> y, ['hello', 'world'], [2, 1]); -``` - -``` text -┌─res────────────────┐ -│ ['world', 'hello'] │ -└────────────────────┘ -``` - -Подробная информация о методе `arraySort` приведена в разделе [Функции по работе с массивами](array-functions.md#array_functions-sort). - -### arrayReverseSort(\[func,\] arr1, …) {#arrayreversesortfunc-arr1} - -Возвращает отсортированный в нисходящем порядке массив `arr1`. Если задана функция `func`, то порядок сортировки определяется результатом применения функции `func` на элементы массива (массивов). - -Пример: - -``` sql -SELECT arrayReverseSort((x, y) -> y, ['hello', 'world'], [2, 1]) as res; -``` - -``` text -┌─res───────────────┐ -│ ['hello','world'] │ -└───────────────────┘ -``` - -Подробная информация о методе `arrayReverseSort` приведена в разделе [Функции по работе с массивами](array-functions.md#array_functions-reverse-sort). - -[Оригинальная статья](https://clickhouse.tech/docs/ru/query_language/functions/higher_order_functions/) diff --git a/docs/ru/sql-reference/functions/index.md b/docs/ru/sql-reference/functions/index.md index 06d3d892cf9..9c1c0c5ca9d 100644 --- a/docs/ru/sql-reference/functions/index.md +++ b/docs/ru/sql-reference/functions/index.md @@ -38,6 +38,20 @@ Функции не могут поменять значения своих аргументов - любые изменения возвращаются в качестве результата. Соответственно, от порядка записи функций в запросе, результат вычислений отдельных функций не зависит. +## Функции высшего порядка, оператор `->` и функция lambda(params, expr) {#higher-order-functions} + +Функции высшего порядка, в качестве своего функционального аргумента могут принимать только лямбда-функции. Чтобы передать лямбда-функцию в функцию высшего порядка, используйте оператор `->`. Слева от стрелочки стоит формальный параметр — произвольный идентификатор, или несколько формальных параметров — произвольные идентификаторы в кортеже. Справа от стрелочки стоит выражение, в котором могут использоваться эти формальные параметры, а также любые столбцы таблицы. + +Примеры: +``` +x -> 2 * x +str -> str != Referer +``` + +В функции высшего порядка может быть передана лямбда-функция, принимающая несколько аргументов. В этом случае в функцию высшего порядка передаётся несколько массивов одинаковой длины, которым эти аргументы будут соответствовать. + +Для некоторых функций первый аргумент (лямбда-функция) может отсутствовать. В этом случае подразумевается тождественное отображение. + ## Обработка ошибок {#obrabotka-oshibok} Некоторые функции могут кидать исключения в случае ошибочных данных. В этом случае, выполнение запроса прерывается, и текст ошибки выводится клиенту. При распределённой обработке запроса, при возникновении исключения на одном из серверов, на другие серверы пытается отправиться просьба тоже прервать выполнение запроса. diff --git a/docs/ru/sql-reference/functions/introspection.md b/docs/ru/sql-reference/functions/introspection.md index 9c6a0711ec9..655c4be8318 100644 --- a/docs/ru/sql-reference/functions/introspection.md +++ b/docs/ru/sql-reference/functions/introspection.md @@ -93,7 +93,7 @@ LIMIT 1 \G ``` -Функция [arrayMap](higher-order-functions.md#higher_order_functions-array-map) позволяет обрабатывать каждый отдельный элемент массива `trace` с помощью функции `addressToLine`. Результат этой обработки вы видите в виде `trace_source_code_lines` колонки выходных данных. +Функция [arrayMap](../../sql-reference/functions/array-functions.md#array-map) позволяет обрабатывать каждый отдельный элемент массива `trace` с помощью функции `addressToLine`. Результат этой обработки вы видите в виде `trace_source_code_lines` колонки выходных данных. ``` text Row 1: @@ -179,7 +179,7 @@ LIMIT 1 \G ``` -То [arrayMap](higher-order-functions.md#higher_order_functions-array-map) функция позволяет обрабатывать каждый отдельный элемент системы. `trace` массив по типу `addressToSymbols` функция. Результат этой обработки вы видите в виде `trace_symbols` колонка выходных данных. +То [arrayMap](../../sql-reference/functions/array-functions.md#array-map) функция позволяет обрабатывать каждый отдельный элемент системы. `trace` массив по типу `addressToSymbols` функция. Результат этой обработки вы видите в виде `trace_symbols` колонка выходных данных. ``` text Row 1: @@ -276,7 +276,7 @@ LIMIT 1 \G ``` -Функция [arrayMap](higher-order-functions.md#higher_order_functions-array-map) позволяет обрабатывать каждый отдельный элемент массива `trace` с помощью функции `demangle`. +Функция [arrayMap](../../sql-reference/functions/array-functions.md#array-map) позволяет обрабатывать каждый отдельный элемент массива `trace` с помощью функции `demangle`. ``` text Row 1: diff --git a/docs/ru/sql-reference/functions/other-functions.md b/docs/ru/sql-reference/functions/other-functions.md index 468e15e7d57..7b9dacf21cd 100644 --- a/docs/ru/sql-reference/functions/other-functions.md +++ b/docs/ru/sql-reference/functions/other-functions.md @@ -508,6 +508,29 @@ SELECT └────────────────┴────────────┘ ``` +## formatReadableQuantity(x) {#formatreadablequantityx} + +Принимает число. Возвращает округленное число с суффиксом (thousand, million, billion и т.д.) в виде строки. + +Облегчает визуальное восприятие больших чисел живым человеком. + +Пример: + +``` sql +SELECT + arrayJoin([1024, 1234 * 1000, (4567 * 1000) * 1000, 98765432101234]) AS number, + formatReadableQuantity(number) AS number_for_humans +``` + +``` text +┌─────────number─┬─number_for_humans─┐ +│ 1024 │ 1.02 thousand │ +│ 1234000 │ 1.23 million │ +│ 4567000000 │ 4.57 billion │ +│ 98765432101234 │ 98.77 trillion │ +└────────────────┴───────────────────┘ +``` + ## least(a, b) {#leasta-b} Возвращает наименьшее значение из a и b. diff --git a/docs/ru/sql-reference/functions/random-functions.md b/docs/ru/sql-reference/functions/random-functions.md index b425505b69d..4aaaef5cb5d 100644 --- a/docs/ru/sql-reference/functions/random-functions.md +++ b/docs/ru/sql-reference/functions/random-functions.md @@ -55,4 +55,50 @@ FROM numbers(3) └────────────┴────────────┴──────────────┴────────────────┴─────────────────┴──────────────────────┘ ``` +# Случайные функции для работы со строками {#random-functions-for-working-with-strings} + +## randomString {#random-string} + +## randomFixedString {#random-fixed-string} + +## randomPrintableASCII {#random-printable-ascii} + +## randomStringUTF8 {#random-string-utf8} + +## fuzzBits {#fuzzbits} + +**Синтаксис** + +``` sql +fuzzBits([s], [prob]) +``` +Инвертирует каждый бит `s` с вероятностью `prob`. + +**Параметры** + +- `s` — `String` or `FixedString` +- `prob` — constant `Float32/64` + +**Возвращаемое значение** + +Измененная случайным образом строка с тем же типом, что и `s`. + +**Пример** + +Запрос: + +``` sql +SELECT fuzzBits(materialize('abacaba'), 0.1) +FROM numbers(3) +``` + +Результат: + +``` text +┌─fuzzBits(materialize('abacaba'), 0.1)─┐ +│ abaaaja │ +│ a*cjab+ │ +│ aeca2A │ +└───────────────────────────────────────┘ + [Оригинальная статья](https://clickhouse.tech/docs/ru/query_language/functions/random_functions/) diff --git a/docs/ru/sql-reference/statements/create/view.md b/docs/ru/sql-reference/statements/create/view.md index 36a7a3c51e2..caa3d04659e 100644 --- a/docs/ru/sql-reference/statements/create/view.md +++ b/docs/ru/sql-reference/statements/create/view.md @@ -5,13 +5,15 @@ toc_title: Представление # CREATE VIEW {#create-view} -``` sql -CREATE [MATERIALIZED] VIEW [IF NOT EXISTS] [db.]table_name [TO[db.]name] [ENGINE = engine] [POPULATE] AS SELECT ... -``` - Создаёт представление. Представления бывают двух видов - обычные и материализованные (MATERIALIZED). -Обычные представления не хранят никаких данных, а всего лишь производят чтение из другой таблицы. То есть, обычное представление - не более чем сохранённый запрос. При чтении из представления, этот сохранённый запрос, используется в качестве подзапроса в секции FROM. +## Обычные представления {#normal} + +``` sql +CREATE [OR REPLACE] VIEW [IF NOT EXISTS] [db.]table_name [ON CLUSTER] AS SELECT ... +``` + +Normal views don’t store any data, they just perform a read from another table on each access. In other words, a normal view is nothing more than a saved query. When reading from a view, this saved query is used as a subquery in the [FROM](../../../sql-reference/statements/select/from.md) clause. Для примера, пусть вы создали представление: @@ -31,15 +33,24 @@ SELECT a, b, c FROM view SELECT a, b, c FROM (SELECT ...) ``` -Материализованные (MATERIALIZED) представления хранят данные, преобразованные соответствующим запросом SELECT. +## Материализованные представления {#materialized} -При создании материализованного представления без использования `TO [db].[table]`, нужно обязательно указать ENGINE - движок таблицы для хранения данных. +``` sql +CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db.]table_name [ON CLUSTER] [TO[db.]name] [ENGINE = engine] [POPULATE] AS SELECT ... +``` + +Материализованные (MATERIALIZED) представления хранят данные, преобразованные соответствующим запросом [SELECT](../../../sql-reference/statements/select/index.md). + +При создании материализованного представления без использования `TO [db].[table]`, нужно обязательно указать `ENGINE` - движок таблицы для хранения данных. При создании материализованного представления с испольованием `TO [db].[table]`, нельзя указывать `POPULATE` Материализованное представление устроено следующим образом: при вставке данных в таблицу, указанную в SELECT-е, кусок вставляемых данных преобразуется этим запросом SELECT, и полученный результат вставляется в представление. -Если указано POPULATE, то при создании представления, в него будут вставлены имеющиеся данные таблицы, как если бы был сделан запрос `CREATE TABLE ... AS SELECT ...` . Иначе, представление будет содержать только данные, вставляемые в таблицу после создания представления. Не рекомендуется использовать POPULATE, так как вставляемые в таблицу данные во время создания представления, не попадут в него. +!!! important "Важно" + Материализованные представлени в ClickHouse больше похожи на `after insert` триггеры. Если в запросе материализованного представления есть агрегирование, оно применяется только к вставляемому блоку записей. Любые изменения существующих данных исходной таблицы (например обновление, удаление, удаление раздела и т.д.) не изменяют материализованное представление. + +Если указано `POPULATE`, то при создании представления, в него будут вставлены имеющиеся данные таблицы, как если бы был сделан запрос `CREATE TABLE ... AS SELECT ...` . Иначе, представление будет содержать только данные, вставляемые в таблицу после создания представления. Не рекомендуется использовать POPULATE, так как вставляемые в таблицу данные во время создания представления, не попадут в него. Запрос `SELECT` может содержать `DISTINCT`, `GROUP BY`, `ORDER BY`, `LIMIT`… Следует иметь ввиду, что соответствующие преобразования будут выполняться независимо, на каждый блок вставляемых данных. Например, при наличии `GROUP BY`, данные будут агрегироваться при вставке, но только в рамках одной пачки вставляемых данных. Далее, данные не будут доагрегированы. Исключение - использование ENGINE, производящего агрегацию данных самостоятельно, например, `SummingMergeTree`. @@ -50,4 +61,4 @@ SELECT a, b, c FROM (SELECT ...) Отсутствует отдельный запрос для удаления представлений. Чтобы удалить представление, следует использовать `DROP TABLE`. [Оригинальная статья](https://clickhouse.tech/docs/ru/sql-reference/statements/create/view) - \ No newline at end of file + diff --git a/docs/ru/sql-reference/statements/drop.md b/docs/ru/sql-reference/statements/drop.md index 4bfd53b1d47..22e553cfdac 100644 --- a/docs/ru/sql-reference/statements/drop.md +++ b/docs/ru/sql-reference/statements/drop.md @@ -5,18 +5,35 @@ toc_title: DROP # DROP {#drop} -Запрос имеет два вида: `DROP DATABASE` и `DROP TABLE`. +Удаляет существующий объект. +Если указано `IF EXISTS` - не выдавать ошибку, если объекта не существует. + +## DROP DATABASE {#drop-database} ``` sql DROP DATABASE [IF EXISTS] db [ON CLUSTER cluster] ``` +Удаляет все таблицы в базе данных db, затем удаляет саму базу данных db. + + +## DROP TABLE {#drop-table} + ``` sql DROP [TEMPORARY] TABLE [IF EXISTS] [db.]name [ON CLUSTER cluster] ``` Удаляет таблицу. -Если указано `IF EXISTS` - не выдавать ошибку, если таблица не существует или база данных не существует. + + +## DROP DICTIONARY {#drop-dictionary} + +``` sql +DROP DICTIONARY [IF EXISTS] [db.]name +``` + +Удаляет словарь. + ## DROP USER {#drop-user-statement} @@ -41,6 +58,7 @@ DROP USER [IF EXISTS] name [,...] [ON CLUSTER cluster_name] DROP ROLE [IF EXISTS] name [,...] [ON CLUSTER cluster_name] ``` + ## DROP ROW POLICY {#drop-row-policy-statement} Удаляет политику доступа к строкам. @@ -80,5 +98,13 @@ DROP [SETTINGS] PROFILE [IF EXISTS] name [,...] [ON CLUSTER cluster_name] ``` +## DROP VIEW {#drop-view} -[Оригинальная статья](https://clickhouse.tech/docs/ru/sql-reference/statements/drop/) \ No newline at end of file +``` sql +DROP VIEW [IF EXISTS] [db.]name [ON CLUSTER cluster] +``` + +Удаляет представление. Представления могут быть удалены и командой `DROP TABLE`, но команда `DROP VIEW` проверяет, что `[db.]name` является представлением. + + +[Оригинальная статья](https://clickhouse.tech/docs/ru/sql-reference/statements/drop/) diff --git a/docs/zh/getting-started/tutorial.md b/docs/zh/getting-started/tutorial.md index 38d5a586806..43c7ed0ec59 100644 --- a/docs/zh/getting-started/tutorial.md +++ b/docs/zh/getting-started/tutorial.md @@ -1,6 +1,4 @@ --- -machine_translated: true -machine_translated_rev: 72537a2d527c63c07aa5d2361a8829f3895cf2bd toc_priority: 12 toc_title: "\u6559\u7A0B" --- @@ -9,27 +7,27 @@ toc_title: "\u6559\u7A0B" ## 从本教程中可以期待什么? {#what-to-expect-from-this-tutorial} -通过本教程,您将学习如何设置一个简单的ClickHouse集群。 它会很小,但容错和可扩展。 然后,我们将使用其中一个示例数据集来填充数据并执行一些演示查询。 +通过本教程,您将学习如何设置一个简单的ClickHouse集群。 它会很小,但却是容错和可扩展的。 然后,我们将使用其中一个示例数据集来填充数据并执行一些演示查询。 ## 单节点设置 {#single-node-setup} -为了推迟分布式环境的复杂性,我们将首先在单个服务器或虚拟机上部署ClickHouse。 ClickHouse通常是从安装 [黛布](install.md#install-from-deb-packages) 或 [rpm](install.md#from-rpm-packages) 包,但也有 [替代办法](install.md#from-docker-image) 对于不支持它们的操作系统。 +为了推迟分布式环境的复杂性,我们将首先在单个服务器或虚拟机上部署ClickHouse。 ClickHouse通常是从[deb](install.md#install-from-deb-packages) 或 [rpm](install.md#from-rpm-packages) 包安装,但对于不支持它们的操作系统也有 [替代方法](install.md#from-docker-image) 。 -例如,您选择了 `deb` 包和执行: +例如,您选择了从 `deb` 包安装,执行: ``` bash {% include 'install/deb.sh' %} ``` -我们在安装的软件包中有什么: +在我们安装的软件中包含这些包: -- `clickhouse-client` 包包含 [ツ环板clientョツ嘉ッツ偲](../interfaces/cli.md) 应用程序,交互式ClickHouse控制台客户端。 -- `clickhouse-common` 包包含一个ClickHouse可执行文件。 -- `clickhouse-server` 包包含要作为服务器运行ClickHouse的配置文件。 +- `clickhouse-client` 包,包含 [clickhouse-client](../interfaces/cli.md) 应用程序,它是交互式ClickHouse控制台客户端。 +- `clickhouse-common` 包,包含一个ClickHouse可执行文件。 +- `clickhouse-server` 包,包含要作为服务端运行的ClickHouse配置文件。 -服务器配置文件位于 `/etc/clickhouse-server/`. 在进一步讨论之前,请注意 `` 元素in `config.xml`. Path确定数据存储的位置,因此应该位于磁盘容量较大的卷上;默认值为 `/var/lib/clickhouse/`. 如果你想调整配置,直接编辑并不方便 `config.xml` 文件,考虑到它可能会在未来的软件包更新中被重写。 复盖配置元素的推荐方法是创建 [在配置文件。d目录](../operations/configuration-files.md) 它作为 “patches” 要配置。xml +服务端配置文件位于 `/etc/clickhouse-server/`。 在进一步讨论之前,请注意 `config.xml`文件中的`` 元素. Path决定了数据存储的位置,因此该位置应该位于磁盘容量较大的卷上;默认值为 `/var/lib/clickhouse/`。 如果你想调整配置,考虑到它可能会在未来的软件包更新中被重写,直接编辑`config.xml` 文件并不方便。 推荐的方法是在[配置文件](../operations/configuration-files.md)目录创建文件,作为config.xml文件的“补丁”,用以复写配置元素。 -你可能已经注意到了, `clickhouse-server` 安装包后不会自动启动。 它也不会在更新后自动重新启动。 您启动服务器的方式取决于您的init系统,通常情况下,它是: +你可能已经注意到了, `clickhouse-server` 安装后不会自动启动。 它也不会在更新后自动重新启动。 您启动服务端的方式取决于您的初始系统,通常情况下是这样: ``` bash sudo service clickhouse-server start @@ -41,13 +39,13 @@ sudo service clickhouse-server start sudo /etc/init.d/clickhouse-server start ``` -服务器日志的默认位置是 `/var/log/clickhouse-server/`. 服务器已准备好处理客户端连接一旦它记录 `Ready for connections` 消息 +服务端日志的默认位置是 `/var/log/clickhouse-server/`。当服务端在日志中记录 `Ready for connections` 消息,即表示服务端已准备好处理客户端连接。 -一旦 `clickhouse-server` 正在运行我们可以利用 `clickhouse-client` 连接到服务器并运行一些测试查询,如 `SELECT "Hello, world!";`. +一旦 `clickhouse-server` 启动并运行,我们可以利用 `clickhouse-client` 连接到服务端,并运行一些测试查询,如 `SELECT "Hello, world!";`.
-Clickhouse-客户端的快速提示 +Clickhouse-client的快速提示 交互模式: diff --git a/docs/zh/guides/apply-catboost-model.md b/docs/zh/guides/apply-catboost-model.md index be21c372307..3657a947ad2 100644 --- a/docs/zh/guides/apply-catboost-model.md +++ b/docs/zh/guides/apply-catboost-model.md @@ -15,7 +15,7 @@ toc_title: "\u5E94\u7528CatBoost\u6A21\u578B" 1. [创建表](#create-table). 2. [将数据插入到表中](#insert-data-to-table). -3. [碌莽禄into拢Integrate010-68520682\](#integrate-catboost-into-clickhouse) (可选步骤)。 +3. [将CatBoost集成到ClickHouse中](#integrate-catboost-into-clickhouse) (可选步骤)。 4. [从SQL运行模型推理](#run-model-inference). 有关训练CatBoost模型的详细信息,请参阅 [培训和应用模型](https://catboost.ai/docs/features/training.html#training). @@ -119,12 +119,12 @@ FROM amazon_train +-------+ ``` -## 3. 碌莽禄into拢Integrate010-68520682\ {#integrate-catboost-into-clickhouse} +## 3. 将CatBoost集成到ClickHouse中 {#integrate-catboost-into-clickhouse} !!! note "注" **可选步骤。** Docker映像包含运行CatBoost和ClickHouse所需的所有内容。 -碌莽禄to拢integrate010-68520682\: +CatBoost集成到ClickHouse步骤: **1.** 构建评估库。 diff --git a/docs/zh/sql-reference/table-functions/remote.md b/docs/zh/sql-reference/table-functions/remote.md index 1125353e2fa..a7fa228cbbd 100644 --- a/docs/zh/sql-reference/table-functions/remote.md +++ b/docs/zh/sql-reference/table-functions/remote.md @@ -1,13 +1,6 @@ ---- -machine_translated: true -machine_translated_rev: 72537a2d527c63c07aa5d2361a8829f3895cf2bd -toc_priority: 40 -toc_title: "\u8FDC\u7A0B" ---- - # 远程,远程安全 {#remote-remotesecure} -允许您访问远程服务器,而无需创建 `Distributed` 桌子 +允许您访问远程服务器,而无需创建 `Distributed` 表 签名: @@ -18,10 +11,10 @@ remoteSecure('addresses_expr', db, table[, 'user'[, 'password']]) remoteSecure('addresses_expr', db.table[, 'user'[, 'password']]) ``` -`addresses_expr` – An expression that generates addresses of remote servers. This may be just one server address. The server address is `host:port`,或者只是 `host`. 主机可以指定为服务器名称,也可以指定为IPv4或IPv6地址。 IPv6地址在方括号中指定。 端口是远程服务器上的TCP端口。 如果省略端口,它使用 `tcp_port` 从服务器的配置文件(默认情况下,9000)。 +`addresses_expr` – 代表远程服务器地址的一个表达式。可以只是单个服务器地址。 服务器地址可以是 `host:port` 或 `host`。`host` 可以指定为服务器域名,或是IPV4或IPV6地址。IPv6地址在方括号中指定。`port` 是远程服务器上的TCP端口。 如果省略端口,则使用服务器配置文件中的 `tcp_port` (默认情况为,9000)。 !!! important "重要事项" - IPv6地址需要该端口。 + IPv6地址需要指定端口。 例: @@ -34,7 +27,7 @@ localhost [2a02:6b8:0:1111::11]:9000 ``` -多个地址可以用逗号分隔。 在这种情况下,ClickHouse将使用分布式处理,因此它将将查询发送到所有指定的地址(如具有不同数据的分片)。 +多个地址可以用逗号分隔。在这种情况下,ClickHouse将使用分布式处理,因此它将将查询发送到所有指定的地址(如具有不同数据的分片)。 示例: @@ -56,7 +49,7 @@ example01-{01..02}-1 如果您有多对大括号,它会生成相应集合的直接乘积。 -大括号中的地址和部分地址可以用管道符号(\|)分隔。 在这种情况下,相应的地址集被解释为副本,并且查询将被发送到第一个正常副本。 但是,副本将按照当前设置的顺序进行迭代 [load\_balancing](../../operations/settings/settings.md) 设置。 +大括号中的地址和部分地址可以用管道符号(\|)分隔。 在这种情况下,相应的地址集被解释为副本,并且查询将被发送到第一个正常副本。 但是,副本将按照当前[load\_balancing](../../operations/settings/settings.md)设置的顺序进行迭代。 示例: @@ -66,20 +59,20 @@ example01-{01..02}-{1|2} 此示例指定两个分片,每个分片都有两个副本。 -生成的地址数由常量限制。 现在这是1000个地址。 +生成的地址数由常量限制。目前这是1000个地址。 -使用 `remote` 表函数比创建一个不太优化 `Distributed` 表,因为在这种情况下,服务器连接被重新建立为每个请求。 此外,如果设置了主机名,则会解析这些名称,并且在使用各种副本时不会计算错误。 在处理大量查询时,始终创建 `Distributed` 表的时间提前,不要使用 `remote` 表功能。 +使用 `remote` 表函数没有创建一个 `Distributed` 表更优,因为在这种情况下,将为每个请求重新建立服务器连接。此外,如果设置了主机名,则会解析这些名称,并且在使用各种副本时不会计算错误。 在处理大量查询时,始终优先创建 `Distributed` 表,不要使用 `remote` 表功能。 该 `remote` 表函数可以在以下情况下是有用的: - 访问特定服务器进行数据比较、调试和测试。 -- 查询之间的各种ClickHouse群集用于研究目的。 -- 手动发出的罕见分布式请求。 +- 在多个ClickHouse集群之间的用户研究目的的查询。 +- 手动发出的不频繁分布式请求。 - 每次重新定义服务器集的分布式请求。 -如果未指定用户, `default` 被使用。 +如果未指定用户, 将会使用`default`。 如果未指定密码,则使用空密码。 -`remoteSecure` -相同 `remote` but with secured connection. Default port — [tcp\_port\_secure](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port_secure) 从配置或9440. +`remoteSecure` - 与 `remote` 相同,但是会使用加密链接。默认端口为配置文件中的[tcp\_port\_secure](../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-tcp_port_secure),或9440。 [原始文章](https://clickhouse.tech/docs/en/query_language/table_functions/remote/) diff --git a/programs/CMakeLists.txt b/programs/CMakeLists.txt index 89220251cda..ae4a72ef62a 100644 --- a/programs/CMakeLists.txt +++ b/programs/CMakeLists.txt @@ -16,6 +16,7 @@ option (ENABLE_CLICKHOUSE_COMPRESSOR "Enable clickhouse-compressor" ${ENABLE_CLI option (ENABLE_CLICKHOUSE_COPIER "Enable clickhouse-copier" ${ENABLE_CLICKHOUSE_ALL}) option (ENABLE_CLICKHOUSE_FORMAT "Enable clickhouse-format" ${ENABLE_CLICKHOUSE_ALL}) option (ENABLE_CLICKHOUSE_OBFUSCATOR "Enable clickhouse-obfuscator" ${ENABLE_CLICKHOUSE_ALL}) +option (ENABLE_CLICKHOUSE_GIT_IMPORT "Enable clickhouse-git-import" ${ENABLE_CLICKHOUSE_ALL}) option (ENABLE_CLICKHOUSE_ODBC_BRIDGE "Enable clickhouse-odbc-bridge" ${ENABLE_CLICKHOUSE_ALL}) if (CLICKHOUSE_SPLIT_BINARY) @@ -91,21 +92,22 @@ add_subdirectory (copier) add_subdirectory (format) add_subdirectory (obfuscator) add_subdirectory (install) +add_subdirectory (git-import) if (ENABLE_CLICKHOUSE_ODBC_BRIDGE) add_subdirectory (odbc-bridge) endif () if (CLICKHOUSE_ONE_SHARED) - add_library(clickhouse-lib SHARED ${CLICKHOUSE_SERVER_SOURCES} ${CLICKHOUSE_CLIENT_SOURCES} ${CLICKHOUSE_LOCAL_SOURCES} ${CLICKHOUSE_BENCHMARK_SOURCES} ${CLICKHOUSE_COPIER_SOURCES} ${CLICKHOUSE_EXTRACT_FROM_CONFIG_SOURCES} ${CLICKHOUSE_COMPRESSOR_SOURCES} ${CLICKHOUSE_FORMAT_SOURCES} ${CLICKHOUSE_OBFUSCATOR_SOURCES} ${CLICKHOUSE_ODBC_BRIDGE_SOURCES}) - target_link_libraries(clickhouse-lib ${CLICKHOUSE_SERVER_LINK} ${CLICKHOUSE_CLIENT_LINK} ${CLICKHOUSE_LOCAL_LINK} ${CLICKHOUSE_BENCHMARK_LINK} ${CLICKHOUSE_COPIER_LINK} ${CLICKHOUSE_EXTRACT_FROM_CONFIG_LINK} ${CLICKHOUSE_COMPRESSOR_LINK} ${CLICKHOUSE_FORMAT_LINK} ${CLICKHOUSE_OBFUSCATOR_LINK} ${CLICKHOUSE_ODBC_BRIDGE_LINK}) - target_include_directories(clickhouse-lib ${CLICKHOUSE_SERVER_INCLUDE} ${CLICKHOUSE_CLIENT_INCLUDE} ${CLICKHOUSE_LOCAL_INCLUDE} ${CLICKHOUSE_BENCHMARK_INCLUDE} ${CLICKHOUSE_COPIER_INCLUDE} ${CLICKHOUSE_EXTRACT_FROM_CONFIG_INCLUDE} ${CLICKHOUSE_COMPRESSOR_INCLUDE} ${CLICKHOUSE_FORMAT_INCLUDE} ${CLICKHOUSE_OBFUSCATOR_INCLUDE} ${CLICKHOUSE_ODBC_BRIDGE_INCLUDE}) + add_library(clickhouse-lib SHARED ${CLICKHOUSE_SERVER_SOURCES} ${CLICKHOUSE_CLIENT_SOURCES} ${CLICKHOUSE_LOCAL_SOURCES} ${CLICKHOUSE_BENCHMARK_SOURCES} ${CLICKHOUSE_COPIER_SOURCES} ${CLICKHOUSE_EXTRACT_FROM_CONFIG_SOURCES} ${CLICKHOUSE_COMPRESSOR_SOURCES} ${CLICKHOUSE_FORMAT_SOURCES} ${CLICKHOUSE_OBFUSCATOR_SOURCES} ${CLICKHOUSE_GIT_IMPORT_SOURCES} ${CLICKHOUSE_ODBC_BRIDGE_SOURCES}) + target_link_libraries(clickhouse-lib ${CLICKHOUSE_SERVER_LINK} ${CLICKHOUSE_CLIENT_LINK} ${CLICKHOUSE_LOCAL_LINK} ${CLICKHOUSE_BENCHMARK_LINK} ${CLICKHOUSE_COPIER_LINK} ${CLICKHOUSE_EXTRACT_FROM_CONFIG_LINK} ${CLICKHOUSE_COMPRESSOR_LINK} ${CLICKHOUSE_FORMAT_LINK} ${CLICKHOUSE_OBFUSCATOR_LINK} ${CLICKHOUSE_GIT_IMPORT_LINK} ${CLICKHOUSE_ODBC_BRIDGE_LINK}) + target_include_directories(clickhouse-lib ${CLICKHOUSE_SERVER_INCLUDE} ${CLICKHOUSE_CLIENT_INCLUDE} ${CLICKHOUSE_LOCAL_INCLUDE} ${CLICKHOUSE_BENCHMARK_INCLUDE} ${CLICKHOUSE_COPIER_INCLUDE} ${CLICKHOUSE_EXTRACT_FROM_CONFIG_INCLUDE} ${CLICKHOUSE_COMPRESSOR_INCLUDE} ${CLICKHOUSE_FORMAT_INCLUDE} ${CLICKHOUSE_OBFUSCATOR_INCLUDE} ${CLICKHOUSE_GIT_IMPORT_INCLUDE} ${CLICKHOUSE_ODBC_BRIDGE_INCLUDE}) set_target_properties(clickhouse-lib PROPERTIES SOVERSION ${VERSION_MAJOR}.${VERSION_MINOR} VERSION ${VERSION_SO} OUTPUT_NAME clickhouse DEBUG_POSTFIX "") install (TARGETS clickhouse-lib LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT clickhouse) endif() if (CLICKHOUSE_SPLIT_BINARY) - set (CLICKHOUSE_ALL_TARGETS clickhouse-server clickhouse-client clickhouse-local clickhouse-benchmark clickhouse-extract-from-config clickhouse-compressor clickhouse-format clickhouse-obfuscator clickhouse-copier) + set (CLICKHOUSE_ALL_TARGETS clickhouse-server clickhouse-client clickhouse-local clickhouse-benchmark clickhouse-extract-from-config clickhouse-compressor clickhouse-format clickhouse-obfuscator clickhouse-git-import clickhouse-copier) if (ENABLE_CLICKHOUSE_ODBC_BRIDGE) list (APPEND CLICKHOUSE_ALL_TARGETS clickhouse-odbc-bridge) @@ -149,6 +151,9 @@ else () if (ENABLE_CLICKHOUSE_OBFUSCATOR) clickhouse_target_link_split_lib(clickhouse obfuscator) endif () + if (ENABLE_CLICKHOUSE_GIT_IMPORT) + clickhouse_target_link_split_lib(clickhouse git-import) + endif () if (ENABLE_CLICKHOUSE_INSTALL) clickhouse_target_link_split_lib(clickhouse install) endif () @@ -199,6 +204,11 @@ else () install (FILES ${CMAKE_CURRENT_BINARY_DIR}/clickhouse-obfuscator DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse) list(APPEND CLICKHOUSE_BUNDLE clickhouse-obfuscator) endif () + if (ENABLE_CLICKHOUSE_GIT_IMPORT) + add_custom_target (clickhouse-git-import ALL COMMAND ${CMAKE_COMMAND} -E create_symlink clickhouse clickhouse-git-import DEPENDS clickhouse) + install (FILES ${CMAKE_CURRENT_BINARY_DIR}/clickhouse-git-import DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse) + list(APPEND CLICKHOUSE_BUNDLE clickhouse-git-import) + endif () if(ENABLE_CLICKHOUSE_ODBC_BRIDGE) list(APPEND CLICKHOUSE_BUNDLE clickhouse-odbc-bridge) endif() diff --git a/programs/client/Client.cpp b/programs/client/Client.cpp index c9701950dc5..0c2aca2b3c8 100644 --- a/programs/client/Client.cpp +++ b/programs/client/Client.cpp @@ -866,6 +866,8 @@ private: // will exit. The ping() would be the best match here, but it's // private, probably for a good reason that the protocol doesn't allow // pings at any possible moment. + // Don't forget to reset the default database which might have changed. + connection->setDefaultDatabase(""); connection->forceConnected(connection_parameters.timeouts); if (text.size() > 4 * 1024) @@ -900,74 +902,151 @@ private: return processMultiQuery(text); } - bool processMultiQuery(const String & text) + bool processMultiQuery(const String & all_queries_text) { const bool test_mode = config().has("testmode"); { /// disable logs if expects errors - TestHint test_hint(test_mode, text); + TestHint test_hint(test_mode, all_queries_text); if (test_hint.clientError() || test_hint.serverError()) processTextAsSingleQuery("SET send_logs_level = 'none'"); } /// Several queries separated by ';'. /// INSERT data is ended by the end of line, not ';'. + /// An exception is VALUES format where we also support semicolon in + /// addition to end of line. - const char * begin = text.data(); - const char * end = begin + text.size(); + const char * this_query_begin = all_queries_text.data(); + const char * all_queries_end = all_queries_text.data() + all_queries_text.size(); - while (begin < end) + while (this_query_begin < all_queries_end) { - const char * pos = begin; - ASTPtr orig_ast = parseQuery(pos, end, true); + // Use the token iterator to skip any whitespace, semicolons and + // comments at the beginning of the query. An example from regression + // tests: + // insert into table t values ('invalid'); -- { serverError 469 } + // select 1 + // Here the test hint comment gets parsed as a part of second query. + // We parse the `INSERT VALUES` up to the semicolon, and the rest + // looks like a two-line query: + // -- { serverError 469 } + // select 1 + // and we expect it to fail with error 469, but this hint is actually + // for the previous query. Test hints should go after the query, so + // we can fix this by skipping leading comments. Token iterator skips + // comments and whitespace by itself, so we only have to check for + // semicolons. + // The code block is to limit visibility of `tokens` because we have + // another such variable further down the code, and get warnings for + // that. + { + Tokens tokens(this_query_begin, all_queries_end); + IParser::Pos token_iterator(tokens, + context.getSettingsRef().max_parser_depth); + while (token_iterator->type == TokenType::Semicolon + && token_iterator.isValid()) + { + ++token_iterator; + } + this_query_begin = token_iterator->begin; + if (this_query_begin >= all_queries_end) + { + break; + } + } - if (!orig_ast) + // Try to parse the query. + const char * this_query_end = this_query_begin; + try + { + parsed_query = parseQuery(this_query_end, all_queries_end, true); + } + catch (Exception & e) + { + if (!test_mode) + throw; + + /// Try find test hint for syntax error + const char * end_of_line = find_first_symbols<'\n'>(this_query_begin,all_queries_end); + TestHint hint(true, String(this_query_end, end_of_line - this_query_end)); + if (hint.serverError()) /// Syntax errors are considered as client errors + throw; + if (hint.clientError() != e.code()) + { + if (hint.clientError()) + e.addMessage("\nExpected clinet error: " + std::to_string(hint.clientError())); + throw; + } + + /// It's expected syntax error, skip the line + this_query_begin = end_of_line; + continue; + } + + if (!parsed_query) { if (ignore_error) { - Tokens tokens(begin, end); + Tokens tokens(this_query_begin, all_queries_end); IParser::Pos token_iterator(tokens, context.getSettingsRef().max_parser_depth); while (token_iterator->type != TokenType::Semicolon && token_iterator.isValid()) ++token_iterator; - begin = token_iterator->end; + this_query_begin = token_iterator->end; continue; } return true; } - auto * insert = orig_ast->as(); - - if (insert && insert->data) + // INSERT queries may have the inserted data in the query text + // that follow the query itself, e.g. "insert into t format CSV 1;2". + // They need special handling. First of all, here we find where the + // inserted data ends. In multy-query mode, it is delimited by a + // newline. + // The VALUES format needs even more handling -- we also allow the + // data to be delimited by semicolon. This case is handled later by + // the format parser itself. + auto * insert_ast = parsed_query->as(); + if (insert_ast && insert_ast->data) { - pos = find_first_symbols<'\n'>(insert->data, end); - insert->end = pos; + this_query_end = find_first_symbols<'\n'>(insert_ast->data, all_queries_end); + insert_ast->end = this_query_end; + query_to_send = all_queries_text.substr( + this_query_begin - all_queries_text.data(), + insert_ast->data - this_query_begin); + } + else + { + query_to_send = all_queries_text.substr( + this_query_begin - all_queries_text.data(), + this_query_end - this_query_begin); } - String str = text.substr(begin - text.data(), pos - begin); + // full_query is the query + inline INSERT data. + full_query = all_queries_text.substr( + this_query_begin - all_queries_text.data(), + this_query_end - this_query_begin); - begin = pos; - while (isWhitespaceASCII(*begin) || *begin == ';') - ++begin; - - TestHint test_hint(test_mode, str); + // Look for the hint in the text of query + insert data, if any. + // e.g. insert into t format CSV 'a' -- { serverError 123 }. + TestHint test_hint(test_mode, full_query); expected_client_error = test_hint.clientError(); expected_server_error = test_hint.serverError(); try { - auto ast_to_process = orig_ast; - if (insert && insert->data) + processParsedSingleQuery(); + + if (insert_ast && insert_ast->data) { - ast_to_process = nullptr; - processTextAsSingleQuery(str); - } - else - { - parsed_query = ast_to_process; - full_query = str; - query_to_send = str; - processParsedSingleQuery(); + // For VALUES format: use the end of inline data as reported + // by the format parser (it is saved in sendData()). This + // allows us to handle queries like: + // insert into t values (1); select 1 + //, where the inline data is delimited by semicolon and not + // by a newline. + this_query_end = parsed_query->as()->end; } } catch (...) @@ -975,7 +1054,7 @@ private: last_exception_received_from_server = std::make_unique(getCurrentExceptionMessage(true), getCurrentExceptionCode()); actual_client_error = last_exception_received_from_server->code(); if (!ignore_error && (!actual_client_error || actual_client_error != expected_client_error)) - std::cerr << "Error on processing query: " << str << std::endl << last_exception_received_from_server->message(); + std::cerr << "Error on processing query: " << full_query << std::endl << last_exception_received_from_server->message(); received_exception_from_server = true; } @@ -989,6 +1068,8 @@ private: else return false; } + + this_query_begin = this_query_end; } return true; @@ -1103,7 +1184,9 @@ private: { last_exception_received_from_server = std::make_unique(getCurrentExceptionMessage(true), getCurrentExceptionCode()); received_exception_from_server = true; - std::cerr << "Error on processing query: " << ast_to_process->formatForErrorMessage() << std::endl << last_exception_received_from_server->message(); + fmt::print(stderr, "Error on processing query '{}': {}\n", + ast_to_process->formatForErrorMessage(), + last_exception_received_from_server->message()); } if (!connection->isConnected()) @@ -1411,7 +1494,7 @@ private: void sendData(Block & sample, const ColumnsDescription & columns_description) { /// If INSERT data must be sent. - const auto * parsed_insert_query = parsed_query->as(); + auto * parsed_insert_query = parsed_query->as(); if (!parsed_insert_query) return; @@ -1420,6 +1503,9 @@ private: /// Send data contained in the query. ReadBufferFromMemory data_in(parsed_insert_query->data, parsed_insert_query->end - parsed_insert_query->data); sendDataFrom(data_in, sample, columns_description); + // Remember where the data ended. We use this info later to determine + // where the next query begins. + parsed_insert_query->end = data_in.buffer().begin() + data_in.count(); } else if (!is_interactive) { diff --git a/programs/config_tools.h.in b/programs/config_tools.h.in index 11386aca60e..7cb5a6d883a 100644 --- a/programs/config_tools.h.in +++ b/programs/config_tools.h.in @@ -12,5 +12,6 @@ #cmakedefine01 ENABLE_CLICKHOUSE_COMPRESSOR #cmakedefine01 ENABLE_CLICKHOUSE_FORMAT #cmakedefine01 ENABLE_CLICKHOUSE_OBFUSCATOR +#cmakedefine01 ENABLE_CLICKHOUSE_GIT_IMPORT #cmakedefine01 ENABLE_CLICKHOUSE_INSTALL #cmakedefine01 ENABLE_CLICKHOUSE_ODBC_BRIDGE diff --git a/programs/git-import/CMakeLists.txt b/programs/git-import/CMakeLists.txt new file mode 100644 index 00000000000..279bb35a272 --- /dev/null +++ b/programs/git-import/CMakeLists.txt @@ -0,0 +1,10 @@ +set (CLICKHOUSE_GIT_IMPORT_SOURCES git-import.cpp) + +set (CLICKHOUSE_GIT_IMPORT_LINK + PRIVATE + boost::program_options + dbms +) + +clickhouse_program_add(git-import) + diff --git a/programs/git-import/clickhouse-git-import.cpp b/programs/git-import/clickhouse-git-import.cpp new file mode 100644 index 00000000000..cfa06306604 --- /dev/null +++ b/programs/git-import/clickhouse-git-import.cpp @@ -0,0 +1,2 @@ +int mainEntryClickHouseGitImport(int argc, char ** argv); +int main(int argc_, char ** argv_) { return mainEntryClickHouseGitImport(argc_, argv_); } diff --git a/programs/git-import/git-import.cpp b/programs/git-import/git-import.cpp new file mode 100644 index 00000000000..7cdd77b4b7c --- /dev/null +++ b/programs/git-import/git-import.cpp @@ -0,0 +1,1235 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + + +static constexpr auto documentation = R"( +A tool to extract information from Git repository for analytics. + +It dumps the data for the following tables: +- commits - commits with statistics; +- file_changes - files changed in every commit with the info about the change and statistics; +- line_changes - every changed line in every changed file in every commit with full info about the line and the information about previous change of this line. + +The largest and the most important table is "line_changes". + +Allows to answer questions like: +- list files with maximum number of authors; +- show me the oldest lines of code in the repository; +- show me the files with longest history; +- list favorite files for author; +- list largest files with lowest number of authors; +- at what weekday the code has highest chance to stay in repository; +- the distribution of code age across repository; +- files sorted by average code age; +- quickly show file with blame info (rough); +- commits and lines of code distribution by time; by weekday, by author; for specific subdirectories; +- show history for every subdirectory, file, line of file, the number of changes (lines and commits) across time; how the number of contributors was changed across time; +- list files with most modifications; +- list files that were rewritten most number of time or by most of authors; +- what is percentage of code removal by other authors, across authors; +- the matrix of authors that shows what authors tends to rewrite another authors code; +- what is the worst time to write code in sense that the code has highest chance to be rewritten; +- the average time before code will be rewritten and the median (half-life of code decay); +- comments/code percentage change in time / by author / by location; +- who tend to write more tests / cpp code / comments. + +The data is intended for analytical purposes. It can be imprecise by many reasons but it should be good enough for its purpose. + +The data is not intended to provide any conclusions for managers, it is especially counter-indicative for any kinds of "performance review". Instead you can spend multiple days looking at various interesting statistics. + +Run this tool inside your git repository. It will create .tsv files that can be loaded into ClickHouse (or into other DBMS if you dare). + +The tool can process large enough repositories in a reasonable time. +It has been tested on: +- ClickHouse: 31 seconds; 3 million rows; +- LLVM: 8 minues; 62 million rows; +- Linux - 12 minutes; 85 million rows; +- Chromium - 67 minutes; 343 million rows; +(the numbers as of Sep 2020) + + +Prepare the database by executing the following queries: + +DROP DATABASE IF EXISTS git; +CREATE DATABASE git; + +CREATE TABLE git.commits +( + hash String, + author LowCardinality(String), + time DateTime, + message String, + files_added UInt32, + files_deleted UInt32, + files_renamed UInt32, + files_modified UInt32, + lines_added UInt32, + lines_deleted UInt32, + hunks_added UInt32, + hunks_removed UInt32, + hunks_changed UInt32 +) ENGINE = MergeTree ORDER BY time; + +CREATE TABLE git.file_changes +( + change_type Enum('Add' = 1, 'Delete' = 2, 'Modify' = 3, 'Rename' = 4, 'Copy' = 5, 'Type' = 6), + path LowCardinality(String), + old_path LowCardinality(String), + file_extension LowCardinality(String), + lines_added UInt32, + lines_deleted UInt32, + hunks_added UInt32, + hunks_removed UInt32, + hunks_changed UInt32, + + commit_hash String, + author LowCardinality(String), + time DateTime, + commit_message String, + commit_files_added UInt32, + commit_files_deleted UInt32, + commit_files_renamed UInt32, + commit_files_modified UInt32, + commit_lines_added UInt32, + commit_lines_deleted UInt32, + commit_hunks_added UInt32, + commit_hunks_removed UInt32, + commit_hunks_changed UInt32 +) ENGINE = MergeTree ORDER BY time; + +CREATE TABLE git.line_changes +( + sign Int8, + line_number_old UInt32, + line_number_new UInt32, + hunk_num UInt32, + hunk_start_line_number_old UInt32, + hunk_start_line_number_new UInt32, + hunk_lines_added UInt32, + hunk_lines_deleted UInt32, + hunk_context LowCardinality(String), + line LowCardinality(String), + indent UInt8, + line_type Enum('Empty' = 0, 'Comment' = 1, 'Punct' = 2, 'Code' = 3), + + prev_commit_hash String, + prev_author LowCardinality(String), + prev_time DateTime, + + file_change_type Enum('Add' = 1, 'Delete' = 2, 'Modify' = 3, 'Rename' = 4, 'Copy' = 5, 'Type' = 6), + path LowCardinality(String), + old_path LowCardinality(String), + file_extension LowCardinality(String), + file_lines_added UInt32, + file_lines_deleted UInt32, + file_hunks_added UInt32, + file_hunks_removed UInt32, + file_hunks_changed UInt32, + + commit_hash String, + author LowCardinality(String), + time DateTime, + commit_message String, + commit_files_added UInt32, + commit_files_deleted UInt32, + commit_files_renamed UInt32, + commit_files_modified UInt32, + commit_lines_added UInt32, + commit_lines_deleted UInt32, + commit_hunks_added UInt32, + commit_hunks_removed UInt32, + commit_hunks_changed UInt32 +) ENGINE = MergeTree ORDER BY time; + +Run the tool. + +Then insert the data with the following commands: + +clickhouse-client --query "INSERT INTO git.commits FORMAT TSV" < commits.tsv +clickhouse-client --query "INSERT INTO git.file_changes FORMAT TSV" < file_changes.tsv +clickhouse-client --query "INSERT INTO git.line_changes FORMAT TSV" < line_changes.tsv + +)"; + +namespace po = boost::program_options; + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int INCORRECT_DATA; +} + + +struct Commit +{ + std::string hash; + std::string author; + LocalDateTime time{}; + std::string message; + uint32_t files_added{}; + uint32_t files_deleted{}; + uint32_t files_renamed{}; + uint32_t files_modified{}; + uint32_t lines_added{}; + uint32_t lines_deleted{}; + uint32_t hunks_added{}; + uint32_t hunks_removed{}; + uint32_t hunks_changed{}; + + void writeTextWithoutNewline(WriteBuffer & out) const + { + writeText(hash, out); + writeChar('\t', out); + writeText(author, out); + writeChar('\t', out); + writeText(time, out); + writeChar('\t', out); + writeText(message, out); + writeChar('\t', out); + writeText(files_added, out); + writeChar('\t', out); + writeText(files_deleted, out); + writeChar('\t', out); + writeText(files_renamed, out); + writeChar('\t', out); + writeText(files_modified, out); + writeChar('\t', out); + writeText(lines_added, out); + writeChar('\t', out); + writeText(lines_deleted, out); + writeChar('\t', out); + writeText(hunks_added, out); + writeChar('\t', out); + writeText(hunks_removed, out); + writeChar('\t', out); + writeText(hunks_changed, out); + } +}; + + +enum class FileChangeType +{ + Add, + Delete, + Modify, + Rename, + Copy, + Type, +}; + +void writeText(FileChangeType type, WriteBuffer & out) +{ + switch (type) + { + case FileChangeType::Add: writeString("Add", out); break; + case FileChangeType::Delete: writeString("Delete", out); break; + case FileChangeType::Modify: writeString("Modify", out); break; + case FileChangeType::Rename: writeString("Rename", out); break; + case FileChangeType::Copy: writeString("Copy", out); break; + case FileChangeType::Type: writeString("Type", out); break; + } +} + +struct FileChange +{ + FileChangeType change_type{}; + std::string path; + std::string old_path; + std::string file_extension; + uint32_t lines_added{}; + uint32_t lines_deleted{}; + uint32_t hunks_added{}; + uint32_t hunks_removed{}; + uint32_t hunks_changed{}; + + void writeTextWithoutNewline(WriteBuffer & out) const + { + writeText(change_type, out); + writeChar('\t', out); + writeText(path, out); + writeChar('\t', out); + writeText(old_path, out); + writeChar('\t', out); + writeText(file_extension, out); + writeChar('\t', out); + writeText(lines_added, out); + writeChar('\t', out); + writeText(lines_deleted, out); + writeChar('\t', out); + writeText(hunks_added, out); + writeChar('\t', out); + writeText(hunks_removed, out); + writeChar('\t', out); + writeText(hunks_changed, out); + } +}; + + +enum class LineType +{ + Empty, + Comment, + Punct, + Code, +}; + +void writeText(LineType type, WriteBuffer & out) +{ + switch (type) + { + case LineType::Empty: writeString("Empty", out); break; + case LineType::Comment: writeString("Comment", out); break; + case LineType::Punct: writeString("Punct", out); break; + case LineType::Code: writeString("Code", out); break; + } +} + +struct LineChange +{ + int8_t sign{}; /// 1 if added, -1 if deleted + uint32_t line_number_old{}; + uint32_t line_number_new{}; + uint32_t hunk_num{}; /// ordinal number of hunk in diff, starting with 0 + uint32_t hunk_start_line_number_old{}; + uint32_t hunk_start_line_number_new{}; + uint32_t hunk_lines_added{}; + uint32_t hunk_lines_deleted{}; + std::string hunk_context; /// The context (like a line with function name) as it is calculated by git + std::string line; /// Line content without leading whitespaces + uint8_t indent{}; /// The number of leading whitespaces or tabs * 4 + LineType line_type{}; + /// Information from the history (blame). + std::string prev_commit_hash; + std::string prev_author; + LocalDateTime prev_time{}; + + /** Classify line to empty / code / comment / single punctuation char. + * Very rough and mostly suitable for our C++ style. + */ + void setLineInfo(std::string full_line) + { + uint32_t num_spaces = 0; + + const char * pos = full_line.data(); + const char * end = pos + full_line.size(); + + while (pos < end) + { + if (*pos == ' ') + ++num_spaces; + else if (*pos == '\t') + num_spaces += 4; + else + break; + ++pos; + } + + indent = std::max(255U, num_spaces); + line.assign(pos, end); + + if (pos == end) + { + line_type = LineType::Empty; + } + else if (pos + 1 < end + && ((pos[0] == '/' && (pos[1] == '/' || pos[1] == '*')) + || (pos[0] == '*' && pos[1] == ' ') /// This is not precise. + || (pos[0] == '#' && pos[1] == ' '))) + { + line_type = LineType::Comment; + } + else + { + while (pos < end) + { + if (isAlphaNumericASCII(*pos)) + { + line_type = LineType::Code; + break; + } + ++pos; + } + if (pos == end) + line_type = LineType::Punct; + } + } + + void writeTextWithoutNewline(WriteBuffer & out) const + { + writeText(sign, out); + writeChar('\t', out); + writeText(line_number_old, out); + writeChar('\t', out); + writeText(line_number_new, out); + writeChar('\t', out); + writeText(hunk_num, out); + writeChar('\t', out); + writeText(hunk_start_line_number_old, out); + writeChar('\t', out); + writeText(hunk_start_line_number_new, out); + writeChar('\t', out); + writeText(hunk_lines_added, out); + writeChar('\t', out); + writeText(hunk_lines_deleted, out); + writeChar('\t', out); + writeText(hunk_context, out); + writeChar('\t', out); + writeText(line, out); + writeChar('\t', out); + writeText(indent, out); + writeChar('\t', out); + writeText(line_type, out); + writeChar('\t', out); + writeText(prev_commit_hash, out); + writeChar('\t', out); + writeText(prev_author, out); + writeChar('\t', out); + writeText(prev_time, out); + } +}; + +using LineChanges = std::vector; + +struct FileDiff +{ + explicit FileDiff(FileChange file_change_) : file_change(file_change_) {} + + FileChange file_change; + LineChanges line_changes; +}; + +using CommitDiff = std::map; + + +/** Parsing helpers */ + +void skipUntilWhitespace(ReadBuffer & buf) +{ + while (!buf.eof()) + { + char * next_pos = find_first_symbols<'\t', '\n', ' '>(buf.position(), buf.buffer().end()); + buf.position() = next_pos; + + if (!buf.hasPendingData()) + continue; + + if (*buf.position() == '\t' || *buf.position() == '\n' || *buf.position() == ' ') + return; + } +} + +void skipUntilNextLine(ReadBuffer & buf) +{ + while (!buf.eof()) + { + char * next_pos = find_first_symbols<'\n'>(buf.position(), buf.buffer().end()); + buf.position() = next_pos; + + if (!buf.hasPendingData()) + continue; + + if (*buf.position() == '\n') + { + ++buf.position(); + return; + } + } +} + +void readStringUntilNextLine(std::string & s, ReadBuffer & buf) +{ + s.clear(); + while (!buf.eof()) + { + char * next_pos = find_first_symbols<'\n'>(buf.position(), buf.buffer().end()); + s.append(buf.position(), next_pos - buf.position()); + buf.position() = next_pos; + + if (!buf.hasPendingData()) + continue; + + if (*buf.position() == '\n') + { + ++buf.position(); + return; + } + } +} + + +/** Writes the resulting tables to files that can be imported to ClickHouse. + */ +struct ResultWriter +{ + WriteBufferFromFile commits{"commits.tsv"}; + WriteBufferFromFile file_changes{"file_changes.tsv"}; + WriteBufferFromFile line_changes{"line_changes.tsv"}; + + void appendCommit(const Commit & commit, const CommitDiff & files) + { + /// commits table + { + auto & out = commits; + + commit.writeTextWithoutNewline(out); + writeChar('\n', out); + } + + for (const auto & elem : files) + { + const FileChange & file_change = elem.second.file_change; + + /// file_changes table + { + auto & out = file_changes; + + file_change.writeTextWithoutNewline(out); + writeChar('\t', out); + commit.writeTextWithoutNewline(out); + writeChar('\n', out); + } + + /// line_changes table + for (const auto & line_change : elem.second.line_changes) + { + auto & out = line_changes; + + line_change.writeTextWithoutNewline(out); + writeChar('\t', out); + file_change.writeTextWithoutNewline(out); + writeChar('\t', out); + commit.writeTextWithoutNewline(out); + writeChar('\n', out); + } + } + } +}; + + +/** See description in "main". + */ +struct Options +{ + bool skip_commits_without_parents = true; + bool skip_commits_with_duplicate_diffs = true; + size_t threads = 1; + std::optional skip_paths; + std::optional skip_commits_with_messages; + std::unordered_set skip_commits; + std::optional diff_size_limit; + std::string stop_after_commit; + + explicit Options(const po::variables_map & options) + { + skip_commits_without_parents = options["skip-commits-without-parents"].as(); + skip_commits_with_duplicate_diffs = options["skip-commits-with-duplicate-diffs"].as(); + threads = options["threads"].as(); + if (options.count("skip-paths")) + { + skip_paths.emplace(options["skip-paths"].as()); + } + if (options.count("skip-commits-with-messages")) + { + skip_commits_with_messages.emplace(options["skip-commits-with-messages"].as()); + } + if (options.count("skip-commit")) + { + auto vec = options["skip-commit"].as>(); + skip_commits.insert(vec.begin(), vec.end()); + } + if (options.count("diff-size-limit")) + { + diff_size_limit = options["diff-size-limit"].as(); + } + if (options.count("stop-after-commit")) + { + stop_after_commit = options["stop-after-commit"].as(); + } + } +}; + + +/** Rough snapshot of repository calculated by application of diffs. It's used to calculate blame info. + * Represented by a list of lines. For every line it contains information about commit that modified this line the last time. + * + * Note that there are many cases when this info may become incorrect. + * The first reason is that git history is non-linear but we form this snapshot by application of commit diffs in some order + * that cannot give us correct results even theoretically. + * The second reason is that we don't process merge commits. But merge commits may contain differences for conflict resolution. + * + * We expect that the information will be mostly correct for the purpose of analytics. + * So, it can provide the expected "blame" info for the most of the lines. + */ +struct FileBlame +{ + using Lines = std::list; + Lines lines; + + /// We walk through this list adding or removing lines. + Lines::iterator it; + size_t current_idx = 1; + + FileBlame() + { + it = lines.begin(); + } + + /// This is important when file was copied or renamed. + FileBlame & operator=(const FileBlame & rhs) + { + lines = rhs.lines; + it = lines.begin(); + current_idx = 1; + return *this; + } + + FileBlame(const FileBlame & rhs) + { + *this = rhs; + } + + /// Move iterator to requested line or stop at the end. + void walk(uint32_t num) + { + while (current_idx < num && it != lines.end()) + { + ++current_idx; + ++it; + } + while (current_idx > num) + { + --current_idx; + --it; + } + } + + const Commit * find(uint32_t num) + { + walk(num); + +// std::cerr << "current_idx: " << current_idx << ", num: " << num << "\n"; + + if (current_idx == num && it != lines.end()) + return &*it; + return {}; + } + + void addLine(uint32_t num, Commit commit) + { + walk(num); + + /// If the inserted line is over the end of file, we insert empty lines before it. + while (it == lines.end() && current_idx < num) + { + lines.emplace_back(); + ++current_idx; + } + + it = lines.insert(it, commit); + } + + void removeLine(uint32_t num) + { +// std::cerr << "Removing line " << num << ", current_idx: " << current_idx << "\n"; + + walk(num); + + if (current_idx == num && it != lines.end()) + it = lines.erase(it); + } +}; + +/// All files with their blame info. When file is renamed, we also rename it in snapshot. +using Snapshot = std::map; + + +/** Enrich the line changes data with the history info from the snapshot + * - the author, time and commit of the previous change to every found line (blame). + * And update the snapshot. + */ +void updateSnapshot(Snapshot & snapshot, const Commit & commit, CommitDiff & file_changes) +{ + /// Renames and copies. + for (auto & elem : file_changes) + { + auto & file = elem.second.file_change; + if (file.path != file.old_path) + snapshot[file.path] = snapshot[file.old_path]; + } + + for (auto & elem : file_changes) + { +// std::cerr << elem.first << "\n"; + + FileBlame & file_snapshot = snapshot[elem.first]; + std::unordered_map deleted_lines; + + /// Obtain blame info from previous state of the snapshot + + for (auto & line_change : elem.second.line_changes) + { + if (line_change.sign == -1) + { + if (const Commit * prev_commit = file_snapshot.find(line_change.line_number_old); + prev_commit && prev_commit->time <= commit.time) + { + line_change.prev_commit_hash = prev_commit->hash; + line_change.prev_author = prev_commit->author; + line_change.prev_time = prev_commit->time; + deleted_lines[line_change.line_number_old] = *prev_commit; + } + else + { + // std::cerr << "Did not find line " << line_change.line_number_old << " from file " << elem.first << ": " << line_change.line << "\n"; + } + } + else if (line_change.sign == 1) + { + uint32_t this_line_in_prev_commit = line_change.hunk_start_line_number_old + + (line_change.line_number_new - line_change.hunk_start_line_number_new); + + if (deleted_lines.count(this_line_in_prev_commit)) + { + const auto & prev_commit = deleted_lines[this_line_in_prev_commit]; + if (prev_commit.time <= commit.time) + { + line_change.prev_commit_hash = prev_commit.hash; + line_change.prev_author = prev_commit.author; + line_change.prev_time = prev_commit.time; + } + } + } + } + + /// Update the snapshot + + for (const auto & line_change : elem.second.line_changes) + { + if (line_change.sign == -1) + { + file_snapshot.removeLine(line_change.line_number_new); + } + else if (line_change.sign == 1) + { + file_snapshot.addLine(line_change.line_number_new, commit); + } + } + } +} + + +/** Deduplication of commits with identical diffs. + */ +using DiffHashes = std::unordered_set; + +UInt128 diffHash(const CommitDiff & file_changes) +{ + SipHash hasher; + + for (const auto & elem : file_changes) + { + hasher.update(elem.second.file_change.change_type); + hasher.update(elem.second.file_change.old_path.size()); + hasher.update(elem.second.file_change.old_path); + hasher.update(elem.second.file_change.path.size()); + hasher.update(elem.second.file_change.path); + + hasher.update(elem.second.line_changes.size()); + for (const auto & line_change : elem.second.line_changes) + { + hasher.update(line_change.sign); + hasher.update(line_change.line_number_old); + hasher.update(line_change.line_number_new); + hasher.update(line_change.indent); + hasher.update(line_change.line.size()); + hasher.update(line_change.line); + } + } + + UInt128 hash_of_diff; + hasher.get128(hash_of_diff.low, hash_of_diff.high); + + return hash_of_diff; +} + + +/** File changes in form + * :100644 100644 b90fe6bb94 3ffe4c380f M src/Storages/MergeTree/MergeTreeDataMergerMutator.cpp + * :100644 100644 828dedf6b5 828dedf6b5 R100 dbms/src/Functions/GeoUtils.h dbms/src/Functions/PolygonUtils.h + * according to the output of 'git show --raw' + */ +void processFileChanges( + ReadBuffer & in, + const Options & options, + Commit & commit, + CommitDiff & file_changes) +{ + while (checkChar(':', in)) + { + FileChange file_change; + + /// We don't care about file mode and content hashes. + for (size_t i = 0; i < 4; ++i) + { + skipUntilWhitespace(in); + skipWhitespaceIfAny(in); + } + + char change_type; + readChar(change_type, in); + + /// For rename and copy there is a number called "score". We ignore it. + int score; + + switch (change_type) + { + case 'A': + file_change.change_type = FileChangeType::Add; + ++commit.files_added; + break; + case 'D': + file_change.change_type = FileChangeType::Delete; + ++commit.files_deleted; + break; + case 'M': + file_change.change_type = FileChangeType::Modify; + ++commit.files_modified; + break; + case 'R': + file_change.change_type = FileChangeType::Rename; + ++commit.files_renamed; + readText(score, in); + break; + case 'C': + file_change.change_type = FileChangeType::Copy; + readText(score, in); + break; + case 'T': + file_change.change_type = FileChangeType::Type; + break; + default: + throw Exception(ErrorCodes::INCORRECT_DATA, "Unexpected file change type: {}", change_type); + } + + skipWhitespaceIfAny(in); + + if (change_type == 'R' || change_type == 'C') + { + readText(file_change.old_path, in); + skipWhitespaceIfAny(in); + readText(file_change.path, in); + } + else + { + readText(file_change.path, in); + } + + file_change.file_extension = std::filesystem::path(file_change.path).extension(); + /// It gives us extension in form of '.cpp'. There is a reason for it but we remove initial dot for simplicity. + if (!file_change.file_extension.empty() && file_change.file_extension.front() == '.') + file_change.file_extension = file_change.file_extension.substr(1, std::string::npos); + + assertChar('\n', in); + + if (!(options.skip_paths && re2::RE2::PartialMatch(file_change.path, *options.skip_paths))) + { + file_changes.emplace( + file_change.path, + FileDiff(file_change)); + } + } +} + + +/** Process the list of diffs for every file from the result of "git show". + * Caveats: + * - changes in binary files can be ignored; + * - if a line content begins with '+' or '-' it will be skipped + * it means that if you store diffs in repository and "git show" will display diff-of-diff for you, + * it won't be processed correctly; + * - we expect some specific format of the diff; but it may actually depend on git config; + * - non-ASCII file names are not processed correctly (they will not be found and will be ignored). + */ +void processDiffs( + ReadBuffer & in, + std::optional size_limit, + Commit & commit, + CommitDiff & file_changes) +{ + std::string old_file_path; + std::string new_file_path; + FileDiff * file_change_and_line_changes = nullptr; + LineChange line_change; + + /// Diffs for every file in form of + /// --- a/src/Storages/StorageReplicatedMergeTree.cpp + /// +++ b/src/Storages/StorageReplicatedMergeTree.cpp + /// @@ -1387,2 +1387 @@ bool StorageReplicatedMergeTree::tryExecuteMerge(const LogEntry & entry) + /// - table_lock, entry.create_time, reserved_space, entry.deduplicate, + /// - entry.force_ttl); + /// + table_lock, entry.create_time, reserved_space, entry.deduplicate); + + size_t diff_size = 0; + while (!in.eof()) + { + if (checkString("@@ ", in)) + { + if (!file_change_and_line_changes) + { + auto file_name = new_file_path.empty() ? old_file_path : new_file_path; + auto it = file_changes.find(file_name); + if (file_changes.end() != it) + file_change_and_line_changes = &it->second; + } + + if (file_change_and_line_changes) + { + uint32_t old_lines = 1; + uint32_t new_lines = 1; + + assertChar('-', in); + readText(line_change.hunk_start_line_number_old, in); + if (checkChar(',', in)) + readText(old_lines, in); + + assertString(" +", in); + readText(line_change.hunk_start_line_number_new, in); + if (checkChar(',', in)) + readText(new_lines, in); + + /// This is needed to simplify the logic of updating snapshot: + /// When all lines are removed we can treat it as repeated removal of line with number 1. + if (line_change.hunk_start_line_number_new == 0) + line_change.hunk_start_line_number_new = 1; + + assertString(" @@", in); + if (checkChar(' ', in)) + readStringUntilNextLine(line_change.hunk_context, in); + else + assertChar('\n', in); + + line_change.hunk_lines_added = new_lines; + line_change.hunk_lines_deleted = old_lines; + + ++line_change.hunk_num; + line_change.line_number_old = line_change.hunk_start_line_number_old; + line_change.line_number_new = line_change.hunk_start_line_number_new; + + if (old_lines && new_lines) + { + ++commit.hunks_changed; + ++file_change_and_line_changes->file_change.hunks_changed; + } + else if (old_lines) + { + ++commit.hunks_removed; + ++file_change_and_line_changes->file_change.hunks_removed; + } + else if (new_lines) + { + ++commit.hunks_added; + ++file_change_and_line_changes->file_change.hunks_added; + } + } + } + else if (checkChar('-', in)) + { + if (checkString("-- ", in)) + { + if (checkString("a/", in)) + { + readStringUntilNextLine(old_file_path, in); + line_change = LineChange{}; + file_change_and_line_changes = nullptr; + } + else if (checkString("/dev/null", in)) + { + old_file_path.clear(); + assertChar('\n', in); + line_change = LineChange{}; + file_change_and_line_changes = nullptr; + } + else + skipUntilNextLine(in); /// Actually it can be the line in diff. Skip it for simplicity. + } + else + { + ++diff_size; + if (file_change_and_line_changes) + { + ++commit.lines_deleted; + ++file_change_and_line_changes->file_change.lines_deleted; + + line_change.sign = -1; + readStringUntilNextLine(line_change.line, in); + line_change.setLineInfo(line_change.line); + + file_change_and_line_changes->line_changes.push_back(line_change); + ++line_change.line_number_old; + } + } + } + else if (checkChar('+', in)) + { + if (checkString("++ ", in)) + { + if (checkString("b/", in)) + { + readStringUntilNextLine(new_file_path, in); + line_change = LineChange{}; + file_change_and_line_changes = nullptr; + } + else if (checkString("/dev/null", in)) + { + new_file_path.clear(); + assertChar('\n', in); + line_change = LineChange{}; + file_change_and_line_changes = nullptr; + } + else + skipUntilNextLine(in); /// Actually it can be the line in diff. Skip it for simplicity. + } + else + { + ++diff_size; + if (file_change_and_line_changes) + { + ++commit.lines_added; + ++file_change_and_line_changes->file_change.lines_added; + + line_change.sign = 1; + readStringUntilNextLine(line_change.line, in); + line_change.setLineInfo(line_change.line); + + file_change_and_line_changes->line_changes.push_back(line_change); + ++line_change.line_number_new; + } + } + } + else + { + /// Unknown lines are ignored. + skipUntilNextLine(in); + } + + if (size_limit && diff_size > *size_limit) + { + return; + } + } +} + + +/** Process the "git show" result for a single commit. Append the result to tables. + */ +void processCommit( + ReadBuffer & in, + const Options & options, + size_t commit_num, + size_t total_commits, + std::string hash, + Snapshot & snapshot, + DiffHashes & diff_hashes, + ResultWriter & result) +{ + Commit commit; + commit.hash = hash; + + time_t commit_time; + readText(commit_time, in); + commit.time = commit_time; + assertChar('\0', in); + readNullTerminated(commit.author, in); + std::string parent_hash; + readNullTerminated(parent_hash, in); + readNullTerminated(commit.message, in); + + if (options.skip_commits_with_messages && re2::RE2::PartialMatch(commit.message, *options.skip_commits_with_messages)) + return; + + std::string message_to_print = commit.message; + std::replace_if(message_to_print.begin(), message_to_print.end(), [](char c){ return std::iscntrl(c); }, ' '); + + std::cerr << fmt::format("{}% {} {} {}\n", + commit_num * 100 / total_commits, toString(commit.time), hash, message_to_print); + + if (options.skip_commits_without_parents && commit_num != 0 && parent_hash.empty()) + { + std::cerr << "Warning: skipping commit without parents\n"; + return; + } + + if (!in.eof()) + assertChar('\n', in); + + CommitDiff file_changes; + processFileChanges(in, options, commit, file_changes); + + if (!in.eof()) + { + assertChar('\n', in); + processDiffs(in, commit_num != 0 ? options.diff_size_limit : std::nullopt, commit, file_changes); + } + + /// Skip commits with too large diffs. + if (options.diff_size_limit && commit_num != 0 && commit.lines_added + commit.lines_deleted > *options.diff_size_limit) + return; + + /// Calculate hash of diff and skip duplicates + if (options.skip_commits_with_duplicate_diffs && !diff_hashes.insert(diffHash(file_changes)).second) + return; + + /// Update snapshot and blame info + updateSnapshot(snapshot, commit, file_changes); + + /// Write the result + result.appendCommit(commit, file_changes); +} + + +/** Runs child process and allows to read the result. + * Multiple processes can be run for parallel processing. + */ +auto gitShow(const std::string & hash) +{ + std::string command = fmt::format( + "git show --raw --pretty='format:%ct%x00%aN%x00%P%x00%s%x00' --patch --unified=0 {}", + hash); + + return ShellCommand::execute(command); +} + + +/** Obtain the list of commits and process them. + */ +void processLog(const Options & options) +{ + ResultWriter result; + + std::string command = "git log --reverse --no-merges --pretty=%H"; + fmt::print("{}\n", command); + auto git_log = ShellCommand::execute(command); + + /// Collect hashes in memory. This is inefficient but allows to display beautiful progress. + /// The number of commits is in order of single millions for the largest repositories, + /// so don't care about potential waste of ~100 MB of memory. + + std::vector hashes; + + auto & in = git_log->out; + while (!in.eof()) + { + std::string hash; + readString(hash, in); + assertChar('\n', in); + + if (!options.skip_commits.count(hash)) + hashes.emplace_back(std::move(hash)); + } + + size_t num_commits = hashes.size(); + fmt::print("Total {} commits to process.\n", num_commits); + + /// Will run multiple processes in parallel + size_t num_threads = options.threads; + if (num_threads == 0) + throw Exception("num-threads cannot be zero", ErrorCodes::INCORRECT_DATA); + + std::vector> show_commands(num_threads); + for (size_t i = 0; i < num_commits && i < num_threads; ++i) + show_commands[i] = gitShow(hashes[i]); + + Snapshot snapshot; + DiffHashes diff_hashes; + + for (size_t i = 0; i < num_commits; ++i) + { + processCommit(show_commands[i % num_threads]->out, options, i, num_commits, hashes[i], snapshot, diff_hashes, result); + + if (!options.stop_after_commit.empty() && hashes[i] == options.stop_after_commit) + break; + + if (i + num_threads < num_commits) + show_commands[i % num_threads] = gitShow(hashes[i + num_threads]); + } +} + + +} + +int mainEntryClickHouseGitImport(int argc, char ** argv) +try +{ + using namespace DB; + + po::options_description desc("Allowed options", getTerminalWidth()); + desc.add_options() + ("help,h", "produce help message") + ("skip-commits-without-parents", po::value()->default_value(true), + "Skip commits without parents (except the initial commit)." + " These commits are usually erroneous but they can make sense in very rare cases.") + ("skip-commits-with-duplicate-diffs", po::value()->default_value(true), + "Skip commits with duplicate diffs." + " These commits are usually results of cherry-pick or merge after rebase.") + ("skip-commit", po::value>(), + "Skip commit with specified hash. The option can be specified multiple times.") + ("skip-paths", po::value(), + "Skip paths that matches regular expression (re2 syntax).") + ("skip-commits-with-messages", po::value(), + "Skip commits whose messages matches regular expression (re2 syntax).") + ("diff-size-limit", po::value()->default_value(100000), + "Skip commits whose diff size (number of added + removed lines) is larger than specified threshold. Does not apply for initial commit.") + ("stop-after-commit", po::value(), + "Stop processing after specified commit hash.") + ("threads", po::value()->default_value(std::thread::hardware_concurrency()), + "Number of concurrent git subprocesses to spawn") + ; + + po::variables_map options; + po::store(boost::program_options::parse_command_line(argc, argv, desc), options); + + if (options.count("help")) + { + std::cout << documentation << '\n' + << "Usage: " << argv[0] << '\n' + << desc << '\n' + << "\nExample:\n" + << "\nclickhouse git-import --skip-paths 'generated\\.cpp|^(contrib|docs?|website|libs/(libcityhash|liblz4|libdivide|libvectorclass|libdouble-conversion|libcpuid|libzstd|libfarmhash|libmetrohash|libpoco|libwidechar_width))/' --skip-commits-with-messages '^Merge branch '\n"; + return 1; + } + + processLog(Options(options)); + return 0; +} +catch (...) +{ + std::cerr << DB::getCurrentExceptionMessage(true) << '\n'; + throw; +} diff --git a/programs/install/Install.cpp b/programs/install/Install.cpp index 7b7ab149447..bd60fbb63ba 100644 --- a/programs/install/Install.cpp +++ b/programs/install/Install.cpp @@ -205,6 +205,7 @@ int mainEntryClickHouseInstall(int argc, char ** argv) "clickhouse-benchmark", "clickhouse-copier", "clickhouse-obfuscator", + "clickhouse-git-import", "clickhouse-compressor", "clickhouse-format", "clickhouse-extract-from-config" diff --git a/programs/main.cpp b/programs/main.cpp index 3df5f9f683b..b91bd732f21 100644 --- a/programs/main.cpp +++ b/programs/main.cpp @@ -46,6 +46,9 @@ int mainEntryClickHouseClusterCopier(int argc, char ** argv); #if ENABLE_CLICKHOUSE_OBFUSCATOR int mainEntryClickHouseObfuscator(int argc, char ** argv); #endif +#if ENABLE_CLICKHOUSE_GIT_IMPORT +int mainEntryClickHouseGitImport(int argc, char ** argv); +#endif #if ENABLE_CLICKHOUSE_INSTALL int mainEntryClickHouseInstall(int argc, char ** argv); int mainEntryClickHouseStart(int argc, char ** argv); @@ -91,6 +94,9 @@ std::pair clickhouse_applications[] = #if ENABLE_CLICKHOUSE_OBFUSCATOR {"obfuscator", mainEntryClickHouseObfuscator}, #endif +#if ENABLE_CLICKHOUSE_GIT_IMPORT + {"git-import", mainEntryClickHouseGitImport}, +#endif #if ENABLE_CLICKHOUSE_INSTALL {"install", mainEntryClickHouseInstall}, {"start", mainEntryClickHouseStart}, diff --git a/programs/odbc-bridge/ODBCBlockInputStream.cpp b/programs/odbc-bridge/ODBCBlockInputStream.cpp index 1316ff8f4c6..00ca89bd887 100644 --- a/programs/odbc-bridge/ODBCBlockInputStream.cpp +++ b/programs/odbc-bridge/ODBCBlockInputStream.cpp @@ -15,6 +15,7 @@ namespace DB namespace ErrorCodes { extern const int NUMBER_OF_COLUMNS_DOESNT_MATCH; + extern const int UNKNOWN_TYPE; } @@ -86,6 +87,8 @@ namespace case ValueType::vtUUID: assert_cast(column).insert(parse(value.convert())); break; + default: + throw Exception("Unsupported value type", ErrorCodes::UNKNOWN_TYPE); } } diff --git a/programs/odbc-bridge/ODBCBlockOutputStream.cpp b/programs/odbc-bridge/ODBCBlockOutputStream.cpp index b5bffc58c55..82ca861ea67 100644 --- a/programs/odbc-bridge/ODBCBlockOutputStream.cpp +++ b/programs/odbc-bridge/ODBCBlockOutputStream.cpp @@ -13,6 +13,11 @@ namespace DB { +namespace ErrorCodes +{ + extern const int UNKNOWN_TYPE; +} + namespace { using ValueType = ExternalResultDescription::ValueType; @@ -79,6 +84,9 @@ namespace return Poco::Dynamic::Var(std::to_string(LocalDateTime(time_t(field.get())))).convert(); case ValueType::vtUUID: return Poco::Dynamic::Var(UUID(field.get()).toUnderType().toHexString()).convert(); + default: + throw Exception("Unsupported value type", ErrorCodes::UNKNOWN_TYPE); + } __builtin_unreachable(); } diff --git a/programs/server/Server.cpp b/programs/server/Server.cpp index f24ba444203..56778b8dd69 100644 --- a/programs/server/Server.cpp +++ b/programs/server/Server.cpp @@ -32,6 +32,7 @@ #include #include #include +#include #include #include #include @@ -305,6 +306,13 @@ int Server::main(const std::vector & /*args*/) /// After full config loaded { + if (config().getBool("remap_executable", false)) + { + LOG_DEBUG(log, "Will remap executable in memory."); + remapExecutable(); + LOG_DEBUG(log, "The code in memory has been successfully remapped."); + } + if (config().getBool("mlock_executable", false)) { if (hasLinuxCapability(CAP_IPC_LOCK)) diff --git a/programs/server/config.d/access_control.xml b/programs/server/config.d/access_control.xml new file mode 100644 index 00000000000..6567c39f171 --- /dev/null +++ b/programs/server/config.d/access_control.xml @@ -0,0 +1,13 @@ + + + + + + users.xml + + + + access/ + + + diff --git a/programs/server/config.xml b/programs/server/config.xml index af01e880dc2..77b59abd891 100644 --- a/programs/server/config.xml +++ b/programs/server/config.xml @@ -212,8 +212,17 @@ /var/lib/clickhouse/user_files/ - - /var/lib/clickhouse/access/ + + + + + users.xml + + + + /var/lib/clickhouse/access/ + + @@ -256,9 +265,6 @@ --> - - users.xml - default @@ -296,6 +302,9 @@ --> true + + false + diff --git a/src/Access/AccessControlManager.cpp b/src/Access/AccessControlManager.cpp index 1fa26c85354..41137867213 100644 --- a/src/Access/AccessControlManager.cpp +++ b/src/Access/AccessControlManager.cpp @@ -181,6 +181,15 @@ void AccessControlManager::addUsersConfigStorage( const String & preprocessed_dir_, const zkutil::GetZooKeeper & get_zookeeper_function_) { + auto storages = getStoragesPtr(); + for (const auto & storage : *storages) + { + if (auto users_config_storage = typeid_cast>(storage)) + { + if (users_config_storage->getStoragePath() == users_config_path_) + return; + } + } auto check_setting_name_function = [this](const std::string_view & setting_name) { checkSettingNameIsAllowed(setting_name); }; auto new_storage = std::make_shared(storage_name_, check_setting_name_function); new_storage->load(users_config_path_, include_from_path_, preprocessed_dir_, get_zookeeper_function_); @@ -210,17 +219,36 @@ void AccessControlManager::startPeriodicReloadingUsersConfigs() void AccessControlManager::addDiskStorage(const String & directory_, bool readonly_) { - addStorage(std::make_shared(directory_, readonly_)); + addDiskStorage(DiskAccessStorage::STORAGE_TYPE, directory_, readonly_); } void AccessControlManager::addDiskStorage(const String & storage_name_, const String & directory_, bool readonly_) { + auto storages = getStoragesPtr(); + for (const auto & storage : *storages) + { + if (auto disk_storage = typeid_cast>(storage)) + { + if (disk_storage->isStoragePathEqual(directory_)) + { + if (readonly_) + disk_storage->setReadOnly(readonly_); + return; + } + } + } addStorage(std::make_shared(storage_name_, directory_, readonly_)); } void AccessControlManager::addMemoryStorage(const String & storage_name_) { + auto storages = getStoragesPtr(); + for (const auto & storage : *storages) + { + if (auto memory_storage = typeid_cast>(storage)) + return; + } addStorage(std::make_shared(storage_name_)); } diff --git a/src/Access/AccessFlags.h b/src/Access/AccessFlags.h index 3cb92b6b855..049140586ea 100644 --- a/src/Access/AccessFlags.h +++ b/src/Access/AccessFlags.h @@ -1,7 +1,7 @@ #pragma once #include -#include +#include #include #include #include diff --git a/src/Access/AccessRights.h b/src/Access/AccessRights.h index 8e150070f53..c610795ab45 100644 --- a/src/Access/AccessRights.h +++ b/src/Access/AccessRights.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include #include diff --git a/src/Access/AccessType.h b/src/Access/AccessType.h index dae86e62434..11896f628d9 100644 --- a/src/Access/AccessType.h +++ b/src/Access/AccessType.h @@ -1,13 +1,17 @@ #pragma once -#include +#include #include #include #include +#include namespace DB { + +using Strings = std::vector; + /// Represents an access type which can be granted on databases, tables, columns, etc. enum class AccessType { diff --git a/src/Access/AllowedClientHosts.h b/src/Access/AllowedClientHosts.h index 2baafb2e04a..615782d75a2 100644 --- a/src/Access/AllowedClientHosts.h +++ b/src/Access/AllowedClientHosts.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include #include @@ -11,6 +11,9 @@ namespace DB { + +using Strings = std::vector; + /// Represents lists of hosts an user is allowed to connect to server from. class AllowedClientHosts { diff --git a/src/Access/Authentication.h b/src/Access/Authentication.h index 35ff0fa1d32..38714339221 100644 --- a/src/Access/Authentication.h +++ b/src/Access/Authentication.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include #include diff --git a/src/Access/DiskAccessStorage.cpp b/src/Access/DiskAccessStorage.cpp index fc80859885d..9965e54df7e 100644 --- a/src/Access/DiskAccessStorage.cpp +++ b/src/Access/DiskAccessStorage.cpp @@ -218,6 +218,16 @@ namespace } + /// Converts a path to an absolute path and append it with a separator. + String makeDirectoryPathCanonical(const String & directory_path) + { + auto canonical_directory_path = std::filesystem::weakly_canonical(directory_path); + if (canonical_directory_path.has_filename()) + canonical_directory_path += std::filesystem::path::preferred_separator; + return canonical_directory_path; + } + + /// Calculates the path to a file named .sql for saving an access entity. String getEntityFilePath(const String & directory_path, const UUID & id) { @@ -298,22 +308,17 @@ DiskAccessStorage::DiskAccessStorage(const String & directory_path_, bool readon { } - DiskAccessStorage::DiskAccessStorage(const String & storage_name_, const String & directory_path_, bool readonly_) : IAccessStorage(storage_name_) { - auto canonical_directory_path = std::filesystem::weakly_canonical(directory_path_); - if (canonical_directory_path.has_filename()) - canonical_directory_path += std::filesystem::path::preferred_separator; + directory_path = makeDirectoryPathCanonical(directory_path_); + readonly = readonly_; std::error_code create_dir_error_code; - std::filesystem::create_directories(canonical_directory_path, create_dir_error_code); + std::filesystem::create_directories(directory_path, create_dir_error_code); - if (!std::filesystem::exists(canonical_directory_path) || !std::filesystem::is_directory(canonical_directory_path) || create_dir_error_code) - throw Exception("Couldn't create directory " + canonical_directory_path.string() + " reason: '" + create_dir_error_code.message() + "'", ErrorCodes::DIRECTORY_DOESNT_EXIST); - - directory_path = canonical_directory_path; - readonly = readonly_; + if (!std::filesystem::exists(directory_path) || !std::filesystem::is_directory(directory_path) || create_dir_error_code) + throw Exception("Couldn't create directory " + directory_path + " reason: '" + create_dir_error_code.message() + "'", ErrorCodes::DIRECTORY_DOESNT_EXIST); bool should_rebuild_lists = std::filesystem::exists(getNeedRebuildListsMarkFilePath(directory_path)); if (!should_rebuild_lists) @@ -337,6 +342,12 @@ DiskAccessStorage::~DiskAccessStorage() } +bool DiskAccessStorage::isStoragePathEqual(const String & directory_path_) const +{ + return getStoragePath() == makeDirectoryPathCanonical(directory_path_); +} + + void DiskAccessStorage::clear() { entries_by_id.clear(); @@ -426,33 +437,41 @@ bool DiskAccessStorage::writeLists() void DiskAccessStorage::scheduleWriteLists(EntityType type) { if (failed_to_write_lists) - return; + return; /// We don't try to write list files after the first fail. + /// The next restart of the server will invoke rebuilding of the list files. - bool already_scheduled = !types_of_lists_to_write.empty(); types_of_lists_to_write.insert(type); - if (already_scheduled) - return; + if (lists_writing_thread_is_waiting) + return; /// If the lists' writing thread is still waiting we can update `types_of_lists_to_write` easily, + /// without restarting that thread. + + if (lists_writing_thread.joinable()) + lists_writing_thread.join(); /// Create the 'need_rebuild_lists.mark' file. /// This file will be used later to find out if writing lists is successful or not. std::ofstream{getNeedRebuildListsMarkFilePath(directory_path)}; - startListsWritingThread(); + lists_writing_thread = ThreadFromGlobalPool{&DiskAccessStorage::listsWritingThreadFunc, this}; + lists_writing_thread_is_waiting = true; } -void DiskAccessStorage::startListsWritingThread() +void DiskAccessStorage::listsWritingThreadFunc() { - if (lists_writing_thread.joinable()) + std::unique_lock lock{mutex}; + { - if (!lists_writing_thread_exited) - return; - lists_writing_thread.detach(); + /// It's better not to write the lists files too often, that's why we need + /// the following timeout. + const auto timeout = std::chrono::minutes(1); + SCOPE_EXIT({ lists_writing_thread_is_waiting = false; }); + if (lists_writing_thread_should_exit.wait_for(lock, timeout) != std::cv_status::timeout) + return; /// The destructor requires us to exit. } - lists_writing_thread_exited = false; - lists_writing_thread = ThreadFromGlobalPool{&DiskAccessStorage::listsWritingThreadFunc, this}; + writeLists(); } @@ -466,21 +485,6 @@ void DiskAccessStorage::stopListsWritingThread() } -void DiskAccessStorage::listsWritingThreadFunc() -{ - std::unique_lock lock{mutex}; - SCOPE_EXIT({ lists_writing_thread_exited = true; }); - - /// It's better not to write the lists files too often, that's why we need - /// the following timeout. - const auto timeout = std::chrono::minutes(1); - if (lists_writing_thread_should_exit.wait_for(lock, timeout) != std::cv_status::timeout) - return; /// The destructor requires us to exit. - - writeLists(); -} - - /// Reads and parses all the ".sql" files from a specified directory /// and then saves the files "users.list", "roles.list", etc. to the same directory. bool DiskAccessStorage::rebuildLists() diff --git a/src/Access/DiskAccessStorage.h b/src/Access/DiskAccessStorage.h index 11eb1c3b1ad..f6bef078aba 100644 --- a/src/Access/DiskAccessStorage.h +++ b/src/Access/DiskAccessStorage.h @@ -18,7 +18,11 @@ public: ~DiskAccessStorage() override; const char * getStorageType() const override { return STORAGE_TYPE; } + String getStoragePath() const override { return directory_path; } + bool isStoragePathEqual(const String & directory_path_) const; + + void setReadOnly(bool readonly_) { readonly = readonly_; } bool isStorageReadOnly() const override { return readonly; } private: @@ -42,9 +46,8 @@ private: void scheduleWriteLists(EntityType type); bool rebuildLists(); - void startListsWritingThread(); - void stopListsWritingThread(); void listsWritingThreadFunc(); + void stopListsWritingThread(); void insertNoLock(const UUID & id, const AccessEntityPtr & new_entity, bool replace_if_exists, Notifications & notifications); void removeNoLock(const UUID & id, Notifications & notifications); @@ -67,14 +70,14 @@ private: void prepareNotifications(const UUID & id, const Entry & entry, bool remove, Notifications & notifications) const; String directory_path; - bool readonly; + std::atomic readonly; std::unordered_map entries_by_id; std::unordered_map entries_by_name_and_type[static_cast(EntityType::MAX)]; boost::container::flat_set types_of_lists_to_write; bool failed_to_write_lists = false; /// Whether writing of the list files has been failed since the recent restart of the server. ThreadFromGlobalPool lists_writing_thread; /// List files are written in a separate thread. std::condition_variable lists_writing_thread_should_exit; /// Signals `lists_writing_thread` to exit. - std::atomic lists_writing_thread_exited = false; + bool lists_writing_thread_is_waiting = false; mutable std::list handlers_by_type[static_cast(EntityType::MAX)]; mutable std::mutex mutex; }; diff --git a/src/Access/EnabledRowPolicies.h b/src/Access/EnabledRowPolicies.h index b92939afb03..0ca4f16fcf1 100644 --- a/src/Access/EnabledRowPolicies.h +++ b/src/Access/EnabledRowPolicies.h @@ -1,7 +1,7 @@ #pragma once #include -#include +#include #include #include #include diff --git a/src/Access/EnabledSettings.h b/src/Access/EnabledSettings.h index cc30e4481fc..80635ca4542 100644 --- a/src/Access/EnabledSettings.h +++ b/src/Access/EnabledSettings.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include #include diff --git a/src/Access/ExternalAuthenticators.h b/src/Access/ExternalAuthenticators.h index 54af87604a6..7484996c472 100644 --- a/src/Access/ExternalAuthenticators.h +++ b/src/Access/ExternalAuthenticators.h @@ -1,7 +1,7 @@ #pragma once #include -#include +#include #include #include diff --git a/src/Access/IAccessEntity.h b/src/Access/IAccessEntity.h index 68e14c99982..18b450bff5c 100644 --- a/src/Access/IAccessEntity.h +++ b/src/Access/IAccessEntity.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include #include diff --git a/src/Access/IAccessStorage.h b/src/Access/IAccessStorage.h index 7851f8c9b6b..d91927e79d9 100644 --- a/src/Access/IAccessStorage.h +++ b/src/Access/IAccessStorage.h @@ -1,7 +1,7 @@ #pragma once #include -#include +#include #include #include #include diff --git a/src/Access/LDAPClient.h b/src/Access/LDAPClient.h index 5aad2ed3061..b117ed9a026 100644 --- a/src/Access/LDAPClient.h +++ b/src/Access/LDAPClient.h @@ -5,7 +5,7 @@ #endif #include -#include +#include #if USE_LDAP # include diff --git a/src/Access/LDAPParams.h b/src/Access/LDAPParams.h index 0d7c7dd17cd..2168ce45203 100644 --- a/src/Access/LDAPParams.h +++ b/src/Access/LDAPParams.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include diff --git a/src/Access/SettingsProfilesCache.h b/src/Access/SettingsProfilesCache.h index 42dd05df351..ef3cfa51665 100644 --- a/src/Access/SettingsProfilesCache.h +++ b/src/Access/SettingsProfilesCache.h @@ -2,7 +2,7 @@ #include #include -#include +#include #include #include #include diff --git a/src/AggregateFunctions/AggregateFunctionArray.cpp b/src/AggregateFunctions/AggregateFunctionArray.cpp index 7fe4f1f448b..d0f17da5aa4 100644 --- a/src/AggregateFunctions/AggregateFunctionArray.cpp +++ b/src/AggregateFunctions/AggregateFunctionArray.cpp @@ -12,6 +12,9 @@ namespace ErrorCodes extern const int ILLEGAL_TYPE_OF_ARGUMENT; } +namespace +{ + class AggregateFunctionCombinatorArray final : public IAggregateFunctionCombinator { public: @@ -45,6 +48,8 @@ public: } }; +} + void registerAggregateFunctionCombinatorArray(AggregateFunctionCombinatorFactory & factory) { factory.registerCombinator(std::make_shared()); diff --git a/src/AggregateFunctions/AggregateFunctionDistinct.cpp b/src/AggregateFunctions/AggregateFunctionDistinct.cpp index 4d89e8fb199..8ad37f49797 100644 --- a/src/AggregateFunctions/AggregateFunctionDistinct.cpp +++ b/src/AggregateFunctions/AggregateFunctionDistinct.cpp @@ -6,12 +6,14 @@ namespace DB { - namespace ErrorCodes { extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; } +namespace +{ + class AggregateFunctionCombinatorDistinct final : public IAggregateFunctionCombinator { public: @@ -56,6 +58,8 @@ public: } }; +} + void registerAggregateFunctionCombinatorDistinct(AggregateFunctionCombinatorFactory & factory) { factory.registerCombinator(std::make_shared()); diff --git a/src/AggregateFunctions/AggregateFunctionForEach.cpp b/src/AggregateFunctions/AggregateFunctionForEach.cpp index 693bc6839fa..6e0365fc04b 100644 --- a/src/AggregateFunctions/AggregateFunctionForEach.cpp +++ b/src/AggregateFunctions/AggregateFunctionForEach.cpp @@ -12,6 +12,9 @@ namespace ErrorCodes extern const int ILLEGAL_TYPE_OF_ARGUMENT; } +namespace +{ + class AggregateFunctionCombinatorForEach final : public IAggregateFunctionCombinator { public: @@ -42,6 +45,8 @@ public: } }; +} + void registerAggregateFunctionCombinatorForEach(AggregateFunctionCombinatorFactory & factory) { factory.registerCombinator(std::make_shared()); diff --git a/src/AggregateFunctions/AggregateFunctionMerge.cpp b/src/AggregateFunctions/AggregateFunctionMerge.cpp index 2ce3f0e11f6..17157d21bd1 100644 --- a/src/AggregateFunctions/AggregateFunctionMerge.cpp +++ b/src/AggregateFunctions/AggregateFunctionMerge.cpp @@ -13,6 +13,9 @@ namespace ErrorCodes extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; } +namespace +{ + class AggregateFunctionCombinatorMerge final : public IAggregateFunctionCombinator { public: @@ -55,6 +58,8 @@ public: } }; +} + void registerAggregateFunctionCombinatorMerge(AggregateFunctionCombinatorFactory & factory) { factory.registerCombinator(std::make_shared()); diff --git a/src/AggregateFunctions/AggregateFunctionNull.cpp b/src/AggregateFunctions/AggregateFunctionNull.cpp index c88d1e7f24c..f584ae1f34c 100644 --- a/src/AggregateFunctions/AggregateFunctionNull.cpp +++ b/src/AggregateFunctions/AggregateFunctionNull.cpp @@ -15,6 +15,9 @@ namespace ErrorCodes extern const int ILLEGAL_TYPE_OF_ARGUMENT; } +namespace +{ + class AggregateFunctionCombinatorNull final : public IAggregateFunctionCombinator { public: @@ -119,6 +122,8 @@ public: } }; +} + void registerAggregateFunctionCombinatorNull(AggregateFunctionCombinatorFactory & factory) { factory.registerCombinator(std::make_shared()); diff --git a/src/AggregateFunctions/AggregateFunctionOrFill.cpp b/src/AggregateFunctions/AggregateFunctionOrFill.cpp index ce8fc8d9ca5..af107e26ca9 100644 --- a/src/AggregateFunctions/AggregateFunctionOrFill.cpp +++ b/src/AggregateFunctions/AggregateFunctionOrFill.cpp @@ -6,6 +6,8 @@ namespace DB { +namespace +{ template class AggregateFunctionCombinatorOrFill final : public IAggregateFunctionCombinator @@ -32,6 +34,8 @@ public: } }; +} + void registerAggregateFunctionCombinatorOrFill(AggregateFunctionCombinatorFactory & factory) { factory.registerCombinator(std::make_shared>()); diff --git a/src/AggregateFunctions/AggregateFunctionRankCorrelation.h b/src/AggregateFunctions/AggregateFunctionRankCorrelation.h index 379a8332f09..15057940ebd 100644 --- a/src/AggregateFunctions/AggregateFunctionRankCorrelation.h +++ b/src/AggregateFunctions/AggregateFunctionRankCorrelation.h @@ -6,7 +6,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/src/AggregateFunctions/AggregateFunctionResample.cpp b/src/AggregateFunctions/AggregateFunctionResample.cpp index 389c9048918..b81fb442f27 100644 --- a/src/AggregateFunctions/AggregateFunctionResample.cpp +++ b/src/AggregateFunctions/AggregateFunctionResample.cpp @@ -13,6 +13,9 @@ namespace ErrorCodes extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; } +namespace +{ + class AggregateFunctionCombinatorResample final : public IAggregateFunctionCombinator { public: @@ -93,6 +96,8 @@ public: } }; +} + void registerAggregateFunctionCombinatorResample(AggregateFunctionCombinatorFactory & factory) { factory.registerCombinator(std::make_shared()); diff --git a/src/AggregateFunctions/AggregateFunctionResample.h b/src/AggregateFunctions/AggregateFunctionResample.h index 92fa8fbb2a5..c1528686785 100644 --- a/src/AggregateFunctions/AggregateFunctionResample.h +++ b/src/AggregateFunctions/AggregateFunctionResample.h @@ -4,6 +4,7 @@ #include #include #include +#include namespace DB @@ -60,7 +61,18 @@ public: if (end < begin) total = 0; else - total = (end - begin + step - 1) / step; + { + Key dif; + size_t sum; + if (common::subOverflow(end, begin, dif) + || common::addOverflow(static_cast(dif), step, sum)) + { + throw Exception("Overflow in internal computations in function " + getName() + + ". Too large arguments", ErrorCodes::ARGUMENT_OUT_OF_BOUND); + } + + total = (sum - 1) / step; // total = (end - begin + step - 1) / step + } if (total > MAX_ELEMENTS) throw Exception("The range given in function " diff --git a/src/AggregateFunctions/AggregateFunctionState.cpp b/src/AggregateFunctions/AggregateFunctionState.cpp index 9d1c677c0ff..348d8ba44dd 100644 --- a/src/AggregateFunctions/AggregateFunctionState.cpp +++ b/src/AggregateFunctions/AggregateFunctionState.cpp @@ -13,6 +13,9 @@ namespace ErrorCodes extern const int BAD_ARGUMENTS; } +namespace +{ + class AggregateFunctionCombinatorState final : public IAggregateFunctionCombinator { public: @@ -33,6 +36,8 @@ public: } }; +} + void registerAggregateFunctionCombinatorState(AggregateFunctionCombinatorFactory & factory) { factory.registerCombinator(std::make_shared()); diff --git a/src/AggregateFunctions/IAggregateFunction.h b/src/AggregateFunctions/IAggregateFunction.h index 7e6b7abbd28..b9656c31fa3 100644 --- a/src/AggregateFunctions/IAggregateFunction.h +++ b/src/AggregateFunctions/IAggregateFunction.h @@ -5,7 +5,7 @@ #include #include -#include +#include #include #include #include diff --git a/src/AggregateFunctions/QuantileExact.h b/src/AggregateFunctions/QuantileExact.h index da0f644721b..3f5a0907126 100644 --- a/src/AggregateFunctions/QuantileExact.h +++ b/src/AggregateFunctions/QuantileExact.h @@ -1,7 +1,7 @@ #pragma once #include -#include +#include #include #include #include diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index 843dd8c2615..b6e8c395b26 100644 --- a/src/CMakeLists.txt +++ b/src/CMakeLists.txt @@ -117,6 +117,10 @@ endif () add_library(clickhouse_common_io ${clickhouse_common_io_headers} ${clickhouse_common_io_sources}) +if (SPLIT_SHARED_LIBRARIES) + target_compile_definitions(clickhouse_common_io PRIVATE SPLIT_SHARED_LIBRARIES) +endif () + add_library (clickhouse_malloc OBJECT Common/malloc.cpp) set_source_files_properties(Common/malloc.cpp PROPERTIES COMPILE_FLAGS "-fno-builtin") diff --git a/src/Columns/ColumnLowCardinality.h b/src/Columns/ColumnLowCardinality.h index e3b879d6dd5..0aeda4567fd 100644 --- a/src/Columns/ColumnLowCardinality.h +++ b/src/Columns/ColumnLowCardinality.h @@ -170,7 +170,12 @@ public: size_t sizeOfValueIfFixed() const override { return getDictionary().sizeOfValueIfFixed(); } bool isNumeric() const override { return getDictionary().isNumeric(); } bool lowCardinality() const override { return true; } - bool isNullable() const override { return isColumnNullable(*dictionary.getColumnUniquePtr()); } + + /** + * Checks if the dictionary column is Nullable(T). + * So LC(Nullable(T)) would return true, LC(U) -- false. + */ + bool nestedIsNullable() const { return isColumnNullable(*dictionary.getColumnUnique().getNestedColumn()); } const IColumnUnique & getDictionary() const { return dictionary.getColumnUnique(); } const ColumnPtr & getDictionaryPtr() const { return dictionary.getColumnUniquePtr(); } diff --git a/src/Columns/ColumnVector.h b/src/Columns/ColumnVector.h index 1090de556a0..55ab67d6214 100644 --- a/src/Columns/ColumnVector.h +++ b/src/Columns/ColumnVector.h @@ -7,6 +7,7 @@ #include #include #include +#include namespace DB @@ -130,7 +131,7 @@ public: void insertFrom(const IColumn & src, size_t n) override { - data.push_back(static_cast(src).getData()[n]); + data.push_back(assert_cast(src).getData()[n]); } void insertData(const char * pos, size_t) override @@ -205,14 +206,14 @@ public: /// This method implemented in header because it could be possibly devirtualized. int compareAt(size_t n, size_t m, const IColumn & rhs_, int nan_direction_hint) const override { - return CompareHelper::compare(data[n], static_cast(rhs_).data[m], nan_direction_hint); + return CompareHelper::compare(data[n], assert_cast(rhs_).data[m], nan_direction_hint); } void compareColumn(const IColumn & rhs, size_t rhs_row_num, PaddedPODArray * row_indexes, PaddedPODArray & compare_results, int direction, int nan_direction_hint) const override { - return this->template doCompareColumn(static_cast(rhs), rhs_row_num, row_indexes, + return this->template doCompareColumn(assert_cast(rhs), rhs_row_num, row_indexes, compare_results, direction, nan_direction_hint); } diff --git a/src/Columns/ColumnsNumber.h b/src/Columns/ColumnsNumber.h index c206b37a588..96ce2bd6d6f 100644 --- a/src/Columns/ColumnsNumber.h +++ b/src/Columns/ColumnsNumber.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include diff --git a/src/Columns/ya.make b/src/Columns/ya.make index 78c0e1b992d..910c479c2a9 100644 --- a/src/Columns/ya.make +++ b/src/Columns/ya.make @@ -2,8 +2,6 @@ LIBRARY() ADDINCL( - contrib/libs/icu/common - contrib/libs/icu/i18n contrib/libs/pdqsort ) diff --git a/src/Common/BitonicSort.h b/src/Common/BitonicSort.h index 6bf10ebe835..8140687c040 100644 --- a/src/Common/BitonicSort.h +++ b/src/Common/BitonicSort.h @@ -12,7 +12,7 @@ #endif #include -#include +#include #include #include #include diff --git a/src/Common/Config/AbstractConfigurationComparison.h b/src/Common/Config/AbstractConfigurationComparison.h index f0d126a578a..f825ad4e53d 100644 --- a/src/Common/Config/AbstractConfigurationComparison.h +++ b/src/Common/Config/AbstractConfigurationComparison.h @@ -1,6 +1,6 @@ #pragma once -#include +#include namespace Poco::Util { diff --git a/src/Common/CpuId.h b/src/Common/CpuId.h index 1548ff6cc40..2db247173a6 100644 --- a/src/Common/CpuId.h +++ b/src/Common/CpuId.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #if defined(__x86_64__) || defined(__i386__) #include diff --git a/src/Common/CurrentMetrics.h b/src/Common/CurrentMetrics.h index 09accf96010..eabeca7a0e9 100644 --- a/src/Common/CurrentMetrics.h +++ b/src/Common/CurrentMetrics.h @@ -4,7 +4,7 @@ #include #include #include -#include +#include /** Allows to count number of simultaneously happening processes or current value of some metric. * - for high-level profiling. diff --git a/src/Common/DNSResolver.cpp b/src/Common/DNSResolver.cpp index d61982f3406..9059d2838bb 100644 --- a/src/Common/DNSResolver.cpp +++ b/src/Common/DNSResolver.cpp @@ -3,7 +3,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/src/Common/DNSResolver.h b/src/Common/DNSResolver.h index 7dbc2852d43..57c28188f58 100644 --- a/src/Common/DNSResolver.h +++ b/src/Common/DNSResolver.h @@ -2,7 +2,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/src/Common/ErrorCodes.cpp b/src/Common/ErrorCodes.cpp index 297192e650b..85da23fb303 100644 --- a/src/Common/ErrorCodes.cpp +++ b/src/Common/ErrorCodes.cpp @@ -281,7 +281,7 @@ namespace ErrorCodes extern const int DICTIONARY_IS_EMPTY = 281; extern const int INCORRECT_INDEX = 282; extern const int UNKNOWN_DISTRIBUTED_PRODUCT_MODE = 283; - extern const int UNKNOWN_GLOBAL_SUBQUERIES_METHOD = 284; + extern const int WRONG_GLOBAL_SUBQUERY = 284; extern const int TOO_FEW_LIVE_REPLICAS = 285; extern const int UNSATISFIED_QUORUM_FOR_PREVIOUS_WRITE = 286; extern const int UNKNOWN_FORMAT_VERSION = 287; @@ -507,6 +507,7 @@ namespace ErrorCodes extern const int CANNOT_DECLARE_RABBITMQ_EXCHANGE = 540; extern const int CANNOT_CREATE_RABBITMQ_QUEUE_BINDING = 541; extern const int CANNOT_REMOVE_RABBITMQ_EXCHANGE = 542; + extern const int UNKNOWN_MYSQL_DATATYPES_SUPPORT_LEVEL = 543; extern const int KEEPER_EXCEPTION = 999; extern const int POCO_EXCEPTION = 1000; diff --git a/src/Common/ExternalLoaderStatus.h b/src/Common/ExternalLoaderStatus.h index 44536198b82..d8852eb6152 100644 --- a/src/Common/ExternalLoaderStatus.h +++ b/src/Common/ExternalLoaderStatus.h @@ -3,7 +3,7 @@ #include #include #include -#include +#include namespace DB { diff --git a/src/Common/FileSyncGuard.h b/src/Common/FileSyncGuard.h new file mode 100644 index 00000000000..6451f6ebf36 --- /dev/null +++ b/src/Common/FileSyncGuard.h @@ -0,0 +1,41 @@ +#pragma once + +#include + +namespace DB +{ + +/// Helper class, that recieves file descriptor and does fsync for it in destructor. +/// It's used to keep descriptor open, while doing some operations with it, and do fsync at the end. +/// Guaranties of sequence 'close-reopen-fsync' may depend on kernel version. +/// Source: linux-fsdevel mailing-list https://marc.info/?l=linux-fsdevel&m=152535409207496 +class FileSyncGuard +{ +public: + /// NOTE: If you have already opened descriptor, it's preffered to use + /// this constructor instead of construnctor with path. + FileSyncGuard(const DiskPtr & disk_, int fd_) : disk(disk_), fd(fd_) {} + + FileSyncGuard(const DiskPtr & disk_, const String & path) + : disk(disk_), fd(disk_->open(path, O_RDWR)) {} + + ~FileSyncGuard() + { + try + { + disk->sync(fd); + disk->close(fd); + } + catch (...) + { + tryLogCurrentException(__PRETTY_FUNCTION__); + } + } + +private: + DiskPtr disk; + int fd = -1; +}; + +} + diff --git a/src/Common/HashTable/Hash.h b/src/Common/HashTable/Hash.h index c561933ab80..abd1a69545f 100644 --- a/src/Common/HashTable/Hash.h +++ b/src/Common/HashTable/Hash.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include #include diff --git a/src/Common/HashTable/HashTable.h b/src/Common/HashTable/HashTable.h index 5c8e7917eb0..baad5d40764 100644 --- a/src/Common/HashTable/HashTable.h +++ b/src/Common/HashTable/HashTable.h @@ -9,7 +9,7 @@ #include #include -#include +#include #include #include diff --git a/src/Common/IFactoryWithAliases.h b/src/Common/IFactoryWithAliases.h index 994b2c1a02c..11ebf31db33 100644 --- a/src/Common/IFactoryWithAliases.h +++ b/src/Common/IFactoryWithAliases.h @@ -2,7 +2,7 @@ #include #include -#include +#include #include #include diff --git a/src/Common/IntervalKind.h b/src/Common/IntervalKind.h index 91c3eb14043..a086d0d2b0c 100644 --- a/src/Common/IntervalKind.h +++ b/src/Common/IntervalKind.h @@ -1,6 +1,6 @@ #pragma once -#include +#include namespace DB diff --git a/src/Common/Macros.cpp b/src/Common/Macros.cpp index 7b5a896015b..a4981fa5be3 100644 --- a/src/Common/Macros.cpp +++ b/src/Common/Macros.cpp @@ -68,8 +68,14 @@ String Macros::expand(const String & s, res += database_name; else if (macro_name == "table" && !table_name.empty()) res += table_name; - else if (macro_name == "uuid" && uuid != UUIDHelpers::Nil) + else if (macro_name == "uuid") + { + if (uuid == UUIDHelpers::Nil) + throw Exception("Macro 'uuid' and empty arguments of ReplicatedMergeTree " + "are supported only for ON CLUSTER queries with Atomic database engine", + ErrorCodes::SYNTAX_ERROR); res += toString(uuid); + } else throw Exception("No macro '" + macro_name + "' in config while processing substitutions in '" + s + "' at '" diff --git a/src/Common/Macros.h b/src/Common/Macros.h index cee133b0ccb..bcd6075782e 100644 --- a/src/Common/Macros.h +++ b/src/Common/Macros.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include diff --git a/src/Common/MemoryTracker.cpp b/src/Common/MemoryTracker.cpp index 9d073cf8dd8..5d51fc9f301 100644 --- a/src/Common/MemoryTracker.cpp +++ b/src/Common/MemoryTracker.cpp @@ -6,6 +6,7 @@ #include #include #include +#include #include #include @@ -22,6 +23,10 @@ namespace DB } } +namespace ProfileEvents +{ + extern const Event QueryMemoryLimitExceeded; +} static constexpr size_t log_peak_memory_usage_every = 1ULL << 30; @@ -104,6 +109,7 @@ void MemoryTracker::alloc(Int64 size) /// Prevent recursion. Exception::ctor -> std::string -> new[] -> MemoryTracker::alloc auto untrack_lock = blocker.cancel(); // NOLINT + ProfileEvents::increment(ProfileEvents::QueryMemoryLimitExceeded); std::stringstream message; message << "Memory tracker"; if (const auto * description = description_ptr.load(std::memory_order_relaxed)) @@ -136,6 +142,7 @@ void MemoryTracker::alloc(Int64 size) /// Prevent recursion. Exception::ctor -> std::string -> new[] -> MemoryTracker::alloc auto no_track = blocker.cancel(); // NOLINT + ProfileEvents::increment(ProfileEvents::QueryMemoryLimitExceeded); std::stringstream message; message << "Memory limit"; if (const auto * description = description_ptr.load(std::memory_order_relaxed)) diff --git a/src/Common/NaNUtils.h b/src/Common/NaNUtils.h index 7d727fb7793..3b393fad41e 100644 --- a/src/Common/NaNUtils.h +++ b/src/Common/NaNUtils.h @@ -4,7 +4,7 @@ #include #include -#include +#include /// To be sure, that this function is zero-cost for non-floating point types. diff --git a/src/Common/NamePrompter.h b/src/Common/NamePrompter.h index a52a5f3775e..5f7832c4423 100644 --- a/src/Common/NamePrompter.h +++ b/src/Common/NamePrompter.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include diff --git a/src/Common/OpenSSLHelpers.h b/src/Common/OpenSSLHelpers.h index e77fc3037c1..2560664de9e 100644 --- a/src/Common/OpenSSLHelpers.h +++ b/src/Common/OpenSSLHelpers.h @@ -5,7 +5,7 @@ #endif #if USE_SSL -# include +# include namespace DB diff --git a/src/Common/PoolWithFailoverBase.h b/src/Common/PoolWithFailoverBase.h index f206278fbda..a328e15e4e5 100644 --- a/src/Common/PoolWithFailoverBase.h +++ b/src/Common/PoolWithFailoverBase.h @@ -7,7 +7,6 @@ #include #include #include -#include #include #include #include diff --git a/src/Common/ProfileEvents.cpp b/src/Common/ProfileEvents.cpp index 475e073d253..486cb7e1a6e 100644 --- a/src/Common/ProfileEvents.cpp +++ b/src/Common/ProfileEvents.cpp @@ -233,6 +233,7 @@ M(S3WriteRequestsErrors, "Number of non-throttling errors in POST, DELETE, PUT and PATCH requests to S3 storage.") \ M(S3WriteRequestsThrottling, "Number of 429 and 503 errors in POST, DELETE, PUT and PATCH requests to S3 storage.") \ M(S3WriteRequestsRedirects, "Number of redirects in POST, DELETE, PUT and PATCH requests to S3 storage.") \ + M(QueryMemoryLimitExceeded, "Number of times when memory limit exceeded for query.") \ namespace ProfileEvents diff --git a/src/Common/QueryProfiler.h b/src/Common/QueryProfiler.h index 44eeebbf10a..8e2d09e0be2 100644 --- a/src/Common/QueryProfiler.h +++ b/src/Common/QueryProfiler.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include diff --git a/src/Common/RWLock.h b/src/Common/RWLock.h index ad0a3f139fc..952c8049a0f 100644 --- a/src/Common/RWLock.h +++ b/src/Common/RWLock.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include diff --git a/src/Common/RadixSort.h b/src/Common/RadixSort.h index cbb8badab4a..22e93a2c324 100644 --- a/src/Common/RadixSort.h +++ b/src/Common/RadixSort.h @@ -13,7 +13,7 @@ #include #include -#include +#include #include diff --git a/src/Common/ShellCommand.cpp b/src/Common/ShellCommand.cpp index 53ab2301a0a..bbb8801f190 100644 --- a/src/Common/ShellCommand.cpp +++ b/src/Common/ShellCommand.cpp @@ -57,7 +57,16 @@ ShellCommand::~ShellCommand() LOG_WARNING(getLogger(), "Cannot kill shell command pid {} errno '{}'", pid, errnoToString(retcode)); } else if (!wait_called) - tryWait(); + { + try + { + tryWait(); + } + catch (...) + { + tryLogCurrentException(getLogger()); + } + } } void ShellCommand::logCommand(const char * filename, char * const argv[]) @@ -74,7 +83,8 @@ void ShellCommand::logCommand(const char * filename, char * const argv[]) LOG_TRACE(ShellCommand::getLogger(), "Will start shell command '{}' with arguments {}", filename, args.str()); } -std::unique_ptr ShellCommand::executeImpl(const char * filename, char * const argv[], bool pipe_stdin_only, bool terminate_in_destructor) +std::unique_ptr ShellCommand::executeImpl( + const char * filename, char * const argv[], bool pipe_stdin_only, bool terminate_in_destructor) { logCommand(filename, argv); @@ -130,7 +140,8 @@ std::unique_ptr ShellCommand::executeImpl(const char * filename, c _exit(int(ReturnCodes::CANNOT_EXEC)); } - std::unique_ptr res(new ShellCommand(pid, pipe_stdin.fds_rw[1], pipe_stdout.fds_rw[0], pipe_stderr.fds_rw[0], terminate_in_destructor)); + std::unique_ptr res(new ShellCommand( + pid, pipe_stdin.fds_rw[1], pipe_stdout.fds_rw[0], pipe_stderr.fds_rw[0], terminate_in_destructor)); LOG_TRACE(getLogger(), "Started shell command '{}' with pid {}", filename, pid); @@ -143,7 +154,8 @@ std::unique_ptr ShellCommand::executeImpl(const char * filename, c } -std::unique_ptr ShellCommand::execute(const std::string & command, bool pipe_stdin_only, bool terminate_in_destructor) +std::unique_ptr ShellCommand::execute( + const std::string & command, bool pipe_stdin_only, bool terminate_in_destructor) { /// Arguments in non-constant chunks of memory (as required for `execv`). /// Moreover, their copying must be done before calling `vfork`, so after `vfork` do a minimum of things. @@ -157,7 +169,8 @@ std::unique_ptr ShellCommand::execute(const std::string & command, } -std::unique_ptr ShellCommand::executeDirect(const std::string & path, const std::vector & arguments, bool terminate_in_destructor) +std::unique_ptr ShellCommand::executeDirect( + const std::string & path, const std::vector & arguments, bool terminate_in_destructor) { size_t argv_sum_size = path.size() + 1; for (const auto & arg : arguments) @@ -186,6 +199,10 @@ int ShellCommand::tryWait() { wait_called = true; + in.close(); + out.close(); + err.close(); + LOG_TRACE(getLogger(), "Will wait for shell command pid {}", pid); int status = 0; diff --git a/src/Common/StatusInfo.h b/src/Common/StatusInfo.h index 89365f0634f..de92bb838ba 100644 --- a/src/Common/StatusInfo.h +++ b/src/Common/StatusInfo.h @@ -4,7 +4,8 @@ #include #include #include -#include +#include +#include #include #include diff --git a/src/Common/TaskStatsInfoGetter.cpp b/src/Common/TaskStatsInfoGetter.cpp index 40b92917343..92978a0ad8c 100644 --- a/src/Common/TaskStatsInfoGetter.cpp +++ b/src/Common/TaskStatsInfoGetter.cpp @@ -1,6 +1,6 @@ #include "TaskStatsInfoGetter.h" #include -#include +#include #include diff --git a/src/Common/TaskStatsInfoGetter.h b/src/Common/TaskStatsInfoGetter.h index 6865c64dc38..00ecf91c475 100644 --- a/src/Common/TaskStatsInfoGetter.h +++ b/src/Common/TaskStatsInfoGetter.h @@ -1,7 +1,7 @@ #pragma once #include -#include +#include #include struct taskstats; diff --git a/src/Common/ThreadProfileEvents.h b/src/Common/ThreadProfileEvents.h index 6bec7b38db5..69db595b426 100644 --- a/src/Common/ThreadProfileEvents.h +++ b/src/Common/ThreadProfileEvents.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include #include diff --git a/src/Common/UTF8Helpers.h b/src/Common/UTF8Helpers.h index 129a745afe2..e795b6846b2 100644 --- a/src/Common/UTF8Helpers.h +++ b/src/Common/UTF8Helpers.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include diff --git a/src/Common/UnicodeBar.h b/src/Common/UnicodeBar.h index 13c39f680aa..9a5bcecbd62 100644 --- a/src/Common/UnicodeBar.h +++ b/src/Common/UnicodeBar.h @@ -3,7 +3,7 @@ #include #include #include -#include +#include #define UNICODE_BAR_CHAR_SIZE (strlen("█")) diff --git a/src/Common/Volnitsky.h b/src/Common/Volnitsky.h index af97dbdae13..a1fa83b4f33 100644 --- a/src/Common/Volnitsky.h +++ b/src/Common/Volnitsky.h @@ -4,7 +4,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/src/Common/ZooKeeper/IKeeper.h b/src/Common/ZooKeeper/IKeeper.h index 409c3838147..9d4a2ebb16a 100644 --- a/src/Common/ZooKeeper/IKeeper.h +++ b/src/Common/ZooKeeper/IKeeper.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include diff --git a/src/Common/ZooKeeper/TestKeeper.cpp b/src/Common/ZooKeeper/TestKeeper.cpp index 1b203d92fb8..4f7beadef5f 100644 --- a/src/Common/ZooKeeper/TestKeeper.cpp +++ b/src/Common/ZooKeeper/TestKeeper.cpp @@ -1,7 +1,7 @@ #include #include #include -#include +#include #include #include diff --git a/src/Common/ZooKeeper/ZooKeeperImpl.h b/src/Common/ZooKeeper/ZooKeeperImpl.h index 305ee46d58a..085b0e9856a 100644 --- a/src/Common/ZooKeeper/ZooKeeperImpl.h +++ b/src/Common/ZooKeeper/ZooKeeperImpl.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include #include diff --git a/src/Common/createHardLink.h b/src/Common/createHardLink.h index 8f8e5c27d9f..c2b01cf817b 100644 --- a/src/Common/createHardLink.h +++ b/src/Common/createHardLink.h @@ -1,6 +1,6 @@ #pragma once -#include +#include namespace DB { diff --git a/src/Common/filesystemHelpers.h b/src/Common/filesystemHelpers.h index 80a1cf10cb4..f97f91d2647 100644 --- a/src/Common/filesystemHelpers.h +++ b/src/Common/filesystemHelpers.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include diff --git a/src/Common/intExp.h b/src/Common/intExp.h index 8a52015c54a..bc977a41d33 100644 --- a/src/Common/intExp.h +++ b/src/Common/intExp.h @@ -3,7 +3,7 @@ #include #include -#include +#include // Also defined in Core/Defines.h #if !defined(NO_SANITIZE_UNDEFINED) diff --git a/src/Common/isLocalAddress.cpp b/src/Common/isLocalAddress.cpp index 3e81ecd935c..8da281e3051 100644 --- a/src/Common/isLocalAddress.cpp +++ b/src/Common/isLocalAddress.cpp @@ -1,7 +1,7 @@ #include #include -#include +#include #include #include #include diff --git a/src/Common/oclBasics.h b/src/Common/oclBasics.h index 7c977830e82..a3e7636af1b 100644 --- a/src/Common/oclBasics.h +++ b/src/Common/oclBasics.h @@ -14,7 +14,7 @@ #endif #include -#include +#include #include diff --git a/src/Common/parseRemoteDescription.h b/src/Common/parseRemoteDescription.h index cbc73380628..6ba0bb4737f 100644 --- a/src/Common/parseRemoteDescription.h +++ b/src/Common/parseRemoteDescription.h @@ -1,5 +1,5 @@ #pragma once -#include +#include #include namespace DB { diff --git a/src/Common/quoteString.h b/src/Common/quoteString.h index 426034e4803..3d395a35b03 100644 --- a/src/Common/quoteString.h +++ b/src/Common/quoteString.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include diff --git a/src/Common/randomSeed.cpp b/src/Common/randomSeed.cpp index 4d466d283c9..8ad624febdd 100644 --- a/src/Common/randomSeed.cpp +++ b/src/Common/randomSeed.cpp @@ -4,7 +4,7 @@ #include #include #include -#include +#include namespace DB diff --git a/src/Common/randomSeed.h b/src/Common/randomSeed.h index e2b8310f79c..4f04e4b974a 100644 --- a/src/Common/randomSeed.h +++ b/src/Common/randomSeed.h @@ -1,7 +1,7 @@ #pragma once #include -#include +#include /** Returns a number suitable as seed for PRNG. Use clock_gettime, pid and so on. */ DB::UInt64 randomSeed(); diff --git a/src/Common/remapExecutable.cpp b/src/Common/remapExecutable.cpp new file mode 100644 index 00000000000..13bce459022 --- /dev/null +++ b/src/Common/remapExecutable.cpp @@ -0,0 +1,201 @@ +#if defined(__linux__) && defined(__amd64__) && defined(__SSE2__) && !defined(SANITIZER) && defined(NDEBUG) && !defined(SPLIT_SHARED_LIBRARIES) + +#include +#include +#include + +#include + +#include + +#include +#include +#include +#include +#include + +#include "remapExecutable.h" + + +namespace DB +{ + +namespace ErrorCodes +{ + extern const int LOGICAL_ERROR; + extern const int CANNOT_ALLOCATE_MEMORY; +} + + +namespace +{ + +uintptr_t readAddressHex(DB::ReadBuffer & in) +{ + uintptr_t res = 0; + while (!in.eof()) + { + if (isHexDigit(*in.position())) + { + res *= 16; + res += unhex(*in.position()); + ++in.position(); + } + else + break; + } + return res; +} + + +/** Find the address and size of the mapped memory region pointed by ptr. + */ +std::pair getMappedArea(void * ptr) +{ + using namespace DB; + + uintptr_t uintptr = reinterpret_cast(ptr); + ReadBufferFromFile in("/proc/self/maps"); + + while (!in.eof()) + { + uintptr_t begin = readAddressHex(in); + assertChar('-', in); + uintptr_t end = readAddressHex(in); + skipToNextLineOrEOF(in); + + if (begin <= uintptr && uintptr < end) + return {reinterpret_cast(begin), end - begin}; + } + + throw Exception("Cannot find mapped area for pointer", ErrorCodes::LOGICAL_ERROR); +} + + +__attribute__((__noinline__)) int64_t our_syscall(...) +{ + __asm__ __volatile__ (R"( + movq %%rdi,%%rax; + movq %%rsi,%%rdi; + movq %%rdx,%%rsi; + movq %%rcx,%%rdx; + movq %%r8,%%r10; + movq %%r9,%%r8; + movq 8(%%rsp),%%r9; + syscall; + ret + )" : : : "memory"); + return 0; +} + + +__attribute__((__noinline__)) void remapToHugeStep3(void * scratch, size_t size, size_t offset) +{ + /// The function should not use the stack, otherwise various optimizations, including "omit-frame-pointer" may break the code. + + /// Unmap the scratch area. + our_syscall(SYS_munmap, scratch, size); + + /** The return address of this function is pointing to scratch area (because it was called from there). + * But the scratch area no longer exists. We should correct the return address by subtracting the offset. + */ + __asm__ __volatile__("subq %0, 8(%%rsp)" : : "r"(offset) : "memory"); +} + + +__attribute__((__noinline__)) void remapToHugeStep2(void * begin, size_t size, void * scratch) +{ + /** Unmap old memory region with the code of our program. + * Our instruction pointer is located inside scratch area and this function can execute after old code is unmapped. + * But it cannot call any other functions because they are not available at usual addresses + * - that's why we have to use "our_syscall" function and a substitution for memcpy. + * (Relative addressing may continue to work but we should not assume that). + */ + + int64_t offset = reinterpret_cast(scratch) - reinterpret_cast(begin); + int64_t (*syscall_func)(...) = reinterpret_cast(reinterpret_cast(our_syscall) + offset); + + int64_t munmap_res = syscall_func(SYS_munmap, begin, size); + if (munmap_res != 0) + return; + + /// Map new anonymous memory region in place of old region with code. + + int64_t mmap_res = syscall_func(SYS_mmap, begin, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0); + if (-1 == mmap_res) + syscall_func(SYS_exit, 1); + + /// As the memory region is anonymous, we can do madvise with MADV_HUGEPAGE. + + syscall_func(SYS_madvise, begin, size, MADV_HUGEPAGE); + + /// Copy the code from scratch area to the old memory location. + + { + __m128i * __restrict dst = reinterpret_cast<__m128i *>(begin); + const __m128i * __restrict src = reinterpret_cast(scratch); + const __m128i * __restrict src_end = reinterpret_cast(reinterpret_cast(scratch) + size); + while (src < src_end) + { + _mm_storeu_si128(dst, _mm_loadu_si128(src)); + + ++dst; + ++src; + } + } + + /// Make the memory area with the code executable and non-writable. + + syscall_func(SYS_mprotect, begin, size, PROT_READ | PROT_EXEC); + + /** Step 3 function should unmap the scratch area. + * The currently executed code is located in the scratch area and cannot be removed here. + * We have to call another function and use its address from the original location (not in scratch area). + * To do it, we obtain its pointer and call by pointer. + */ + + void(* volatile step3)(void*, size_t, size_t) = remapToHugeStep3; + step3(scratch, size, offset); +} + + +__attribute__((__noinline__)) void remapToHugeStep1(void * begin, size_t size) +{ + /// Allocate scratch area and copy the code there. + + void * scratch = mmap(nullptr, size, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (MAP_FAILED == scratch) + throwFromErrno(fmt::format("Cannot mmap {} bytes", size), ErrorCodes::CANNOT_ALLOCATE_MEMORY); + + memcpy(scratch, begin, size); + + /// Offset to the scratch area from previous location. + + int64_t offset = reinterpret_cast(scratch) - reinterpret_cast(begin); + + /// Jump to the next function inside the scratch area. + + reinterpret_cast(reinterpret_cast(remapToHugeStep2) + offset)(begin, size, scratch); +} + +} + + +void remapExecutable() +{ + auto [begin, size] = getMappedArea(reinterpret_cast(remapExecutable)); + remapToHugeStep1(begin, size); +} + +} + +#else + +namespace DB +{ + +void remapExecutable() {} + +} + +#endif diff --git a/src/Common/remapExecutable.h b/src/Common/remapExecutable.h new file mode 100644 index 00000000000..7acb61f13bd --- /dev/null +++ b/src/Common/remapExecutable.h @@ -0,0 +1,7 @@ +namespace DB +{ + +/// This function tries to reallocate the code of the running program in a more efficient way. +void remapExecutable(); + +} diff --git a/src/Common/tests/CMakeLists.txt b/src/Common/tests/CMakeLists.txt index f6c232cdd22..8de9424e044 100644 --- a/src/Common/tests/CMakeLists.txt +++ b/src/Common/tests/CMakeLists.txt @@ -84,3 +84,6 @@ target_link_libraries (procfs_metrics_provider_perf PRIVATE clickhouse_common_io add_executable (average average.cpp) target_link_libraries (average PRIVATE clickhouse_common_io) + +add_executable (shell_command_inout shell_command_inout.cpp) +target_link_libraries (shell_command_inout PRIVATE clickhouse_common_io) diff --git a/src/Common/tests/average.cpp b/src/Common/tests/average.cpp index 900e99ee752..5f3b13af8e8 100644 --- a/src/Common/tests/average.cpp +++ b/src/Common/tests/average.cpp @@ -3,7 +3,7 @@ #include -#include +#include #include #include #include diff --git a/src/Common/tests/gtest_shell_command.cpp b/src/Common/tests/gtest_shell_command.cpp index 057a4d22648..4d578422962 100644 --- a/src/Common/tests/gtest_shell_command.cpp +++ b/src/Common/tests/gtest_shell_command.cpp @@ -1,5 +1,5 @@ #include -#include +#include #include #include #include diff --git a/src/Common/tests/integer_hash_tables_and_hashes.cpp b/src/Common/tests/integer_hash_tables_and_hashes.cpp index 5b090fa6e4e..f5d9150a6ad 100644 --- a/src/Common/tests/integer_hash_tables_and_hashes.cpp +++ b/src/Common/tests/integer_hash_tables_and_hashes.cpp @@ -12,7 +12,7 @@ //#define DBMS_HASH_MAP_COUNT_COLLISIONS //#define DBMS_HASH_MAP_DEBUG_RESIZES -#include +#include #include #include #include diff --git a/src/Common/tests/pod_array.cpp b/src/Common/tests/pod_array.cpp index 6e9634ba3cf..7ebf2670271 100644 --- a/src/Common/tests/pod_array.cpp +++ b/src/Common/tests/pod_array.cpp @@ -1,5 +1,5 @@ #include -#include +#include #include #define ASSERT_CHECK(cond, res) \ diff --git a/src/Common/tests/shell_command_inout.cpp b/src/Common/tests/shell_command_inout.cpp new file mode 100644 index 00000000000..615700cd042 --- /dev/null +++ b/src/Common/tests/shell_command_inout.cpp @@ -0,0 +1,47 @@ +#include + +#include +#include + +#include +#include +#include + +/** This example shows how we can proxy stdin to ShellCommand and obtain stdout in streaming fashion. */ + +int main(int argc, char ** argv) +try +{ + using namespace DB; + + if (argc < 2) + { + std::cerr << "Usage: shell_command_inout 'command...' < in > out\n"; + return 1; + } + + auto command = ShellCommand::execute(argv[1]); + + ReadBufferFromFileDescriptor in(STDIN_FILENO); + WriteBufferFromFileDescriptor out(STDOUT_FILENO); + WriteBufferFromFileDescriptor err(STDERR_FILENO); + + /// Background thread sends data and foreground thread receives result. + + std::thread thread([&] + { + copyData(in, command->in); + command->in.close(); + }); + + copyData(command->out, out); + copyData(command->err, err); + + thread.join(); + return 0; +} +catch (...) +{ + std::cerr << DB::getCurrentExceptionMessage(true) << '\n'; + throw; +} diff --git a/src/Common/ya.make b/src/Common/ya.make index d9a7a2ce4de..72f1fa42756 100644 --- a/src/Common/ya.make +++ b/src/Common/ya.make @@ -74,6 +74,7 @@ SRCS( QueryProfiler.cpp quoteString.cpp randomSeed.cpp + remapExecutable.cpp RemoteHostFilter.cpp renameat2.cpp RWLock.cpp diff --git a/src/Compression/CachedCompressedReadBuffer.cpp b/src/Compression/CachedCompressedReadBuffer.cpp index 1b083c004c0..3fb45ab0948 100644 --- a/src/Compression/CachedCompressedReadBuffer.cpp +++ b/src/Compression/CachedCompressedReadBuffer.cpp @@ -72,9 +72,10 @@ bool CachedCompressedReadBuffer::nextImpl() } CachedCompressedReadBuffer::CachedCompressedReadBuffer( - const std::string & path_, std::function()> file_in_creator_, UncompressedCache * cache_) + const std::string & path_, std::function()> file_in_creator_, UncompressedCache * cache_, bool allow_different_codecs_) : ReadBuffer(nullptr, 0), file_in_creator(std::move(file_in_creator_)), cache(cache_), path(path_), file_pos(0) { + allow_different_codecs = allow_different_codecs_; } void CachedCompressedReadBuffer::seek(size_t offset_in_compressed_file, size_t offset_in_decompressed_block) diff --git a/src/Compression/CachedCompressedReadBuffer.h b/src/Compression/CachedCompressedReadBuffer.h index 88bcec8197d..c2338f6f841 100644 --- a/src/Compression/CachedCompressedReadBuffer.h +++ b/src/Compression/CachedCompressedReadBuffer.h @@ -38,7 +38,7 @@ private: clockid_t clock_type {}; public: - CachedCompressedReadBuffer(const std::string & path, std::function()> file_in_creator, UncompressedCache * cache_); + CachedCompressedReadBuffer(const std::string & path, std::function()> file_in_creator, UncompressedCache * cache_, bool allow_different_codecs_ = false); void seek(size_t offset_in_compressed_file, size_t offset_in_decompressed_block); diff --git a/src/Compression/CompressedReadBufferBase.cpp b/src/Compression/CompressedReadBufferBase.cpp index a05b5cd7f64..be2f697e1b3 100644 --- a/src/Compression/CompressedReadBufferBase.cpp +++ b/src/Compression/CompressedReadBufferBase.cpp @@ -105,13 +105,24 @@ size_t CompressedReadBufferBase::readCompressedData(size_t & size_decompressed, uint8_t method = ICompressionCodec::readMethod(own_compressed_buffer.data()); if (!codec) + { codec = CompressionCodecFactory::instance().get(method); + } else if (method != codec->getMethodByte()) - throw Exception("Data compressed with different methods, given method byte 0x" - + getHexUIntLowercase(method) - + ", previous method byte 0x" - + getHexUIntLowercase(codec->getMethodByte()), - ErrorCodes::CANNOT_DECOMPRESS); + { + if (allow_different_codecs) + { + codec = CompressionCodecFactory::instance().get(method); + } + else + { + throw Exception("Data compressed with different methods, given method byte 0x" + + getHexUIntLowercase(method) + + ", previous method byte 0x" + + getHexUIntLowercase(codec->getMethodByte()), + ErrorCodes::CANNOT_DECOMPRESS); + } + } size_compressed_without_checksum = ICompressionCodec::readCompressedBlockSize(own_compressed_buffer.data()); size_decompressed = ICompressionCodec::readDecompressedBlockSize(own_compressed_buffer.data()); @@ -163,21 +174,32 @@ void CompressedReadBufferBase::decompress(char * to, size_t size_decompressed, s uint8_t method = ICompressionCodec::readMethod(compressed_buffer); if (!codec) + { codec = CompressionCodecFactory::instance().get(method); + } else if (codec->getMethodByte() != method) - throw Exception("Data compressed with different methods, given method byte " - + getHexUIntLowercase(method) - + ", previous method byte " - + getHexUIntLowercase(codec->getMethodByte()), - ErrorCodes::CANNOT_DECOMPRESS); + { + if (allow_different_codecs) + { + codec = CompressionCodecFactory::instance().get(method); + } + else + { + throw Exception("Data compressed with different methods, given method byte " + + getHexUIntLowercase(method) + + ", previous method byte " + + getHexUIntLowercase(codec->getMethodByte()), + ErrorCodes::CANNOT_DECOMPRESS); + } + } codec->decompress(compressed_buffer, size_compressed_without_checksum, to); } /// 'compressed_in' could be initialized lazily, but before first call of 'readCompressedData'. -CompressedReadBufferBase::CompressedReadBufferBase(ReadBuffer * in) - : compressed_in(in), own_compressed_buffer(0) +CompressedReadBufferBase::CompressedReadBufferBase(ReadBuffer * in, bool allow_different_codecs_) + : compressed_in(in), own_compressed_buffer(0), allow_different_codecs(allow_different_codecs_) { } diff --git a/src/Compression/CompressedReadBufferBase.h b/src/Compression/CompressedReadBufferBase.h index f44140dcd04..71dc5274d5b 100644 --- a/src/Compression/CompressedReadBufferBase.h +++ b/src/Compression/CompressedReadBufferBase.h @@ -26,6 +26,9 @@ protected: /// Don't checksum on decompressing. bool disable_checksum = false; + /// Allow reading data, compressed by different codecs from one file. + bool allow_different_codecs; + /// Read compressed data into compressed_buffer. Get size of decompressed data from block header. Checksum if need. /// Returns number of compressed bytes read. size_t readCompressedData(size_t & size_decompressed, size_t & size_compressed_without_checksum); @@ -34,7 +37,7 @@ protected: public: /// 'compressed_in' could be initialized lazily, but before first call of 'readCompressedData'. - CompressedReadBufferBase(ReadBuffer * in = nullptr); + CompressedReadBufferBase(ReadBuffer * in = nullptr, bool allow_different_codecs_ = false); ~CompressedReadBufferBase(); /** Disable checksums. diff --git a/src/Compression/CompressedReadBufferFromFile.cpp b/src/Compression/CompressedReadBufferFromFile.cpp index 8d6a42eacd3..f3fa2d6bc10 100644 --- a/src/Compression/CompressedReadBufferFromFile.cpp +++ b/src/Compression/CompressedReadBufferFromFile.cpp @@ -36,20 +36,22 @@ bool CompressedReadBufferFromFile::nextImpl() return true; } -CompressedReadBufferFromFile::CompressedReadBufferFromFile(std::unique_ptr buf) +CompressedReadBufferFromFile::CompressedReadBufferFromFile(std::unique_ptr buf, bool allow_different_codecs_) : BufferWithOwnMemory(0), p_file_in(std::move(buf)), file_in(*p_file_in) { compressed_in = &file_in; + allow_different_codecs = allow_different_codecs_; } CompressedReadBufferFromFile::CompressedReadBufferFromFile( - const std::string & path, size_t estimated_size, size_t aio_threshold, size_t mmap_threshold, size_t buf_size) + const std::string & path, size_t estimated_size, size_t aio_threshold, size_t mmap_threshold, size_t buf_size, bool allow_different_codecs_) : BufferWithOwnMemory(0) , p_file_in(createReadBufferFromFileBase(path, estimated_size, aio_threshold, mmap_threshold, buf_size)) , file_in(*p_file_in) { compressed_in = &file_in; + allow_different_codecs = allow_different_codecs_; } diff --git a/src/Compression/CompressedReadBufferFromFile.h b/src/Compression/CompressedReadBufferFromFile.h index 1729490f606..166b2595ef9 100644 --- a/src/Compression/CompressedReadBufferFromFile.h +++ b/src/Compression/CompressedReadBufferFromFile.h @@ -28,10 +28,11 @@ private: bool nextImpl() override; public: - CompressedReadBufferFromFile(std::unique_ptr buf); + CompressedReadBufferFromFile(std::unique_ptr buf, bool allow_different_codecs_ = false); CompressedReadBufferFromFile( - const std::string & path, size_t estimated_size, size_t aio_threshold, size_t mmap_threshold, size_t buf_size = DBMS_DEFAULT_BUFFER_SIZE); + const std::string & path, size_t estimated_size, size_t aio_threshold, size_t mmap_threshold, + size_t buf_size = DBMS_DEFAULT_BUFFER_SIZE, bool allow_different_codecs_ = false); void seek(size_t offset_in_compressed_file, size_t offset_in_decompressed_block); diff --git a/src/Compression/CompressedWriteBuffer.cpp b/src/Compression/CompressedWriteBuffer.cpp index 092da9e4364..02f418dcdf7 100644 --- a/src/Compression/CompressedWriteBuffer.cpp +++ b/src/Compression/CompressedWriteBuffer.cpp @@ -2,7 +2,7 @@ #include #include -#include +#include #include "CompressedWriteBuffer.h" #include diff --git a/src/Compression/CompressionCodecDelta.cpp b/src/Compression/CompressionCodecDelta.cpp index 51bd19f646b..ecb7c36b205 100644 --- a/src/Compression/CompressionCodecDelta.cpp +++ b/src/Compression/CompressionCodecDelta.cpp @@ -36,6 +36,11 @@ ASTPtr CompressionCodecDelta::getCodecDesc() const return makeASTFunction("Delta", literal); } +void CompressionCodecDelta::updateHash(SipHash & hash) const +{ + getCodecDesc()->updateTreeHash(hash); +} + namespace { diff --git a/src/Compression/CompressionCodecDelta.h b/src/Compression/CompressionCodecDelta.h index 5c3979e063e..a192fab051a 100644 --- a/src/Compression/CompressionCodecDelta.h +++ b/src/Compression/CompressionCodecDelta.h @@ -14,7 +14,10 @@ public: ASTPtr getCodecDesc() const override; + void updateHash(SipHash & hash) const override; + protected: + UInt32 doCompressData(const char * source, UInt32 source_size, char * dest) const override; void doDecompressData(const char * source, UInt32 source_size, char * dest, UInt32 uncompressed_size) const override; diff --git a/src/Compression/CompressionCodecDoubleDelta.cpp b/src/Compression/CompressionCodecDoubleDelta.cpp index 157e2df1a3f..dd2e95a916d 100644 --- a/src/Compression/CompressionCodecDoubleDelta.cpp +++ b/src/Compression/CompressionCodecDoubleDelta.cpp @@ -339,6 +339,12 @@ ASTPtr CompressionCodecDoubleDelta::getCodecDesc() const return std::make_shared("DoubleDelta"); } +void CompressionCodecDoubleDelta::updateHash(SipHash & hash) const +{ + getCodecDesc()->updateTreeHash(hash); + hash.update(data_bytes_size); +} + UInt32 CompressionCodecDoubleDelta::getMaxCompressedDataSize(UInt32 uncompressed_size) const { const auto result = 2 // common header diff --git a/src/Compression/CompressionCodecDoubleDelta.h b/src/Compression/CompressionCodecDoubleDelta.h index a2690d24414..30ef086077d 100644 --- a/src/Compression/CompressionCodecDoubleDelta.h +++ b/src/Compression/CompressionCodecDoubleDelta.h @@ -100,7 +100,10 @@ public: ASTPtr getCodecDesc() const override; + void updateHash(SipHash & hash) const override; + protected: + UInt32 doCompressData(const char * source, UInt32 source_size, char * dest) const override; void doDecompressData(const char * source, UInt32 source_size, char * dest, UInt32 uncompressed_size) const override; diff --git a/src/Compression/CompressionCodecGorilla.cpp b/src/Compression/CompressionCodecGorilla.cpp index 042835f4a32..3d08734fe91 100644 --- a/src/Compression/CompressionCodecGorilla.cpp +++ b/src/Compression/CompressionCodecGorilla.cpp @@ -254,6 +254,12 @@ ASTPtr CompressionCodecGorilla::getCodecDesc() const return std::make_shared("Gorilla"); } +void CompressionCodecGorilla::updateHash(SipHash & hash) const +{ + getCodecDesc()->updateTreeHash(hash); + hash.update(data_bytes_size); +} + UInt32 CompressionCodecGorilla::getMaxCompressedDataSize(UInt32 uncompressed_size) const { const auto result = 2 // common header diff --git a/src/Compression/CompressionCodecGorilla.h b/src/Compression/CompressionCodecGorilla.h index 523add0700f..df0f329dc31 100644 --- a/src/Compression/CompressionCodecGorilla.h +++ b/src/Compression/CompressionCodecGorilla.h @@ -97,7 +97,10 @@ public: ASTPtr getCodecDesc() const override; + void updateHash(SipHash & hash) const override; + protected: + UInt32 doCompressData(const char * source, UInt32 source_size, char * dest) const override; void doDecompressData(const char * source, UInt32 source_size, char * dest, UInt32 uncompressed_size) const override; diff --git a/src/Compression/CompressionCodecLZ4.cpp b/src/Compression/CompressionCodecLZ4.cpp index cf3622cd702..1370349d68d 100644 --- a/src/Compression/CompressionCodecLZ4.cpp +++ b/src/Compression/CompressionCodecLZ4.cpp @@ -35,6 +35,11 @@ ASTPtr CompressionCodecLZ4::getCodecDesc() const return std::make_shared("LZ4"); } +void CompressionCodecLZ4::updateHash(SipHash & hash) const +{ + getCodecDesc()->updateTreeHash(hash); +} + UInt32 CompressionCodecLZ4::getMaxCompressedDataSize(UInt32 uncompressed_size) const { return LZ4_COMPRESSBOUND(uncompressed_size); diff --git a/src/Compression/CompressionCodecLZ4.h b/src/Compression/CompressionCodecLZ4.h index 2f19af08185..229e25481e6 100644 --- a/src/Compression/CompressionCodecLZ4.h +++ b/src/Compression/CompressionCodecLZ4.h @@ -18,6 +18,8 @@ public: UInt32 getAdditionalSizeAtTheEndOfBuffer() const override { return LZ4::ADDITIONAL_BYTES_AT_END_OF_BUFFER; } + void updateHash(SipHash & hash) const override; + protected: UInt32 doCompressData(const char * source, UInt32 source_size, char * dest) const override; diff --git a/src/Compression/CompressionCodecMultiple.cpp b/src/Compression/CompressionCodecMultiple.cpp index 868df90825e..77f0fc132fe 100644 --- a/src/Compression/CompressionCodecMultiple.cpp +++ b/src/Compression/CompressionCodecMultiple.cpp @@ -37,6 +37,12 @@ ASTPtr CompressionCodecMultiple::getCodecDesc() const return result; } +void CompressionCodecMultiple::updateHash(SipHash & hash) const +{ + for (const auto & codec : codecs) + codec->updateHash(hash); +} + UInt32 CompressionCodecMultiple::getMaxCompressedDataSize(UInt32 uncompressed_size) const { UInt32 compressed_size = uncompressed_size; diff --git a/src/Compression/CompressionCodecMultiple.h b/src/Compression/CompressionCodecMultiple.h index cd50d3250e3..6bac189bdf7 100644 --- a/src/Compression/CompressionCodecMultiple.h +++ b/src/Compression/CompressionCodecMultiple.h @@ -19,7 +19,10 @@ public: static std::vector getCodecsBytesFromData(const char * source); + void updateHash(SipHash & hash) const override; + protected: + UInt32 doCompressData(const char * source, UInt32 source_size, char * dest) const override; void doDecompressData(const char * source, UInt32 source_size, char * dest, UInt32 decompressed_size) const override; diff --git a/src/Compression/CompressionCodecNone.cpp b/src/Compression/CompressionCodecNone.cpp index 50c19b2b547..f727c4b4860 100644 --- a/src/Compression/CompressionCodecNone.cpp +++ b/src/Compression/CompressionCodecNone.cpp @@ -17,6 +17,11 @@ ASTPtr CompressionCodecNone::getCodecDesc() const return std::make_shared("NONE"); } +void CompressionCodecNone::updateHash(SipHash & hash) const +{ + getCodecDesc()->updateTreeHash(hash); +} + UInt32 CompressionCodecNone::doCompressData(const char * source, UInt32 source_size, char * dest) const { memcpy(dest, source, source_size); diff --git a/src/Compression/CompressionCodecNone.h b/src/Compression/CompressionCodecNone.h index ed604063198..370ef301694 100644 --- a/src/Compression/CompressionCodecNone.h +++ b/src/Compression/CompressionCodecNone.h @@ -15,7 +15,10 @@ public: ASTPtr getCodecDesc() const override; + void updateHash(SipHash & hash) const override; + protected: + UInt32 doCompressData(const char * source, UInt32 source_size, char * dest) const override; void doDecompressData(const char * source, UInt32 source_size, char * dest, UInt32 uncompressed_size) const override; diff --git a/src/Compression/CompressionCodecT64.cpp b/src/Compression/CompressionCodecT64.cpp index 16462e50ebd..30972a5fe1f 100644 --- a/src/Compression/CompressionCodecT64.cpp +++ b/src/Compression/CompressionCodecT64.cpp @@ -646,6 +646,13 @@ ASTPtr CompressionCodecT64::getCodecDesc() const return makeASTFunction("T64", literal); } +void CompressionCodecT64::updateHash(SipHash & hash) const +{ + getCodecDesc()->updateTreeHash(hash); + hash.update(type_idx); + hash.update(variant); +} + void registerCodecT64(CompressionCodecFactory & factory) { auto reg_func = [&](const ASTPtr & arguments, DataTypePtr type) -> CompressionCodecPtr diff --git a/src/Compression/CompressionCodecT64.h b/src/Compression/CompressionCodecT64.h index 11efbea0955..06c34ba0a4a 100644 --- a/src/Compression/CompressionCodecT64.h +++ b/src/Compression/CompressionCodecT64.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include @@ -35,6 +35,8 @@ public: ASTPtr getCodecDesc() const override; + void updateHash(SipHash & hash) const override; + protected: UInt32 doCompressData(const char * src, UInt32 src_size, char * dst) const override; void doDecompressData(const char * src, UInt32 src_size, char * dst, UInt32 uncompressed_size) const override; diff --git a/src/Compression/CompressionCodecZSTD.cpp b/src/Compression/CompressionCodecZSTD.cpp index ab48580533e..3b317884fec 100644 --- a/src/Compression/CompressionCodecZSTD.cpp +++ b/src/Compression/CompressionCodecZSTD.cpp @@ -32,6 +32,11 @@ ASTPtr CompressionCodecZSTD::getCodecDesc() const return makeASTFunction("ZSTD", literal); } +void CompressionCodecZSTD::updateHash(SipHash & hash) const +{ + getCodecDesc()->updateTreeHash(hash); +} + UInt32 CompressionCodecZSTD::getMaxCompressedDataSize(UInt32 uncompressed_size) const { return ZSTD_compressBound(uncompressed_size); diff --git a/src/Compression/CompressionCodecZSTD.h b/src/Compression/CompressionCodecZSTD.h index 2ad893083c3..3bfb6bb1d4d 100644 --- a/src/Compression/CompressionCodecZSTD.h +++ b/src/Compression/CompressionCodecZSTD.h @@ -21,7 +21,10 @@ public: UInt32 getMaxCompressedDataSize(UInt32 uncompressed_size) const override; + void updateHash(SipHash & hash) const override; + protected: + UInt32 doCompressData(const char * source, UInt32 source_size, char * dest) const override; void doDecompressData(const char * source, UInt32 source_size, char * dest, UInt32 uncompressed_size) const override; diff --git a/src/Compression/ICompressionCodec.cpp b/src/Compression/ICompressionCodec.cpp index 4aafc298658..5de015b2680 100644 --- a/src/Compression/ICompressionCodec.cpp +++ b/src/Compression/ICompressionCodec.cpp @@ -35,6 +35,13 @@ ASTPtr ICompressionCodec::getFullCodecDesc() const return result; } +UInt64 ICompressionCodec::getHash() const +{ + SipHash hash; + updateHash(hash); + return hash.get64(); +} + UInt32 ICompressionCodec::compress(const char * source, UInt32 source_size, char * dest) const { assert(source != nullptr && dest != nullptr); diff --git a/src/Compression/ICompressionCodec.h b/src/Compression/ICompressionCodec.h index fa1f73ce4dd..8d7d3fc800c 100644 --- a/src/Compression/ICompressionCodec.h +++ b/src/Compression/ICompressionCodec.h @@ -3,8 +3,9 @@ #include #include #include -#include +#include #include +#include namespace DB @@ -36,6 +37,10 @@ public: /// "CODEC(LZ4,LZ4HC(5))" ASTPtr getFullCodecDesc() const; + /// Hash, that depends on codec ast and optional parameters like data type + virtual void updateHash(SipHash & hash) const = 0; + UInt64 getHash() const; + /// Compressed bytes from uncompressed source to dest. Dest should preallocate memory UInt32 compress(const char * source, UInt32 source_size, char * dest) const; diff --git a/src/Compression/tests/gtest_compressionCodec.cpp b/src/Compression/tests/gtest_compressionCodec.cpp index 4677efce5da..e9470536ae8 100644 --- a/src/Compression/tests/gtest_compressionCodec.cpp +++ b/src/Compression/tests/gtest_compressionCodec.cpp @@ -2,7 +2,7 @@ #include #include -#include +#include #include #include #include diff --git a/src/Core/BlockInfo.cpp b/src/Core/BlockInfo.cpp index 78ee165bad1..9f88513cd3c 100644 --- a/src/Core/BlockInfo.cpp +++ b/src/Core/BlockInfo.cpp @@ -1,4 +1,4 @@ -#include +#include #include #include #include diff --git a/src/Core/BlockInfo.h b/src/Core/BlockInfo.h index 886ecd96ef4..c8dd1576b22 100644 --- a/src/Core/BlockInfo.h +++ b/src/Core/BlockInfo.h @@ -2,7 +2,7 @@ #include -#include +#include namespace DB diff --git a/src/Core/DecimalComparison.h b/src/Core/DecimalComparison.h index 93992029634..b9ae2a1fe79 100644 --- a/src/Core/DecimalComparison.h +++ b/src/Core/DecimalComparison.h @@ -129,7 +129,7 @@ private: Shift shift; if (decimal0 && decimal1) { - auto result_type = decimalResultType(*decimal0, *decimal1, false, false); + auto result_type = decimalResultType(*decimal0, *decimal1); shift.a = static_cast(result_type.scaleFactorFor(*decimal0, false).value); shift.b = static_cast(result_type.scaleFactorFor(*decimal1, false).value); } diff --git a/src/Core/DecimalFunctions.h b/src/Core/DecimalFunctions.h index b821d29dd0d..cd5a2b5a670 100644 --- a/src/Core/DecimalFunctions.h +++ b/src/Core/DecimalFunctions.h @@ -1,5 +1,4 @@ #pragma once -// Moved Decimal-related functions out from Core/Types.h to reduce compilation time. #include #include diff --git a/src/Core/ExternalResultDescription.cpp b/src/Core/ExternalResultDescription.cpp index 5ed34764909..941ee003c94 100644 --- a/src/Core/ExternalResultDescription.cpp +++ b/src/Core/ExternalResultDescription.cpp @@ -1,9 +1,11 @@ #include "ExternalResultDescription.h" #include #include +#include #include #include #include +#include #include #include #include @@ -64,6 +66,14 @@ void ExternalResultDescription::init(const Block & sample_block_) types.emplace_back(ValueType::vtString, is_nullable); else if (typeid_cast(type)) types.emplace_back(ValueType::vtString, is_nullable); + else if (typeid_cast(type)) + types.emplace_back(ValueType::vtDateTime64, is_nullable); + else if (typeid_cast *>(type)) + types.emplace_back(ValueType::vtDecimal32, is_nullable); + else if (typeid_cast *>(type)) + types.emplace_back(ValueType::vtDecimal64, is_nullable); + else if (typeid_cast *>(type)) + types.emplace_back(ValueType::vtDecimal128, is_nullable); else throw Exception{"Unsupported type " + type->getName(), ErrorCodes::UNKNOWN_TYPE}; } diff --git a/src/Core/ExternalResultDescription.h b/src/Core/ExternalResultDescription.h index 0bd77afa628..29294fcf2c8 100644 --- a/src/Core/ExternalResultDescription.h +++ b/src/Core/ExternalResultDescription.h @@ -26,6 +26,10 @@ struct ExternalResultDescription vtDate, vtDateTime, vtUUID, + vtDateTime64, + vtDecimal32, + vtDecimal64, + vtDecimal128 }; Block sample_block; diff --git a/src/Core/MultiEnum.h b/src/Core/MultiEnum.h new file mode 100644 index 00000000000..ddfc5b13e86 --- /dev/null +++ b/src/Core/MultiEnum.h @@ -0,0 +1,99 @@ +#pragma once + +#include +#include + +// Wrapper around enum that can have multiple values (or none) set at once. +template +struct MultiEnum +{ + using StorageType = StorageTypeT; + using EnumType = EnumTypeT; + + MultiEnum() = default; + + template ...>>> + explicit MultiEnum(EnumValues ... v) + : MultiEnum((toBitFlag(v) | ... | 0u)) + {} + + template >> + explicit MultiEnum(ValueType v) + : bitset(v) + { + static_assert(std::is_unsigned_v); + static_assert(std::is_unsigned_v && std::is_integral_v); + } + + MultiEnum(const MultiEnum & other) = default; + MultiEnum & operator=(const MultiEnum & other) = default; + + bool isSet(EnumType value) const + { + return bitset & toBitFlag(value); + } + + void set(EnumType value) + { + bitset |= toBitFlag(value); + } + + void unSet(EnumType value) + { + bitset &= ~(toBitFlag(value)); + } + + void reset() + { + bitset = 0; + } + + StorageType getValue() const + { + return bitset; + } + + template >> + void setValue(ValueType new_value) + { + // Can't set value from any enum avoid confusion + static_assert(!std::is_enum_v); + bitset = new_value; + } + + bool operator==(const MultiEnum & other) const + { + return bitset == other.bitset; + } + + template >> + bool operator==(ValueType other) const + { + // Shouldn't be comparable with any enum to avoid confusion + static_assert(!std::is_enum_v); + return bitset == other; + } + + template + bool operator!=(U && other) const + { + return !(*this == other); + } + + template >> + friend bool operator==(ValueType left, MultiEnum right) + { + return right.operator==(left); + } + + template + friend bool operator!=(L left, MultiEnum right) + { + return !(right.operator==(left)); + } + +private: + StorageType bitset = 0; + + static StorageType toBitFlag(EnumType v) { return StorageType{1} << static_cast(v); } +}; diff --git a/src/Core/MySQL/Authentication.h b/src/Core/MySQL/Authentication.h index 3874655e523..e1b7c174139 100644 --- a/src/Core/MySQL/Authentication.h +++ b/src/Core/MySQL/Authentication.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include diff --git a/src/Core/MySQL/IMySQLReadPacket.cpp b/src/Core/MySQL/IMySQLReadPacket.cpp index 8fc8855c8a4..5f6bbc7bceb 100644 --- a/src/Core/MySQL/IMySQLReadPacket.cpp +++ b/src/Core/MySQL/IMySQLReadPacket.cpp @@ -50,21 +50,22 @@ uint64_t readLengthEncodedNumber(ReadBuffer & buffer) uint64_t buf = 0; buffer.readStrict(c); auto cc = static_cast(c); - if (cc < 0xfc) + switch (cc) { - return cc; - } - else if (cc < 0xfd) - { - buffer.readStrict(reinterpret_cast(&buf), 2); - } - else if (cc < 0xfe) - { - buffer.readStrict(reinterpret_cast(&buf), 3); - } - else - { - buffer.readStrict(reinterpret_cast(&buf), 8); + /// NULL + case 0xfb: + break; + case 0xfc: + buffer.readStrict(reinterpret_cast(&buf), 2); + break; + case 0xfd: + buffer.readStrict(reinterpret_cast(&buf), 3); + break; + case 0xfe: + buffer.readStrict(reinterpret_cast(&buf), 8); + break; + default: + return cc; } return buf; } diff --git a/src/Core/MySQL/MySQLClient.h b/src/Core/MySQL/MySQLClient.h index 3fb86b35833..a31794acc42 100644 --- a/src/Core/MySQL/MySQLClient.h +++ b/src/Core/MySQL/MySQLClient.h @@ -1,5 +1,5 @@ #pragma once -#include +#include #include #include #include diff --git a/src/Core/MySQL/MySQLReplication.cpp b/src/Core/MySQL/MySQLReplication.cpp index 42d077260f8..e7f113ba7af 100644 --- a/src/Core/MySQL/MySQLReplication.cpp +++ b/src/Core/MySQL/MySQLReplication.cpp @@ -171,7 +171,7 @@ namespace MySQLReplication /// Ignore MySQL 8.0 optional metadata fields. /// https://mysqlhighavailability.com/more-metadata-is-written-into-binary-log/ - payload.ignore(payload.available() - CHECKSUM_CRC32_SIGNATURE_LENGTH); + payload.ignoreAll(); } /// Types that do not used in the binlog event: @@ -221,6 +221,7 @@ namespace MySQLReplication } case MYSQL_TYPE_NEWDECIMAL: case MYSQL_TYPE_STRING: { + /// Big-Endian auto b0 = UInt16(meta[pos] << 8); auto b1 = UInt8(meta[pos + 1]); column_meta.emplace_back(UInt16(b0 + b1)); @@ -231,6 +232,7 @@ namespace MySQLReplication case MYSQL_TYPE_BIT: case MYSQL_TYPE_VARCHAR: case MYSQL_TYPE_VAR_STRING: { + /// Little-Endian auto b0 = UInt8(meta[pos]); auto b1 = UInt16(meta[pos + 1] << 8); column_meta.emplace_back(UInt16(b0 + b1)); @@ -911,7 +913,7 @@ namespace MySQLReplication break; } } - payload.tryIgnore(CHECKSUM_CRC32_SIGNATURE_LENGTH); + payload.ignoreAll(); } } diff --git a/src/Core/MySQL/MySQLReplication.h b/src/Core/MySQL/MySQLReplication.h index b63b103e87a..ad5e53ed200 100644 --- a/src/Core/MySQL/MySQLReplication.h +++ b/src/Core/MySQL/MySQLReplication.h @@ -2,7 +2,7 @@ #include #include #include -#include +#include #include #include diff --git a/src/Core/Protocol.h b/src/Core/Protocol.h index bc97e5d47d4..15630d0a6f8 100644 --- a/src/Core/Protocol.h +++ b/src/Core/Protocol.h @@ -1,6 +1,6 @@ #pragma once -#include +#include namespace DB diff --git a/src/Core/QueryProcessingStage.h b/src/Core/QueryProcessingStage.h index 658b504fc2c..b1ed4709df2 100644 --- a/src/Core/QueryProcessingStage.h +++ b/src/Core/QueryProcessingStage.h @@ -1,6 +1,6 @@ #pragma once -#include +#include namespace DB diff --git a/src/Core/Settings.h b/src/Core/Settings.h index d367297f900..b39c223a5e9 100644 --- a/src/Core/Settings.h +++ b/src/Core/Settings.h @@ -382,6 +382,7 @@ class IColumn; M(Bool, alter_partition_verbose_result, false, "Output information about affected parts. Currently works only for FREEZE and ATTACH commands.", 0) \ M(Bool, allow_experimental_database_materialize_mysql, false, "Allow to create database with Engine=MaterializeMySQL(...).", 0) \ M(Bool, system_events_show_zero_values, false, "Include all metrics, even with zero values", 0) \ + M(MySQLDataTypesSupport, mysql_datatypes_support_level, 0, "Which MySQL types should be converted to corresponding ClickHouse types (rather than being represented as String). Can be empty or any combination of 'decimal' or 'datetime64'. When empty MySQL's DECIMAL and DATETIME/TIMESTAMP with non-zero precison are seen as String on ClickHouse's side.", 0) \ \ /** Obsolete settings that do nothing but left for compatibility reasons. Remove each one after half a year of obsolescence. */ \ \ @@ -439,6 +440,7 @@ class IColumn; M(String, output_format_avro_codec, "", "Compression codec used for output. Possible values: 'null', 'deflate', 'snappy'.", 0) \ M(UInt64, output_format_avro_sync_interval, 16 * 1024, "Sync interval in bytes.", 0) \ M(Bool, output_format_tsv_crlf_end_of_line, false, "If it is set true, end of line in TSV format will be \\r\\n instead of \\n.", 0) \ + M(String, output_format_tsv_null_representation, "\\N", "Custom NULL representation in TSV format", 0) \ \ M(UInt64, input_format_allow_errors_num, 0, "Maximum absolute amount of errors while reading text formats (like CSV, TSV). In case of error, if at least absolute or relative amount of errors is lower than corresponding value, will skip until next line and continue.", 0) \ M(Float, input_format_allow_errors_ratio, 0, "Maximum relative amount of errors while reading text formats (like CSV, TSV). In case of error, if at least absolute or relative amount of errors is lower than corresponding value, will skip until next line and continue.", 0) \ diff --git a/src/Core/SettingsEnums.cpp b/src/Core/SettingsEnums.cpp index 1a03f5f4578..c0d2906e2fc 100644 --- a/src/Core/SettingsEnums.cpp +++ b/src/Core/SettingsEnums.cpp @@ -11,6 +11,7 @@ namespace ErrorCodes extern const int UNKNOWN_DISTRIBUTED_PRODUCT_MODE; extern const int UNKNOWN_JOIN; extern const int BAD_ARGUMENTS; + extern const int UNKNOWN_MYSQL_DATATYPES_SUPPORT_LEVEL; } @@ -91,4 +92,8 @@ IMPLEMENT_SETTING_ENUM_WITH_RENAME(DefaultDatabaseEngine, ErrorCodes::BAD_ARGUME {{"Ordinary", DefaultDatabaseEngine::Ordinary}, {"Atomic", DefaultDatabaseEngine::Atomic}}) +IMPLEMENT_SETTING_MULTI_ENUM(MySQLDataTypesSupport, ErrorCodes::UNKNOWN_MYSQL_DATATYPES_SUPPORT_LEVEL, + {{"decimal", MySQLDataTypesSupport::DECIMAL}, + {"datetime64", MySQLDataTypesSupport::DATETIME64}}) + } diff --git a/src/Core/SettingsEnums.h b/src/Core/SettingsEnums.h index 16ebef87e01..7ed5ffb0c35 100644 --- a/src/Core/SettingsEnums.h +++ b/src/Core/SettingsEnums.h @@ -126,4 +126,15 @@ enum class DefaultDatabaseEngine }; DECLARE_SETTING_ENUM(DefaultDatabaseEngine) + + +enum class MySQLDataTypesSupport +{ + DECIMAL, // convert MySQL's decimal and number to ClickHouse Decimal when applicable + DATETIME64, // convert MySQL's DATETIME and TIMESTAMP and ClickHouse DateTime64 if precision is > 0 or range is greater that for DateTime. + // ENUM +}; + +DECLARE_SETTING_MULTI_ENUM(MySQLDataTypesSupport) + } diff --git a/src/Core/SettingsFields.h b/src/Core/SettingsFields.h index ca774336f88..1a5676bd8a8 100644 --- a/src/Core/SettingsFields.h +++ b/src/Core/SettingsFields.h @@ -2,11 +2,13 @@ #include #include -#include +#include #include +#include #include #include #include +#include namespace DB @@ -328,6 +330,113 @@ void SettingFieldEnum::readBinary(ReadBuffer & in) throw Exception(msg, ERROR_CODE_FOR_UNEXPECTED_NAME); \ } +// Mostly like SettingFieldEnum, but can have multiple enum values (or none) set at once. +template +struct SettingFieldMultiEnum +{ + using EnumType = Enum; + using ValueType = MultiEnum; + using StorageType = typename ValueType::StorageType; + + ValueType value; + bool changed = false; + + explicit SettingFieldMultiEnum(ValueType v = ValueType{}) : value{v} {} + explicit SettingFieldMultiEnum(EnumType e) : value{e} {} + explicit SettingFieldMultiEnum(StorageType s) : value(s) {} + explicit SettingFieldMultiEnum(const Field & f) : value(parseValueFromString(f.safeGet())) {} + + operator ValueType() const { return value; } + explicit operator StorageType() const { return value.getValue(); } + explicit operator Field() const { return toString(); } + + SettingFieldMultiEnum & operator= (StorageType x) { changed = x != value.getValue(); value.setValue(x); return *this; } + SettingFieldMultiEnum & operator= (ValueType x) { changed = !(x == value); value = x; return *this; } + SettingFieldMultiEnum & operator= (const Field & x) { parseFromString(x.safeGet()); return *this; } + + String toString() const + { + static const String separator = ","; + String result; + for (StorageType i = 0; i < Traits::getEnumSize(); ++i) + { + const auto v = static_cast(i); + if (value.isSet(v)) + { + result += Traits::toString(v); + result += separator; + } + } + + if (result.size() > 0) + result.erase(result.size() - separator.size()); + + return result; + } + void parseFromString(const String & str) { *this = parseValueFromString(str); } + + void writeBinary(WriteBuffer & out) const; + void readBinary(ReadBuffer & in); + +private: + static ValueType parseValueFromString(const std::string_view str) + { + static const String separators=", "; + + ValueType result; + + //to avoid allocating memory on substr() + const std::string_view str_view{str}; + + auto value_start = str_view.find_first_not_of(separators); + while (value_start != std::string::npos) + { + auto value_end = str_view.find_first_of(separators, value_start + 1); + if (value_end == std::string::npos) + value_end = str_view.size(); + + result.set(Traits::fromString(str_view.substr(value_start, value_end - value_start))); + value_start = str_view.find_first_not_of(separators, value_end); + } + + return result; + } +}; + +template +void SettingFieldMultiEnum::writeBinary(WriteBuffer & out) const +{ + SettingFieldEnumHelpers::writeBinary(toString(), out); +} + +template +void SettingFieldMultiEnum::readBinary(ReadBuffer & in) +{ + parseFromString(SettingFieldEnumHelpers::readBinary(in)); +} + +#define DECLARE_SETTING_MULTI_ENUM(ENUM_TYPE) \ + DECLARE_SETTING_MULTI_ENUM_WITH_RENAME(ENUM_TYPE, ENUM_TYPE) + +#define DECLARE_SETTING_MULTI_ENUM_WITH_RENAME(ENUM_TYPE, NEW_NAME) \ + struct SettingField##NEW_NAME##Traits \ + { \ + using EnumType = ENUM_TYPE; \ + static size_t getEnumSize(); \ + static const String & toString(EnumType value); \ + static EnumType fromString(const std::string_view & str); \ + }; \ + \ + using SettingField##NEW_NAME = SettingFieldMultiEnum; + +#define IMPLEMENT_SETTING_MULTI_ENUM(ENUM_TYPE, ERROR_CODE_FOR_UNEXPECTED_NAME, ...) \ + IMPLEMENT_SETTING_MULTI_ENUM_WITH_RENAME(ENUM_TYPE, ERROR_CODE_FOR_UNEXPECTED_NAME, __VA_ARGS__) + +#define IMPLEMENT_SETTING_MULTI_ENUM_WITH_RENAME(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME, ...) \ + IMPLEMENT_SETTING_ENUM_WITH_RENAME(NEW_NAME, ERROR_CODE_FOR_UNEXPECTED_NAME, __VA_ARGS__)\ + size_t SettingField##NEW_NAME##Traits::getEnumSize() {\ + return std::initializer_list> __VA_ARGS__ .size();\ + } /// Can keep a value of any type. Used for user-defined settings. struct SettingFieldCustom diff --git a/src/Core/Types.h b/src/Core/Types.h index c23ac4a1379..3157598adc0 100644 --- a/src/Core/Types.h +++ b/src/Core/Types.h @@ -3,7 +3,7 @@ #include #include #include -#include +#include namespace DB @@ -13,6 +13,11 @@ namespace DB struct Null {}; +/// Ignore strange gcc warning https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55776 +#if !__clang__ +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wshadow" +#endif /// @note Except explicitly described you should not assume on TypeIndex numbers and/or their orders in this enum. enum class TypeIndex { @@ -52,27 +57,15 @@ enum class TypeIndex AggregateFunction, LowCardinality, }; +#if !__clang__ +#pragma GCC diagnostic pop +#endif -/// defined in common/types.h -using UInt8 = ::UInt8; -using UInt16 = ::UInt16; -using UInt32 = ::UInt32; -using UInt64 = ::UInt64; +/// Other int defines are in common/types.h using UInt256 = ::wUInt256; - -using Int8 = ::Int8; -using Int16 = ::Int16; -using Int32 = ::Int32; -using Int64 = ::Int64; using Int128 = ::Int128; using Int256 = ::wInt256; -using Float32 = float; -using Float64 = double; - -using String = std::string; - - /** Note that for types not used in DB, IsNumber is false. */ template constexpr bool IsNumber = false; diff --git a/src/Core/tests/gtest_multienum.cpp b/src/Core/tests/gtest_multienum.cpp new file mode 100644 index 00000000000..91cee6b316a --- /dev/null +++ b/src/Core/tests/gtest_multienum.cpp @@ -0,0 +1,158 @@ +#include + +#include +#include +#include + +namespace +{ + +using namespace DB; +enum class TestEnum : UInt8 +{ + // name represents which bit is going to be set + ZERO, + ONE, + TWO, + THREE, + FOUR, + FIVE +}; +} + +GTEST_TEST(MultiEnum, WithDefault) +{ + MultiEnum multi_enum; + ASSERT_EQ(0, multi_enum.getValue()); + ASSERT_EQ(0, multi_enum); + + ASSERT_FALSE(multi_enum.isSet(TestEnum::ZERO)); + ASSERT_FALSE(multi_enum.isSet(TestEnum::ONE)); + ASSERT_FALSE(multi_enum.isSet(TestEnum::TWO)); + ASSERT_FALSE(multi_enum.isSet(TestEnum::THREE)); + ASSERT_FALSE(multi_enum.isSet(TestEnum::FOUR)); + ASSERT_FALSE(multi_enum.isSet(TestEnum::FIVE)); +} + +GTEST_TEST(MultiEnum, WitheEnum) +{ + MultiEnum multi_enum(TestEnum::FOUR); + ASSERT_EQ(16, multi_enum.getValue()); + ASSERT_EQ(16, multi_enum); + + ASSERT_FALSE(multi_enum.isSet(TestEnum::ZERO)); + ASSERT_FALSE(multi_enum.isSet(TestEnum::ONE)); + ASSERT_FALSE(multi_enum.isSet(TestEnum::TWO)); + ASSERT_FALSE(multi_enum.isSet(TestEnum::THREE)); + ASSERT_TRUE(multi_enum.isSet(TestEnum::FOUR)); + ASSERT_FALSE(multi_enum.isSet(TestEnum::FIVE)); +} + +GTEST_TEST(MultiEnum, WithValue) +{ + const MultiEnum multi_enum(13u); // (1 | (1 << 2 | 1 << 3) + + ASSERT_TRUE(multi_enum.isSet(TestEnum::ZERO)); + ASSERT_FALSE(multi_enum.isSet(TestEnum::ONE)); + ASSERT_TRUE(multi_enum.isSet(TestEnum::TWO)); + ASSERT_TRUE(multi_enum.isSet(TestEnum::THREE)); + ASSERT_FALSE(multi_enum.isSet(TestEnum::FOUR)); + ASSERT_FALSE(multi_enum.isSet(TestEnum::FIVE)); +} + +GTEST_TEST(MultiEnum, WithMany) +{ + MultiEnum multi_enum{TestEnum::ONE, TestEnum::FIVE}; + ASSERT_EQ(1 << 1 | 1 << 5, multi_enum.getValue()); + ASSERT_EQ(1 << 1 | 1 << 5, multi_enum); + + ASSERT_FALSE(multi_enum.isSet(TestEnum::ZERO)); + ASSERT_TRUE(multi_enum.isSet(TestEnum::ONE)); + ASSERT_FALSE(multi_enum.isSet(TestEnum::TWO)); + ASSERT_FALSE(multi_enum.isSet(TestEnum::THREE)); + ASSERT_FALSE(multi_enum.isSet(TestEnum::FOUR)); + ASSERT_TRUE(multi_enum.isSet(TestEnum::FIVE)); +} + +GTEST_TEST(MultiEnum, WithCopyConstructor) +{ + const MultiEnum multi_enum_source{TestEnum::ONE, TestEnum::FIVE}; + MultiEnum multi_enum{multi_enum_source}; + + ASSERT_EQ(1 << 1 | 1 << 5, multi_enum.getValue()); +} + +GTEST_TEST(MultiEnum, SetAndUnSet) +{ + MultiEnum multi_enum; + multi_enum.set(TestEnum::ONE); + ASSERT_EQ(1 << 1, multi_enum); + + multi_enum.set(TestEnum::TWO); + ASSERT_EQ(1 << 1| (1 << 2), multi_enum); + + multi_enum.unSet(TestEnum::ONE); + ASSERT_EQ(1 << 2, multi_enum); +} + +GTEST_TEST(MultiEnum, SetValueOnDifferentTypes) +{ + MultiEnum multi_enum; + + multi_enum.setValue(static_cast(1)); + ASSERT_EQ(1, multi_enum); + + multi_enum.setValue(static_cast(2)); + ASSERT_EQ(2, multi_enum); + + multi_enum.setValue(static_cast(3)); + ASSERT_EQ(3, multi_enum); + + multi_enum.setValue(static_cast(4)); + ASSERT_EQ(4, multi_enum); +} + +// shouldn't compile +//GTEST_TEST(MultiEnum, WithOtherEnumType) +//{ +// MultiEnum multi_enum; + +// enum FOO {BAR, FOOBAR}; +// MultiEnum multi_enum2(BAR); +// MultiEnum multi_enum3(BAR, FOOBAR); +// multi_enum.setValue(FOO::BAR); +// multi_enum == FOO::BAR; +// FOO::BAR == multi_enum; +//} + +GTEST_TEST(MultiEnum, SetSameValueMultipleTimes) +{ + // Setting same value is idempotent. + MultiEnum multi_enum; + multi_enum.set(TestEnum::ONE); + ASSERT_EQ(1 << 1, multi_enum); + + multi_enum.set(TestEnum::ONE); + ASSERT_EQ(1 << 1, multi_enum); +} + +GTEST_TEST(MultiEnum, UnSetValuesThatWerentSet) +{ + // Unsetting values that weren't set shouldn't change other flags nor aggregate value. + MultiEnum multi_enum{TestEnum::ONE, TestEnum::THREE}; + multi_enum.unSet(TestEnum::TWO); + ASSERT_EQ(1 << 1 | 1 << 3, multi_enum); + + multi_enum.unSet(TestEnum::FOUR); + ASSERT_EQ(1 << 1 | 1 << 3, multi_enum); + + multi_enum.unSet(TestEnum::FIVE); + ASSERT_EQ(1 << 1 | 1 << 3, multi_enum); +} + +GTEST_TEST(MultiEnum, Reset) +{ + MultiEnum multi_enum{TestEnum::ONE, TestEnum::THREE}; + multi_enum.reset(); + ASSERT_EQ(0, multi_enum); +} diff --git a/src/Core/tests/gtest_settings.cpp b/src/Core/tests/gtest_settings.cpp new file mode 100644 index 00000000000..8833d86c397 --- /dev/null +++ b/src/Core/tests/gtest_settings.cpp @@ -0,0 +1,146 @@ +#include + +#include +#include +#include + +namespace +{ +using namespace DB; +using SettingMySQLDataTypesSupport = SettingFieldMultiEnum; +} + +namespace DB +{ + +template +bool operator== (const SettingFieldMultiEnum & setting, const Field & f) +{ + return Field(setting) == f; +} + +template +bool operator== (const Field & f, const SettingFieldMultiEnum & setting) +{ + return f == Field(setting); +} + +} + +GTEST_TEST(MySQLDataTypesSupport, WithDefault) +{ + // Setting can be default-initialized and that means all values are unset. + const SettingMySQLDataTypesSupport setting; + ASSERT_EQ(0, setting.value.getValue()); + ASSERT_EQ("", setting.toString()); + ASSERT_EQ(setting, Field("")); + + ASSERT_FALSE(setting.value.isSet(MySQLDataTypesSupport::DECIMAL)); + ASSERT_FALSE(setting.value.isSet(MySQLDataTypesSupport::DATETIME64)); +} + +GTEST_TEST(SettingMySQLDataTypesSupport, WithDECIMAL) +{ + // Setting can be initialized with MySQLDataTypesSupport::DECIMAL + // and this value can be obtained in varios forms with getters. + const SettingMySQLDataTypesSupport setting(MySQLDataTypesSupport::DECIMAL); + ASSERT_EQ(1, setting.value.getValue()); + + ASSERT_TRUE(setting.value.isSet(MySQLDataTypesSupport::DECIMAL)); + ASSERT_FALSE(setting.value.isSet(MySQLDataTypesSupport::DATETIME64)); + + ASSERT_EQ("decimal", setting.toString()); + ASSERT_EQ(Field("decimal"), setting); +} + +GTEST_TEST(SettingMySQLDataTypesSupport, With1) +{ + // Setting can be initialized with int value corresponding to DECIMAL + // and rest of the test is the same as for that value. + const SettingMySQLDataTypesSupport setting(1u); + ASSERT_EQ(1, setting.value.getValue()); + + ASSERT_TRUE(setting.value.isSet(MySQLDataTypesSupport::DECIMAL)); + ASSERT_FALSE(setting.value.isSet(MySQLDataTypesSupport::DATETIME64)); + + ASSERT_EQ("decimal", setting.toString()); + ASSERT_EQ(Field("decimal"), setting); +} + +GTEST_TEST(SettingMySQLDataTypesSupport, WithMultipleValues) +{ + // Setting can be initialized with int value corresponding to (DECIMAL | DATETIME64) + const SettingMySQLDataTypesSupport setting(3u); + ASSERT_EQ(3, setting.value.getValue()); + + ASSERT_TRUE(setting.value.isSet(MySQLDataTypesSupport::DECIMAL)); + ASSERT_TRUE(setting.value.isSet(MySQLDataTypesSupport::DATETIME64)); + + ASSERT_EQ("decimal,datetime64", setting.toString()); + ASSERT_EQ(Field("decimal,datetime64"), setting); +} + +GTEST_TEST(SettingMySQLDataTypesSupport, SetString) +{ + SettingMySQLDataTypesSupport setting; + setting = String("decimal"); + ASSERT_TRUE(setting.changed); + ASSERT_TRUE(setting.value.isSet(MySQLDataTypesSupport::DECIMAL)); + ASSERT_FALSE(setting.value.isSet(MySQLDataTypesSupport::DATETIME64)); + ASSERT_EQ("decimal", setting.toString()); + ASSERT_EQ(Field("decimal"), setting); + + setting = "datetime64,decimal"; + ASSERT_TRUE(setting.changed); + ASSERT_TRUE(setting.value.isSet(MySQLDataTypesSupport::DECIMAL)); + ASSERT_TRUE(setting.value.isSet(MySQLDataTypesSupport::DATETIME64)); + ASSERT_EQ("decimal,datetime64", setting.toString()); + ASSERT_EQ(Field("decimal,datetime64"), setting); + + // comma with spaces + setting = " datetime64 , decimal "; + ASSERT_FALSE(setting.changed); // false since value is the same as previous one. + ASSERT_TRUE(setting.value.isSet(MySQLDataTypesSupport::DECIMAL)); + ASSERT_TRUE(setting.value.isSet(MySQLDataTypesSupport::DATETIME64)); + ASSERT_EQ("decimal,datetime64", setting.toString()); + ASSERT_EQ(Field("decimal,datetime64"), setting); + + setting = String(",,,,,,,, ,decimal"); + ASSERT_TRUE(setting.changed); + ASSERT_TRUE(setting.value.isSet(MySQLDataTypesSupport::DECIMAL)); + ASSERT_FALSE(setting.value.isSet(MySQLDataTypesSupport::DATETIME64)); + ASSERT_EQ("decimal", setting.toString()); + ASSERT_EQ(Field("decimal"), setting); + + setting = String(",decimal,decimal,decimal,decimal,decimal,decimal,decimal,decimal,decimal,"); + ASSERT_FALSE(setting.changed); //since previous value was DECIMAL + ASSERT_TRUE(setting.value.isSet(MySQLDataTypesSupport::DECIMAL)); + ASSERT_FALSE(setting.value.isSet(MySQLDataTypesSupport::DATETIME64)); + ASSERT_EQ("decimal", setting.toString()); + ASSERT_EQ(Field("decimal"), setting); + + setting = String(""); + ASSERT_TRUE(setting.changed); + ASSERT_FALSE(setting.value.isSet(MySQLDataTypesSupport::DECIMAL)); + ASSERT_FALSE(setting.value.isSet(MySQLDataTypesSupport::DATETIME64)); + ASSERT_EQ("", setting.toString()); + ASSERT_EQ(Field(""), setting); +} + +GTEST_TEST(SettingMySQLDataTypesSupport, SetInvalidString) +{ + // Setting can be initialized with int value corresponding to (DECIMAL | DATETIME64) + SettingMySQLDataTypesSupport setting; + EXPECT_THROW(setting = String("FOOBAR"), Exception); + ASSERT_FALSE(setting.changed); + ASSERT_EQ(0, setting.value.getValue()); + + EXPECT_THROW(setting = String("decimal,datetime64,123"), Exception); + ASSERT_FALSE(setting.changed); + ASSERT_EQ(0, setting.value.getValue()); + + EXPECT_NO_THROW(setting = String(", ")); + ASSERT_FALSE(setting.changed); + ASSERT_EQ(0, setting.value.getValue()); +} + diff --git a/src/Core/tests/mysql_protocol.cpp b/src/Core/tests/mysql_protocol.cpp index acae8603c40..6cad095fc85 100644 --- a/src/Core/tests/mysql_protocol.cpp +++ b/src/Core/tests/mysql_protocol.cpp @@ -283,6 +283,7 @@ int main(int argc, char ** argv) } { + /// mysql_protocol --host=172.17.0.3 --user=root --password=123 --db=sbtest try { boost::program_options::options_description desc("Allowed options"); diff --git a/src/DataStreams/BlockStreamProfileInfo.h b/src/DataStreams/BlockStreamProfileInfo.h index 5f75cf9ddea..d068db89641 100644 --- a/src/DataStreams/BlockStreamProfileInfo.h +++ b/src/DataStreams/BlockStreamProfileInfo.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include diff --git a/src/DataStreams/ExecutionSpeedLimits.h b/src/DataStreams/ExecutionSpeedLimits.h index 8f098bfd6b4..9ab58e12cf4 100644 --- a/src/DataStreams/ExecutionSpeedLimits.h +++ b/src/DataStreams/ExecutionSpeedLimits.h @@ -1,7 +1,7 @@ #pragma once #include -#include +#include #include namespace DB diff --git a/src/DataStreams/ExpressionBlockInputStream.cpp b/src/DataStreams/ExpressionBlockInputStream.cpp index 9673395a21a..4840a6263f6 100644 --- a/src/DataStreams/ExpressionBlockInputStream.cpp +++ b/src/DataStreams/ExpressionBlockInputStream.cpp @@ -18,7 +18,7 @@ String ExpressionBlockInputStream::getName() const { return "Expression"; } Block ExpressionBlockInputStream::getTotals() { totals = children.back()->getTotals(); - expression->executeOnTotals(totals); + expression->execute(totals); return totals; } @@ -30,14 +30,6 @@ Block ExpressionBlockInputStream::getHeader() const Block ExpressionBlockInputStream::readImpl() { - if (!initialized) - { - if (expression->resultIsAlwaysEmpty()) - return {}; - - initialized = true; - } - Block res = children.back()->read(); if (res) expression->execute(res); diff --git a/src/DataStreams/ExpressionBlockInputStream.h b/src/DataStreams/ExpressionBlockInputStream.h index 62141a060af..fae54fbcfbf 100644 --- a/src/DataStreams/ExpressionBlockInputStream.h +++ b/src/DataStreams/ExpressionBlockInputStream.h @@ -25,7 +25,6 @@ public: Block getHeader() const override; protected: - bool initialized = false; ExpressionActionsPtr expression; Block readImpl() override; diff --git a/src/DataStreams/FilterBlockInputStream.cpp b/src/DataStreams/FilterBlockInputStream.cpp index b4b00083d7f..83b36c97db7 100644 --- a/src/DataStreams/FilterBlockInputStream.cpp +++ b/src/DataStreams/FilterBlockInputStream.cpp @@ -54,7 +54,7 @@ String FilterBlockInputStream::getName() const { return "Filter"; } Block FilterBlockInputStream::getTotals() { totals = children.back()->getTotals(); - expression->executeOnTotals(totals); + expression->execute(totals); return totals; } diff --git a/src/DataStreams/MarkInCompressedFile.h b/src/DataStreams/MarkInCompressedFile.h index 62886ffad57..94ff5414762 100644 --- a/src/DataStreams/MarkInCompressedFile.h +++ b/src/DataStreams/MarkInCompressedFile.h @@ -2,7 +2,7 @@ #include -#include +#include #include #include diff --git a/src/DataStreams/MongoDBBlockInputStream.cpp b/src/DataStreams/MongoDBBlockInputStream.cpp index 7865f854547..25abdd909c4 100644 --- a/src/DataStreams/MongoDBBlockInputStream.cpp +++ b/src/DataStreams/MongoDBBlockInputStream.cpp @@ -37,6 +37,7 @@ namespace ErrorCodes extern const int TYPE_MISMATCH; extern const int MONGODB_CANNOT_AUTHENTICATE; extern const int NOT_FOUND_COLUMN_IN_BLOCK; + extern const int UNKNOWN_TYPE; } @@ -298,6 +299,8 @@ namespace ErrorCodes::TYPE_MISMATCH}; break; } + default: + throw Exception("Value of unsupported type:" + column.getName(), ErrorCodes::UNKNOWN_TYPE); } } diff --git a/src/DataStreams/NativeBlockOutputStream.h b/src/DataStreams/NativeBlockOutputStream.h index 720a779ec5e..64ccd267634 100644 --- a/src/DataStreams/NativeBlockOutputStream.h +++ b/src/DataStreams/NativeBlockOutputStream.h @@ -1,7 +1,7 @@ #pragma once #include -#include +#include #include namespace DB diff --git a/src/DataStreams/TTLBlockInputStream.cpp b/src/DataStreams/TTLBlockInputStream.cpp index 6d80e784c03..85d9c7fead2 100644 --- a/src/DataStreams/TTLBlockInputStream.cpp +++ b/src/DataStreams/TTLBlockInputStream.cpp @@ -134,6 +134,7 @@ Block TTLBlockInputStream::readImpl() removeValuesWithExpiredColumnTTL(block); updateMovesTTL(block); + updateRecompressionTTL(block); return block; } @@ -369,13 +370,12 @@ void TTLBlockInputStream::removeValuesWithExpiredColumnTTL(Block & block) block.erase(column); } -void TTLBlockInputStream::updateMovesTTL(Block & block) +void TTLBlockInputStream::updateTTLWithDescriptions(Block & block, const TTLDescriptions & descriptions, TTLInfoMap & ttl_info_map) { std::vector columns_to_remove; - for (const auto & ttl_entry : metadata_snapshot->getMoveTTLs()) + for (const auto & ttl_entry : descriptions) { - auto & new_ttl_info = new_ttl_infos.moves_ttl[ttl_entry.result_column]; - + auto & new_ttl_info = ttl_info_map[ttl_entry.result_column]; if (!block.has(ttl_entry.result_column)) { columns_to_remove.push_back(ttl_entry.result_column); @@ -395,6 +395,16 @@ void TTLBlockInputStream::updateMovesTTL(Block & block) block.erase(column); } +void TTLBlockInputStream::updateMovesTTL(Block & block) +{ + updateTTLWithDescriptions(block, metadata_snapshot->getMoveTTLs(), new_ttl_infos.moves_ttl); +} + +void TTLBlockInputStream::updateRecompressionTTL(Block & block) +{ + updateTTLWithDescriptions(block, metadata_snapshot->getRecompressionTTLs(), new_ttl_infos.recompression_ttl); +} + UInt32 TTLBlockInputStream::getTimestampByIndex(const IColumn * column, size_t ind) { if (const ColumnUInt16 * column_date = typeid_cast(column)) diff --git a/src/DataStreams/TTLBlockInputStream.h b/src/DataStreams/TTLBlockInputStream.h index 3f37f35426c..1d3b69f61c5 100644 --- a/src/DataStreams/TTLBlockInputStream.h +++ b/src/DataStreams/TTLBlockInputStream.h @@ -4,6 +4,7 @@ #include #include #include +#include #include @@ -75,9 +76,16 @@ private: /// Finalize agg_result into result_columns void finalizeAggregates(MutableColumns & result_columns); + /// Execute description expressions on block and update ttl's in + /// ttl_info_map with expression results. + void updateTTLWithDescriptions(Block & block, const TTLDescriptions & descriptions, TTLInfoMap & ttl_info_map); + /// Updates TTL for moves void updateMovesTTL(Block & block); + /// Update values for recompression TTL using data from block. + void updateRecompressionTTL(Block & block); + UInt32 getTimestampByIndex(const IColumn * column, size_t ind); bool isTTLExpired(time_t ttl) const; }; diff --git a/src/DataTypes/DataTypeDecimalBase.h b/src/DataTypes/DataTypeDecimalBase.h index 265d58d69e1..c5669ab735a 100644 --- a/src/DataTypes/DataTypeDecimalBase.h +++ b/src/DataTypes/DataTypeDecimalBase.h @@ -156,38 +156,31 @@ protected: }; -template typename DecimalType> -typename std::enable_if_t<(sizeof(T) >= sizeof(U)), DecimalType> -inline decimalResultType(const DecimalType & tx, const DecimalType & ty, bool is_multiply, bool is_divide) +template typename DecimalType> +inline auto decimalResultType(const DecimalType & tx, const DecimalType & ty) { - UInt32 scale = (tx.getScale() > ty.getScale() ? tx.getScale() : ty.getScale()); - if (is_multiply) + UInt32 scale{}; + if constexpr (is_multiply) scale = tx.getScale() + ty.getScale(); - else if (is_divide) + else if constexpr (is_division) scale = tx.getScale(); - return DecimalType(DecimalUtils::maxPrecision(), scale); + else + scale = (tx.getScale() > ty.getScale() ? tx.getScale() : ty.getScale()); + + if constexpr (sizeof(T) < sizeof(U)) + return DecimalType(DecimalUtils::maxPrecision(), scale); + else + return DecimalType(DecimalUtils::maxPrecision(), scale); } -template typename DecimalType> -typename std::enable_if_t<(sizeof(T) < sizeof(U)), const DecimalType> -inline decimalResultType(const DecimalType & tx, const DecimalType & ty, bool is_multiply, bool is_divide) -{ - UInt32 scale = (tx.getScale() > ty.getScale() ? tx.getScale() : ty.getScale()); - if (is_multiply) - scale = tx.getScale() * ty.getScale(); - else if (is_divide) - scale = tx.getScale(); - return DecimalType(DecimalUtils::maxPrecision(), scale); -} - -template typename DecimalType> -inline const DecimalType decimalResultType(const DecimalType & tx, const DataTypeNumber &, bool, bool) +template typename DecimalType> +inline const DecimalType decimalResultType(const DecimalType & tx, const DataTypeNumber &) { return DecimalType(DecimalUtils::maxPrecision(), tx.getScale()); } -template typename DecimalType> -inline const DecimalType decimalResultType(const DataTypeNumber &, const DecimalType & ty, bool, bool) +template typename DecimalType> +inline const DecimalType decimalResultType(const DataTypeNumber &, const DecimalType & ty) { return DecimalType(DecimalUtils::maxPrecision(), ty.getScale()); } diff --git a/src/DataTypes/DataTypeNullable.cpp b/src/DataTypes/DataTypeNullable.cpp index 847047850fd..9c738da9f6a 100644 --- a/src/DataTypes/DataTypeNullable.cpp +++ b/src/DataTypes/DataTypeNullable.cpp @@ -217,7 +217,7 @@ void DataTypeNullable::serializeTextEscaped(const IColumn & column, size_t row_n const ColumnNullable & col = assert_cast(column); if (col.isNullAt(row_num)) - writeCString("\\N", ostr); + writeString(settings.tsv.null_representation, ostr); else nested_data_type->serializeAsTextEscaped(col.getNestedColumn(), row_num, ostr, settings); } @@ -308,16 +308,30 @@ ReturnType DataTypeNullable::deserializeTextQuoted(IColumn & column, ReadBuffer const DataTypePtr & nested_data_type) { return safeDeserialize(column, *nested_data_type, - [&istr] { return checkStringByFirstCharacterAndAssertTheRestCaseInsensitive("NULL", istr); }, + [&istr] + { + return checkStringByFirstCharacterAndAssertTheRestCaseInsensitive("NULL", istr); + }, [&nested_data_type, &istr, &settings] (IColumn & nested) { nested_data_type->deserializeAsTextQuoted(nested, istr, settings); }); } void DataTypeNullable::deserializeWholeText(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const { - safeDeserialize(column, *nested_data_type, - [&istr] { return checkStringByFirstCharacterAndAssertTheRestCaseInsensitive("NULL", istr); }, - [this, &istr, &settings] (IColumn & nested) { nested_data_type->deserializeAsWholeText(nested, istr, settings); }); + deserializeWholeText(column, istr, settings, nested_data_type); +} + +template +ReturnType DataTypeNullable::deserializeWholeText(IColumn & column, ReadBuffer & istr, const FormatSettings & settings, + const DataTypePtr & nested_data_type) +{ + return safeDeserialize(column, *nested_data_type, + [&istr] + { + return checkStringByFirstCharacterAndAssertTheRestCaseInsensitive("NULL", istr) + || checkStringByFirstCharacterAndAssertTheRest("ᴺᵁᴸᴸ", istr); + }, + [&nested_data_type, &istr, &settings] (IColumn & nested) { nested_data_type->deserializeAsWholeText(nested, istr, settings); }); } @@ -544,6 +558,7 @@ DataTypePtr removeNullable(const DataTypePtr & type) } +template bool DataTypeNullable::deserializeWholeText(IColumn & column, ReadBuffer & istr, const FormatSettings & settings, const DataTypePtr & nested); template bool DataTypeNullable::deserializeTextEscaped(IColumn & column, ReadBuffer & istr, const FormatSettings & settings, const DataTypePtr & nested); template bool DataTypeNullable::deserializeTextQuoted(IColumn & column, ReadBuffer & istr, const FormatSettings &, const DataTypePtr & nested); template bool DataTypeNullable::deserializeTextCSV(IColumn & column, ReadBuffer & istr, const FormatSettings & settings, const DataTypePtr & nested); diff --git a/src/DataTypes/DataTypeNullable.h b/src/DataTypes/DataTypeNullable.h index 22d403da6c4..587eecdf32e 100644 --- a/src/DataTypes/DataTypeNullable.h +++ b/src/DataTypes/DataTypeNullable.h @@ -103,6 +103,8 @@ public: /// If ReturnType is bool, check for NULL and deserialize value into non-nullable column (and return true) or insert default value of nested type (and return false) /// If ReturnType is void, deserialize Nullable(T) template + static ReturnType deserializeWholeText(IColumn & column, ReadBuffer & istr, const FormatSettings & settings, const DataTypePtr & nested); + template static ReturnType deserializeTextEscaped(IColumn & column, ReadBuffer & istr, const FormatSettings & settings, const DataTypePtr & nested); template static ReturnType deserializeTextQuoted(IColumn & column, ReadBuffer & istr, const FormatSettings &, const DataTypePtr & nested); diff --git a/src/DataTypes/convertMySQLDataType.cpp b/src/DataTypes/convertMySQLDataType.cpp index 054dc412915..a509cf8b091 100644 --- a/src/DataTypes/convertMySQLDataType.cpp +++ b/src/DataTypes/convertMySQLDataType.cpp @@ -1,12 +1,17 @@ #include "convertMySQLDataType.h" #include -#include +#include +#include +#include #include #include #include #include "DataTypeDate.h" #include "DataTypeDateTime.h" +#include "DataTypeDateTime64.h" +#include "DataTypeEnum.h" +#include "DataTypesDecimal.h" #include "DataTypeFixedString.h" #include "DataTypeNullable.h" #include "DataTypeString.h" @@ -25,52 +30,88 @@ ASTPtr dataTypeConvertToQuery(const DataTypePtr & data_type) return makeASTFunction("Nullable", dataTypeConvertToQuery(typeid_cast(data_type.get())->getNestedType())); } -DataTypePtr convertMySQLDataType(const std::string & mysql_data_type, bool is_nullable, bool is_unsigned, size_t length) +DataTypePtr convertMySQLDataType(MultiEnum type_support, + const std::string & mysql_data_type, + bool is_nullable, + bool is_unsigned, + size_t length, + size_t precision, + size_t scale) { - DataTypePtr res; - if (mysql_data_type == "tinyint") - { - if (is_unsigned) - res = std::make_shared(); - else - res = std::make_shared(); - } - else if (mysql_data_type == "smallint") - { - if (is_unsigned) - res = std::make_shared(); - else - res = std::make_shared(); - } - else if (mysql_data_type == "int" || mysql_data_type == "mediumint") - { - if (is_unsigned) - res = std::make_shared(); - else - res = std::make_shared(); - } - else if (mysql_data_type == "bigint") - { - if (is_unsigned) - res = std::make_shared(); - else - res = std::make_shared(); - } - else if (mysql_data_type == "float") - res = std::make_shared(); - else if (mysql_data_type == "double") - res = std::make_shared(); - else if (mysql_data_type == "date") - res = std::make_shared(); - else if (mysql_data_type == "datetime" || mysql_data_type == "timestamp") - res = std::make_shared(); - else if (mysql_data_type == "binary") - res = std::make_shared(length); - else + // we expect mysql_data_type to be either "basic_type" or "type_with_params(param1, param2, ...)" + auto data_type = std::string_view(mysql_data_type); + const auto param_start_pos = data_type.find("("); + const auto type_name = data_type.substr(0, param_start_pos); + + DataTypePtr res = [&]() -> DataTypePtr { + if (type_name == "tinyint") + { + if (is_unsigned) + return std::make_shared(); + else + return std::make_shared(); + } + if (type_name == "smallint") + { + if (is_unsigned) + return std::make_shared(); + else + return std::make_shared(); + } + if (type_name == "int" || type_name == "mediumint") + { + if (is_unsigned) + return std::make_shared(); + else + return std::make_shared(); + } + if (type_name == "bigint") + { + if (is_unsigned) + return std::make_shared(); + else + return std::make_shared(); + } + if (type_name == "float") + return std::make_shared(); + if (type_name == "double") + return std::make_shared(); + if (type_name == "date") + return std::make_shared(); + if (type_name == "binary") + return std::make_shared(length); + if (type_name == "datetime" || type_name == "timestamp") + { + if (!type_support.isSet(MySQLDataTypesSupport::DATETIME64)) + return std::make_shared(); + + if (type_name == "timestamp" && scale == 0) + { + return std::make_shared(); + } + else if (type_name == "datetime" || type_name == "timestamp") + { + return std::make_shared(scale); + } + } + + if (type_support.isSet(MySQLDataTypesSupport::DECIMAL) && (type_name == "numeric" || type_name == "decimal")) + { + if (precision <= DecimalUtils::maxPrecision()) + return std::make_shared>(precision, scale); + else if (precision <= DecimalUtils::maxPrecision()) + return std::make_shared>(precision, scale); + else if (precision <= DecimalUtils::maxPrecision()) + return std::make_shared>(precision, scale); + } + /// Also String is fallback for all unknown types. - res = std::make_shared(); + return std::make_shared(); + }(); + if (is_nullable) res = std::make_shared(res); + return res; } diff --git a/src/DataTypes/convertMySQLDataType.h b/src/DataTypes/convertMySQLDataType.h index 54477afb385..f1c4a73d6f7 100644 --- a/src/DataTypes/convertMySQLDataType.h +++ b/src/DataTypes/convertMySQLDataType.h @@ -1,17 +1,20 @@ #pragma once #include +#include #include #include "IDataType.h" namespace DB { +enum class MySQLDataTypesSupport; + /// Convert data type to query. for example /// DataTypeUInt8 -> ASTIdentifier(UInt8) /// DataTypeNullable(DataTypeUInt8) -> ASTFunction(ASTIdentifier(UInt8)) ASTPtr dataTypeConvertToQuery(const DataTypePtr & data_type); /// Convert MySQL type to ClickHouse data type. -DataTypePtr convertMySQLDataType(const std::string & mysql_data_type, bool is_nullable, bool is_unsigned, size_t length); +DataTypePtr convertMySQLDataType(MultiEnum type_support, const std::string & mysql_data_type, bool is_nullable, bool is_unsigned, size_t length, size_t precision, size_t scale); } diff --git a/src/DataTypes/tests/gtest_DataType_deserializeAsText.cpp b/src/DataTypes/tests/gtest_DataType_deserializeAsText.cpp new file mode 100644 index 00000000000..48e2f0d80a0 --- /dev/null +++ b/src/DataTypes/tests/gtest_DataType_deserializeAsText.cpp @@ -0,0 +1,101 @@ +#include +#include +#include +#include +#include +#include +#include +#include + +#pragma GCC diagnostic ignored "-Wmissing-declarations" +#include + +#include +#include + +#include + +namespace std +{ + +template +inline std::ostream& operator<<(std::ostream & ostr, const std::vector & v) +{ + ostr << "["; + for (const auto & i : v) + { + ostr << i << ", "; + } + return ostr << "] (" << v.size() << ") items"; +} + +} + +using namespace DB; + +struct ParseDataTypeTestCase +{ + const char * type_name; + std::vector values; + FieldVector expected_values; +}; + +std::ostream & operator<<(std::ostream & ostr, const ParseDataTypeTestCase & test_case) +{ + return ostr << "ParseDataTypeTestCase{\"" << test_case.type_name << "\", " << test_case.values << "}"; +} + + +class ParseDataTypeTest : public ::testing::TestWithParam +{ +public: + void SetUp() override + { + const auto & p = GetParam(); + + data_type = DataTypeFactory::instance().get(p.type_name); + } + + DataTypePtr data_type; +}; + +TEST_P(ParseDataTypeTest, parseStringValue) +{ + const auto & p = GetParam(); + + auto col = data_type->createColumn(); + for (const auto & value : p.values) + { + ReadBuffer buffer(const_cast(value.data()), value.size(), 0); + data_type->deserializeAsWholeText(*col, buffer, FormatSettings{}); + } + + ASSERT_EQ(p.expected_values.size(), col->size()) << "Actual items: " << *col; + for (size_t i = 0; i < col->size(); ++i) + { + ASSERT_EQ(p.expected_values[i], (*col)[i]); + } +} + + +INSTANTIATE_TEST_SUITE_P(ParseDecimal, + ParseDataTypeTest, + ::testing::ValuesIn( + std::initializer_list{ + { + "Decimal(8, 0)", + {"0", "5", "8", "-5", "-8", "12345678", "-12345678"}, + + std::initializer_list{ + DecimalField(0, 0), + DecimalField(5, 0), + DecimalField(8, 0), + DecimalField(-5, 0), + DecimalField(-8, 0), + DecimalField(12345678, 0), + DecimalField(-12345678, 0) + } + } + } + ) +); diff --git a/src/Databases/DatabasesCommon.h b/src/Databases/DatabasesCommon.h index 4c7ec1ec637..5e1e555a524 100644 --- a/src/Databases/DatabasesCommon.h +++ b/src/Databases/DatabasesCommon.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include #include diff --git a/src/Databases/IDatabase.h b/src/Databases/IDatabase.h index d82755a7bc8..b28bd5fd599 100644 --- a/src/Databases/IDatabase.h +++ b/src/Databases/IDatabase.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include #include diff --git a/src/Databases/MySQL/DatabaseConnectionMySQL.cpp b/src/Databases/MySQL/DatabaseConnectionMySQL.cpp index 0d944e215a0..9c94014bf23 100644 --- a/src/Databases/MySQL/DatabaseConnectionMySQL.cpp +++ b/src/Databases/MySQL/DatabaseConnectionMySQL.cpp @@ -10,6 +10,7 @@ # include # include # include +# include # include # include # include @@ -43,31 +44,14 @@ constexpr static const auto suffix = ".remove_flag"; static constexpr const std::chrono::seconds cleaner_sleep_time{30}; static const std::chrono::seconds lock_acquire_timeout{10}; -static String toQueryStringWithQuote(const std::vector & quote_list) -{ - WriteBufferFromOwnString quote_list_query; - quote_list_query << "("; - - for (size_t index = 0; index < quote_list.size(); ++index) - { - if (index) - quote_list_query << ","; - - quote_list_query << quote << quote_list[index]; - } - - quote_list_query << ")"; - return quote_list_query.str(); -} - -DatabaseConnectionMySQL::DatabaseConnectionMySQL( - const Context & global_context_, const String & database_name_, const String & metadata_path_, +DatabaseConnectionMySQL::DatabaseConnectionMySQL(const Context & context, const String & database_name_, const String & metadata_path_, const ASTStorage * database_engine_define_, const String & database_name_in_mysql_, mysqlxx::Pool && pool) : IDatabase(database_name_) - , global_context(global_context_.getGlobalContext()) + , global_context(context.getGlobalContext()) , metadata_path(metadata_path_) , database_engine_define(database_engine_define_->clone()) , database_name_in_mysql(database_name_in_mysql_) + , mysql_datatypes_support_level(context.getQueryContext().getSettingsRef().mysql_datatypes_support_level) , mysql_pool(std::move(pool)) { empty(); /// test database is works fine. @@ -78,7 +62,7 @@ bool DatabaseConnectionMySQL::empty() const { std::lock_guard lock(mutex); - fetchTablesIntoLocalCache(); + fetchTablesIntoLocalCache(global_context); if (local_tables_cache.empty()) return true; @@ -90,12 +74,12 @@ bool DatabaseConnectionMySQL::empty() const return true; } -DatabaseTablesIteratorPtr DatabaseConnectionMySQL::getTablesIterator(const Context &, const FilterByNameFunction & filter_by_table_name) +DatabaseTablesIteratorPtr DatabaseConnectionMySQL::getTablesIterator(const Context & context, const FilterByNameFunction & filter_by_table_name) { Tables tables; std::lock_guard lock(mutex); - fetchTablesIntoLocalCache(); + fetchTablesIntoLocalCache(context); for (const auto & [table_name, modify_time_and_storage] : local_tables_cache) if (!remove_or_detach_tables.count(table_name) && (!filter_by_table_name || filter_by_table_name(table_name))) @@ -109,11 +93,11 @@ bool DatabaseConnectionMySQL::isTableExist(const String & name, const Context & return bool(tryGetTable(name, context)); } -StoragePtr DatabaseConnectionMySQL::tryGetTable(const String & mysql_table_name, const Context &) const +StoragePtr DatabaseConnectionMySQL::tryGetTable(const String & mysql_table_name, const Context & context) const { std::lock_guard lock(mutex); - fetchTablesIntoLocalCache(); + fetchTablesIntoLocalCache(context); if (!remove_or_detach_tables.count(mysql_table_name) && local_tables_cache.find(mysql_table_name) != local_tables_cache.end()) return local_tables_cache[mysql_table_name].second; @@ -157,11 +141,11 @@ static ASTPtr getCreateQueryFromStorage(const StoragePtr & storage, const ASTPtr return create_table_query; } -ASTPtr DatabaseConnectionMySQL::getCreateTableQueryImpl(const String & table_name, const Context &, bool throw_on_error) const +ASTPtr DatabaseConnectionMySQL::getCreateTableQueryImpl(const String & table_name, const Context & context, bool throw_on_error) const { std::lock_guard lock(mutex); - fetchTablesIntoLocalCache(); + fetchTablesIntoLocalCache(context); if (local_tables_cache.find(table_name) == local_tables_cache.end()) { @@ -178,7 +162,7 @@ time_t DatabaseConnectionMySQL::getObjectMetadataModificationTime(const String & { std::lock_guard lock(mutex); - fetchTablesIntoLocalCache(); + fetchTablesIntoLocalCache(global_context); if (local_tables_cache.find(table_name) == local_tables_cache.end()) throw Exception("MySQL table " + database_name_in_mysql + "." + table_name + " doesn't exist.", ErrorCodes::UNKNOWN_TABLE); @@ -194,12 +178,12 @@ ASTPtr DatabaseConnectionMySQL::getCreateDatabaseQuery() const return create_query; } -void DatabaseConnectionMySQL::fetchTablesIntoLocalCache() const +void DatabaseConnectionMySQL::fetchTablesIntoLocalCache(const Context & context) const { const auto & tables_with_modification_time = fetchTablesWithModificationTime(); destroyLocalCacheExtraTables(tables_with_modification_time); - fetchLatestTablesStructureIntoCache(tables_with_modification_time); + fetchLatestTablesStructureIntoCache(tables_with_modification_time, context); } void DatabaseConnectionMySQL::destroyLocalCacheExtraTables(const std::map & tables_with_modification_time) const @@ -216,7 +200,7 @@ void DatabaseConnectionMySQL::destroyLocalCacheExtraTables(const std::map &tables_modification_time) const +void DatabaseConnectionMySQL::fetchLatestTablesStructureIntoCache(const std::map &tables_modification_time, const Context & context) const { std::vector wait_update_tables_name; for (const auto & table_modification_time : tables_modification_time) @@ -228,7 +212,7 @@ void DatabaseConnectionMySQL::fetchLatestTablesStructureIntoCache(const std::map wait_update_tables_name.emplace_back(table_modification_time.first); } - std::map tables_and_columns = fetchTablesColumnsList(wait_update_tables_name); + std::map tables_and_columns = fetchTablesColumnsList(wait_update_tables_name, context); for (const auto & table_and_columns : tables_and_columns) { @@ -280,53 +264,16 @@ std::map DatabaseConnectionMySQL::fetchTablesWithModificationTim return tables_with_modification_time; } -std::map DatabaseConnectionMySQL::fetchTablesColumnsList(const std::vector & tables_name) const +std::map DatabaseConnectionMySQL::fetchTablesColumnsList(const std::vector & tables_name, const Context & context) const { - std::map tables_and_columns; + const auto & settings = context.getSettingsRef(); - if (tables_name.empty()) - return tables_and_columns; - - Block tables_columns_sample_block - { - { std::make_shared(), "table_name" }, - { std::make_shared(), "column_name" }, - { std::make_shared(), "column_type" }, - { std::make_shared(), "is_nullable" }, - { std::make_shared(), "is_unsigned" }, - { std::make_shared(), "length" }, - }; - - WriteBufferFromOwnString query; - query << "SELECT " - " TABLE_NAME AS table_name," - " COLUMN_NAME AS column_name," - " DATA_TYPE AS column_type," - " IS_NULLABLE = 'YES' AS is_nullable," - " COLUMN_TYPE LIKE '%unsigned' AS is_unsigned," - " CHARACTER_MAXIMUM_LENGTH AS length" - " FROM INFORMATION_SCHEMA.COLUMNS" - " WHERE TABLE_SCHEMA = " << quote << database_name_in_mysql - << " AND TABLE_NAME IN " << toQueryStringWithQuote(tables_name) << " ORDER BY ORDINAL_POSITION"; - - const auto & external_table_functions_use_nulls = global_context.getSettings().external_table_functions_use_nulls; - MySQLBlockInputStream result(mysql_pool.get(), query.str(), tables_columns_sample_block, DEFAULT_BLOCK_SIZE); - while (Block block = result.read()) - { - size_t rows = block.rows(); - for (size_t i = 0; i < rows; ++i) - { - String table_name = (*block.getByPosition(0).column)[i].safeGet(); - tables_and_columns[table_name].emplace_back((*block.getByPosition(1).column)[i].safeGet(), - convertMySQLDataType( - (*block.getByPosition(2).column)[i].safeGet(), - (*block.getByPosition(3).column)[i].safeGet() && - external_table_functions_use_nulls, - (*block.getByPosition(4).column)[i].safeGet(), - (*block.getByPosition(5).column)[i].safeGet())); - } - } - return tables_and_columns; + return DB::fetchTablesColumnsList( + mysql_pool, + database_name_in_mysql, + tables_name, + settings.external_table_functions_use_nulls, + mysql_datatypes_support_level); } void DatabaseConnectionMySQL::shutdown() diff --git a/src/Databases/MySQL/DatabaseConnectionMySQL.h b/src/Databases/MySQL/DatabaseConnectionMySQL.h index c4fb3d5f90c..e9f72adc013 100644 --- a/src/Databases/MySQL/DatabaseConnectionMySQL.h +++ b/src/Databases/MySQL/DatabaseConnectionMySQL.h @@ -4,17 +4,27 @@ #if USE_MYSQL #include -#include -#include -#include -#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include namespace DB { class Context; +enum class MySQLDataTypesSupport; + /** Real-time access to table list and table structure from remote MySQL * It doesn't make any manipulations with filesystem. * All tables are created by calling code after real-time pull-out structure from remote MySQL @@ -25,7 +35,7 @@ public: ~DatabaseConnectionMySQL() override; DatabaseConnectionMySQL( - const Context & global_context, const String & database_name, const String & metadata_path, + const Context & context, const String & database_name, const String & metadata_path, const ASTStorage * database_engine_define, const String & database_name_in_mysql, mysqlxx::Pool && pool); String getEngineName() const override { return "MySQL"; } @@ -66,6 +76,9 @@ private: String metadata_path; ASTPtr database_engine_define; String database_name_in_mysql; + // Cache setting for later from query context upon creation, + // so column types depend on the settings set at query-level. + MultiEnum mysql_datatypes_support_level; std::atomic quit{false}; std::condition_variable cond; @@ -81,15 +94,15 @@ private: void cleanOutdatedTables(); - void fetchTablesIntoLocalCache() const; + void fetchTablesIntoLocalCache(const Context & context) const; std::map fetchTablesWithModificationTime() const; - std::map fetchTablesColumnsList(const std::vector & tables_name) const; + std::map fetchTablesColumnsList(const std::vector & tables_name, const Context & context) const; void destroyLocalCacheExtraTables(const std::map & tables_with_modification_time) const; - void fetchLatestTablesStructureIntoCache(const std::map & tables_modification_time) const; + void fetchLatestTablesStructureIntoCache(const std::map & tables_modification_time, const Context & context) const; ThreadFromGlobalPool thread; }; diff --git a/src/Databases/MySQL/FetchTablesColumnsList.cpp b/src/Databases/MySQL/FetchTablesColumnsList.cpp new file mode 100644 index 00000000000..3e25c703a1d --- /dev/null +++ b/src/Databases/MySQL/FetchTablesColumnsList.cpp @@ -0,0 +1,114 @@ +#if !defined(ARCADIA_BUILD) +# include "config_core.h" +#endif + +#if USE_MYSQL +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +namespace +{ +using namespace DB; + +String toQueryStringWithQuote(const std::vector & quote_list) +{ + WriteBufferFromOwnString quote_list_query; + quote_list_query << "("; + + for (size_t index = 0; index < quote_list.size(); ++index) + { + if (index) + quote_list_query << ","; + + quote_list_query << quote << quote_list[index]; + } + + quote_list_query << ")"; + return quote_list_query.str(); +} +} + +namespace DB +{ + +std::map fetchTablesColumnsList( + mysqlxx::Pool & pool, + const String & database_name, + const std::vector & tables_name, + bool external_table_functions_use_nulls, + MultiEnum type_support) +{ + std::map tables_and_columns; + + if (tables_name.empty()) + return tables_and_columns; + + Block tables_columns_sample_block + { + { std::make_shared(), "table_name" }, + { std::make_shared(), "column_name" }, + { std::make_shared(), "column_type" }, + { std::make_shared(), "is_nullable" }, + { std::make_shared(), "is_unsigned" }, + { std::make_shared(), "length" }, + { std::make_shared(), "precision" }, + { std::make_shared(), "scale" }, + }; + + WriteBufferFromOwnString query; + query << "SELECT " + " TABLE_NAME AS table_name," + " COLUMN_NAME AS column_name," + " COLUMN_TYPE AS column_type," + " IS_NULLABLE = 'YES' AS is_nullable," + " COLUMN_TYPE LIKE '%unsigned' AS is_unsigned," + " CHARACTER_MAXIMUM_LENGTH AS length," + " NUMERIC_PRECISION as ''," + " IF(ISNULL(NUMERIC_SCALE), DATETIME_PRECISION, NUMERIC_SCALE) AS scale" // we know DATETIME_PRECISION as a scale in CH + " FROM INFORMATION_SCHEMA.COLUMNS" + " WHERE TABLE_SCHEMA = " << quote << database_name + << " AND TABLE_NAME IN " << toQueryStringWithQuote(tables_name) << " ORDER BY ORDINAL_POSITION"; + + MySQLBlockInputStream result(pool.get(), query.str(), tables_columns_sample_block, DEFAULT_BLOCK_SIZE); + while (Block block = result.read()) + { + const auto & table_name_col = *block.getByPosition(0).column; + const auto & column_name_col = *block.getByPosition(1).column; + const auto & column_type_col = *block.getByPosition(2).column; + const auto & is_nullable_col = *block.getByPosition(3).column; + const auto & is_unsigned_col = *block.getByPosition(4).column; + const auto & char_max_length_col = *block.getByPosition(5).column; + const auto & precision_col = *block.getByPosition(6).column; + const auto & scale_col = *block.getByPosition(7).column; + + size_t rows = block.rows(); + for (size_t i = 0; i < rows; ++i) + { + String table_name = table_name_col[i].safeGet(); + tables_and_columns[table_name].emplace_back( + column_name_col[i].safeGet(), + convertMySQLDataType( + type_support, + column_type_col[i].safeGet(), + external_table_functions_use_nulls && is_nullable_col[i].safeGet(), + is_unsigned_col[i].safeGet(), + char_max_length_col[i].safeGet(), + precision_col[i].safeGet(), + scale_col[i].safeGet())); + } + } + return tables_and_columns; +} + +} + +#endif diff --git a/src/Databases/MySQL/FetchTablesColumnsList.h b/src/Databases/MySQL/FetchTablesColumnsList.h new file mode 100644 index 00000000000..52191c2ecb8 --- /dev/null +++ b/src/Databases/MySQL/FetchTablesColumnsList.h @@ -0,0 +1,28 @@ +#pragma once + +#include "config_core.h" +#if USE_MYSQL + +#include + +#include +#include +#include +#include + +#include +#include + +namespace DB +{ + +std::map fetchTablesColumnsList( + mysqlxx::Pool & pool, + const String & database_name, + const std::vector & tables_name, + bool external_table_functions_use_nulls, + MultiEnum type_support); + +} + +#endif diff --git a/src/Databases/MySQL/MaterializeMetadata.h b/src/Databases/MySQL/MaterializeMetadata.h index c036ea77940..5e77620e365 100644 --- a/src/Databases/MySQL/MaterializeMetadata.h +++ b/src/Databases/MySQL/MaterializeMetadata.h @@ -6,7 +6,7 @@ #if USE_MYSQL -#include +#include #include #include #include diff --git a/src/Databases/MySQL/MaterializeMySQLSyncThread.cpp b/src/Databases/MySQL/MaterializeMySQLSyncThread.cpp index 851ea351876..465a7cb912a 100644 --- a/src/Databases/MySQL/MaterializeMySQLSyncThread.cpp +++ b/src/Databases/MySQL/MaterializeMySQLSyncThread.cpp @@ -195,6 +195,7 @@ void MaterializeMySQLSyncThread::synchronization(const String & mysql_version) } catch (...) { + client.disconnect(); tryLogCurrentException(log); getDatabase(database_name).setException(std::current_exception()); } @@ -206,6 +207,7 @@ void MaterializeMySQLSyncThread::stopSynchronization() { sync_quit = true; background_thread_pool->join(); + client.disconnect(); } } diff --git a/src/Databases/ya.make b/src/Databases/ya.make index 50b58cf3e71..726127bfe52 100644 --- a/src/Databases/ya.make +++ b/src/Databases/ya.make @@ -19,6 +19,7 @@ SRCS( DatabaseWithDictionaries.cpp MySQL/DatabaseConnectionMySQL.cpp MySQL/DatabaseMaterializeMySQL.cpp + MySQL/FetchTablesColumnsList.cpp MySQL/MaterializeMetadata.cpp MySQL/MaterializeMySQLSettings.cpp MySQL/MaterializeMySQLSyncThread.cpp diff --git a/src/Dictionaries/CassandraBlockInputStream.cpp b/src/Dictionaries/CassandraBlockInputStream.cpp index 4f6a62a0eea..721cb44a82e 100644 --- a/src/Dictionaries/CassandraBlockInputStream.cpp +++ b/src/Dictionaries/CassandraBlockInputStream.cpp @@ -19,6 +19,7 @@ namespace DB namespace ErrorCodes { extern const int TYPE_MISMATCH; + extern const int UNKNOWN_TYPE; } CassandraBlockInputStream::CassandraBlockInputStream( @@ -140,6 +141,8 @@ void CassandraBlockInputStream::insertValue(IColumn & column, ValueType type, co assert_cast(column).insert(parse(uuid_str.data(), uuid_str.size())); break; } + default: + throw Exception("Unknown type : " + std::to_string(static_cast(type)), ErrorCodes::UNKNOWN_TYPE); } } @@ -252,6 +255,8 @@ void CassandraBlockInputStream::assertTypes(const CassResultPtr & result) expected = CASS_VALUE_TYPE_UUID; expected_text = "uuid"; break; + default: + throw Exception("Unknown type : " + std::to_string(static_cast(description.types[i].first)), ErrorCodes::UNKNOWN_TYPE); } CassValueType got = cass_result_column_type(result, i); diff --git a/src/Dictionaries/ExecutableDictionarySource.cpp b/src/Dictionaries/ExecutableDictionarySource.cpp index 918cf0732ab..cc250727261 100644 --- a/src/Dictionaries/ExecutableDictionarySource.cpp +++ b/src/Dictionaries/ExecutableDictionarySource.cpp @@ -1,12 +1,13 @@ #include "ExecutableDictionarySource.h" -#include -#include +#include #include #include #include #include #include +#include +#include #include #include #include @@ -16,6 +17,7 @@ #include "DictionaryStructure.h" #include "registerDictionaries.h" + namespace DB { static const UInt64 max_block_size = 8192; @@ -31,15 +33,23 @@ namespace /// Owns ShellCommand and calls wait for it. class ShellCommandOwningBlockInputStream : public OwningBlockInputStream { + private: + Poco::Logger * log; public: - ShellCommandOwningBlockInputStream(const BlockInputStreamPtr & impl, std::unique_ptr own_) - : OwningBlockInputStream(std::move(impl), std::move(own_)) + ShellCommandOwningBlockInputStream(Poco::Logger * log_, const BlockInputStreamPtr & impl, std::unique_ptr command_) + : OwningBlockInputStream(std::move(impl), std::move(command_)), log(log_) { } void readSuffix() override { OwningBlockInputStream::readSuffix(); + + std::string err; + readStringUntilEOF(err, own->err); + if (!err.empty()) + LOG_ERROR(log, "Having stderr: {}", err); + own->wait(); } }; @@ -80,7 +90,7 @@ BlockInputStreamPtr ExecutableDictionarySource::loadAll() LOG_TRACE(log, "loadAll {}", toString()); auto process = ShellCommand::execute(command); auto input_stream = context.getInputFormat(format, process->out, sample_block, max_block_size); - return std::make_shared(input_stream, std::move(process)); + return std::make_shared(log, input_stream, std::move(process)); } BlockInputStreamPtr ExecutableDictionarySource::loadUpdatedAll() @@ -95,67 +105,73 @@ BlockInputStreamPtr ExecutableDictionarySource::loadUpdatedAll() LOG_TRACE(log, "loadUpdatedAll {}", command_with_update_field); auto process = ShellCommand::execute(command_with_update_field); auto input_stream = context.getInputFormat(format, process->out, sample_block, max_block_size); - return std::make_shared(input_stream, std::move(process)); + return std::make_shared(log, input_stream, std::move(process)); } namespace { - /** A stream, that also runs and waits for background thread - * (that will feed data into pipe to be read from the other side of the pipe). + /** A stream, that runs child process and sends data to its stdin in background thread, + * and receives data from its stdout. */ class BlockInputStreamWithBackgroundThread final : public IBlockInputStream { public: BlockInputStreamWithBackgroundThread( - const BlockInputStreamPtr & stream_, std::unique_ptr && command_, std::packaged_task && task_) - : stream{stream_}, command{std::move(command_)}, task(std::move(task_)), thread([this] { - task(); - command->in.close(); - }) + const Context & context, + const std::string & format, + const Block & sample_block, + const std::string & command_str, + Poco::Logger * log_, + std::function && send_data_) + : log(log_), + command(ShellCommand::execute(command_str)), + send_data(std::move(send_data_)), + thread([this] { send_data(command->in); }) { - children.push_back(stream); + stream = context.getInputFormat(format, command->out, sample_block, max_block_size); } ~BlockInputStreamWithBackgroundThread() override { if (thread.joinable()) - { - try - { - readSuffix(); - } - catch (...) - { - tryLogCurrentException(__PRETTY_FUNCTION__); - } - } + thread.join(); } - Block getHeader() const override { return stream->getHeader(); } + Block getHeader() const override + { + return stream->getHeader(); + } private: - Block readImpl() override { return stream->read(); } + Block readImpl() override + { + return stream->read(); + } + + void readPrefix() override + { + stream->readPrefix(); + } void readSuffix() override { - IBlockInputStream::readSuffix(); - if (!wait_called) - { - wait_called = true; - command->wait(); - } - thread.join(); - /// To rethrow an exception, if any. - task.get_future().get(); + stream->readSuffix(); + + std::string err; + readStringUntilEOF(err, command->err); + if (!err.empty()) + LOG_ERROR(log, "Having stderr: {}", err); + + command->wait(); } String getName() const override { return "WithBackgroundThread"; } + Poco::Logger * log; BlockInputStreamPtr stream; std::unique_ptr command; - std::packaged_task task; + std::function send_data; ThreadFromGlobalPool thread; - bool wait_called = false; }; } @@ -164,28 +180,29 @@ namespace BlockInputStreamPtr ExecutableDictionarySource::loadIds(const std::vector & ids) { LOG_TRACE(log, "loadIds {} size = {}", toString(), ids.size()); - auto process = ShellCommand::execute(command); - - auto output_stream = context.getOutputFormat(format, process->in, sample_block); - auto input_stream = context.getInputFormat(format, process->out, sample_block, max_block_size); return std::make_shared( - input_stream, std::move(process), std::packaged_task([output_stream, &ids]() mutable { formatIDs(output_stream, ids); })); + context, format, sample_block, command, log, + [&ids, this](WriteBufferFromFile & out) mutable + { + auto output_stream = context.getOutputFormat(format, out, sample_block); + formatIDs(output_stream, ids); + out.close(); + }); } BlockInputStreamPtr ExecutableDictionarySource::loadKeys(const Columns & key_columns, const std::vector & requested_rows) { LOG_TRACE(log, "loadKeys {} size = {}", toString(), requested_rows.size()); - auto process = ShellCommand::execute(command); - - auto output_stream = context.getOutputFormat(format, process->in, sample_block); - auto input_stream = context.getInputFormat(format, process->out, sample_block, max_block_size); return std::make_shared( - input_stream, std::move(process), std::packaged_task([output_stream, key_columns, &requested_rows, this]() mutable + context, format, sample_block, command, log, + [key_columns, &requested_rows, this](WriteBufferFromFile & out) mutable { + auto output_stream = context.getOutputFormat(format, out, sample_block); formatKeys(dict_struct, output_stream, key_columns, requested_rows); - })); + out.close(); + }); } bool ExecutableDictionarySource::isModified() const diff --git a/src/Dictionaries/PolygonDictionaryUtils.h b/src/Dictionaries/PolygonDictionaryUtils.h index 11ec28502af..cd99717f98a 100644 --- a/src/Dictionaries/PolygonDictionaryUtils.h +++ b/src/Dictionaries/PolygonDictionaryUtils.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include @@ -25,8 +25,8 @@ using Ring = IPolygonDictionary::Ring; using Box = bg::model::box; /** SlabsPolygonIndex builds index based on shooting ray down from point. - * When this ray crosses odd number of edges in single polygon, point is considered inside. - * + * When this ray crosses odd number of edges in single polygon, point is considered inside. + * * SlabsPolygonIndex divides plane into vertical slabs, separated by vertical lines going through all points. * For each slab, all edges falling in that slab are effectively stored. * For each find query, required slab is found with binary search, and result is computed diff --git a/src/Dictionaries/RedisBlockInputStream.cpp b/src/Dictionaries/RedisBlockInputStream.cpp index a3ee86ae1d6..a5514d14155 100644 --- a/src/Dictionaries/RedisBlockInputStream.cpp +++ b/src/Dictionaries/RedisBlockInputStream.cpp @@ -26,6 +26,7 @@ namespace DB extern const int LOGICAL_ERROR; extern const int NUMBER_OF_COLUMNS_DOESNT_MATCH; extern const int INTERNAL_REDIS_ERROR; + extern const int UNKNOWN_TYPE; } @@ -103,6 +104,8 @@ namespace DB case ValueType::vtUUID: assert_cast(column).insertValue(parse(string_value)); break; + default: + throw Exception("Value of unsupported type:" + column.getName(), ErrorCodes::UNKNOWN_TYPE); } } } diff --git a/src/Dictionaries/tests/gtest_dictionary_configuration.cpp b/src/Dictionaries/tests/gtest_dictionary_configuration.cpp index fc99a34cd42..453ce2b81f0 100644 --- a/src/Dictionaries/tests/gtest_dictionary_configuration.cpp +++ b/src/Dictionaries/tests/gtest_dictionary_configuration.cpp @@ -1,4 +1,4 @@ -#include +#include #include #include #include diff --git a/src/Disks/DiskDecorator.cpp b/src/Disks/DiskDecorator.cpp index e55534e347f..7f2ea58d7cf 100644 --- a/src/Disks/DiskDecorator.cpp +++ b/src/Disks/DiskDecorator.cpp @@ -165,4 +165,19 @@ void DiskDecorator::truncateFile(const String & path, size_t size) delegate->truncateFile(path, size); } +int DiskDecorator::open(const String & path, mode_t mode) const +{ + return delegate->open(path, mode); +} + +void DiskDecorator::close(int fd) const +{ + delegate->close(fd); +} + +void DiskDecorator::sync(int fd) const +{ + delegate->sync(fd); +} + } diff --git a/src/Disks/DiskDecorator.h b/src/Disks/DiskDecorator.h index 71bb100c576..f1ddfff4952 100644 --- a/src/Disks/DiskDecorator.h +++ b/src/Disks/DiskDecorator.h @@ -42,6 +42,9 @@ public: void setReadOnly(const String & path) override; void createHardLink(const String & src_path, const String & dst_path) override; void truncateFile(const String & path, size_t size) override; + int open(const String & path, mode_t mode) const override; + void close(int fd) const override; + void sync(int fd) const override; const String getType() const override { return delegate->getType(); } protected: diff --git a/src/Disks/DiskFactory.h b/src/Disks/DiskFactory.h index 50520381552..d41f14bd753 100644 --- a/src/Disks/DiskFactory.h +++ b/src/Disks/DiskFactory.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include diff --git a/src/Disks/DiskLocal.cpp b/src/Disks/DiskLocal.cpp index f9e988211da..a09ab7c5ac5 100644 --- a/src/Disks/DiskLocal.cpp +++ b/src/Disks/DiskLocal.cpp @@ -8,7 +8,7 @@ #include #include - +#include namespace DB { @@ -19,6 +19,10 @@ namespace ErrorCodes extern const int EXCESSIVE_ELEMENT_IN_CONFIG; extern const int PATH_ACCESS_DENIED; extern const int INCORRECT_DISK_INDEX; + extern const int FILE_DOESNT_EXIST; + extern const int CANNOT_OPEN_FILE; + extern const int CANNOT_FSYNC; + extern const int CANNOT_CLOSE_FILE; extern const int CANNOT_TRUNCATE_FILE; } @@ -292,6 +296,28 @@ void DiskLocal::copy(const String & from_path, const std::shared_ptr & to IDisk::copy(from_path, to_disk, to_path); /// Copy files through buffers. } +int DiskLocal::open(const String & path, mode_t mode) const +{ + String full_path = disk_path + path; + int fd = ::open(full_path.c_str(), mode); + if (-1 == fd) + throwFromErrnoWithPath("Cannot open file " + full_path, full_path, + errno == ENOENT ? ErrorCodes::FILE_DOESNT_EXIST : ErrorCodes::CANNOT_OPEN_FILE); + return fd; +} + +void DiskLocal::close(int fd) const +{ + if (-1 == ::close(fd)) + throw Exception("Cannot close file", ErrorCodes::CANNOT_CLOSE_FILE); +} + +void DiskLocal::sync(int fd) const +{ + if (-1 == ::fsync(fd)) + throw Exception("Cannot fsync", ErrorCodes::CANNOT_FSYNC); +} + DiskPtr DiskLocalReservation::getDisk(size_t i) const { if (i != 0) diff --git a/src/Disks/DiskLocal.h b/src/Disks/DiskLocal.h index 71c4dc0aec9..762a8502faa 100644 --- a/src/Disks/DiskLocal.h +++ b/src/Disks/DiskLocal.h @@ -99,6 +99,10 @@ public: void createHardLink(const String & src_path, const String & dst_path) override; + int open(const String & path, mode_t mode) const override; + void close(int fd) const override; + void sync(int fd) const override; + void truncateFile(const String & path, size_t size) override; const String getType() const override { return "local"; } diff --git a/src/Disks/DiskMemory.cpp b/src/Disks/DiskMemory.cpp index 96d9e22c414..d185263d48c 100644 --- a/src/Disks/DiskMemory.cpp +++ b/src/Disks/DiskMemory.cpp @@ -408,6 +408,21 @@ void DiskMemory::setReadOnly(const String &) throw Exception("Method setReadOnly is not implemented for memory disks", ErrorCodes::NOT_IMPLEMENTED); } +int DiskMemory::open(const String & /*path*/, mode_t /*mode*/) const +{ + throw Exception("Method open is not implemented for memory disks", ErrorCodes::NOT_IMPLEMENTED); +} + +void DiskMemory::close(int /*fd*/) const +{ + throw Exception("Method close is not implemented for memory disks", ErrorCodes::NOT_IMPLEMENTED); +} + +void DiskMemory::sync(int /*fd*/) const +{ + throw Exception("Method sync is not implemented for memory disks", ErrorCodes::NOT_IMPLEMENTED); +} + void DiskMemory::truncateFile(const String & path, size_t size) { std::lock_guard lock(mutex); diff --git a/src/Disks/DiskMemory.h b/src/Disks/DiskMemory.h index fc265ddef03..4d4b947098b 100644 --- a/src/Disks/DiskMemory.h +++ b/src/Disks/DiskMemory.h @@ -90,6 +90,10 @@ public: void createHardLink(const String & src_path, const String & dst_path) override; + int open(const String & path, mode_t mode) const override; + void close(int fd) const override; + void sync(int fd) const override; + void truncateFile(const String & path, size_t size) override; const String getType() const override { return "memory"; } diff --git a/src/Disks/IDisk.h b/src/Disks/IDisk.h index 53dc4999dc4..688c1dfad42 100644 --- a/src/Disks/IDisk.h +++ b/src/Disks/IDisk.h @@ -1,7 +1,7 @@ #pragma once #include -#include +#include #include #include #include @@ -177,12 +177,24 @@ public: /// Create hardlink from `src_path` to `dst_path`. virtual void createHardLink(const String & src_path, const String & dst_path) = 0; + /// Wrapper for POSIX open + virtual int open(const String & path, mode_t mode) const = 0; + + /// Wrapper for POSIX close + virtual void close(int fd) const = 0; + + /// Wrapper for POSIX fsync + virtual void sync(int fd) const = 0; + /// Truncate file to specified size. virtual void truncateFile(const String & path, size_t size); /// Return disk type - "local", "s3", etc. virtual const String getType() const = 0; + /// Invoked when Global Context is shutdown. + virtual void shutdown() { } + private: /// Returns executor to perform asynchronous operations. Executor & getExecutor() { return *executor; } diff --git a/src/Disks/S3/DiskS3.cpp b/src/Disks/S3/DiskS3.cpp index 5aa57518c83..6abb72efeb0 100644 --- a/src/Disks/S3/DiskS3.cpp +++ b/src/Disks/S3/DiskS3.cpp @@ -33,6 +33,7 @@ namespace ErrorCodes extern const int CANNOT_SEEK_THROUGH_FILE; extern const int UNKNOWN_FORMAT; extern const int INCORRECT_DISK_INDEX; + extern const int NOT_IMPLEMENTED; } namespace @@ -746,4 +747,28 @@ void DiskS3::setReadOnly(const String & path) Poco::File(metadata_path + path).setReadOnly(true); } +int DiskS3::open(const String & /*path*/, mode_t /*mode*/) const +{ + throw Exception("Method open is not implemented for S3 disks", ErrorCodes::NOT_IMPLEMENTED); +} + +void DiskS3::close(int /*fd*/) const +{ + throw Exception("Method close is not implemented for S3 disks", ErrorCodes::NOT_IMPLEMENTED); +} + +void DiskS3::sync(int /*fd*/) const +{ + throw Exception("Method sync is not implemented for S3 disks", ErrorCodes::NOT_IMPLEMENTED); +} + +void DiskS3::shutdown() +{ + /// This call stops any next retry attempts for ongoing S3 requests. + /// If S3 request is failed and the method below is executed S3 client immediately returns the last failed S3 request outcome. + /// If S3 is healthy nothing wrong will be happened and S3 requests will be processed in a regular way without errors. + /// This should significantly speed up shutdown process if S3 is unhealthy. + client->DisableRequestProcessing(); +} + } diff --git a/src/Disks/S3/DiskS3.h b/src/Disks/S3/DiskS3.h index 34f00af6439..2d9c7f79865 100644 --- a/src/Disks/S3/DiskS3.h +++ b/src/Disks/S3/DiskS3.h @@ -100,8 +100,14 @@ public: void setReadOnly(const String & path) override; + int open(const String & path, mode_t mode) const override; + void close(int fd) const override; + void sync(int fd) const override; + const String getType() const override { return "s3"; } + void shutdown() override; + private: bool tryReserve(UInt64 bytes); diff --git a/src/Disks/S3/ProxyConfiguration.h b/src/Disks/S3/ProxyConfiguration.h index 62aec0e005e..32a1c8d3c45 100644 --- a/src/Disks/S3/ProxyConfiguration.h +++ b/src/Disks/S3/ProxyConfiguration.h @@ -1,7 +1,7 @@ #pragma once #include -#include +#include #include #include diff --git a/src/Disks/S3/registerDiskS3.cpp b/src/Disks/S3/registerDiskS3.cpp index 341ada59631..fbd19ce1cd9 100644 --- a/src/Disks/S3/registerDiskS3.cpp +++ b/src/Disks/S3/registerDiskS3.cpp @@ -145,9 +145,12 @@ void registerDiskS3(DiskFactory & factory) config.getUInt64(config_prefix + ".min_bytes_for_seek", 1024 * 1024)); /// This code is used only to check access to the corresponding disk. - checkWriteAccess(*s3disk); - checkReadAccess(name, *s3disk); - checkRemoveAccess(*s3disk); + if (!config.getBool(config_prefix + ".skip_access_check", false)) + { + checkWriteAccess(*s3disk); + checkReadAccess(name, *s3disk); + checkRemoveAccess(*s3disk); + } bool cache_enabled = config.getBool(config_prefix + ".cache_enabled", true); diff --git a/src/Formats/FormatFactory.cpp b/src/Formats/FormatFactory.cpp index 935d31d6541..522149d3cfd 100644 --- a/src/Formats/FormatFactory.cpp +++ b/src/Formats/FormatFactory.cpp @@ -111,6 +111,7 @@ static FormatSettings getOutputFormatSetting(const Settings & settings, const Co format_settings.template_settings.row_format = settings.format_template_row; format_settings.template_settings.row_between_delimiter = settings.format_template_rows_between_delimiter; format_settings.tsv.crlf_end_of_line = settings.output_format_tsv_crlf_end_of_line; + format_settings.tsv.null_representation = settings.output_format_tsv_null_representation; format_settings.write_statistics = settings.output_format_write_statistics; format_settings.parquet.row_group_size = settings.output_format_parquet_row_group_size; format_settings.schema.format_schema = settings.format_schema; @@ -323,13 +324,86 @@ void FormatFactory::registerFileSegmentationEngine(const String & name, FileSegm target = std::move(file_segmentation_engine); } +/// File Segmentation Engines for parallel reading + +void registerFileSegmentationEngineTabSeparated(FormatFactory & factory); +void registerFileSegmentationEngineCSV(FormatFactory & factory); +void registerFileSegmentationEngineJSONEachRow(FormatFactory & factory); +void registerFileSegmentationEngineRegexp(FormatFactory & factory); +void registerFileSegmentationEngineJSONAsString(FormatFactory & factory); + +/// Formats for both input/output. + +void registerInputFormatNative(FormatFactory & factory); +void registerOutputFormatNative(FormatFactory & factory); + +void registerInputFormatProcessorNative(FormatFactory & factory); +void registerOutputFormatProcessorNative(FormatFactory & factory); +void registerInputFormatProcessorRowBinary(FormatFactory & factory); +void registerOutputFormatProcessorRowBinary(FormatFactory & factory); +void registerInputFormatProcessorTabSeparated(FormatFactory & factory); +void registerOutputFormatProcessorTabSeparated(FormatFactory & factory); +void registerInputFormatProcessorValues(FormatFactory & factory); +void registerOutputFormatProcessorValues(FormatFactory & factory); +void registerInputFormatProcessorCSV(FormatFactory & factory); +void registerOutputFormatProcessorCSV(FormatFactory & factory); +void registerInputFormatProcessorTSKV(FormatFactory & factory); +void registerOutputFormatProcessorTSKV(FormatFactory & factory); +void registerInputFormatProcessorJSONEachRow(FormatFactory & factory); +void registerOutputFormatProcessorJSONEachRow(FormatFactory & factory); +void registerInputFormatProcessorJSONCompactEachRow(FormatFactory & factory); +void registerOutputFormatProcessorJSONCompactEachRow(FormatFactory & factory); +void registerInputFormatProcessorProtobuf(FormatFactory & factory); +void registerOutputFormatProcessorProtobuf(FormatFactory & factory); +void registerInputFormatProcessorTemplate(FormatFactory & factory); +void registerOutputFormatProcessorTemplate(FormatFactory & factory); +void registerInputFormatProcessorMsgPack(FormatFactory & factory); +void registerOutputFormatProcessorMsgPack(FormatFactory & factory); +void registerInputFormatProcessorORC(FormatFactory & factory); +void registerOutputFormatProcessorORC(FormatFactory & factory); +void registerInputFormatProcessorParquet(FormatFactory & factory); +void registerOutputFormatProcessorParquet(FormatFactory & factory); +void registerInputFormatProcessorArrow(FormatFactory & factory); +void registerOutputFormatProcessorArrow(FormatFactory & factory); +void registerInputFormatProcessorAvro(FormatFactory & factory); +void registerOutputFormatProcessorAvro(FormatFactory & factory); + +/// Output only (presentational) formats. + +void registerOutputFormatNull(FormatFactory & factory); + +void registerOutputFormatProcessorPretty(FormatFactory & factory); +void registerOutputFormatProcessorPrettyCompact(FormatFactory & factory); +void registerOutputFormatProcessorPrettySpace(FormatFactory & factory); +void registerOutputFormatProcessorVertical(FormatFactory & factory); +void registerOutputFormatProcessorJSON(FormatFactory & factory); +void registerOutputFormatProcessorJSONCompact(FormatFactory & factory); +void registerOutputFormatProcessorJSONEachRowWithProgress(FormatFactory & factory); +void registerOutputFormatProcessorXML(FormatFactory & factory); +void registerOutputFormatProcessorODBCDriver2(FormatFactory & factory); +void registerOutputFormatProcessorNull(FormatFactory & factory); +void registerOutputFormatProcessorMySQLWire(FormatFactory & factory); +void registerOutputFormatProcessorMarkdown(FormatFactory & factory); +void registerOutputFormatProcessorPostgreSQLWire(FormatFactory & factory); + +/// Input only formats. + +void registerInputFormatProcessorRegexp(FormatFactory & factory); +void registerInputFormatProcessorJSONAsString(FormatFactory & factory); +void registerInputFormatProcessorLineAsString(FormatFactory & factory); +void registerInputFormatProcessorCapnProto(FormatFactory & factory); + FormatFactory::FormatFactory() { + registerFileSegmentationEngineTabSeparated(*this); + registerFileSegmentationEngineCSV(*this); + registerFileSegmentationEngineJSONEachRow(*this); + registerFileSegmentationEngineRegexp(*this); + registerFileSegmentationEngineJSONAsString(*this); + registerInputFormatNative(*this); registerOutputFormatNative(*this); - registerOutputFormatProcessorJSONEachRowWithProgress(*this); - registerInputFormatProcessorNative(*this); registerOutputFormatProcessorNative(*this); registerInputFormatProcessorRowBinary(*this); @@ -348,8 +422,11 @@ FormatFactory::FormatFactory() registerOutputFormatProcessorJSONCompactEachRow(*this); registerInputFormatProcessorProtobuf(*this); registerOutputFormatProcessorProtobuf(*this); + registerInputFormatProcessorTemplate(*this); + registerOutputFormatProcessorTemplate(*this); + registerInputFormatProcessorMsgPack(*this); + registerOutputFormatProcessorMsgPack(*this); #if !defined(ARCADIA_BUILD) - registerInputFormatProcessorCapnProto(*this); registerInputFormatProcessorORC(*this); registerOutputFormatProcessorORC(*this); registerInputFormatProcessorParquet(*this); @@ -359,18 +436,6 @@ FormatFactory::FormatFactory() registerInputFormatProcessorAvro(*this); registerOutputFormatProcessorAvro(*this); #endif - registerInputFormatProcessorTemplate(*this); - registerOutputFormatProcessorTemplate(*this); - registerInputFormatProcessorRegexp(*this); - registerInputFormatProcessorMsgPack(*this); - registerOutputFormatProcessorMsgPack(*this); - registerInputFormatProcessorJSONAsString(*this); - - registerFileSegmentationEngineTabSeparated(*this); - registerFileSegmentationEngineCSV(*this); - registerFileSegmentationEngineJSONEachRow(*this); - registerFileSegmentationEngineRegexp(*this); - registerFileSegmentationEngineJSONAsString(*this); registerOutputFormatNull(*this); @@ -380,12 +445,20 @@ FormatFactory::FormatFactory() registerOutputFormatProcessorVertical(*this); registerOutputFormatProcessorJSON(*this); registerOutputFormatProcessorJSONCompact(*this); + registerOutputFormatProcessorJSONEachRowWithProgress(*this); registerOutputFormatProcessorXML(*this); registerOutputFormatProcessorODBCDriver2(*this); registerOutputFormatProcessorNull(*this); registerOutputFormatProcessorMySQLWire(*this); registerOutputFormatProcessorMarkdown(*this); registerOutputFormatProcessorPostgreSQLWire(*this); + + registerInputFormatProcessorRegexp(*this); + registerInputFormatProcessorJSONAsString(*this); + registerInputFormatProcessorLineAsString(*this); +#if !defined(ARCADIA_BUILD) + registerInputFormatProcessorCapnProto(*this); +#endif } FormatFactory & FormatFactory::instance() diff --git a/src/Formats/FormatFactory.h b/src/Formats/FormatFactory.h index f0d2b7826a0..de53490dd3b 100644 --- a/src/Formats/FormatFactory.h +++ b/src/Formats/FormatFactory.h @@ -1,6 +1,6 @@ #pragma once -#include +#include #include #include #include @@ -141,73 +141,4 @@ private: const Creators & getCreators(const String & name) const; }; -/// Formats for both input/output. - -void registerInputFormatNative(FormatFactory & factory); -void registerOutputFormatNative(FormatFactory & factory); - -void registerInputFormatProcessorNative(FormatFactory & factory); -void registerOutputFormatProcessorNative(FormatFactory & factory); -void registerInputFormatProcessorRowBinary(FormatFactory & factory); -void registerOutputFormatProcessorRowBinary(FormatFactory & factory); -void registerInputFormatProcessorTabSeparated(FormatFactory & factory); -void registerOutputFormatProcessorTabSeparated(FormatFactory & factory); -void registerInputFormatProcessorValues(FormatFactory & factory); -void registerOutputFormatProcessorValues(FormatFactory & factory); -void registerInputFormatProcessorCSV(FormatFactory & factory); -void registerOutputFormatProcessorCSV(FormatFactory & factory); -void registerInputFormatProcessorTSKV(FormatFactory & factory); -void registerOutputFormatProcessorTSKV(FormatFactory & factory); -void registerInputFormatProcessorJSONEachRow(FormatFactory & factory); -void registerOutputFormatProcessorJSONEachRow(FormatFactory & factory); -void registerInputFormatProcessorJSONCompactEachRow(FormatFactory & factory); -void registerOutputFormatProcessorJSONCompactEachRow(FormatFactory & factory); -void registerInputFormatProcessorParquet(FormatFactory & factory); -void registerOutputFormatProcessorParquet(FormatFactory & factory); -void registerInputFormatProcessorArrow(FormatFactory & factory); -void registerOutputFormatProcessorArrow(FormatFactory & factory); -void registerInputFormatProcessorProtobuf(FormatFactory & factory); -void registerOutputFormatProcessorProtobuf(FormatFactory & factory); -void registerInputFormatProcessorAvro(FormatFactory & factory); -void registerOutputFormatProcessorAvro(FormatFactory & factory); -void registerInputFormatProcessorTemplate(FormatFactory & factory); -void registerOutputFormatProcessorTemplate(FormatFactory & factory); -void registerInputFormatProcessorMsgPack(FormatFactory & factory); -void registerOutputFormatProcessorMsgPack(FormatFactory & factory); -void registerInputFormatProcessorORC(FormatFactory & factory); -void registerOutputFormatProcessorORC(FormatFactory & factory); - - -/// File Segmentation Engines for parallel reading - -void registerFileSegmentationEngineTabSeparated(FormatFactory & factory); -void registerFileSegmentationEngineCSV(FormatFactory & factory); -void registerFileSegmentationEngineJSONEachRow(FormatFactory & factory); -void registerFileSegmentationEngineRegexp(FormatFactory & factory); -void registerFileSegmentationEngineJSONAsString(FormatFactory & factory); - -/// Output only (presentational) formats. - -void registerOutputFormatNull(FormatFactory & factory); - -void registerOutputFormatProcessorPretty(FormatFactory & factory); -void registerOutputFormatProcessorPrettyCompact(FormatFactory & factory); -void registerOutputFormatProcessorPrettySpace(FormatFactory & factory); -void registerOutputFormatProcessorPrettyASCII(FormatFactory & factory); -void registerOutputFormatProcessorVertical(FormatFactory & factory); -void registerOutputFormatProcessorJSON(FormatFactory & factory); -void registerOutputFormatProcessorJSONCompact(FormatFactory & factory); -void registerOutputFormatProcessorJSONEachRowWithProgress(FormatFactory & factory); -void registerOutputFormatProcessorXML(FormatFactory & factory); -void registerOutputFormatProcessorODBCDriver2(FormatFactory & factory); -void registerOutputFormatProcessorNull(FormatFactory & factory); -void registerOutputFormatProcessorMySQLWire(FormatFactory & factory); -void registerOutputFormatProcessorMarkdown(FormatFactory & factory); -void registerOutputFormatProcessorPostgreSQLWire(FormatFactory & factory); - -/// Input only formats. -void registerInputFormatProcessorCapnProto(FormatFactory & factory); -void registerInputFormatProcessorRegexp(FormatFactory & factory); -void registerInputFormatProcessorJSONAsString(FormatFactory & factory); - } diff --git a/src/Formats/FormatSchemaInfo.h b/src/Formats/FormatSchemaInfo.h index 7af0d56a0cf..67f1baca84b 100644 --- a/src/Formats/FormatSchemaInfo.h +++ b/src/Formats/FormatSchemaInfo.h @@ -1,6 +1,6 @@ #pragma once -#include +#include namespace DB { diff --git a/src/Formats/FormatSettings.h b/src/Formats/FormatSettings.h index 299ec353f03..cd5cab8cf5a 100644 --- a/src/Formats/FormatSettings.h +++ b/src/Formats/FormatSettings.h @@ -1,6 +1,6 @@ #pragma once -#include +#include namespace DB @@ -78,6 +78,7 @@ struct FormatSettings { bool empty_as_default = false; bool crlf_end_of_line = false; + String null_representation = "\\N"; }; TSV tsv; diff --git a/src/Formats/IRowOutputStream.h b/src/Formats/IRowOutputStream.h index 3b18603ee69..7cf6251cd0d 100644 --- a/src/Formats/IRowOutputStream.h +++ b/src/Formats/IRowOutputStream.h @@ -3,7 +3,7 @@ #include #include #include -#include +#include namespace DB diff --git a/src/Formats/MySQLBlockInputStream.cpp b/src/Formats/MySQLBlockInputStream.cpp index 17c09cdc14d..f85680c0031 100644 --- a/src/Formats/MySQLBlockInputStream.cpp +++ b/src/Formats/MySQLBlockInputStream.cpp @@ -7,13 +7,15 @@ # include # include # include +# include +# include +# include # include # include # include # include # include "MySQLBlockInputStream.h" - namespace DB { namespace ErrorCodes @@ -39,7 +41,7 @@ namespace { using ValueType = ExternalResultDescription::ValueType; - void insertValue(IColumn & column, const ValueType type, const mysqlxx::Value & value) + void insertValue(const IDataType & data_type, IColumn & column, const ValueType type, const mysqlxx::Value & value) { switch (type) { @@ -85,6 +87,15 @@ namespace case ValueType::vtUUID: assert_cast(column).insert(parse(value.data(), value.size())); break; + case ValueType::vtDateTime64:[[fallthrough]]; + case ValueType::vtDecimal32: [[fallthrough]]; + case ValueType::vtDecimal64: [[fallthrough]]; + case ValueType::vtDecimal128: + { + ReadBuffer buffer(const_cast(value.data()), value.size(), 0); + data_type.deserializeAsWholeText(column, buffer, FormatSettings{}); + break; + } } } @@ -112,19 +123,21 @@ Block MySQLBlockInputStream::readImpl() for (const auto idx : ext::range(0, row.size())) { const auto value = row[idx]; + const auto & sample = description.sample_block.getByPosition(idx); if (!value.isNull()) { if (description.types[idx].second) { ColumnNullable & column_nullable = assert_cast(*columns[idx]); - insertValue(column_nullable.getNestedColumn(), description.types[idx].first, value); + const auto & data_type = assert_cast(*sample.type); + insertValue(*data_type.getNestedType(), column_nullable.getNestedColumn(), description.types[idx].first, value); column_nullable.getNullMapData().emplace_back(0); } else - insertValue(*columns[idx], description.types[idx].first, value); + insertValue(*sample.type, *columns[idx], description.types[idx].first, value); } else - insertDefaultValue(*columns[idx], *description.sample_block.getByPosition(idx).column); + insertDefaultValue(*columns[idx], *sample.column); } ++num_rows; diff --git a/src/Formats/ParsedTemplateFormatString.h b/src/Formats/ParsedTemplateFormatString.h index 2da8a074679..f2e801faeab 100644 --- a/src/Formats/ParsedTemplateFormatString.h +++ b/src/Formats/ParsedTemplateFormatString.h @@ -1,8 +1,9 @@ #pragma once -#include +#include #include #include +#include #include #include @@ -10,6 +11,7 @@ namespace DB { class Block; +using Strings = std::vector; struct ParsedTemplateFormatString { diff --git a/src/Formats/ProtobufColumnMatcher.h b/src/Formats/ProtobufColumnMatcher.h index 03c5ec40fc6..35521be7a9b 100644 --- a/src/Formats/ProtobufColumnMatcher.h +++ b/src/Formats/ProtobufColumnMatcher.h @@ -8,7 +8,7 @@ # include # include # include -# include +# include # include # include # include diff --git a/src/Formats/ProtobufSchemas.h b/src/Formats/ProtobufSchemas.h index 590c479bcc8..05778a85343 100644 --- a/src/Formats/ProtobufSchemas.h +++ b/src/Formats/ProtobufSchemas.h @@ -5,7 +5,7 @@ #include #include -#include +#include #include diff --git a/src/Functions/CMakeLists.txt b/src/Functions/CMakeLists.txt index 78caabb6941..0a99a034a33 100644 --- a/src/Functions/CMakeLists.txt +++ b/src/Functions/CMakeLists.txt @@ -53,8 +53,28 @@ endif() target_include_directories(clickhouse_functions SYSTEM PRIVATE ${SPARSEHASH_INCLUDE_DIR}) -# Won't generate debug info for files with heavy template instantiation to achieve faster linking and lower size. -target_compile_options(clickhouse_functions PRIVATE "-g0") +if (CMAKE_BUILD_TYPE_UC STREQUAL "RELEASE" + OR CMAKE_BUILD_TYPE_UC STREQUAL "RELWITHDEBINFO" + OR CMAKE_BUILD_TYPE_UC STREQUAL "MINSIZEREL") + set (STRIP_DSF_DEFAULT ON) +else() + set (STRIP_DSF_DEFAULT OFF) +endif() + + +option(STRIP_DEBUG_SYMBOLS_FUNCTIONS + "Do not generate debugger info for ClickHouse functions. + Provides faster linking and lower binary size. + Tradeoff is the inability to debug some source files with e.g. gdb + (empty stack frames and no local variables)." + ${STRIP_DSF_DEFAULT}) + +if (STRIP_DEBUG_SYMBOLS_FUNCTIONS) + message(WARNING "Not generating debugger info for ClickHouse functions") + target_compile_options(clickhouse_functions PRIVATE "-g0") +else() + message(STATUS "Generating debugger info for ClickHouse functions") +endif() if (USE_ICU) target_link_libraries (clickhouse_functions PRIVATE ${ICU_LIBRARIES}) diff --git a/src/Functions/CRC.cpp b/src/Functions/CRC.cpp index 96edf9a0d8e..6083e5ef16f 100644 --- a/src/Functions/CRC.cpp +++ b/src/Functions/CRC.cpp @@ -72,6 +72,9 @@ namespace ErrorCodes extern const int ILLEGAL_TYPE_OF_ARGUMENT; } +namespace +{ + template struct CRCFunctionWrapper { @@ -127,6 +130,8 @@ using FunctionCRC32IEEE = FunctionCRC; // Uses CRC-64-ECMA polynomial using FunctionCRC64ECMA = FunctionCRC; +} + template void registerFunctionCRCImpl(FunctionFactory & factory) { diff --git a/src/Functions/CustomWeekTransforms.h b/src/Functions/CustomWeekTransforms.h index 97752d51263..86e1c444a78 100644 --- a/src/Functions/CustomWeekTransforms.h +++ b/src/Functions/CustomWeekTransforms.h @@ -2,7 +2,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/src/Functions/DateTimeTransforms.h b/src/Functions/DateTimeTransforms.h index 6e2c3ea9ea6..6220d10a17d 100644 --- a/src/Functions/DateTimeTransforms.h +++ b/src/Functions/DateTimeTransforms.h @@ -1,5 +1,5 @@ #pragma once -#include +#include #include #include #include diff --git a/src/Functions/DummyJSONParser.h b/src/Functions/DummyJSONParser.h index 4f4facba957..a71c90e4a19 100644 --- a/src/Functions/DummyJSONParser.h +++ b/src/Functions/DummyJSONParser.h @@ -1,7 +1,7 @@ #pragma once #include -#include +#include namespace DB { diff --git a/src/Functions/FunctionBinaryArithmetic.h b/src/Functions/FunctionBinaryArithmetic.h index 2a467451684..bbb08c4068f 100644 --- a/src/Functions/FunctionBinaryArithmetic.h +++ b/src/Functions/FunctionBinaryArithmetic.h @@ -22,11 +22,15 @@ #include #include "IFunctionImpl.h" #include "FunctionHelpers.h" +#include "IsOperation.h" #include "DivisionUtils.h" #include "castTypeToEither.h" #include "FunctionFactory.h" #include #include +#include +#include +#include #if !defined(ARCADIA_BUILD) # include @@ -50,6 +54,7 @@ namespace ErrorCodes extern const int LOGICAL_ERROR; extern const int DECIMAL_OVERFLOW; extern const int CANNOT_ADD_DIFFERENT_AGGREGATE_STATES; + extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH; } @@ -60,7 +65,7 @@ namespace ErrorCodes */ template -struct BinaryOperationImplBase +struct BinaryOperation { using ResultType = ResultType_; static const constexpr bool allow_fixed_string = false; @@ -162,38 +167,35 @@ struct FixedStringOperationImpl template -struct BinaryOperationImpl : BinaryOperationImplBase +struct BinaryOperationImpl : BinaryOperation { }; - -template struct PlusImpl; -template struct MinusImpl; -template struct MultiplyImpl; -template struct DivideFloatingImpl; -template struct DivideIntegralImpl; -template struct DivideIntegralOrZeroImpl; -template struct LeastBaseImpl; -template struct GreatestBaseImpl; -template struct ModuloImpl; - +template +inline constexpr const auto & undec(const T & x) +{ + if constexpr (IsDecimalNumber) + return x.value; + else + return x; +} /// Binary operations for Decimals need scale args /// +|- scale one of args (which scale factor is not 1). ScaleR = oneof(Scale1, Scale2); /// * no agrs scale. ScaleR = Scale1 + Scale2; /// / first arg scale. ScaleR = Scale1 (scale_a = DecimalType::getScale()). -template typename Operation, typename ResultType_, bool _check_overflow = true> +template