Merge branch 'master' into amosbird-fixcrash

This commit is contained in:
Alexey Milovidov 2020-11-12 13:47:50 +03:00
commit 1af28be77e
423 changed files with 6080 additions and 2297 deletions

3
.gitmodules vendored
View File

@ -190,3 +190,6 @@
path = contrib/croaring
url = https://github.com/RoaringBitmap/CRoaring
branch = v0.2.66
[submodule "contrib/miniselect"]
path = contrib/miniselect
url = https://github.com/danlark1/miniselect

View File

@ -1,3 +1,122 @@
### ClickHouse release v20.11.2.1, 2020-11-11
#### Backward Incompatible Change
* If some `profile` was specified in `distributed_ddl` config section, then this profile could overwrite settings of `default` profile on server startup. It's fixed, now settings of distributed DDL queries should not affect global server settings. [#16635](https://github.com/ClickHouse/ClickHouse/pull/16635) ([tavplubix](https://github.com/tavplubix)).
* Restrict to use of non-comparable data types (like `AggregateFunction`) in keys (Sorting key, Primary key, Partition key, and so on). [#16601](https://github.com/ClickHouse/ClickHouse/pull/16601) ([alesapin](https://github.com/alesapin)).
* Remove `ANALYZE` and `AST` queries, and make the setting `enable_debug_queries` obsolete since now it is the part of full featured `EXPLAIN` query. [#16536](https://github.com/ClickHouse/ClickHouse/pull/16536) ([Ivan](https://github.com/abyss7)).
* Aggregate functions `boundingRatio`, `rankCorr`, `retention`, `timeSeriesGroupSum`, `timeSeriesGroupRateSum`, `windowFunnel` were erroneously made case-insensitive. Now their names are made case sensitive as designed. Only functions that are specified in SQL standard or made for compatibility with other DBMS or functions similar to those should be case-insensitive. [#16407](https://github.com/ClickHouse/ClickHouse/pull/16407) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Make `rankCorr` function return nan on insufficient data https://github.com/ClickHouse/ClickHouse/issues/16124. [#16135](https://github.com/ClickHouse/ClickHouse/pull/16135) ([hexiaoting](https://github.com/hexiaoting)).
#### New Feature
* Added support of LDAP as a user directory for locally non-existent users. [#12736](https://github.com/ClickHouse/ClickHouse/pull/12736) ([Denis Glazachev](https://github.com/traceon)).
* Add `system.replicated_fetches` table which shows currently running background fetches. [#16428](https://github.com/ClickHouse/ClickHouse/pull/16428) ([alesapin](https://github.com/alesapin)).
* Added setting `date_time_output_format`. [#15845](https://github.com/ClickHouse/ClickHouse/pull/15845) ([Maksim Kita](https://github.com/kitaisreal)).
* Added minimal web UI to ClickHouse. [#16158](https://github.com/ClickHouse/ClickHouse/pull/16158) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Allows to read/write Single protobuf message at once (w/o length-delimiters). [#15199](https://github.com/ClickHouse/ClickHouse/pull/15199) ([filimonov](https://github.com/filimonov)).
* Added initial OpenTelemetry support. ClickHouse now accepts OpenTelemetry traceparent headers over Native and HTTP protocols, and passes them downstream in some cases. The trace spans for executed queries are saved into the `system.opentelemetry_span_log` table. [#14195](https://github.com/ClickHouse/ClickHouse/pull/14195) ([Alexander Kuzmenkov](https://github.com/akuzm)).
* Allow specify primary key in column list of `CREATE TABLE` query. This is needed for compatibility with other SQL dialects. [#15823](https://github.com/ClickHouse/ClickHouse/pull/15823) ([Maksim Kita](https://github.com/kitaisreal)).
* Implement `OFFSET offset_row_count {ROW | ROWS} FETCH {FIRST | NEXT} fetch_row_count {ROW | ROWS} {ONLY | WITH TIES}` in SELECT query with ORDER BY. This is the SQL-standard way to specify `LIMIT`. [#15855](https://github.com/ClickHouse/ClickHouse/pull/15855) ([hexiaoting](https://github.com/hexiaoting)).
* `errorCodeToName` function - return variable name of the error (useful for analyzing query_log and similar). `system.errors` table - shows how many times errors has been happened (respects `system_events_show_zero_values`). [#16438](https://github.com/ClickHouse/ClickHouse/pull/16438) ([Azat Khuzhin](https://github.com/azat)).
* Added function `untuple` which is a special function which can introduce new columns to the SELECT list by expanding a named tuple. [#16242](https://github.com/ClickHouse/ClickHouse/pull/16242) ([Nikolai Kochetov](https://github.com/KochetovNicolai), [Amos Bird](https://github.com/amosbird)).
* Now we can provide identifiers via query parameters. And these parameters can be used as table objects or columns. [#16594](https://github.com/ClickHouse/ClickHouse/pull/16594) ([Amos Bird](https://github.com/amosbird)).
* Added big integers (UInt256, Int128, Int256) and UUID data types support for MergeTree BloomFilter index. Big integers is an experimental feature. [#16642](https://github.com/ClickHouse/ClickHouse/pull/16642) ([Maksim Kita](https://github.com/kitaisreal)).
* Add `farmFingerprint64` function (non-cryptographic string hashing). [#16570](https://github.com/ClickHouse/ClickHouse/pull/16570) ([Jacob Hayes](https://github.com/JacobHayes)).
* Add `log_queries_min_query_duration_ms`, only queries slower then the value of this setting will go to `query_log`/`query_thread_log` (i.e. something like `slow_query_log` in mysql). [#16529](https://github.com/ClickHouse/ClickHouse/pull/16529) ([Azat Khuzhin](https://github.com/azat)).
* Ability to create a docker image on the top of `Alpine`. Uses precompiled binary and glibc components from ubuntu 20.04. [#16479](https://github.com/ClickHouse/ClickHouse/pull/16479) ([filimonov](https://github.com/filimonov)).
* Added `toUUIDOrNull`, `toUUIDOrZero` cast functions. [#16337](https://github.com/ClickHouse/ClickHouse/pull/16337) ([Maksim Kita](https://github.com/kitaisreal)).
* Add `max_concurrent_queries_for_all_users` setting, see [#6636](https://github.com/ClickHouse/ClickHouse/issues/6636) for use cases. [#16154](https://github.com/ClickHouse/ClickHouse/pull/16154) ([nvartolomei](https://github.com/nvartolomei)).
* Add a new option `print_query_id` to clickhouse-client. It helps generate arbitrary strings with the current query id generated by the client. Also print query id in clickhouse-client by default. [#15809](https://github.com/ClickHouse/ClickHouse/pull/15809) ([Amos Bird](https://github.com/amosbird)).
* Add `tid` and `logTrace` functions. This closes [#9434](https://github.com/ClickHouse/ClickHouse/issues/9434). [#15803](https://github.com/ClickHouse/ClickHouse/pull/15803) ([flynn](https://github.com/ucasFL)).
* Add function `formatReadableTimeDelta` that format time delta to human readable string ... [#15497](https://github.com/ClickHouse/ClickHouse/pull/15497) ([Filipe Caixeta](https://github.com/filipecaixeta)).
* Added `disable_merges` option for volumes in multi-disk configuration. [#13956](https://github.com/ClickHouse/ClickHouse/pull/13956) ([Vladimir Chebotarev](https://github.com/excitoon)).
#### Experimental Feature
* New functions `encrypt`, `aes_encrypt_mysql`, `decrypt`, `aes_decrypt_mysql`. These functions are working slowly, so we consider it as an experimental feature. [#11844](https://github.com/ClickHouse/ClickHouse/pull/11844) ([Vasily Nemkov](https://github.com/Enmk)).
#### Bug Fix
* Mask password in data_path in the `system.distribution_queue`. [#16727](https://github.com/ClickHouse/ClickHouse/pull/16727) ([Azat Khuzhin](https://github.com/azat)).
* Fix `IN` operator over several columns and tuples with enabled `transform_null_in` setting. Fixes [#15310](https://github.com/ClickHouse/ClickHouse/issues/15310). [#16722](https://github.com/ClickHouse/ClickHouse/pull/16722) ([Anton Popov](https://github.com/CurtizJ)).
* The setting `max_parallel_replicas` worked incorrectly if the queried table has no sampling. This fixes [#5733](https://github.com/ClickHouse/ClickHouse/issues/5733). [#16675](https://github.com/ClickHouse/ClickHouse/pull/16675) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix optimize_read_in_order/optimize_aggregation_in_order with max_threads > 0 and expression in ORDER BY. [#16637](https://github.com/ClickHouse/ClickHouse/pull/16637) ([Azat Khuzhin](https://github.com/azat)).
* Calculation of `DEFAULT` expressions was involving possible name collisions (that was very unlikely to encounter). This fixes [#9359](https://github.com/ClickHouse/ClickHouse/issues/9359). [#16612](https://github.com/ClickHouse/ClickHouse/pull/16612) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix `query_thread_log.query_duration_ms` unit. [#16563](https://github.com/ClickHouse/ClickHouse/pull/16563) ([Azat Khuzhin](https://github.com/azat)).
* Fix a bug when using MySQL Master -> MySQL Slave -> ClickHouse MaterializeMySQL Engine. `MaterializeMySQL` is an experimental feature. [#16504](https://github.com/ClickHouse/ClickHouse/pull/16504) ([TCeason](https://github.com/TCeason)).
* Specifically crafted argument of `round` function with `Decimal` was leading to integer division by zero. This fixes [#13338](https://github.com/ClickHouse/ClickHouse/issues/13338). [#16451](https://github.com/ClickHouse/ClickHouse/pull/16451) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix DROP TABLE for Distributed (racy with INSERT). [#16409](https://github.com/ClickHouse/ClickHouse/pull/16409) ([Azat Khuzhin](https://github.com/azat)).
* Fix processing of very large entries in replication queue. Very large entries may appear in ALTER queries if table structure is extremely large (near 1 MB). This fixes [#16307](https://github.com/ClickHouse/ClickHouse/issues/16307). [#16332](https://github.com/ClickHouse/ClickHouse/pull/16332) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fixed the inconsistent behaviour when a part of return data could be dropped because the set for its filtration wasn't created. [#16308](https://github.com/ClickHouse/ClickHouse/pull/16308) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Fix dictGet in sharding_key (and similar places, i.e. when the function context is stored permanently). [#16205](https://github.com/ClickHouse/ClickHouse/pull/16205) ([Azat Khuzhin](https://github.com/azat)).
* Fix the exception thrown in `clickhouse-local` when trying to execute `OPTIMIZE` command. Fixes [#16076](https://github.com/ClickHouse/ClickHouse/issues/16076). [#16192](https://github.com/ClickHouse/ClickHouse/pull/16192) ([filimonov](https://github.com/filimonov)).
* Fixes [#15780](https://github.com/ClickHouse/ClickHouse/issues/15780) regression, e.g. `indexOf([1, 2, 3], toLowCardinality(1))` now is prohibited but it should not be. [#16038](https://github.com/ClickHouse/ClickHouse/pull/16038) ([Mike](https://github.com/myrrc)).
* Fix bug with MySQL database. When MySQL server used as database engine is down some queries raise Exception, because they try to get tables from disabled server, while it's unnecessary. For example, query `SELECT ... FROM system.parts` should work only with MergeTree tables and don't touch MySQL database at all. [#16032](https://github.com/ClickHouse/ClickHouse/pull/16032) ([Kruglov Pavel](https://github.com/Avogar)).
* Now exception will be thrown when `ALTER MODIFY COLUMN ... DEFAULT ...` has incompatible default with column type. Fixes [#15854](https://github.com/ClickHouse/ClickHouse/issues/15854). [#15858](https://github.com/ClickHouse/ClickHouse/pull/15858) ([alesapin](https://github.com/alesapin)).
* Fixed IPv4CIDRToRange/IPv6CIDRToRange functions to accept const IP-column values. [#15856](https://github.com/ClickHouse/ClickHouse/pull/15856) ([vladimir-golovchenko](https://github.com/vladimir-golovchenko)).
#### Improvement
* Treat `INTERVAL '1 hour'` as equivalent to `INTERVAL 1 HOUR`, to be compatible with Postgres and similar. This fixes [#15637](https://github.com/ClickHouse/ClickHouse/issues/15637). [#15978](https://github.com/ClickHouse/ClickHouse/pull/15978) ([flynn](https://github.com/ucasFL)).
* Enable parsing enum values by their numeric ids for CSV, TSV and JSON input formats. [#15685](https://github.com/ClickHouse/ClickHouse/pull/15685) ([vivarum](https://github.com/vivarum)).
* Better read task scheduling for JBOD architecture and `MergeTree` storage. New setting `read_backoff_min_concurrency` which serves as the lower limit to the number of reading threads. [#16423](https://github.com/ClickHouse/ClickHouse/pull/16423) ([Amos Bird](https://github.com/amosbird)).
* Add missing support for `LowCardinality` in `Avro` format. [#16521](https://github.com/ClickHouse/ClickHouse/pull/16521) ([Mike](https://github.com/myrrc)).
* Workaround for use `S3` with nginx server as proxy. Nginx currenty does not accept urls with empty path like `http://domain.com?delete`, but vanilla aws-sdk-cpp produces this kind of urls. This commit uses patched aws-sdk-cpp version, which makes urls with "/" as path in this cases, like `http://domain.com/?delete`. [#16814](https://github.com/ClickHouse/ClickHouse/pull/16814) ([ianton-ru](https://github.com/ianton-ru)).
* Better diagnostics on parse errors in input data. Provide row number on `Cannot read all data` errors. [#16644](https://github.com/ClickHouse/ClickHouse/pull/16644) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Make the behaviour of `minMap` and `maxMap` more desireable. It will not skip zero values in the result. Fixes [#16087](https://github.com/ClickHouse/ClickHouse/issues/16087). [#16631](https://github.com/ClickHouse/ClickHouse/pull/16631) ([Ildus Kurbangaliev](https://github.com/ildus)).
* Better update of ZooKeeper configuration in runtime. [#16630](https://github.com/ClickHouse/ClickHouse/pull/16630) ([sundyli](https://github.com/sundy-li)).
* Apply SETTINGS clause as early as possible. It allows to modify more settings in the query. This closes [#3178](https://github.com/ClickHouse/ClickHouse/issues/3178). [#16619](https://github.com/ClickHouse/ClickHouse/pull/16619) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Now `event_time_microseconds` field stores in Decimal64, not UInt64. [#16617](https://github.com/ClickHouse/ClickHouse/pull/16617) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
* Now paratmeterized functions can be used in `APPLY` column transformer. [#16589](https://github.com/ClickHouse/ClickHouse/pull/16589) ([Amos Bird](https://github.com/amosbird)).
* Improve scheduling of background task which removes data of dropped tables in `Atomic` databases. `Atomic` databases do not create broken symlink to table data directory if table actually has no data directory. [#16584](https://github.com/ClickHouse/ClickHouse/pull/16584) ([tavplubix](https://github.com/tavplubix)).
* Subqueries in `WITH` section (CTE) can reference previous subqueries in `WITH` section by their name. [#16575](https://github.com/ClickHouse/ClickHouse/pull/16575) ([Amos Bird](https://github.com/amosbird)).
* Add current_database into `system.query_thread_log`. [#16558](https://github.com/ClickHouse/ClickHouse/pull/16558) ([Azat Khuzhin](https://github.com/azat)).
* Allow to fetch parts that are already committed or outdated in the current instance into the detached directory. It's useful when migrating tables from another cluster and having N to 1 shards mapping. It's also consistent with the current fetchPartition implementation. [#16538](https://github.com/ClickHouse/ClickHouse/pull/16538) ([Amos Bird](https://github.com/amosbird)).
* Multiple improvements for `RabbitMQ`: Fixed bug for [#16263](https://github.com/ClickHouse/ClickHouse/issues/16263). Also minimized event loop lifetime. Added more efficient queues setup. [#16426](https://github.com/ClickHouse/ClickHouse/pull/16426) ([Kseniia Sumarokova](https://github.com/kssenii)).
* Fix debug assertion in `quantileDeterministic` function. In previous version it may also transfer up to two times more data over the network. Although no bug existed. This fixes [#15683](https://github.com/ClickHouse/ClickHouse/issues/15683). [#16410](https://github.com/ClickHouse/ClickHouse/pull/16410) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Add `TablesToDropQueueSize` metric. It's equal to number of dropped tables, that are waiting for background data removal. [#16364](https://github.com/ClickHouse/ClickHouse/pull/16364) ([tavplubix](https://github.com/tavplubix)).
* Better diagnostics when client has dropped connection. In previous versions, `Attempt to read after EOF` and `Broken pipe` exceptions were logged in server. In new version, it's information message `Client has dropped the connection, cancel the query.`. [#16329](https://github.com/ClickHouse/ClickHouse/pull/16329) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Add total_rows/total_bytes (from system.tables) support for Set/Join table engines. [#16306](https://github.com/ClickHouse/ClickHouse/pull/16306) ([Azat Khuzhin](https://github.com/azat)).
* Now it's possible to specify `PRIMARY KEY` without `ORDER BY` for MergeTree table engines family. Closes [#15591](https://github.com/ClickHouse/ClickHouse/issues/15591). [#16284](https://github.com/ClickHouse/ClickHouse/pull/16284) ([alesapin](https://github.com/alesapin)).
* If there is no tmp folder in the system (chroot, misconfigutation etc) `clickhouse-local` will create temporary subfolder in the current directory. [#16280](https://github.com/ClickHouse/ClickHouse/pull/16280) ([filimonov](https://github.com/filimonov)).
* Add support for nested data types (like named tuple) as sub-types. Fixes [#15587](https://github.com/ClickHouse/ClickHouse/issues/15587). [#16262](https://github.com/ClickHouse/ClickHouse/pull/16262) ([Ivan](https://github.com/abyss7)).
* Support for `database_atomic_wait_for_drop_and_detach_synchronously`/`NO DELAY`/`SYNC` for `DROP DATABASE`. [#16127](https://github.com/ClickHouse/ClickHouse/pull/16127) ([Azat Khuzhin](https://github.com/azat)).
* Add `allow_nondeterministic_optimize_skip_unused_shards` (to allow non deterministic like `rand()` or `dictGet()` in sharding key). [#16105](https://github.com/ClickHouse/ClickHouse/pull/16105) ([Azat Khuzhin](https://github.com/azat)).
* Fix `memory_profiler_step`/`max_untracked_memory` for queries via HTTP (test included). Fix the issue that adjusting this value globally in xml config does not help either, since those settings are not applied anyway, only default (4MB) value is [used](https://github.com/ClickHouse/ClickHouse/blob/17731245336d8c84f75e4c0894c5797ed7732190/src/Common/ThreadStatus.h#L104). Fix `query_id` for the most root ThreadStatus of the http query (by initializing QueryScope after reading query_id). [#16101](https://github.com/ClickHouse/ClickHouse/pull/16101) ([Azat Khuzhin](https://github.com/azat)).
* Now it's allowed to execute `ALTER ... ON CLUSTER` queries regardless of the `<internal_replication>` setting in cluster config. [#16075](https://github.com/ClickHouse/ClickHouse/pull/16075) ([alesapin](https://github.com/alesapin)).
* Fix rare issue when `clickhouse-client` may abort on exit due to loading of suggestions. This fixes [#16035](https://github.com/ClickHouse/ClickHouse/issues/16035). [#16047](https://github.com/ClickHouse/ClickHouse/pull/16047) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Add support of `cache` layout for `Redis` dictionaries with complex key. [#15985](https://github.com/ClickHouse/ClickHouse/pull/15985) ([Anton Popov](https://github.com/CurtizJ)).
* Fix query hang (endless loop) in case of misconfiguration (`connections_with_failover_max_tries` set to 0). [#15876](https://github.com/ClickHouse/ClickHouse/pull/15876) ([Azat Khuzhin](https://github.com/azat)).
* Change level of some log messages from information to debug, so information messages will not appear for every query. This closes [#5293](https://github.com/ClickHouse/ClickHouse/issues/5293). [#15816](https://github.com/ClickHouse/ClickHouse/pull/15816) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Remove `MemoryTrackingInBackground*` metrics to avoid potentially misleading results. This fixes [#15684](https://github.com/ClickHouse/ClickHouse/issues/15684). [#15813](https://github.com/ClickHouse/ClickHouse/pull/15813) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Add reconnects to `zookeeper-dump-tree` tool. [#15711](https://github.com/ClickHouse/ClickHouse/pull/15711) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Allow explicitly specify columns list in `CREATE TABLE table AS table_function(...)` query. Fixes [#9249](https://github.com/ClickHouse/ClickHouse/issues/9249) Fixes [#14214](https://github.com/ClickHouse/ClickHouse/issues/14214). [#14295](https://github.com/ClickHouse/ClickHouse/pull/14295) ([tavplubix](https://github.com/tavplubix)).
#### Performance Improvement
* Do not merge parts across partitions in SELECT FINAL. [#15938](https://github.com/ClickHouse/ClickHouse/pull/15938) ([Kruglov Pavel](https://github.com/Avogar)).
* Improve performance of `-OrNull` and `-OrDefault` aggregate functions. [#16661](https://github.com/ClickHouse/ClickHouse/pull/16661) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Improve performance of `quantileMerge`. In previous versions it was obnoxiously slow. This closes [#1463](https://github.com/ClickHouse/ClickHouse/issues/1463). [#16643](https://github.com/ClickHouse/ClickHouse/pull/16643) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Improve performance of logical functions a little. [#16347](https://github.com/ClickHouse/ClickHouse/pull/16347) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Improved performance of merges assignment in MergeTree table engines. Shouldn't be visible for the user. [#16191](https://github.com/ClickHouse/ClickHouse/pull/16191) ([alesapin](https://github.com/alesapin)).
* Speedup hashed/sparse_hashed dictionary loading by preallocating the hash table. [#15454](https://github.com/ClickHouse/ClickHouse/pull/15454) ([Azat Khuzhin](https://github.com/azat)).
* Now trivial count optimization becomes slightly non-trivial. Predicates that contain exact partition expr can be optimized too. This also fixes [#11092](https://github.com/ClickHouse/ClickHouse/issues/11092) which returns wrong count when `max_parallel_replicas > 1`. [#15074](https://github.com/ClickHouse/ClickHouse/pull/15074) ([Amos Bird](https://github.com/amosbird)).
#### Build/Testing/Packaging Improvement
* Add flaky check for stateless tests. It will detect potentially flaky functional tests in advance, before they are merged. [#16238](https://github.com/ClickHouse/ClickHouse/pull/16238) ([alesapin](https://github.com/alesapin)).
* Use proper version for `croaring` instead of amalgamation. [#16285](https://github.com/ClickHouse/ClickHouse/pull/16285) ([sundyli](https://github.com/sundy-li)).
* Improve generation of build files for `ya.make` build system (Arcadia). [#16700](https://github.com/ClickHouse/ClickHouse/pull/16700) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Add MySQL BinLog file check tool for `MaterializeMySQL` database engine. `MaterializeMySQL` is an experimental feature. [#16223](https://github.com/ClickHouse/ClickHouse/pull/16223) ([Winter Zhang](https://github.com/zhang2014)).
* Check for executable bit on non-executable files. People often accidentially commit executable files from Windows. [#15843](https://github.com/ClickHouse/ClickHouse/pull/15843) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Check for `#pragma once` in headers. [#15818](https://github.com/ClickHouse/ClickHouse/pull/15818) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Fix illegal code style `&vector[idx]` in libhdfs3. This fixes libcxx debug build. See also https://github.com/ClickHouse-Extras/libhdfs3/pull/8 . [#15815](https://github.com/ClickHouse/ClickHouse/pull/15815) ([Amos Bird](https://github.com/amosbird)).
* Fix build of one miscellaneous example tool on Mac OS. Note that we don't build examples on Mac OS in our CI (we build only ClickHouse binary), so there is zero chance it will not break again. This fixes [#15804](https://github.com/ClickHouse/ClickHouse/issues/15804). [#15808](https://github.com/ClickHouse/ClickHouse/pull/15808) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Simplify Sys/V init script. [#14135](https://github.com/ClickHouse/ClickHouse/pull/14135) ([alexey-milovidov](https://github.com/alexey-milovidov)).
* Added `boost::program_options` to `db_generator` in order to increase its usability. This closes [#15940](https://github.com/ClickHouse/ClickHouse/issues/15940). [#15973](https://github.com/ClickHouse/ClickHouse/pull/15973) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
## ClickHouse release 20.10
### ClickHouse release v20.10.3.30, 2020-10-28

View File

@ -445,6 +445,7 @@ include (cmake/find/brotli.cmake)
include (cmake/find/protobuf.cmake)
include (cmake/find/grpc.cmake)
include (cmake/find/pdqsort.cmake)
include (cmake/find/miniselect.cmake)
include (cmake/find/hdfs3.cmake) # uses protobuf
include (cmake/find/poco.cmake)
include (cmake/find/curl.cmake)

View File

@ -113,6 +113,12 @@
#include "pcg_extras.hpp"
namespace DB
{
struct PcgSerializer;
struct PcgDeserializer;
}
namespace pcg_detail {
using namespace pcg_extras;
@ -557,6 +563,9 @@ public:
engine<xtype1, itype1,
output_mixin1, output_previous1,
stream_mixin1, multiplier_mixin1>& rng);
friend ::DB::PcgSerializer;
friend ::DB::PcgDeserializer;
};
template <typename CharT, typename Traits,

View File

@ -0,0 +1,2 @@
set(MINISELECT_INCLUDE_DIR ${ClickHouse_SOURCE_DIR}/contrib/miniselect/include)
message(STATUS "Using miniselect: ${MINISELECT_INCLUDE_DIR}")

2
contrib/libunwind vendored

@ -1 +1 @@
Subproject commit 27026ef4a9c6c8cc956d1d131c4d794e24096981
Subproject commit 198458b35f100da32bd3e74c2a3ce8d236db299b

1
contrib/miniselect vendored Submodule

@ -0,0 +1 @@
Subproject commit be0af6bd0b6eb044d1acc4f754b229972d99903a

View File

@ -127,7 +127,7 @@ function clone_submodules
(
cd "$FASTTEST_SOURCE"
SUBMODULES_TO_UPDATE=(contrib/boost contrib/zlib-ng contrib/libxml2 contrib/poco contrib/libunwind contrib/ryu contrib/fmtlib contrib/base64 contrib/cctz contrib/libcpuid contrib/double-conversion contrib/libcxx contrib/libcxxabi contrib/libc-headers contrib/lz4 contrib/zstd contrib/fastops contrib/rapidjson contrib/re2 contrib/sparsehash-c11 contrib/croaring)
SUBMODULES_TO_UPDATE=(contrib/boost contrib/zlib-ng contrib/libxml2 contrib/poco contrib/libunwind contrib/ryu contrib/fmtlib contrib/base64 contrib/cctz contrib/libcpuid contrib/double-conversion contrib/libcxx contrib/libcxxabi contrib/libc-headers contrib/lz4 contrib/zstd contrib/fastops contrib/rapidjson contrib/re2 contrib/sparsehash-c11 contrib/croaring contrib/miniselect)
git submodule sync
git submodule update --init --recursive "${SUBMODULES_TO_UPDATE[@]}"

View File

@ -17,13 +17,6 @@ def get_skip_list_cmd(path):
return ''
def run_perf_test(cmd, xmls_path, output_folder):
output_path = os.path.join(output_folder, "perf_stress_run.txt")
f = open(output_path, 'w')
p = Popen("{} --skip-tags=long --recursive --input-files {}".format(cmd, xmls_path), shell=True, stdout=f, stderr=f)
return p
def get_options(i):
options = ""
if 0 < i:
@ -75,8 +68,6 @@ if __name__ == "__main__":
args = parser.parse_args()
func_pipes = []
perf_process = None
perf_process = run_perf_test(args.perf_test_cmd, args.perf_test_xml_path, args.output_folder)
func_pipes = run_func_test(args.test_cmd, args.output_folder, args.num_parallel, args.skip_func_tests, args.global_time_limit)
logging.info("Will wait functests to finish")

View File

@ -0,0 +1,141 @@
# How to add test queries to ClickHouse CI
ClickHouse has hundreds (or even thousands) of features. Every commit get checked by a complex set of tests containing many thousands of test cases.
The core functionality is very well tested, but some corner-cases and different combinations of features can be uncovered with ClickHouse CI.
Most of the bugs/regressions we see happen in that 'grey area' where test coverage is poor.
And we are very interested in covering most of the possible scenarios and feature combinations used in real life by tests.
## Why adding tests
Why/when you should add a test case into ClickHouse code:
1) you use some complicated scenarios / feature combinations / you have some corner case which is probably not widely used
2) you see that certain behavior gets changed between version w/o notifications in the changelog
3) you just want to help to improve ClickHouse quality and ensure the features you use will not be broken in the future releases
4) once the test is added/accepted, you can be sure the corner case you check will never be accidentally broken.
5) you will be a part of great open-source community
6) your name will be visible in the `system.contributors` table!
7) you will make a world bit better :)
### Steps to do
#### Prerequisite
I assume you run some Linux machine (you can use docker / virtual machines on other OS) and any modern browser / internet connection, and you have some basic Linux & SQL skills.
Any highly specialized knowledge is not needed (so you don't need to know C++ or know something about how ClickHouse CI works).
#### Preparation
1) [create GitHub account](https://github.com/join) (if you haven't one yet)
2) [setup git](https://docs.github.com/en/free-pro-team@latest/github/getting-started-with-github/set-up-git)
```bash
# for Ubuntu
sudo apt-get update
sudo apt-get install git
git config --global user.name "John Doe" # fill with your name
git config --global user.email "email@example.com" # fill with your email
```
3) [fork ClickHouse project](https://docs.github.com/en/free-pro-team@latest/github/getting-started-with-github/fork-a-repo) - just open [https://github.com/ClickHouse/ClickHouse](https://github.com/ClickHouse/ClickHouse) and press fork button in the top right corner:
![fork repo](https://github-images.s3.amazonaws.com/help/bootcamp/Bootcamp-Fork.png)
4) clone your fork to some folder on your PC, for example, `~/workspace/ClickHouse`
```
mkdir ~/workspace && cd ~/workspace
git clone https://github.com/< your GitHub username>/ClickHouse
cd ClickHouse
git remote add upstream https://github.com/ClickHouse/ClickHouse
```
#### New branch for the test
1) create a new branch from the latest clickhouse master
```
cd ~/workspace/ClickHouse
git fetch upstream
git checkout -b name_for_a_branch_with_my_test upstream/master
```
#### Install & run clickhouse
1) install `clickhouse-server` (follow [official docs](https://clickhouse.tech/docs/en/getting-started/install/))
2) install test configurations (it will use Zookeeper mock implementation and adjust some settings)
```
cd ~/workspace/ClickHouse/tests/config
sudo ./install.sh
```
3) run clickhouse-server
```
sudo systemctl restart clickhouse-server
```
#### Creating the test file
1) find the number for your test - find the file with the biggest number in `tests/queries/0_stateless/`
```sh
$ cd ~/workspace/ClickHouse
$ ls tests/queries/0_stateless/[0-9]*.reference | tail -n 1
tests/queries/0_stateless/01520_client_print_query_id.reference
```
Currently, the last number for the test is `01520`, so my test will have the number `01521`
2) create an SQL file with the next number and name of the feature you test
```sh
touch tests/queries/0_stateless/01521_dummy_test.sql
```
3) edit SQL file with your favorite editor (see hint of creating tests below)
```sh
vim tests/queries/0_stateless/01521_dummy_test.sql
```
4) run the test, and put the result of that into the reference file:
```
clickhouse-client -nmT < tests/queries/0_stateless/01521_dummy_test.sql | tee tests/queries/0_stateless/01521_dummy_test.reference
```
5) ensure everything is correct, if the test output is incorrect (due to some bug for example), adjust the reference file using text editor.
#### How create good test
- test should be
- minimal - create only tables related to tested functionality, remove unrelated columns and parts of query
- fast - should not take longer than few seconds (better subseconds)
- correct - fails then feature is not working
- deteministic
- isolated / stateless
- don't rely on some environment things
- don't rely on timing when possible
- try to cover corner cases (zeros / Nulls / empty sets / throwing exceptions)
- to test that query return errors, you can put special comment after the query: `-- { serverError 60 }` or `-- { clientError 20 }`
- don't switch databases (unless necessary)
- you can create several table replicas on the same node if needed
- you can use one of the test cluster definitions when needed (see system.clusters)
- use `number` / `numbers_mt` / `zeros` / `zeros_mt` and similar for queries / to initialize data when appliable
- clean up the created objects after test and before the test (DROP IF EXISTS) - in case of some dirty state
- prefer sync mode of operations (mutations, merges, etc.)
- use other SQL files in the `0_stateless` folder as an example
- ensure the feature / feature combination you want to tests is not covered yet with existsing tests
#### Commit / push / create PR.
1) commit & push your changes
```sh
cd ~/workspace/ClickHouse
git add tests/queries/0_stateless/01521_dummy_test.sql
git add tests/queries/0_stateless/01521_dummy_test.reference
git commit # use some nice commit message when possible
git push origin HEAD
```
2) use a link which was shown during the push, to create a PR into the main repo
3) adjust the PR title and contents, in `Changelog category (leave one)` keep
`Build/Testing/Packaging Improvement`, fill the rest of the fields if you want.

View File

@ -384,7 +384,7 @@ Possible values:
- `'basic'` — Use basic parser.
ClickHouse can parse only the basic `YYYY-MM-DD HH:MM:SS` format. For example, `'2019-08-20 10:18:56'`.
ClickHouse can parse only the basic `YYYY-MM-DD HH:MM:SS` or `YYYY-MM-DD` format. For example, `'2019-08-20 10:18:56'` or `2019-08-20`.
Default value: `'basic'`.

View File

@ -3,10 +3,45 @@ toc_priority: 47
toc_title: Date
---
# Date {#date}
# Date {#data_type-date}
A date. Stored in two bytes as the number of days since 1970-01-01 (unsigned). Allows storing values from just after the beginning of the Unix Epoch to the upper threshold defined by a constant at the compilation stage (currently, this is until the year 2106, but the final fully-supported year is 2105).
The date value is stored without the time zone.
## Examples {#examples}
**1.** Creating a table with a `DateTime`-type column and inserting data into it:
``` sql
CREATE TABLE dt
(
`timestamp` Date,
`event_id` UInt8
)
ENGINE = TinyLog;
```
``` sql
INSERT INTO dt Values (1546300800, 1), ('2019-01-01', 2);
```
``` sql
SELECT * FROM dt;
```
``` text
┌──timestamp─┬─event_id─┐
│ 2019-01-01 │ 1 │
│ 2019-01-01 │ 2 │
└────────────┴──────────┘
```
## See Also {#see-also}
- [Functions for working with dates and times](../../sql-reference/functions/date-time-functions.md)
- [Operators for working with dates and times](../../sql-reference/operators/index.md#operators-datetime)
- [`DateTime` data type](../../sql-reference/data-types/datetime.md)
[Original article](https://clickhouse.tech/docs/en/data_types/date/) <!--hide-->

View File

@ -5,7 +5,7 @@ toc_title: null function
# null {#null-function}
Accepts an inserted data of the specified structure and immediately drops it away. The function is used for convenience writing tests and demonstrations.
Creates a temporary table of the specified structure with the [Null](../../engines/table-engines/special/null.md) table engine. According to the `Null`-engine properties, the table data is ignored and the table itself is immediately droped right after the query execution. The function is used for the convenience of test writing and demonstrations.
**Syntax**
@ -19,7 +19,7 @@ null('structure')
**Returned value**
A table with the specified structure, which is dropped right after the query execution.
A temporary `Null`-engine table with the specified structure.
**Example**
@ -36,6 +36,8 @@ INSERT INTO t SELECT * FROM numbers_mt(1000000000);
DROP TABLE IF EXISTS t;
```
See also: format **Null**.
See also:
- [Null table engine](../../engines/table-engines/special/null.md)
[Original article](https://clickhouse.tech/docs/en/sql-reference/table-functions/null/) <!--hide-->

View File

@ -0,0 +1,43 @@
---
toc_priority: 53
toc_title: null функция
---
# null {#null-function}
Создает временную таблицу указанной структуры с движком [Null](../../engines/table-engines/special/null.md). В соответствии со свойствами движка, данные в таблице игнорируются, а сама таблица удаляется сразу после выполнения запроса. Функция используется для удобства написания тестов и демонстрационных примеров.
**Синтаксис**
``` sql
null('structure')
```
**Параметр**
- `structure` — список колонок и их типов. [String](../../sql-reference/data-types/string.md).
**Возвращаемое значение**
Временная таблица указанной структуры с движком `Null`.
**Пример**
Один запрос с функцией `null`:
``` sql
INSERT INTO function null('x UInt64') SELECT * FROM numbers_mt(1000000000);
```
заменяет три запроса:
```sql
CREATE TABLE t (x UInt64) ENGINE = Null;
INSERT INTO t SELECT * FROM numbers_mt(1000000000);
DROP TABLE IF EXISTS t;
```
См. также:
- [Движок таблиц Null](../../engines/table-engines/special/null.md)
[Original article](https://clickhouse.tech/docs/en/sql-reference/table-functions/null/) <!--hide-->

View File

@ -46,6 +46,6 @@ toc_priority: 29
`Log` 引擎为表中的每一列使用不同的文件。`StripeLog` 将所有的数据存储在一个文件中。因此 `StripeLog` 引擎在操作系统中使用更少的描述符,但是 `Log` 引擎提供更高的读性能。
`TingLog` 引擎是该系列中最简单的引擎并且提供了最少的功能和最低的性能。`TingLog` 引擎不支持并行读取和并发数据访问,并将每一列存储在不同的文件中。它比其余两种支持并行读取的引擎的读取速度更慢,并且使用了和 `Log` 引擎同样多的描述符。你可以在简单的低负载的情景下使用它。
`TinyLog` 引擎是该系列中最简单的引擎并且提供了最少的功能和最低的性能。`TinyLog` 引擎不支持并行读取和并发数据访问,并将每一列存储在不同的文件中。它比其余两种支持并行读取的引擎的读取速度更慢,并且使用了和 `Log` 引擎同样多的描述符。你可以在简单的低负载的情景下使用它。
[来源文章](https://clickhouse.tech/docs/en/operations/table_engines/log_family/) <!--hide-->

View File

@ -1,5 +1,5 @@
# 日志 {#log}
# Log {#log}
日志与 TinyLog 的不同之处在于,«标记» 的小文件与列文件存在一起。这些标记写在每个数据块上,并且包含偏移量,这些偏移量指示从哪里开始读取文件以便跳过指定的行数。这使得可以在多个线程中读取表数据。对于并发数据访问,可以同时执行读取操作,而写入操作则阻塞读取和其它写入。Log 引擎不支持索引。同样,如果写入表失败,则该表将被破坏,并且从该表读取将返回错误。Log 引擎适用于临时数据write-once 表以及测试或演示目的。
`Log``TinyLog` 的不同之处在于,«标记» 的小文件与列文件存在一起。这些标记写在每个数据块上,并且包含偏移量,这些偏移量指示从哪里开始读取文件以便跳过指定的行数。这使得可以在多个线程中读取表数据。对于并发数据访问,可以同时执行读取操作,而写入操作则阻塞读取和其它写入。`Log`引擎不支持索引。同样,如果写入表失败,则该表将被破坏,并且从该表读取将返回错误。`Log`引擎适用于临时数据write-once 表以及测试或演示目的。
[原始文章](https://clickhouse.tech/docs/zh/operations/table_engines/log/) <!--hide-->

View File

@ -11,9 +11,9 @@
不要禁用超线程。 它有助于某些查询,但不适用于其他查询。
## 涡轮增压 {#turbo-boost}
## 超频 {#turbo-boost}
强烈推荐涡轮增压。 它显着提高了典型负载的性能。
强烈推荐超频(turbo-boost)。 它显着提高了典型负载的性能。
您可以使用 `turbostat` 要查看负载下的CPU的实际时钟速率。
## CPU缩放调控器 {#cpu-scaling-governor}
@ -39,18 +39,18 @@ echo 'performance' | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_gover
始终禁用交换文件。 不这样做的唯一原因是如果您使用的ClickHouse在您的个人笔记本电脑。
## 巨大的页面 {#huge-pages}
## 大页(Huge Pages) {#huge-pages}
始终禁用透明巨大的页面。 它会干扰内存分alloc从而导致显着的性能下降。
始终禁用透明大页(transparent huge pages)。 它会干扰内存分alloc从而导致显着的性能下降。
``` bash
echo 'never' | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
```
使用 `perf top`内核中用于内存管理的时间。
永久巨大的页面也不需要被分配。
使用 `perf top`内核中用于内存管理的时间。
永久大页(permanent huge pages)也不需要被分配。
## 存储系统 {#storage-subsystem}
## 存储系统 {#storage-subsystem}
如果您的预算允许您使用SSD请使用SSD。
如果没有,请使用硬盘。 SATA硬盘7200转就行了。
@ -100,27 +100,27 @@ XFS也是合适的但它还没有经过ClickHouse的彻底测试。
如果可能的话至少使用一个10GB的网络。 1Gb也可以工作但对于使用数十tb的数据修补副本或处理具有大量中间数据的分布式查询情况会更糟。
## 动物园管理员 {#zookeeper}
## Zookeeper {#zookeeper}
您可能已经将ZooKeeper用于其他目的。 您可以使用相同的zookeeper安装如果它还没有超载。
Its best to use a fresh version of ZooKeeper 3.4.9 or later. The version in stable Linux distributions may be outdated.
最好使用新版本的 Zookeeper 3.4.9 或之后的版本. 稳定 Liunx 发行版中的 Zookeeper 版本可能是落后的。
You should never use manually written scripts to transfer data between different ZooKeeper clusters, because the result will be incorrect for sequential nodes. Never use the «zkcopy» utility for the same reason: https://github.com/ksprojects/zkcopy/issues/15
你永远不该使用自己手写的脚本在不同的 Zookeeper 集群之间转移数据, 这可能会导致序列节点的数据不正确。出于同样的原因,永远不要使用 zkcopy 工具: https://github.com/ksprojects/zkcopy/issues/15
如果要将现有ZooKeeper集群分为两个正确的方法是增加其副本的数量然后将其重新配置为两个独立的集群。
不要在与ClickHouse相同的服务器上运行ZooKeeper。 由于ZooKeeper对延迟非常敏感ClickHouse可能会利用所有可用的系统资源。
不要在与ClickHouse相同的服务器上运行ZooKeeper。 因为ZooKeeper对延迟非常敏感而ClickHouse可能会占用所有可用的系统资源。
使用默认设置ZooKeeper是一个定时炸弹:
默认设置ZooKeeper 就像是一个定时炸弹:
> 使用默认配置时ZooKeeper服务器不会从旧快照和日志中删除文件请参阅autopurge这是操作员的责任。
当使用默认配置时ZooKeeper服务不会从旧快照和日志中删除文件请参阅autopurge这是操作员的责任。
必须拆除炸弹
必须拆除炸弹
下面的ZooKeeper3.5.1)配置在Yandex中使用。梅地卡生产环境截至2017年5月20日:
下面的ZooKeeper3.5.1)配置在 Yandex.Metrica 的生产环境中使用截至2017年5月20日:
动物园cfg:
zoo.cfg:
``` bash
# http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html
@ -222,7 +222,7 @@ JAVA_OPTS="-Xms{{ '{{' }} cluster.get('xms','128M') {{ '}}' }} \
-XX:+CMSParallelRemarkEnabled"
```
盐初始化:
Salt init:
description "zookeeper-{{ '{{' }} cluster['name'] {{ '}}' }} centralized coordination service"

View File

@ -54,6 +54,7 @@
#include <IO/WriteHelpers.h>
#include <IO/Operators.h>
#include <IO/UseSSL.h>
#include <IO/WriteBufferFromOStream.h>
#include <DataStreams/AsynchronousBlockInputStream.h>
#include <DataStreams/AddingDefaultsBlockInputStream.h>
#include <DataStreams/InternalTextLogsRowOutputStream.h>
@ -1158,13 +1159,13 @@ private:
ASTPtr ast_to_process;
try
{
std::stringstream dump_before_fuzz;
WriteBufferFromOwnString dump_before_fuzz;
fuzz_base->dumpTree(dump_before_fuzz);
auto base_before_fuzz = fuzz_base->formatForErrorMessage();
ast_to_process = fuzz_base->clone();
std::stringstream dump_of_cloned_ast;
WriteBufferFromOwnString dump_of_cloned_ast;
ast_to_process->dumpTree(dump_of_cloned_ast);
// Run the original query as well.
@ -1186,7 +1187,9 @@ private:
fprintf(stderr, "dump of cloned ast:\n%s\n",
dump_of_cloned_ast.str().c_str());
fprintf(stderr, "dump after fuzz:\n");
fuzz_base->dumpTree(std::cerr);
WriteBufferFromOStream cerr_buf(std::cerr, 4096);
fuzz_base->dumpTree(cerr_buf);
cerr_buf.next();
fmt::print(stderr, "IAST::clone() is broken for some AST node. This is a bug. The original AST ('dump before fuzz') and its cloned copy ('dump of cloned AST') refer to the same nodes, which must never happen. This means that their parent node doesn't implement clone() correctly.");
@ -1529,7 +1532,9 @@ private:
if (is_interactive)
{
std::cout << std::endl;
formatAST(*res, std::cout);
WriteBufferFromOStream res_buf(std::cout, 4096);
formatAST(*res, res_buf);
res_buf.next();
std::cout << std::endl << std::endl;
}

View File

@ -8,6 +8,7 @@
#include <Core/Types.h>
#include <IO/Operators.h>
#include <IO/UseSSL.h>
#include <IO/WriteBufferFromOStream.h>
#include <Parsers/ASTExpressionList.h>
#include <Parsers/ASTFunction.h>
#include <Parsers/ASTIdentifier.h>
@ -419,7 +420,9 @@ void QueryFuzzer::fuzzMain(ASTPtr & ast)
fuzz(ast);
std::cout << std::endl;
formatAST(*ast, std::cout, false /*highlight*/);
WriteBufferFromOStream ast_buf(std::cout, 4096);
formatAST(*ast, ast_buf, false /*highlight*/);
ast_buf.next();
std::cout << std::endl << std::endl;
}

View File

@ -86,7 +86,7 @@ Suggest::Suggest()
void Suggest::loadImpl(Connection & connection, const ConnectionTimeouts & timeouts, size_t suggestion_limit)
{
std::stringstream query;
std::stringstream query; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
query << "SELECT DISTINCT arrayJoin(extractAll(name, '[\\\\w_]{2,}')) AS res FROM ("
"SELECT name FROM system.functions"
" UNION ALL "

View File

@ -93,7 +93,7 @@ private:
void parse(const String & hint)
{
std::stringstream ss;
std::stringstream ss; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
ss << hint;
String item;

View File

@ -162,7 +162,7 @@ void ClusterCopier::discoverShardPartitions(const ConnectionTimeouts & timeouts,
if (!missing_partitions.empty())
{
std::stringstream ss;
WriteBufferFromOwnString ss;
for (const String & missing_partition : missing_partitions)
ss << " " << missing_partition;

View File

@ -13,7 +13,7 @@ using ConfigurationPtr = Poco::AutoPtr<Poco::Util::AbstractConfiguration>;
ConfigurationPtr getConfigurationFromXMLString(const std::string & xml_data)
{
std::stringstream ss(xml_data);
std::stringstream ss(xml_data); // STYLE_CHECK_ALLOW_STD_STRING_STREAM
Poco::XML::InputSource input_source{ss};
return {new Poco::Util::XMLConfiguration{&input_source}};
}

View File

@ -394,12 +394,8 @@ inline ASTPtr TaskTable::rewriteReplicatedCreateQueryToPlain()
inline String DB::TaskShard::getDescription() const
{
std::stringstream ss;
ss << "N" << numberInCluster()
<< " (having a replica " << getHostNameExample()
<< ", pull table " + getQuotedTable(task_table.table_pull)
<< " of cluster " + task_table.cluster_pull_name << ")";
return ss.str();
return fmt::format("N{} (having a replica {}, pull table {} of cluster {}",
numberInCluster(), getHostNameExample(), getQuotedTable(task_table.table_pull), task_table.cluster_pull_name);
}
inline String DB::TaskShard::getHostNameExample() const

View File

@ -6,6 +6,7 @@
#include <IO/ReadBufferFromFileDescriptor.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteBufferFromFileDescriptor.h>
#include <IO/WriteBufferFromOStream.h>
#include <Parsers/ParserQuery.h>
#include <Parsers/parseQuery.h>
#include <Parsers/formatAST.h>
@ -129,7 +130,9 @@ int mainEntryClickHouseFormat(int argc, char ** argv)
ASTPtr res = parseQueryAndMovePosition(parser, pos, end, "query", multiple, 0, DBMS_DEFAULT_MAX_PARSER_DEPTH);
if (!quiet)
{
formatAST(*res, std::cout, hilite, oneline);
WriteBufferFromOStream res_buf(std::cout, 4096);
formatAST(*res, res_buf, hilite, oneline);
res_buf.next();
if (multiple)
std::cout << "\n;\n";
std::cout << std::endl;

View File

@ -422,7 +422,7 @@ static const char * minimal_default_user_xml =
static ConfigurationPtr getConfigurationFromXMLString(const char * xml_data)
{
std::stringstream ss{std::string{xml_data}};
std::stringstream ss{std::string{xml_data}}; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
Poco::XML::InputSource input_source{ss};
return {new Poco::Util::XMLConfiguration{&input_source}};
}

View File

@ -113,16 +113,16 @@ void ODBCColumnsInfoHandler::handleRequest(Poco::Net::HTTPServerRequest & reques
/// TODO Why not do SQLColumns instead?
std::string name = schema_name.empty() ? backQuoteIfNeed(table_name) : backQuoteIfNeed(schema_name) + "." + backQuoteIfNeed(table_name);
std::stringstream ss;
WriteBufferFromOwnString buf;
std::string input = "SELECT * FROM " + name + " WHERE 1 = 0";
ParserQueryWithOutput parser;
ASTPtr select = parseQuery(parser, input.data(), input.data() + input.size(), "", context_settings.max_query_size, context_settings.max_parser_depth);
IAST::FormatSettings settings(ss, true);
IAST::FormatSettings settings(buf, true);
settings.always_quote_identifiers = true;
settings.identifier_quoting_style = getQuotingStyle(hdbc);
select->format(settings);
std::string query = ss.str();
std::string query = buf.str();
LOG_TRACE(log, "Inferring structure with query '{}'", query);

View File

@ -32,12 +32,12 @@ namespace
for (const auto & column : columns)
query.columns->children.emplace_back(std::make_shared<ASTIdentifier>(column.name));
std::stringstream ss;
IAST::FormatSettings settings(ss, true);
WriteBufferFromOwnString buf;
IAST::FormatSettings settings(buf, true);
settings.always_quote_identifiers = true;
settings.identifier_quoting_style = quoting;
query.IAST::format(settings);
return ss.str();
return buf.str();
}
std::string getQuestionMarks(size_t n)

View File

@ -191,10 +191,10 @@ int Server::run()
if (config().hasOption("help"))
{
Poco::Util::HelpFormatter help_formatter(Server::options());
std::stringstream header;
header << commandName() << " [OPTION] [-- [ARG]...]\n";
header << "positional arguments can be used to rewrite config.xml properties, for example, --http_port=8010";
help_formatter.setHeader(header.str());
auto header_str = fmt::format("{} [OPTION] [-- [ARG]...]\n"
"positional arguments can be used to rewrite config.xml properties, for example, --http_port=8010",
commandName());
help_formatter.setHeader(header_str);
help_formatter.format(std::cout);
return 0;
}
@ -568,6 +568,8 @@ int Server::main(const std::vector<std::string> & /*args*/)
if (config->has("zookeeper"))
global_context->reloadZooKeeperIfChanged(config);
global_context->reloadAuxiliaryZooKeepersConfigIfChanged(config);
global_context->updateStorageConfiguration(*config);
},
/* already_loaded = */ true);

View File

@ -197,11 +197,13 @@ namespace
boost::range::push_back(queries, InterpreterShowGrantsQuery::getAttachGrantQueries(entity));
/// Serialize the list of ATTACH queries to a string.
std::stringstream ss;
ss.exceptions(std::ios::failbit);
WriteBufferFromOwnString buf;
for (const ASTPtr & query : queries)
ss << *query << ";\n";
String file_contents = std::move(ss).str();
{
formatAST(*query, buf, false, true);
buf.write(";\n", 2);
}
String file_contents = buf.str();
/// First we save *.tmp file and then we rename if everything's ok.
auto tmp_file_path = std::filesystem::path{file_path}.replace_extension(".tmp");
@ -353,7 +355,7 @@ String DiskAccessStorage::getStorageParamsJSON() const
json.set("path", directory_path);
if (readonly)
json.set("readonly", readonly.load());
std::ostringstream oss;
std::ostringstream oss; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
oss.exceptions(std::ios::failbit);
Poco::JSON::Stringifier::stringify(json, oss);
return oss.str();

View File

@ -150,7 +150,7 @@ String LDAPAccessStorage::getStorageParamsJSON() const
params_json.set("server", ldap_server);
params_json.set("roles", default_role_names);
std::ostringstream oss;
std::ostringstream oss; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
oss.exceptions(std::ios::failbit);
Poco::JSON::Stringifier::stringify(params_json, oss);

View File

@ -460,7 +460,7 @@ String UsersConfigAccessStorage::getStorageParamsJSON() const
Poco::JSON::Object json;
if (!path.empty())
json.set("path", path);
std::ostringstream oss;
std::ostringstream oss; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
oss.exceptions(std::ios::failbit);
Poco::JSON::Stringifier::stringify(json, oss);
return oss.str();

View File

@ -2,6 +2,9 @@
#include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h>
#include <IO/ReadBufferFromString.h>
#include <IO/WriteBufferFromString.h>
#include <IO/Operators.h>
#include <DataTypes/DataTypeArray.h>
#include <DataTypes/DataTypeString.h>
@ -244,10 +247,9 @@ public:
if constexpr (Trait::sampler == Sampler::RNG)
{
DB::writeIntBinary<size_t>(this->data(place).total_values, buf);
std::ostringstream rng_stream;
rng_stream.exceptions(std::ios::failbit);
rng_stream << this->data(place).rng;
DB::writeStringBinary(rng_stream.str(), buf);
WriteBufferFromOwnString rng_buf;
rng_buf << this->data(place).rng;
DB::writeStringBinary(rng_buf.str(), buf);
}
// TODO
@ -275,9 +277,8 @@ public:
DB::readIntBinary<size_t>(this->data(place).total_values, buf);
std::string rng_string;
DB::readStringBinary(rng_string, buf);
std::istringstream rng_stream(rng_string);
rng_stream.exceptions(std::ios::failbit);
rng_stream >> this->data(place).rng;
ReadBufferFromString rng_buf(rng_string);
rng_buf >> this->data(place).rng;
}
// TODO
@ -565,10 +566,9 @@ public:
if constexpr (Trait::sampler == Sampler::RNG)
{
DB::writeIntBinary<size_t>(data(place).total_values, buf);
std::ostringstream rng_stream;
rng_stream.exceptions(std::ios::failbit);
rng_stream << data(place).rng;
DB::writeStringBinary(rng_stream.str(), buf);
WriteBufferFromOwnString rng_buf;
rng_buf << data(place).rng;
DB::writeStringBinary(rng_buf.str(), buf);
}
// TODO
@ -600,9 +600,8 @@ public:
DB::readIntBinary<size_t>(data(place).total_values, buf);
std::string rng_string;
DB::readStringBinary(rng_string, buf);
std::istringstream rng_stream(rng_string);
rng_stream.exceptions(std::ios::failbit);
rng_stream >> data(place).rng;
ReadBufferFromString rng_buf(rng_string);
rng_buf >> data(place).rng;
}
// TODO

View File

@ -1,7 +1,5 @@
#pragma once
#include <iostream>
#include <sstream>
#include <unordered_set>
#include <Columns/ColumnsNumber.h>
#include <Columns/ColumnArray.h>

View File

@ -1,10 +1,8 @@
#pragma once
#include <bitset>
#include <iostream>
#include <map>
#include <queue>
#include <sstream>
#include <unordered_set>
#include <utility>
#include <Columns/ColumnArray.h>

View File

@ -1,7 +1,5 @@
#pragma once
#include <iostream>
#include <sstream>
#include <unordered_set>
#include <Columns/ColumnsNumber.h>
#include <DataTypes/DataTypeDateTime.h>

View File

@ -8,6 +8,9 @@
#include <Common/NaNUtils.h>
#include <Common/PODArray.h>
#if !defined(ARCADIA_BUILD)
#include <miniselect/floyd_rivest_select.h> // Y_IGNORE
#endif
namespace DB
{
@ -87,7 +90,11 @@ struct QuantileExact : QuantileExactBase<Value, QuantileExact<Value>>
{
size_t n = level < 1 ? level * array.size() : (array.size() - 1);
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_select(array.begin(), array.begin() + n, array.end()); /// NOTE You can think of the radix-select algorithm.
#else
std::nth_element(array.begin(), array.begin() + n, array.end()); /// NOTE You can think of the radix-select algorithm.
#endif
return array[n];
}
@ -107,8 +114,11 @@ struct QuantileExact : QuantileExactBase<Value, QuantileExact<Value>>
size_t n = level < 1 ? level * array.size() : (array.size() - 1);
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_select(array.begin() + prev_n, array.begin() + n, array.end());
#else
std::nth_element(array.begin() + prev_n, array.begin() + n, array.end());
#endif
result[indices[i]] = array[n];
prev_n = n;
}
@ -144,7 +154,11 @@ struct QuantileExactExclusive : public QuantileExact<Value>
else if (n < 1)
return static_cast<Float64>(array[0]);
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_select(array.begin(), array.begin() + n - 1, array.end());
#else
std::nth_element(array.begin(), array.begin() + n - 1, array.end());
#endif
auto nth_element = std::min_element(array.begin() + n, array.end());
return static_cast<Float64>(array[n - 1]) + (h - n) * static_cast<Float64>(*nth_element - array[n - 1]);
@ -173,7 +187,11 @@ struct QuantileExactExclusive : public QuantileExact<Value>
result[indices[i]] = static_cast<Float64>(array[0]);
else
{
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_select(array.begin() + prev_n, array.begin() + n - 1, array.end());
#else
std::nth_element(array.begin() + prev_n, array.begin() + n - 1, array.end());
#endif
auto nth_element = std::min_element(array.begin() + n, array.end());
result[indices[i]] = static_cast<Float64>(array[n - 1]) + (h - n) * static_cast<Float64>(*nth_element - array[n - 1]);
@ -208,8 +226,11 @@ struct QuantileExactInclusive : public QuantileExact<Value>
return static_cast<Float64>(array[array.size() - 1]);
else if (n < 1)
return static_cast<Float64>(array[0]);
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_select(array.begin(), array.begin() + n - 1, array.end());
#else
std::nth_element(array.begin(), array.begin() + n - 1, array.end());
#endif
auto nth_element = std::min_element(array.begin() + n, array.end());
return static_cast<Float64>(array[n - 1]) + (h - n) * static_cast<Float64>(*nth_element - array[n - 1]);
@ -236,7 +257,11 @@ struct QuantileExactInclusive : public QuantileExact<Value>
result[indices[i]] = static_cast<Float64>(array[0]);
else
{
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_select(array.begin() + prev_n, array.begin() + n - 1, array.end());
#else
std::nth_element(array.begin() + prev_n, array.begin() + n - 1, array.end());
#endif
auto nth_element = std::min_element(array.begin() + n, array.end());
result[indices[i]] = static_cast<Float64>(array[n - 1]) + (h - n) * static_cast<Float64>(*nth_element - array[n - 1]);

View File

@ -7,6 +7,9 @@
#include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h>
#if !defined(ARCADIA_BUILD)
#include <miniselect/floyd_rivest_select.h> // Y_IGNORE
#endif
namespace DB
{
@ -179,7 +182,11 @@ namespace detail
/// Sorting an array will not be considered a violation of constancy.
auto & array = elems;
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_select(array.begin(), array.begin() + n, array.end());
#else
std::nth_element(array.begin(), array.begin() + n, array.end());
#endif
quantile = array[n];
}
@ -200,7 +207,11 @@ namespace detail
? level * elems.size()
: (elems.size() - 1);
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_select(array.begin() + prev_n, array.begin() + n, array.end());
#else
std::nth_element(array.begin() + prev_n, array.begin() + n, array.end());
#endif
result[level_index] = array[n];
prev_n = n;

View File

@ -3,11 +3,13 @@
#include <limits>
#include <algorithm>
#include <climits>
#include <sstream>
#include <common/types.h>
#include <IO/ReadBuffer.h>
#include <IO/ReadHelpers.h>
#include <IO/WriteHelpers.h>
#include <IO/ReadBufferFromString.h>
#include <IO/WriteBufferFromString.h>
#include <IO/Operators.h>
#include <Common/PODArray.h>
#include <Common/NaNUtils.h>
#include <Poco/Exception.h>
@ -190,9 +192,8 @@ public:
std::string rng_string;
DB::readStringBinary(rng_string, buf);
std::istringstream rng_stream(rng_string);
rng_stream.exceptions(std::ios::failbit);
rng_stream >> rng;
DB::ReadBufferFromString rng_buf(rng_string);
rng_buf >> rng;
for (size_t i = 0; i < samples.size(); ++i)
DB::readBinary(samples[i], buf);
@ -205,10 +206,9 @@ public:
DB::writeIntBinary<size_t>(sample_count, buf);
DB::writeIntBinary<size_t>(total_values, buf);
std::ostringstream rng_stream;
rng_stream.exceptions(std::ios::failbit);
rng_stream << rng;
DB::writeStringBinary(rng_stream.str(), buf);
DB::WriteBufferFromOwnString rng_buf;
rng_buf << rng;
DB::writeStringBinary(rng_buf.str(), buf);
for (size_t i = 0; i < std::min(sample_count, total_values); ++i)
DB::writeBinary(samples[i], buf);

View File

@ -3,7 +3,6 @@
#include <limits>
#include <algorithm>
#include <climits>
#include <sstream>
#include <AggregateFunctions/ReservoirSampler.h>
#include <common/types.h>
#include <Common/HashTable/Hash.h>

View File

@ -321,6 +321,7 @@ target_include_directories(clickhouse_common_io PUBLIC ${CMAKE_CURRENT_BINARY_DI
dbms_target_include_directories(PUBLIC ${CMAKE_CURRENT_BINARY_DIR}/Core/include)
dbms_target_include_directories(SYSTEM BEFORE PUBLIC ${PDQSORT_INCLUDE_DIR})
dbms_target_include_directories(SYSTEM BEFORE PUBLIC ${MINISELECT_INCLUDE_DIR})
if (ZSTD_LIBRARY)
dbms_target_link_libraries(PRIVATE ${ZSTD_LIBRARY})

View File

@ -1,5 +1,6 @@
#include <Client/MultiplexedConnections.h>
#include <IO/ConnectionTimeouts.h>
#include <IO/Operators.h>
#include <Common/thread_local_rng.h>
@ -222,19 +223,18 @@ std::string MultiplexedConnections::dumpAddresses() const
std::string MultiplexedConnections::dumpAddressesUnlocked() const
{
bool is_first = true;
std::ostringstream os;
os.exceptions(std::ios::failbit);
WriteBufferFromOwnString buf;
for (const ReplicaState & state : replica_states)
{
const Connection * connection = state.connection;
if (connection)
{
os << (is_first ? "" : "; ") << connection->getDescription();
buf << (is_first ? "" : "; ") << connection->getDescription();
is_first = false;
}
}
return os.str();
return buf.str();
}
Packet MultiplexedConnections::receivePacketUnlocked()

View File

@ -20,6 +20,10 @@
#include <Common/WeakHash.h>
#include <Common/HashTable/Hash.h>
#if !defined(ARCADIA_BUILD)
#include <miniselect/floyd_rivest_select.h> // Y_IGNORE
#endif
namespace DB
{
@ -782,7 +786,11 @@ void ColumnArray::getPermutationImpl(size_t limit, Permutation & res, Comparator
auto less = [&cmp](size_t lhs, size_t rhs){ return cmp(lhs, rhs) < 0; };
if (limit)
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_partial_sort(res.begin(), res.begin() + limit, res.end(), less);
#else
std::partial_sort(res.begin(), res.begin() + limit, res.end(), less);
#endif
else
std::sort(res.begin(), res.end(), less);
}
@ -834,8 +842,11 @@ void ColumnArray::updatePermutationImpl(size_t limit, Permutation & res, EqualRa
return;
/// Since then we are working inside the interval.
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less);
#else
std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less);
#endif
auto new_first = first;
for (auto j = first + 1; j < limit; ++j)
{

View File

@ -8,6 +8,10 @@
#include <common/unaligned.h>
#include <ext/scope_guard.h>
#if !defined(ARCADIA_BUILD)
#include <miniselect/floyd_rivest_select.h> // Y_IGNORE
#endif
#include <IO/WriteHelpers.h>
@ -162,10 +166,10 @@ void ColumnDecimal<T>::updatePermutation(bool reverse, size_t limit, int, IColum
{
const auto& [first, last] = equal_ranges[i];
if (reverse)
std::partial_sort(res.begin() + first, res.begin() + last, res.begin() + last,
std::sort(res.begin() + first, res.begin() + last,
[this](size_t a, size_t b) { return data[a] > data[b]; });
else
std::partial_sort(res.begin() + first, res.begin() + last, res.begin() + last,
std::sort(res.begin() + first, res.begin() + last,
[this](size_t a, size_t b) { return data[a] < data[b]; });
auto new_first = first;
@ -193,12 +197,21 @@ void ColumnDecimal<T>::updatePermutation(bool reverse, size_t limit, int, IColum
/// Since then we are working inside the interval.
if (reverse)
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last,
[this](size_t a, size_t b) { return data[a] > data[b]; });
#else
std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last,
[this](size_t a, size_t b) { return data[a] > data[b]; });
#endif
else
std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last,
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last,
[this](size_t a, size_t b) { return data[a] < data[b]; });
#else
std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last,
[this](size_t a, size_t b) { return data[a] > data[b]; });
#endif
auto new_first = first;
for (auto j = first + 1; j < limit; ++j)
{

View File

@ -7,6 +7,9 @@
#include <Columns/IColumnImpl.h>
#include <Columns/ColumnVectorHelper.h>
#include <Core/Field.h>
#if !defined(ARCADIA_BUILD)
#include <miniselect/floyd_rivest_select.h> // Y_IGNORE
#endif
namespace DB
@ -253,9 +256,17 @@ protected:
sort_end = res.begin() + limit;
if (reverse)
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_partial_sort(res.begin(), sort_end, res.end(), [this](size_t a, size_t b) { return data[a] > data[b]; });
#else
std::partial_sort(res.begin(), sort_end, res.end(), [this](size_t a, size_t b) { return data[a] > data[b]; });
#endif
else
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_partial_sort(res.begin(), sort_end, res.end(), [this](size_t a, size_t b) { return data[a] < data[b]; });
#else
std::partial_sort(res.begin(), sort_end, res.end(), [this](size_t a, size_t b) { return data[a] < data[b]; });
#endif
}
};

View File

@ -10,6 +10,9 @@
#include <Common/HashTable/Hash.h>
#include <ext/scope_guard.h>
#if !defined(ARCADIA_BUILD)
#include <miniselect/floyd_rivest_select.h> // Y_IGNORE
#endif
#include <DataStreams/ColumnGathererStream.h>
@ -157,9 +160,17 @@ void ColumnFixedString::getPermutation(bool reverse, size_t limit, int /*nan_dir
if (limit)
{
if (reverse)
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_partial_sort(res.begin(), res.begin() + limit, res.end(), less<false>(*this));
#else
std::partial_sort(res.begin(), res.begin() + limit, res.end(), less<false>(*this));
#endif
else
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_partial_sort(res.begin(), res.begin() + limit, res.end(), less<true>(*this));
#else
std::partial_sort(res.begin(), res.begin() + limit, res.end(), less<true>(*this));
#endif
}
else
{
@ -217,9 +228,17 @@ void ColumnFixedString::updatePermutation(bool reverse, size_t limit, int, Permu
/// Since then we are working inside the interval.
if (reverse)
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less<false>(*this));
#else
std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less<false>(*this));
#endif
else
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less<true>(*this));
#else
std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less<true>(*this));
#endif
auto new_first = first;
for (auto j = first + 1; j < limit; ++j)

View File

@ -9,6 +9,10 @@
#include <ext/scope_guard.h>
#if !defined(ARCADIA_BUILD)
#include <miniselect/floyd_rivest_select.h> // Y_IGNORE
#endif
namespace DB
{
namespace ErrorCodes
@ -393,7 +397,11 @@ void ColumnLowCardinality::updatePermutationImpl(size_t limit, Permutation & res
/// Since then we are working inside the interval.
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less);
#else
std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less);
#endif
auto new_first = first;
for (auto j = first + 1; j < limit; ++j)

View File

@ -10,6 +10,10 @@
#include <common/unaligned.h>
#include <ext/scope_guard.h>
#if !defined(ARCADIA_BUILD)
#include <miniselect/floyd_rivest_select.h> // Y_IGNORE
#endif
namespace DB
{
@ -313,7 +317,11 @@ void ColumnString::getPermutationImpl(size_t limit, Permutation & res, Comparato
auto less = [&cmp](size_t lhs, size_t rhs){ return cmp(lhs, rhs) < 0; };
if (limit)
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_partial_sort(res.begin(), res.begin() + limit, res.end(), less);
#else
std::partial_sort(res.begin(), res.begin() + limit, res.end(), less);
#endif
else
std::sort(res.begin(), res.end(), less);
}
@ -364,8 +372,11 @@ void ColumnString::updatePermutationImpl(size_t limit, Permutation & res, EqualR
return;
/// Since then we are working inside the interval.
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less);
#else
std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less);
#endif
size_t new_first = first;
for (size_t j = first + 1; j < limit; ++j)

View File

@ -9,6 +9,9 @@
#include <Common/assert_cast.h>
#include <Common/WeakHash.h>
#include <Core/Field.h>
#if !defined(ARCADIA_BUILD)
#include <miniselect/floyd_rivest_select.h> // Y_IGNORE
#endif
namespace DB
@ -352,7 +355,11 @@ void ColumnTuple::getPermutationImpl(size_t limit, Permutation & res, LessOperat
if (limit)
{
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_partial_sort(res.begin(), res.begin() + limit, res.end(), less);
#else
std::partial_sort(res.begin(), res.begin() + limit, res.end(), less);
#endif
}
else
{

View File

@ -17,7 +17,9 @@
#include <ext/bit_cast.h>
#include <ext/scope_guard.h>
#include <pdqsort.h>
#if !defined(ARCADIA_BUILD)
#include <miniselect/floyd_rivest_select.h> // Y_IGNORE
#endif
#ifdef __SSE2__
#include <emmintrin.h>
@ -156,9 +158,17 @@ void ColumnVector<T>::getPermutation(bool reverse, size_t limit, int nan_directi
res[i] = i;
if (reverse)
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_partial_sort(res.begin(), res.begin() + limit, res.end(), greater(*this, nan_direction_hint));
#else
std::partial_sort(res.begin(), res.begin() + limit, res.end(), greater(*this, nan_direction_hint));
#endif
else
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_partial_sort(res.begin(), res.begin() + limit, res.end(), less(*this, nan_direction_hint));
#else
std::partial_sort(res.begin(), res.begin() + limit, res.end(), less(*this, nan_direction_hint));
#endif
}
else
{
@ -254,9 +264,17 @@ void ColumnVector<T>::updatePermutation(bool reverse, size_t limit, int nan_dire
/// Since then, we are working inside the interval.
if (reverse)
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, greater(*this, nan_direction_hint));
#else
std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, greater(*this, nan_direction_hint));
#endif
else
#if !defined(ARCADIA_BUILD)
miniselect::floyd_rivest_partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less(*this, nan_direction_hint));
#else
std::partial_sort(res.begin() + first, res.begin() + limit, res.begin() + last, less(*this, nan_direction_hint));
#endif
size_t new_first = first;
for (size_t j = first + 1; j < limit; ++j)

View File

@ -71,7 +71,7 @@ void checkColumn(
std::unordered_map<UInt32, T> map;
size_t num_collisions = 0;
std::stringstream collisions_str;
std::stringstream collisions_str; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
collisions_str.exceptions(std::ios::failbit);
for (size_t i = 0; i < eq_class.size(); ++i)

View File

@ -5,7 +5,6 @@
#include <cstdlib>
#include <cstring>
#include <algorithm>
#include <sstream>
#include <functional>
#include <filesystem>
#include <Poco/DOM/Text.h>
@ -17,6 +16,8 @@
#include <Common/StringUtils/StringUtils.h>
#include <Common/Exception.h>
#include <common/getResource.h>
#include <IO/WriteBufferFromString.h>
#include <IO/Operators.h>
#define PREPROCESSED_SUFFIX "-preprocessed"
@ -537,8 +538,7 @@ XMLDocumentPtr ConfigProcessor::processConfig(
if (has_zk_includes)
*has_zk_includes = !contributing_zk_paths.empty();
std::stringstream comment;
comment.exceptions(std::ios::failbit);
WriteBufferFromOwnString comment;
comment << " This file was generated automatically.\n";
comment << " Do not edit it: it is likely to be discarded and generated again before it's read next time.\n";
comment << " Files used to generate this file:";

View File

@ -245,8 +245,7 @@ static std::string getExtraExceptionInfo(const std::exception & e)
std::string getCurrentExceptionMessage(bool with_stacktrace, bool check_embedded_stacktrace /*= false*/, bool with_extra_info /*= true*/)
{
std::stringstream stream;
stream.exceptions(std::ios::failbit);
WriteBufferFromOwnString stream;
try
{
@ -365,8 +364,7 @@ void tryLogException(std::exception_ptr e, Poco::Logger * logger, const std::str
std::string getExceptionMessage(const Exception & e, bool with_stacktrace, bool check_embedded_stacktrace)
{
std::stringstream stream;
stream.exceptions(std::ios::failbit);
WriteBufferFromOwnString stream;
try
{

View File

@ -16,13 +16,13 @@ struct HTMLForm : public Poco::Net::HTMLForm
HTMLForm(const Poco::Net::HTTPRequest & request)
{
Poco::URI uri(request.getURI());
std::istringstream istr(uri.getRawQuery());
std::istringstream istr(uri.getRawQuery()); // STYLE_CHECK_ALLOW_STD_STRING_STREAM
readUrl(istr);
}
HTMLForm(const Poco::URI & uri)
{
std::istringstream istr(uri.getRawQuery());
std::istringstream istr(uri.getRawQuery()); // STYLE_CHECK_ALLOW_STD_STRING_STREAM
readUrl(istr);
}

View File

@ -133,17 +133,13 @@ void MemoryTracker::alloc(Int64 size)
BlockerInThread untrack_lock;
ProfileEvents::increment(ProfileEvents::QueryMemoryLimitExceeded);
std::stringstream message;
message.exceptions(std::ios::failbit);
message << "Memory tracker";
if (const auto * description = description_ptr.load(std::memory_order_relaxed))
message << " " << description;
message << ": fault injected. Would use " << formatReadableSizeWithBinarySuffix(will_be)
<< " (attempt to allocate chunk of " << size << " bytes)"
<< ", maximum: " << formatReadableSizeWithBinarySuffix(current_hard_limit);
const auto * description = description_ptr.load(std::memory_order_relaxed);
amount.fetch_sub(size, std::memory_order_relaxed);
throw DB::Exception(message.str(), DB::ErrorCodes::MEMORY_LIMIT_EXCEEDED);
throw DB::Exception(DB::ErrorCodes::MEMORY_LIMIT_EXCEEDED,
"Memory tracker{}{}: fault injected. Would use {} (attempt to allocate chunk of {} bytes), maximum: {}",
description ? " " : "", description ? description : "",
formatReadableSizeWithBinarySuffix(will_be),
size, formatReadableSizeWithBinarySuffix(current_hard_limit));
}
if (unlikely(current_profiler_limit && will_be > current_profiler_limit))
@ -166,17 +162,13 @@ void MemoryTracker::alloc(Int64 size)
BlockerInThread untrack_lock;
ProfileEvents::increment(ProfileEvents::QueryMemoryLimitExceeded);
std::stringstream message;
message.exceptions(std::ios::failbit);
message << "Memory limit";
if (const auto * description = description_ptr.load(std::memory_order_relaxed))
message << " " << description;
message << " exceeded: would use " << formatReadableSizeWithBinarySuffix(will_be)
<< " (attempt to allocate chunk of " << size << " bytes)"
<< ", maximum: " << formatReadableSizeWithBinarySuffix(current_hard_limit);
const auto * description = description_ptr.load(std::memory_order_relaxed);
amount.fetch_sub(size, std::memory_order_relaxed);
throw DB::Exception(message.str(), DB::ErrorCodes::MEMORY_LIMIT_EXCEEDED);
throw DB::Exception(DB::ErrorCodes::MEMORY_LIMIT_EXCEEDED,
"Memory limit{}{} exceeded: would use {} (attempt to allocate chunk of {} bytes), maximum: {}",
description ? " " : "", description ? description : "",
formatReadableSizeWithBinarySuffix(will_be),
size, formatReadableSizeWithBinarySuffix(current_hard_limit));
}
updatePeak(will_be);

View File

@ -8,6 +8,7 @@
#include <common/logger_useful.h>
#include <common/errnoToString.h>
#include <IO/WriteHelpers.h>
#include <IO/Operators.h>
#include <unistd.h>
#include <csignal>
@ -73,8 +74,7 @@ ShellCommand::~ShellCommand()
void ShellCommand::logCommand(const char * filename, char * const argv[])
{
std::stringstream args;
args.exceptions(std::ios::failbit);
WriteBufferFromOwnString args;
for (int i = 0; argv != nullptr && argv[i] != nullptr; ++i)
{
if (i > 0)

View File

@ -23,7 +23,7 @@
std::string signalToErrorMessage(int sig, const siginfo_t & info, const ucontext_t & context)
{
std::stringstream error;
std::stringstream error; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
error.exceptions(std::ios::failbit);
switch (sig)
{
@ -319,7 +319,7 @@ static void toStringEveryLineImpl(
const DB::SymbolIndex & symbol_index = DB::SymbolIndex::instance();
std::unordered_map<std::string, DB::Dwarf> dwarfs;
std::stringstream out;
std::stringstream out; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
out.exceptions(std::ios::failbit);
for (size_t i = offset; i < size; ++i)
@ -359,7 +359,7 @@ static void toStringEveryLineImpl(
out.str({});
}
#else
std::stringstream out;
std::stringstream out; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
out.exceptions(std::ios::failbit);
for (size_t i = offset; i < size; ++i)
@ -375,7 +375,7 @@ static void toStringEveryLineImpl(
static std::string toStringImpl(const StackTrace::FramePointers & frame_pointers, size_t offset, size_t size)
{
std::stringstream out;
std::stringstream out; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
out.exceptions(std::ios::failbit);
toStringEveryLineImpl(frame_pointers, offset, size, [&](const std::string & str) { out << str << '\n'; });
return out.str();

View File

@ -153,7 +153,7 @@ std::pair<bool, std::string> StudentTTest::compareAndReport(size_t confidence_le
double mean_confidence_interval = table_value * t_statistic;
std::stringstream ss;
std::stringstream ss; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
ss.exceptions(std::ios::failbit);
if (mean_difference > mean_confidence_interval && (mean_difference - mean_confidence_interval > 0.0001)) /// difference must be more than 0.0001, to take into account connection latency.

View File

@ -413,7 +413,8 @@ std::vector<size_t> PerfEventsCounters::eventIndicesFromString(const std::string
return result;
}
std::istringstream iss(events_list);
std::istringstream iss(events_list); // STYLE_CHECK_ALLOW_STD_STRING_STREAM
std::string event_name;
while (std::getline(iss, event_name, ','))
{

View File

@ -1,5 +1,3 @@
#include <sstream>
#include <Common/Exception.h>
#include <Common/ThreadProfileEvents.h>
#include <Common/QueryProfiler.h>
@ -79,12 +77,10 @@ void ThreadStatus::assertState(const std::initializer_list<int> & permitted_stat
return;
}
std::stringstream ss;
ss.exceptions(std::ios::failbit);
ss << "Unexpected thread state " << getCurrentState();
if (description)
ss << ": " << description;
throw Exception(ss.str(), ErrorCodes::LOGICAL_ERROR);
throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected thread state {}: {}", getCurrentState(), description);
else
throw Exception(ErrorCodes::LOGICAL_ERROR, "Unexpected thread state {}", getCurrentState());
}
void ThreadStatus::attachInternalTextLogsQueue(const InternalTextLogsQueuePtr & logs_queue,

View File

@ -1,11 +1,11 @@
#pragma once
#include <tuple>
#include <sstream>
#include <iomanip>
#include <city.h>
#include <Core/Types.h>
#include <Common/hex.h>
#ifdef __SSE4_2__
#include <nmmintrin.h>
@ -48,10 +48,9 @@ struct UInt128
String toHexString() const
{
std::ostringstream os;
os.exceptions(std::ios::failbit);
os << std::setw(16) << std::setfill('0') << std::hex << high << low;
return String(os.str());
String res(2 * sizeof(UInt128), 0);
writeHexUIntLowercase(*this, res.data());
return res;
}
bool inline operator== (const UInt128 rhs) const { return tuple() == rhs.tuple(); }

View File

@ -1,6 +1,5 @@
#pragma once
#include <sstream>
#include <IO/ReadHelpers.h>
#include <IO/ReadWriteBufferFromHTTP.h>
#include <Interpreters/Context.h>
@ -307,9 +306,6 @@ struct ODBCBridgeMixin
std::vector<std::string> cmd_args;
path.setFileName("clickhouse-odbc-bridge");
std::stringstream command;
command.exceptions(std::ios::failbit);
#if !CLICKHOUSE_SPLIT_BINARY
cmd_args.push_back("odbc-bridge");
#endif

View File

@ -218,7 +218,7 @@ std::pair<ResponsePtr, Undo> TestKeeperCreateRequest::process(TestKeeper::Contai
auto seq_num = it->second.seq_num;
++it->second.seq_num;
std::stringstream seq_num_str;
std::stringstream seq_num_str; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
seq_num_str.exceptions(std::ios::failbit);
seq_num_str << std::setw(10) << std::setfill('0') << seq_num;

View File

@ -3,7 +3,6 @@
#include <ext/scope_guard.h>
#include <pthread.h>
#include <cstdint>
#include <sstream>
#if defined(__FreeBSD__)
# include <pthread_np.h>
@ -80,13 +79,8 @@ __attribute__((__weak__)) void checkStackSize()
/// It's safe to assume that overflow in multiplying by two cannot occur.
if (stack_size * 2 > max_stack_size)
{
std::stringstream message;
message.exceptions(std::ios::failbit);
message << "Stack size too large"
<< ". Stack address: " << stack_address
<< ", frame address: " << frame_address
<< ", stack size: " << stack_size
<< ", maximum stack size: " << max_stack_size;
throw Exception(message.str(), ErrorCodes::TOO_DEEP_RECURSION);
throw Exception(ErrorCodes::TOO_DEEP_RECURSION,
"Stack size too large. Stack address: {}, frame address: {}, stack size: {}, maximum stack size: {}",
stack_address, frame_address, stack_size, max_stack_size);
}
}

View File

@ -1,6 +1,4 @@
#include <cmath>
#include <sstream>
#include <iomanip>
#include <Common/formatReadable.h>
#include <IO/DoubleConverter.h>

View File

@ -1,4 +1,7 @@
#include <Common/parseGlobs.h>
#include <IO/WriteBufferFromString.h>
#include <IO/ReadBufferFromString.h>
#include <IO/Operators.h>
#include <re2/re2.h>
#include <re2/stringpiece.h>
#include <algorithm>
@ -18,21 +21,21 @@ namespace DB
*/
std::string makeRegexpPatternFromGlobs(const std::string & initial_str_with_globs)
{
std::ostringstream oss_for_escaping;
oss_for_escaping.exceptions(std::ios::failbit);
/// FIXME make it better
WriteBufferFromOwnString buf_for_escaping;
/// Escaping only characters that not used in glob syntax
for (const auto & letter : initial_str_with_globs)
{
if ((letter == '[') || (letter == ']') || (letter == '|') || (letter == '+') || (letter == '-') || (letter == '(') || (letter == ')'))
oss_for_escaping << '\\';
oss_for_escaping << letter;
if ((letter == '[') || (letter == ']') || (letter == '|') || (letter == '+') || (letter == '-') || (letter == '(') || (letter == ')') || (letter == '\\'))
buf_for_escaping << '\\';
buf_for_escaping << letter;
}
std::string escaped_with_globs = oss_for_escaping.str();
std::string escaped_with_globs = buf_for_escaping.str();
static const re2::RE2 enum_or_range(R"({([\d]+\.\.[\d]+|[^{}*,]+,[^{}*]*[^{}*,])})"); /// regexp for {expr1,expr2,expr3} or {M..N}, where M and N - non-negative integers, expr's should be without {}*,
re2::StringPiece input(escaped_with_globs);
re2::StringPiece matched;
std::ostringstream oss_for_replacing;
std::ostringstream oss_for_replacing; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
oss_for_replacing.exceptions(std::ios::failbit);
size_t current_index = 0;
while (RE2::FindAndConsume(&input, enum_or_range, &matched))
@ -45,9 +48,8 @@ std::string makeRegexpPatternFromGlobs(const std::string & initial_str_with_glob
size_t range_begin = 0;
size_t range_end = 0;
char point;
std::istringstream iss_range(buffer);
iss_range.exceptions(std::ios::failbit);
iss_range >> range_begin >> point >> point >> range_end;
ReadBufferFromString buf_range(buffer);
buf_range >> range_begin >> point >> point >> range_end;
bool leading_zeros = buffer[0] == '0';
size_t num_len = std::to_string(range_end).size();
if (leading_zeros)
@ -71,20 +73,19 @@ std::string makeRegexpPatternFromGlobs(const std::string & initial_str_with_glob
}
oss_for_replacing << escaped_with_globs.substr(current_index);
std::string almost_res = oss_for_replacing.str();
std::ostringstream oss_final_processing;
oss_final_processing.exceptions(std::ios::failbit);
WriteBufferFromOwnString buf_final_processing;
for (const auto & letter : almost_res)
{
if ((letter == '?') || (letter == '*'))
{
oss_final_processing << "[^/]"; /// '?' is any symbol except '/'
buf_final_processing << "[^/]"; /// '?' is any symbol except '/'
if (letter == '?')
continue;
}
if ((letter == '.') || (letter == '{') || (letter == '}'))
oss_final_processing << '\\';
oss_final_processing << letter;
buf_final_processing << '\\';
buf_final_processing << letter;
}
return oss_final_processing.str();
return buf_final_processing.str();
}
}

View File

@ -9,7 +9,8 @@ using namespace DB;
TEST(Common, getMultipleValuesFromConfig)
{
std::istringstream xml_isteam(R"END(<?xml version="1.0"?>
std::istringstream // STYLE_CHECK_ALLOW_STD_STRING_STREAM
xml_isteam(R"END(<?xml version="1.0"?>
<yandex>
<first_level>
<second_level>0</second_level>

View File

@ -102,7 +102,8 @@ TEST(Common, SensitiveDataMasker)
EXPECT_EQ(maskerbad.wipeSensitiveData(x), 0);
{
std::istringstream xml_isteam(R"END(<?xml version="1.0"?>
std::istringstream // STYLE_CHECK_ALLOW_STD_STRING_STREAM
xml_isteam(R"END(<?xml version="1.0"?>
<clickhouse>
<query_masking_rules>
<rule>
@ -152,7 +153,8 @@ TEST(Common, SensitiveDataMasker)
try
{
std::istringstream xml_isteam_bad(R"END(<?xml version="1.0"?>
std::istringstream // STYLE_CHECK_ALLOW_STD_STRING_STREAM
xml_isteam_bad(R"END(<?xml version="1.0"?>
<clickhouse>
<query_masking_rules>
<rule>
@ -181,7 +183,8 @@ TEST(Common, SensitiveDataMasker)
try
{
std::istringstream xml_isteam_bad(R"END(<?xml version="1.0"?>
std::istringstream // STYLE_CHECK_ALLOW_STD_STRING_STREAM
xml_isteam_bad(R"END(<?xml version="1.0"?>
<clickhouse>
<query_masking_rules>
<rule><name>test</name></rule>
@ -203,7 +206,8 @@ TEST(Common, SensitiveDataMasker)
try
{
std::istringstream xml_isteam_bad(R"END(<?xml version="1.0"?>
std::istringstream // STYLE_CHECK_ALLOW_STD_STRING_STREAM
xml_isteam_bad(R"END(<?xml version="1.0"?>
<clickhouse>
<query_masking_rules>
<rule><name>test</name><regexp>())(</regexp></rule>

View File

@ -12,6 +12,7 @@
#include <IO/BufferWithOwnMemory.h>
#include <Compression/CompressionInfo.h>
#include <IO/WriteHelpers.h>
#include <IO/Operators.h>
namespace ProfileEvents
@ -42,7 +43,7 @@ static void validateChecksum(char * data, size_t size, const Checksum expected_c
if (expected_checksum == calculated_checksum)
return;
std::stringstream message;
WriteBufferFromOwnString message;
/// TODO mess up of endianness in error message.
message << "Checksum doesn't match: corrupted data."
@ -50,7 +51,16 @@ static void validateChecksum(char * data, size_t size, const Checksum expected_c
+ ". Actual: " + getHexUIntLowercase(calculated_checksum.first) + getHexUIntLowercase(calculated_checksum.second)
+ ". Size of compressed block: " + toString(size);
const char * message_hardware_failure = "This is most likely due to hardware failure. If you receive broken data over network and the error does not repeat every time, this can be caused by bad RAM on network interface controller or bad controller itself or bad RAM on network switches or bad CPU on network switches (look at the logs on related network switches; note that TCP checksums don't help) or bad RAM on host (look at dmesg or kern.log for enormous amount of EDAC errors, ECC-related reports, Machine Check Exceptions, mcelog; note that ECC memory can fail if the number of errors is huge) or bad CPU on host. If you read data from disk, this can be caused by disk bit rott. This exception protects ClickHouse from data corruption due to hardware failures.";
const char * message_hardware_failure = "This is most likely due to hardware failure. "
"If you receive broken data over network and the error does not repeat every time, "
"this can be caused by bad RAM on network interface controller or bad controller itself "
"or bad RAM on network switches or bad CPU on network switches "
"(look at the logs on related network switches; note that TCP checksums don't help) "
"or bad RAM on host (look at dmesg or kern.log for enormous amount of EDAC errors, "
"ECC-related reports, Machine Check Exceptions, mcelog; note that ECC memory can fail "
"if the number of errors is huge) or bad CPU on host. If you read data from disk, "
"this can be caused by disk bit rott. This exception protects ClickHouse "
"from data corruption due to hardware failures.";
auto flip_bit = [](char * buf, size_t pos)
{

View File

@ -51,10 +51,7 @@ int main(int, char **)
if (x != i)
{
std::stringstream s;
s.exceptions(std::ios::failbit);
s << "Failed!, read: " << x << ", expected: " << i;
throw DB::Exception(s.str(), 0);
throw DB::Exception(0, "Failed!, read: {}, expected: {}", x, i);
}
}
stopwatch.stop();

View File

@ -1,5 +1,4 @@
#include <Core/MySQL/IMySQLReadPacket.h>
#include <sstream>
#include <IO/MySQLPacketPayloadReadBuffer.h>
#include <IO/LimitReadBuffer.h>
@ -21,10 +20,9 @@ void IMySQLReadPacket::readPayload(ReadBuffer & in, uint8_t & sequence_id)
readPayloadImpl(payload);
if (!payload.eof())
{
std::stringstream tmp;
tmp.exceptions(std::ios::failbit);
tmp << "Packet payload is not fully read. Stopped after " << payload.count() << " bytes, while " << payload.available() << " bytes are in buffer.";
throw Exception(tmp.str(), ErrorCodes::UNKNOWN_PACKET_FROM_CLIENT);
throw Exception(ErrorCodes::UNKNOWN_PACKET_FROM_CLIENT,
"Packet payload is not fully read. Stopped after {} bytes, while {} bytes are in buffer.",
payload.count(), payload.available());
}
}

View File

@ -1,10 +1,14 @@
#include <Core/MySQL/IMySQLWritePacket.h>
#include <IO/MySQLPacketPayloadWriteBuffer.h>
#include <sstream>
namespace DB
{
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
}
namespace MySQLProtocol
{
@ -15,10 +19,8 @@ void IMySQLWritePacket::writePayload(WriteBuffer & buffer, uint8_t & sequence_id
buf.next();
if (buf.remainingPayloadSize())
{
std::stringstream ss;
ss.exceptions(std::ios::failbit);
ss << "Incomplete payload. Written " << getPayloadSize() - buf.remainingPayloadSize() << " bytes, expected " << getPayloadSize() << " bytes.";
throw Exception(ss.str(), 0);
throw Exception(ErrorCodes::LOGICAL_ERROR, "Incomplete payload. Written {} bytes, expected {} bytes.",
getPayloadSize() - buf.remainingPayloadSize(), getPayloadSize());
}
}

View File

@ -4,6 +4,7 @@
#include <IO/ReadBufferFromString.h>
#include <IO/MySQLBinlogEventReadBuffer.h>
#include <IO/ReadHelpers.h>
#include <IO/Operators.h>
#include <common/DateLUT.h>
#include <Common/FieldVisitors.h>
#include <Core/MySQL/PacketsGeneric.h>
@ -35,15 +36,15 @@ namespace MySQLReplication
payload.readStrict(reinterpret_cast<char *>(&flags), 2);
}
void EventHeader::dump(std::ostream & out) const
void EventHeader::dump(WriteBuffer & out) const
{
out << "\n=== " << to_string(this->type) << " ===" << std::endl;
out << "Timestamp: " << this->timestamp << std::endl;
out << "Event Type: " << this->type << std::endl;
out << "Server ID: " << this->server_id << std::endl;
out << "Event Size: " << this->event_size << std::endl;
out << "Log Pos: " << this->log_pos << std::endl;
out << "Flags: " << this->flags << std::endl;
out << "\n=== " << to_string(this->type) << " ===" << '\n';
out << "Timestamp: " << this->timestamp << '\n';
out << "Event Type: " << to_string(this->type) << '\n';
out << "Server ID: " << this->server_id << '\n';
out << "Event Size: " << this->event_size << '\n';
out << "Log Pos: " << this->log_pos << '\n';
out << "Flags: " << this->flags << '\n';
}
/// https://dev.mysql.com/doc/internals/en/format-description-event.html
@ -60,13 +61,13 @@ namespace MySQLReplication
readStringUntilEOF(event_type_header_length, payload);
}
void FormatDescriptionEvent::dump(std::ostream & out) const
void FormatDescriptionEvent::dump(WriteBuffer & out) const
{
header.dump(out);
out << "Binlog Version: " << this->binlog_version << std::endl;
out << "Server Version: " << this->server_version << std::endl;
out << "Create Timestamp: " << this->create_timestamp << std::endl;
out << "Event Header Len: " << std::to_string(this->event_header_length) << std::endl;
out << "Binlog Version: " << this->binlog_version << '\n';
out << "Server Version: " << this->server_version << '\n';
out << "Create Timestamp: " << this->create_timestamp << '\n';
out << "Event Header Len: " << std::to_string(this->event_header_length) << '\n';
}
/// https://dev.mysql.com/doc/internals/en/rotate-event.html
@ -76,11 +77,11 @@ namespace MySQLReplication
readStringUntilEOF(next_binlog, payload);
}
void RotateEvent::dump(std::ostream & out) const
void RotateEvent::dump(WriteBuffer & out) const
{
header.dump(out);
out << "Position: " << this->position << std::endl;
out << "Next Binlog: " << this->next_binlog << std::endl;
out << "Position: " << this->position << '\n';
out << "Next Binlog: " << this->next_binlog << '\n';
}
/// https://dev.mysql.com/doc/internals/en/query-event.html
@ -116,24 +117,24 @@ namespace MySQLReplication
}
}
void QueryEvent::dump(std::ostream & out) const
void QueryEvent::dump(WriteBuffer & out) const
{
header.dump(out);
out << "Thread ID: " << this->thread_id << std::endl;
out << "Execution Time: " << this->exec_time << std::endl;
out << "Schema Len: " << std::to_string(this->schema_len) << std::endl;
out << "Error Code: " << this->error_code << std::endl;
out << "Status Len: " << this->status_len << std::endl;
out << "Schema: " << this->schema << std::endl;
out << "Query: " << this->query << std::endl;
out << "Thread ID: " << this->thread_id << '\n';
out << "Execution Time: " << this->exec_time << '\n';
out << "Schema Len: " << std::to_string(this->schema_len) << '\n';
out << "Error Code: " << this->error_code << '\n';
out << "Status Len: " << this->status_len << '\n';
out << "Schema: " << this->schema << '\n';
out << "Query: " << this->query << '\n';
}
void XIDEvent::parseImpl(ReadBuffer & payload) { payload.readStrict(reinterpret_cast<char *>(&xid), 8); }
void XIDEvent::dump(std::ostream & out) const
void XIDEvent::dump(WriteBuffer & out) const
{
header.dump(out);
out << "XID: " << this->xid << std::endl;
out << "XID: " << this->xid << '\n';
}
void TableMapEvent::parseImpl(ReadBuffer & payload)
@ -238,21 +239,23 @@ namespace MySQLReplication
}
}
void TableMapEvent::dump(std::ostream & out) const
void TableMapEvent::dump(WriteBuffer & out) const
{
header.dump(out);
out << "Table ID: " << this->table_id << std::endl;
out << "Flags: " << this->flags << std::endl;
out << "Schema Len: " << std::to_string(this->schema_len) << std::endl;
out << "Schema: " << this->schema << std::endl;
out << "Table Len: " << std::to_string(this->table_len) << std::endl;
out << "Table: " << this->table << std::endl;
out << "Column Count: " << this->column_count << std::endl;
out << "Table ID: " << this->table_id << '\n';
out << "Flags: " << this->flags << '\n';
out << "Schema Len: " << std::to_string(this->schema_len) << '\n';
out << "Schema: " << this->schema << '\n';
out << "Table Len: " << std::to_string(this->table_len) << '\n';
out << "Table: " << this->table << '\n';
out << "Column Count: " << this->column_count << '\n';
for (auto i = 0U; i < column_count; i++)
{
out << "Column Type [" << i << "]: " << std::to_string(column_type[i]) << ", Meta: " << column_meta[i] << std::endl;
out << "Column Type [" << i << "]: " << std::to_string(column_type[i]) << ", Meta: " << column_meta[i] << '\n';
}
out << "Null Bitmap: " << this->null_bitmap << std::endl;
String bitmap_str;
boost::to_string(this->null_bitmap, bitmap_str);
out << "Null Bitmap: " << bitmap_str << '\n';
}
void RowsEvent::parseImpl(ReadBuffer & payload)
@ -631,16 +634,16 @@ namespace MySQLReplication
rows.push_back(row);
}
void RowsEvent::dump(std::ostream & out) const
void RowsEvent::dump(WriteBuffer & out) const
{
FieldVisitorToString to_string;
header.dump(out);
out << "Schema: " << this->schema << std::endl;
out << "Table: " << this->table << std::endl;
out << "Schema: " << this->schema << '\n';
out << "Table: " << this->table << '\n';
for (auto i = 0U; i < rows.size(); i++)
{
out << "Row[" << i << "]: " << applyVisitor(to_string, rows[i]) << std::endl;
out << "Row[" << i << "]: " << applyVisitor(to_string, rows[i]) << '\n';
}
}
@ -663,22 +666,22 @@ namespace MySQLReplication
payload.ignoreAll();
}
void GTIDEvent::dump(std::ostream & out) const
void GTIDEvent::dump(WriteBuffer & out) const
{
WriteBufferFromOwnString ws;
writeUUIDText(gtid.uuid, ws);
auto gtid_next = ws.str() + ":" + std::to_string(gtid.seq_no);
header.dump(out);
out << "GTID Next: " << gtid_next << std::endl;
out << "GTID Next: " << gtid_next << '\n';
}
void DryRunEvent::parseImpl(ReadBuffer & payload) { payload.ignoreAll(); }
void DryRunEvent::dump(std::ostream & out) const
void DryRunEvent::dump(WriteBuffer & out) const
{
header.dump(out);
out << "[DryRun Event]" << std::endl;
out << "[DryRun Event]" << '\n';
}
/// Update binlog name/position/gtid based on the event type.
@ -716,12 +719,12 @@ namespace MySQLReplication
gtid_sets.parse(gtid_sets_);
}
void Position::dump(std::ostream & out) const
void Position::dump(WriteBuffer & out) const
{
out << "\n=== Binlog Position ===" << std::endl;
out << "Binlog: " << this->binlog_name << std::endl;
out << "Position: " << this->binlog_pos << std::endl;
out << "GTIDSets: " << this->gtid_sets.toString() << std::endl;
out << "\n=== Binlog Position ===" << '\n';
out << "Binlog: " << this->binlog_name << '\n';
out << "Position: " << this->binlog_pos << '\n';
out << "GTIDSets: " << this->gtid_sets.toString() << '\n';
}
void MySQLFlavor::readPayloadImpl(ReadBuffer & payload)

View File

@ -309,7 +309,7 @@ namespace MySQLReplication
UInt16 flags;
EventHeader() : timestamp(0), server_id(0), event_size(0), log_pos(0), flags(0) { }
void dump(std::ostream & out) const;
void dump(WriteBuffer & out) const;
void parse(ReadBuffer & payload);
};
@ -321,7 +321,7 @@ namespace MySQLReplication
EventBase(EventHeader && header_) : header(std::move(header_)) {}
virtual ~EventBase() = default;
virtual void dump(std::ostream & out) const = 0;
virtual void dump(WriteBuffer & out) const = 0;
virtual void parseEvent(ReadBuffer & payload) { parseImpl(payload); }
virtual MySQLEventType type() const { return MYSQL_UNHANDLED_EVENT; }
@ -344,7 +344,7 @@ namespace MySQLReplication
UInt8 event_header_length;
String event_type_header_length;
void dump(std::ostream & out) const override;
void dump(WriteBuffer & out) const override;
void parseImpl(ReadBuffer & payload) override;
private:
@ -358,7 +358,7 @@ namespace MySQLReplication
String next_binlog;
RotateEvent(EventHeader && header_) : EventBase(std::move(header_)), position(0) {}
void dump(std::ostream & out) const override;
void dump(WriteBuffer & out) const override;
protected:
void parseImpl(ReadBuffer & payload) override;
@ -389,7 +389,7 @@ namespace MySQLReplication
{
}
void dump(std::ostream & out) const override;
void dump(WriteBuffer & out) const override;
MySQLEventType type() const override { return MYSQL_QUERY_EVENT; }
protected:
@ -404,7 +404,7 @@ namespace MySQLReplication
protected:
UInt64 xid;
void dump(std::ostream & out) const override;
void dump(WriteBuffer & out) const override;
void parseImpl(ReadBuffer & payload) override;
};
@ -423,7 +423,7 @@ namespace MySQLReplication
Bitmap null_bitmap;
TableMapEvent(EventHeader && header_) : EventBase(std::move(header_)), table_id(0), flags(0), schema_len(0), table_len(0), column_count(0) {}
void dump(std::ostream & out) const override;
void dump(WriteBuffer & out) const override;
protected:
void parseImpl(ReadBuffer & payload) override;
@ -445,7 +445,7 @@ namespace MySQLReplication
table = table_map->table;
}
void dump(std::ostream & out) const override;
void dump(WriteBuffer & out) const override;
protected:
UInt64 table_id;
@ -489,7 +489,7 @@ namespace MySQLReplication
GTID gtid;
GTIDEvent(EventHeader && header_) : EventBase(std::move(header_)), commit_flag(0) {}
void dump(std::ostream & out) const override;
void dump(WriteBuffer & out) const override;
protected:
void parseImpl(ReadBuffer & payload) override;
@ -499,7 +499,7 @@ namespace MySQLReplication
{
public:
DryRunEvent(EventHeader && header_) : EventBase(std::move(header_)) {}
void dump(std::ostream & out) const override;
void dump(WriteBuffer & out) const override;
protected:
void parseImpl(ReadBuffer & payload) override;
@ -515,7 +515,7 @@ namespace MySQLReplication
Position() : binlog_pos(0) { }
void update(BinlogEventPtr event);
void update(UInt64 binlog_pos_, const String & binlog_name_, const String & gtid_sets_);
void dump(std::ostream & out) const;
void dump(WriteBuffer & out) const;
};
class IFlavor : public MySQLProtocol::IMySQLReadPacket

View File

@ -130,4 +130,6 @@ void Settings::checkNoSettingNamesAtTopLevel(const Poco::Util::AbstractConfigura
}
}
IMPLEMENT_SETTINGS_TRAITS(FormatFactorySettingsTraits, FORMAT_FACTORY_SETTINGS)
}

View File

@ -399,6 +399,7 @@ class IColumn;
\
/** Obsolete settings that do nothing but left for compatibility reasons. Remove each one after half a year of obsolescence. */ \
\
M(UInt64, max_memory_usage_for_all_queries, 0, "Obsolete. Will be removed after 2020-10-20", 0) \
M(UInt64, multiple_joins_rewriter_version, 0, "Obsolete setting, does nothing. Will be removed after 2021-03-31", 0) \
M(Bool, experimental_use_processors, true, "Obsolete setting, does nothing. Will be removed after 2020-11-29.", 0) \
M(Bool, force_optimize_skip_unused_shards_no_nested, false, "Obsolete setting, does nothing. Will be removed after 2020-12-01. Use force_optimize_skip_unused_shards_nesting instead.", 0) \
@ -514,4 +515,13 @@ struct Settings : public BaseSettings<SettingsTraits>
static void checkNoSettingNamesAtTopLevel(const Poco::Util::AbstractConfiguration & config, const String & config_path);
};
/*
* User-specified file format settings for File and ULR engines.
*/
DECLARE_SETTINGS_TRAITS(FormatFactorySettingsTraits, FORMAT_FACTORY_SETTINGS)
struct FormatFactorySettings : public BaseSettings<FormatFactorySettingsTraits>
{
};
}

View File

@ -60,10 +60,7 @@ struct SortColumnDescription
std::string dump() const
{
std::stringstream ss;
ss.exceptions(std::ios::failbit);
ss << column_name << ":" << column_number << ":dir " << direction << "nulls " << nulls_direction;
return ss.str();
return fmt::format("{}:{}:dir {}nulls ", column_name, column_number, direction, nulls_direction);
}
};

View File

@ -7,6 +7,7 @@
#include <Core/MySQL/PacketsProtocolText.h>
#include <IO/ReadBufferFromString.h>
#include <IO/WriteBufferFromString.h>
#include <IO/WriteBufferFromOStream.h>
#include <boost/program_options.hpp>
@ -329,6 +330,8 @@ int main(int argc, char ** argv)
slave.connect();
slave.startBinlogDumpGTID(slave_id, replicate_db, gtid_sets);
WriteBufferFromOStream cerr(std::cerr);
/// Read one binlog event on by one.
while (true)
{
@ -337,40 +340,40 @@ int main(int argc, char ** argv)
{
case MYSQL_QUERY_EVENT: {
auto binlog_event = std::static_pointer_cast<QueryEvent>(event);
binlog_event->dump(std::cerr);
binlog_event->dump(cerr);
Position pos = slave.getPosition();
pos.dump(std::cerr);
pos.dump(cerr);
break;
}
case MYSQL_WRITE_ROWS_EVENT: {
auto binlog_event = std::static_pointer_cast<WriteRowsEvent>(event);
binlog_event->dump(std::cerr);
binlog_event->dump(cerr);
Position pos = slave.getPosition();
pos.dump(std::cerr);
pos.dump(cerr);
break;
}
case MYSQL_UPDATE_ROWS_EVENT: {
auto binlog_event = std::static_pointer_cast<UpdateRowsEvent>(event);
binlog_event->dump(std::cerr);
binlog_event->dump(cerr);
Position pos = slave.getPosition();
pos.dump(std::cerr);
pos.dump(cerr);
break;
}
case MYSQL_DELETE_ROWS_EVENT: {
auto binlog_event = std::static_pointer_cast<DeleteRowsEvent>(event);
binlog_event->dump(std::cerr);
binlog_event->dump(cerr);
Position pos = slave.getPosition();
pos.dump(std::cerr);
pos.dump(cerr);
break;
}
default:
if (event->header.type != MySQLReplication::EventType::HEARTBEAT_EVENT)
{
event->dump(std::cerr);
event->dump(cerr);
}
break;
}

View File

@ -59,15 +59,10 @@ void CheckConstraintsBlockOutputStream::write(const Block & block)
/// Is violated.
if (!value)
{
std::stringstream exception_message;
exception_message.exceptions(std::ios::failbit);
exception_message << "Constraint " << backQuote(constraint_ptr->name)
<< " for table " << table_id.getNameForLogs()
<< " is violated, because it is a constant expression returning 0."
<< " It is most likely an error in table definition.";
throw Exception{exception_message.str(), ErrorCodes::VIOLATED_CONSTRAINT};
throw Exception(ErrorCodes::VIOLATED_CONSTRAINT,
"Constraint {} for table {} is violated, because it is a constant expression returning 0. "
"It is most likely an error in table definition.",
backQuote(constraint_ptr->name), table_id.getNameForLogs());
}
}
else
@ -87,28 +82,27 @@ void CheckConstraintsBlockOutputStream::write(const Block & block)
Names related_columns = constraint_expr->getRequiredColumns();
std::stringstream exception_message;
exception_message.exceptions(std::ios::failbit);
exception_message << "Constraint " << backQuote(constraint_ptr->name)
<< " for table " << table_id.getNameForLogs()
<< " is violated at row " << (rows_written + row_idx + 1)
<< ". Expression: (" << serializeAST(*(constraint_ptr->expr), true) << ")"
<< ". Column values";
bool first = true;
String column_values_msg;
constexpr size_t approx_bytes_for_col = 32;
column_values_msg.reserve(approx_bytes_for_col * related_columns.size());
for (const auto & name : related_columns)
{
const IColumn & column = *block.getByName(name).column;
assert(row_idx < column.size());
exception_message << (first ? ": " : ", ")
<< backQuoteIfNeed(name) << " = " << applyVisitor(FieldVisitorToString(), column[row_idx]);
if (!first)
column_values_msg.append(", ");
column_values_msg.append(backQuoteIfNeed(name));
column_values_msg.append(" = ");
column_values_msg.append(applyVisitor(FieldVisitorToString(), column[row_idx]));
first = false;
}
throw Exception{exception_message.str(), ErrorCodes::VIOLATED_CONSTRAINT};
throw Exception(ErrorCodes::VIOLATED_CONSTRAINT,
"Constraint {} for table {} is violated at row {}. Expression: ({}). Column values: {}",
backQuote(constraint_ptr->name), table_id.getNameForLogs(), rows_written + row_idx + 1,
serializeAST(*(constraint_ptr->expr), true), column_values_msg);
}
}
}

View File

@ -4,7 +4,8 @@
#include <Interpreters/ProcessList.h>
#include <Access/EnabledQuota.h>
#include <Common/CurrentThread.h>
#include <common/sleep.h>
#include <IO/WriteBufferFromString.h>
#include <IO/Operators.h>
namespace ProfileEvents
{
@ -359,8 +360,7 @@ Block IBlockInputStream::getExtremes()
String IBlockInputStream::getTreeID() const
{
std::stringstream s;
s.exceptions(std::ios::failbit);
WriteBufferFromOwnString s;
s << getName();
if (!children.empty())
@ -399,13 +399,13 @@ size_t IBlockInputStream::checkDepthImpl(size_t max_depth, size_t level) const
}
void IBlockInputStream::dumpTree(std::ostream & ostr, size_t indent, size_t multiplier) const
void IBlockInputStream::dumpTree(WriteBuffer & ostr, size_t indent, size_t multiplier) const
{
ostr << String(indent, ' ') << getName();
if (multiplier > 1)
ostr << " × " << multiplier;
//ostr << ": " << getHeader().dumpStructure();
ostr << std::endl;
ostr << '\n';
++indent;
/// If the subtree is repeated several times, then we output it once with the multiplier.

View File

@ -95,7 +95,7 @@ public:
virtual void readSuffix();
/// Must be called before `read()` and `readPrefix()`.
void dumpTree(std::ostream & ostr, size_t indent = 0, size_t multiplier = 1) const;
void dumpTree(WriteBuffer & ostr, size_t indent = 0, size_t multiplier = 1) const;
/** Check the depth of the pipeline.
* If max_depth is specified and the `depth` is greater - throw an exception.

View File

@ -1,6 +1,4 @@
#include <queue>
#include <iomanip>
#include <sstream>
#include <common/logger_useful.h>

View File

@ -1,4 +1,3 @@
#include <sstream>
#include <string>
#include <vector>

View File

@ -32,8 +32,7 @@ static const std::vector<String> supported_functions{"any", "anyLast", "min",
String DataTypeCustomSimpleAggregateFunction::getName() const
{
std::stringstream stream;
stream.exceptions(std::ios::failbit);
WriteBufferFromOwnString stream;
stream << "SimpleAggregateFunction(" << function->getName();
if (!parameters.empty())

View File

@ -29,10 +29,7 @@ namespace ErrorCodes
template <typename T>
std::string DataTypeDecimal<T>::doGetName() const
{
std::stringstream ss;
ss.exceptions(std::ios::failbit);
ss << "Decimal(" << this->precision << ", " << this->scale << ")";
return ss.str();
return fmt::format("Decimal({}, {})", this->precision, this->scale);
}

View File

@ -26,7 +26,7 @@ static auto typeFromString(const std::string & str)
static auto typesFromString(const std::string & str)
{
std::istringstream data_types_stream(str);
std::istringstream data_types_stream(str); // STYLE_CHECK_ALLOW_STD_STRING_STREAM
DataTypes data_types;
std::string data_type;
while (data_types_stream >> data_type)

View File

@ -94,10 +94,9 @@ String getObjectDefinitionFromCreateQuery(const ASTPtr & query)
if (!create)
{
std::ostringstream query_stream;
query_stream.exceptions(std::ios::failbit);
formatAST(*query, query_stream, true);
throw Exception("Query '" + query_stream.str() + "' is not CREATE query", ErrorCodes::LOGICAL_ERROR);
WriteBufferFromOwnString query_buf;
formatAST(*query, query_buf, true);
throw Exception(ErrorCodes::LOGICAL_ERROR, "Query '{}' is not CREATE query", query_buf.str());
}
if (!create->is_dictionary)
@ -121,11 +120,10 @@ String getObjectDefinitionFromCreateQuery(const ASTPtr & query)
if (create->uuid != UUIDHelpers::Nil)
create->table = TABLE_WITH_UUID_NAME_PLACEHOLDER;
std::ostringstream statement_stream;
statement_stream.exceptions(std::ios::failbit);
formatAST(*create, statement_stream, false);
statement_stream << '\n';
return statement_stream.str();
WriteBufferFromOwnString statement_buf;
formatAST(*create, statement_buf, false);
writeChar('\n', statement_buf);
return statement_buf.str();
}
DatabaseOnDisk::DatabaseOnDisk(

View File

@ -127,8 +127,7 @@ static String checkVariableAndGetVersion(const mysqlxx::Pool::Entry & connection
}
bool first = true;
std::stringstream error_message;
error_message.exceptions(std::ios::failbit);
WriteBufferFromOwnString error_message;
error_message << "Illegal MySQL variables, the MaterializeMySQL engine requires ";
for (const auto & [variable_name, variable_error_message] : variables_error_message)
{
@ -239,8 +238,7 @@ static inline BlockOutputStreamPtr getTableOutput(const String & database_name,
{
const StoragePtr & storage = DatabaseCatalog::instance().getTable(StorageID(database_name, table_name), query_context);
std::stringstream insert_columns_str;
insert_columns_str.exceptions(std::ios::failbit);
WriteBufferFromOwnString insert_columns_str;
const StorageInMemoryMetadata & storage_metadata = storage->getInMemoryMetadata();
const ColumnsDescription & storage_columns = storage_metadata.getColumns();
const NamesAndTypesList & insert_columns_names = insert_materialized ? storage_columns.getAllPhysical() : storage_columns.getOrdinary();
@ -331,10 +329,9 @@ std::optional<MaterializeMetadata> MaterializeMySQLSyncThread::prepareSynchroniz
const auto & position_message = [&]()
{
std::stringstream ss;
ss.exceptions(std::ios::failbit);
position.dump(ss);
return ss.str();
WriteBufferFromOwnString buf;
position.dump(buf);
return buf.str();
};
LOG_INFO(log, "MySQL dump database position: \n {}", position_message());
}
@ -374,10 +371,9 @@ void MaterializeMySQLSyncThread::flushBuffersData(Buffers & buffers, Materialize
const auto & position_message = [&]()
{
std::stringstream ss;
ss.exceptions(std::ios::failbit);
client.getPosition().dump(ss);
return ss.str();
WriteBufferFromOwnString buf;
client.getPosition().dump(buf);
return buf.str();
};
LOG_INFO(log, "MySQL executed position: \n {}", position_message());
}
@ -623,8 +619,26 @@ void MaterializeMySQLSyncThread::onEvent(Buffers & buffers, const BinlogEventPtr
else if (receive_event->type() == MYSQL_QUERY_EVENT)
{
QueryEvent & query_event = static_cast<QueryEvent &>(*receive_event);
flushBuffersData(buffers, metadata);
Position position_before_ddl;
position_before_ddl.update(metadata.binlog_position, metadata.binlog_file, metadata.executed_gtid_set);
metadata.transaction(position_before_ddl, [&]() { buffers.commit(global_context); });
metadata.transaction(client.getPosition(),[&](){ executeDDLAtomic(query_event); });
}
else if (receive_event->header.type != HEARTBEAT_EVENT)
{
const auto & dump_event_message = [&]()
{
WriteBufferFromOwnString buf;
receive_event->dump(buf);
return buf.str();
};
LOG_DEBUG(log, "Skip MySQL event: \n {}", dump_event_message());
}
}
void MaterializeMySQLSyncThread::executeDDLAtomic(const QueryEvent & query_event)
{
try
{
Context query_context = createQueryContext(global_context);
@ -642,19 +656,6 @@ void MaterializeMySQLSyncThread::onEvent(Buffers & buffers, const BinlogEventPtr
throw;
}
}
else if (receive_event->header.type != HEARTBEAT_EVENT)
{
const auto & dump_event_message = [&]()
{
std::stringstream ss;
ss.exceptions(std::ios::failbit);
receive_event->dump(ss);
return ss.str();
};
LOG_DEBUG(log, "Skip MySQL event: \n {}", dump_event_message());
}
}
bool MaterializeMySQLSyncThread::isMySQLSyncThread()
{

View File

@ -100,6 +100,7 @@ private:
std::atomic<bool> sync_quit{false};
std::unique_ptr<ThreadFromGlobalPool> background_thread_pool;
void executeDDLAtomic(const QueryEvent & query_event);
};
}

View File

@ -4,6 +4,7 @@
#include <DataTypes/DataTypeNullable.h>
#include <Formats/FormatSettings.h>
#include <IO/WriteHelpers.h>
#include <IO/Operators.h>
#include <Common/StringUtils/StringUtils.h>
#include <numeric>
@ -230,8 +231,7 @@ std::string DictionaryStructure::getKeyDescription() const
if (id)
return "UInt64";
std::ostringstream out;
out.exceptions(std::ios::failbit);
WriteBufferFromOwnString out;
out << '(';

View File

@ -18,7 +18,7 @@ static bool registered = false;
static std::string configurationToString(const DictionaryConfigurationPtr & config)
{
const Poco::Util::XMLConfiguration * xml_config = dynamic_cast<const Poco::Util::XMLConfiguration *>(config.get());
std::ostringstream oss;
std::ostringstream oss; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
oss.exceptions(std::ios::failbit);
xml_config->save(oss);
return oss.str();

View File

@ -40,100 +40,93 @@ const FormatFactory::Creators & FormatFactory::getCreators(const String & name)
throw Exception("Unknown format " + name, ErrorCodes::UNKNOWN_FORMAT);
}
FormatSettings getFormatSettings(const Context & context)
{
const auto & settings = context.getSettingsRef();
static FormatSettings getInputFormatSetting(const Settings & settings, const Context & context)
return getFormatSettings(context, settings);
}
template <typename Settings>
FormatSettings getFormatSettings(const Context & context,
const Settings & settings)
{
FormatSettings format_settings;
format_settings.csv.delimiter = settings.format_csv_delimiter;
format_settings.csv.allow_single_quotes = settings.format_csv_allow_single_quotes;
format_settings.avro.allow_missing_fields = settings.input_format_avro_allow_missing_fields;
format_settings.avro.output_codec = settings.output_format_avro_codec;
format_settings.avro.output_sync_interval = settings.output_format_avro_sync_interval;
format_settings.avro.schema_registry_url = settings.format_avro_schema_registry_url.toString();
format_settings.csv.allow_double_quotes = settings.format_csv_allow_double_quotes;
format_settings.csv.unquoted_null_literal_as_null = settings.input_format_csv_unquoted_null_literal_as_null;
format_settings.csv.allow_single_quotes = settings.format_csv_allow_single_quotes;
format_settings.csv.crlf_end_of_line = settings.output_format_csv_crlf_end_of_line;
format_settings.csv.delimiter = settings.format_csv_delimiter;
format_settings.csv.empty_as_default = settings.input_format_defaults_for_omitted_fields;
format_settings.csv.input_format_enum_as_number = settings.input_format_csv_enum_as_number;
format_settings.null_as_default = settings.input_format_null_as_default;
format_settings.values.interpret_expressions = settings.input_format_values_interpret_expressions;
format_settings.values.deduce_templates_of_expressions = settings.input_format_values_deduce_templates_of_expressions;
format_settings.values.accurate_types_of_literals = settings.input_format_values_accurate_types_of_literals;
format_settings.with_names_use_header = settings.input_format_with_names_use_header;
format_settings.skip_unknown_fields = settings.input_format_skip_unknown_fields;
format_settings.import_nested_json = settings.input_format_import_nested_json;
format_settings.csv.unquoted_null_literal_as_null = settings.input_format_csv_unquoted_null_literal_as_null;
format_settings.custom.escaping_rule = settings.format_custom_escaping_rule;
format_settings.custom.field_delimiter = settings.format_custom_field_delimiter;
format_settings.custom.result_after_delimiter = settings.format_custom_result_after_delimiter;
format_settings.custom.result_after_delimiter = settings.format_custom_result_after_delimiter;
format_settings.custom.result_before_delimiter = settings.format_custom_result_before_delimiter;
format_settings.custom.row_after_delimiter = settings.format_custom_row_after_delimiter;
format_settings.custom.row_before_delimiter = settings.format_custom_row_before_delimiter;
format_settings.custom.row_between_delimiter = settings.format_custom_row_between_delimiter;
format_settings.date_time_input_format = settings.date_time_input_format;
format_settings.date_time_output_format = settings.date_time_output_format;
format_settings.enable_streaming = settings.output_format_enable_streaming;
format_settings.import_nested_json = settings.input_format_import_nested_json;
format_settings.input_allow_errors_num = settings.input_format_allow_errors_num;
format_settings.input_allow_errors_ratio = settings.input_format_allow_errors_ratio;
format_settings.template_settings.resultset_format = settings.format_template_resultset;
format_settings.template_settings.row_format = settings.format_template_row;
format_settings.template_settings.row_between_delimiter = settings.format_template_rows_between_delimiter;
format_settings.tsv.empty_as_default = settings.input_format_tsv_empty_as_default;
format_settings.tsv.input_format_enum_as_number = settings.input_format_tsv_enum_as_number;
format_settings.json.escape_forward_slashes = settings.output_format_json_escape_forward_slashes;
format_settings.json.quote_64bit_integers = settings.output_format_json_quote_64bit_integers;
format_settings.json.quote_denormals = settings.output_format_json_quote_denormals;
format_settings.null_as_default = settings.input_format_null_as_default;
format_settings.parquet.row_group_size = settings.output_format_parquet_row_group_size;
format_settings.pretty.charset = settings.output_format_pretty_grid_charset.toString() == "ASCII" ? FormatSettings::Pretty::Charset::ASCII : FormatSettings::Pretty::Charset::UTF8;
format_settings.pretty.color = settings.output_format_pretty_color;
format_settings.pretty.max_column_pad_width = settings.output_format_pretty_max_column_pad_width;
format_settings.pretty.max_rows = settings.output_format_pretty_max_rows;
format_settings.pretty.max_value_width = settings.output_format_pretty_max_value_width;
format_settings.pretty.output_format_pretty_row_numbers = settings.output_format_pretty_row_numbers;
format_settings.regexp.escaping_rule = settings.format_regexp_escaping_rule;
format_settings.regexp.regexp = settings.format_regexp;
format_settings.regexp.skip_unmatched = settings.format_regexp_skip_unmatched;
format_settings.schema.format_schema = settings.format_schema;
format_settings.schema.format_schema_path = context.getFormatSchemaPath();
format_settings.schema.is_server = context.hasGlobalContext() && (context.getGlobalContext().getApplicationType() == Context::ApplicationType::SERVER);
format_settings.custom.result_before_delimiter = settings.format_custom_result_before_delimiter;
format_settings.custom.result_after_delimiter = settings.format_custom_result_after_delimiter;
format_settings.custom.escaping_rule = settings.format_custom_escaping_rule;
format_settings.custom.field_delimiter = settings.format_custom_field_delimiter;
format_settings.custom.row_before_delimiter = settings.format_custom_row_before_delimiter;
format_settings.custom.row_after_delimiter = settings.format_custom_row_after_delimiter;
format_settings.custom.row_between_delimiter = settings.format_custom_row_between_delimiter;
format_settings.regexp.regexp = settings.format_regexp;
format_settings.regexp.escaping_rule = settings.format_regexp_escaping_rule;
format_settings.regexp.skip_unmatched = settings.format_regexp_skip_unmatched;
format_settings.skip_unknown_fields = settings.input_format_skip_unknown_fields;
format_settings.template_settings.resultset_format = settings.format_template_resultset;
format_settings.template_settings.row_between_delimiter = settings.format_template_rows_between_delimiter;
format_settings.template_settings.row_format = settings.format_template_row;
format_settings.tsv.crlf_end_of_line = settings.output_format_tsv_crlf_end_of_line;
format_settings.tsv.empty_as_default = settings.input_format_tsv_empty_as_default;
format_settings.tsv.input_format_enum_as_number = settings.input_format_tsv_enum_as_number;
format_settings.tsv.null_representation = settings.output_format_tsv_null_representation;
format_settings.values.accurate_types_of_literals = settings.input_format_values_accurate_types_of_literals;
format_settings.values.deduce_templates_of_expressions = settings.input_format_values_deduce_templates_of_expressions;
format_settings.values.interpret_expressions = settings.input_format_values_interpret_expressions;
format_settings.with_names_use_header = settings.input_format_with_names_use_header;
format_settings.write_statistics = settings.output_format_write_statistics;
/// Validate avro_schema_registry_url with RemoteHostFilter when non-empty and in Server context
if (context.hasGlobalContext() && (context.getGlobalContext().getApplicationType() == Context::ApplicationType::SERVER))
if (format_settings.schema.is_server)
{
const Poco::URI & avro_schema_registry_url = settings.format_avro_schema_registry_url;
if (!avro_schema_registry_url.empty())
context.getRemoteHostFilter().checkURL(avro_schema_registry_url);
}
format_settings.avro.schema_registry_url = settings.format_avro_schema_registry_url.toString();
format_settings.avro.allow_missing_fields = settings.input_format_avro_allow_missing_fields;
return format_settings;
}
static FormatSettings getOutputFormatSetting(const Settings & settings, const Context & context)
{
FormatSettings format_settings;
format_settings.enable_streaming = settings.output_format_enable_streaming;
format_settings.json.quote_64bit_integers = settings.output_format_json_quote_64bit_integers;
format_settings.json.quote_denormals = settings.output_format_json_quote_denormals;
format_settings.json.escape_forward_slashes = settings.output_format_json_escape_forward_slashes;
format_settings.csv.delimiter = settings.format_csv_delimiter;
format_settings.csv.allow_single_quotes = settings.format_csv_allow_single_quotes;
format_settings.csv.allow_double_quotes = settings.format_csv_allow_double_quotes;
format_settings.csv.crlf_end_of_line = settings.output_format_csv_crlf_end_of_line;
format_settings.pretty.max_rows = settings.output_format_pretty_max_rows;
format_settings.pretty.max_column_pad_width = settings.output_format_pretty_max_column_pad_width;
format_settings.pretty.max_value_width = settings.output_format_pretty_max_value_width;
format_settings.pretty.color = settings.output_format_pretty_color;
format_settings.pretty.charset = settings.output_format_pretty_grid_charset.toString() == "ASCII" ?
FormatSettings::Pretty::Charset::ASCII :
FormatSettings::Pretty::Charset::UTF8;
format_settings.pretty.output_format_pretty_row_numbers = settings.output_format_pretty_row_numbers;
format_settings.template_settings.resultset_format = settings.format_template_resultset;
format_settings.template_settings.row_format = settings.format_template_row;
format_settings.template_settings.row_between_delimiter = settings.format_template_rows_between_delimiter;
format_settings.tsv.crlf_end_of_line = settings.output_format_tsv_crlf_end_of_line;
format_settings.tsv.null_representation = settings.output_format_tsv_null_representation;
format_settings.write_statistics = settings.output_format_write_statistics;
format_settings.parquet.row_group_size = settings.output_format_parquet_row_group_size;
format_settings.schema.format_schema = settings.format_schema;
format_settings.schema.format_schema_path = context.getFormatSchemaPath();
format_settings.schema.is_server = context.hasGlobalContext() && (context.getGlobalContext().getApplicationType() == Context::ApplicationType::SERVER);
format_settings.custom.result_before_delimiter = settings.format_custom_result_before_delimiter;
format_settings.custom.result_after_delimiter = settings.format_custom_result_after_delimiter;
format_settings.custom.escaping_rule = settings.format_custom_escaping_rule;
format_settings.custom.field_delimiter = settings.format_custom_field_delimiter;
format_settings.custom.row_before_delimiter = settings.format_custom_row_before_delimiter;
format_settings.custom.row_after_delimiter = settings.format_custom_row_after_delimiter;
format_settings.custom.row_between_delimiter = settings.format_custom_row_between_delimiter;
format_settings.avro.output_codec = settings.output_format_avro_codec;
format_settings.avro.output_sync_interval = settings.output_format_avro_sync_interval;
format_settings.date_time_output_format = settings.date_time_output_format;
template
FormatSettings getFormatSettings<FormatFactorySettings>(const Context & context,
const FormatFactorySettings & settings);
return format_settings;
}
template
FormatSettings getFormatSettings<Settings>(const Context & context,
const Settings & settings);
BlockInputStreamPtr FormatFactory::getInput(
@ -142,21 +135,22 @@ BlockInputStreamPtr FormatFactory::getInput(
const Block & sample,
const Context & context,
UInt64 max_block_size,
ReadCallback callback) const
const std::optional<FormatSettings> & _format_settings) const
{
if (name == "Native")
return std::make_shared<NativeBlockInputStream>(buf, sample, 0);
auto format_settings = _format_settings
? *_format_settings : getFormatSettings(context);
if (!getCreators(name).input_processor_creator)
{
const auto & input_getter = getCreators(name).input_creator;
if (!input_getter)
throw Exception("Format " + name + " is not suitable for input", ErrorCodes::FORMAT_IS_NOT_SUITABLE_FOR_INPUT);
const Settings & settings = context.getSettingsRef();
FormatSettings format_settings = getInputFormatSetting(settings, context);
return input_getter(buf, sample, max_block_size, callback ? callback : ReadCallback(), format_settings);
return input_getter(buf, sample, max_block_size, {}, format_settings);
}
const Settings & settings = context.getSettingsRef();
@ -182,17 +176,16 @@ BlockInputStreamPtr FormatFactory::getInput(
if (!input_getter)
throw Exception("Format " + name + " is not suitable for input", ErrorCodes::FORMAT_IS_NOT_SUITABLE_FOR_INPUT);
FormatSettings format_settings = getInputFormatSetting(settings, context);
RowInputFormatParams row_input_format_params;
row_input_format_params.max_block_size = max_block_size;
row_input_format_params.allow_errors_num = format_settings.input_allow_errors_num;
row_input_format_params.allow_errors_ratio = format_settings.input_allow_errors_ratio;
row_input_format_params.callback = std::move(callback);
row_input_format_params.max_execution_time = settings.max_execution_time;
row_input_format_params.timeout_overflow_mode = settings.timeout_overflow_mode;
auto input_creator_params = ParallelParsingBlockInputStream::InputCreatorParams{sample, row_input_format_params, format_settings};
auto input_creator_params =
ParallelParsingBlockInputStream::InputCreatorParams{sample,
row_input_format_params, format_settings};
ParallelParsingBlockInputStream::Params params{buf, input_getter,
input_creator_params, file_segmentation_engine,
static_cast<int>(settings.max_threads),
@ -200,32 +193,37 @@ BlockInputStreamPtr FormatFactory::getInput(
return std::make_shared<ParallelParsingBlockInputStream>(params);
}
auto format = getInputFormat(name, buf, sample, context, max_block_size, std::move(callback));
auto format = getInputFormat(name, buf, sample, context, max_block_size,
format_settings);
return std::make_shared<InputStreamFromInputFormat>(std::move(format));
}
BlockOutputStreamPtr FormatFactory::getOutput(
const String & name, WriteBuffer & buf, const Block & sample, const Context & context, WriteCallback callback, const bool ignore_no_row_delimiter) const
BlockOutputStreamPtr FormatFactory::getOutput(const String & name,
WriteBuffer & buf, const Block & sample, const Context & context,
WriteCallback callback, const std::optional<FormatSettings> & _format_settings) const
{
auto format_settings = _format_settings
? *_format_settings : getFormatSettings(context);
if (!getCreators(name).output_processor_creator)
{
const auto & output_getter = getCreators(name).output_creator;
if (!output_getter)
throw Exception("Format " + name + " is not suitable for output", ErrorCodes::FORMAT_IS_NOT_SUITABLE_FOR_OUTPUT);
const Settings & settings = context.getSettingsRef();
FormatSettings format_settings = getOutputFormatSetting(settings, context);
/** Materialization is needed, because formats can use the functions `IDataType`,
* which only work with full columns.
*/
return std::make_shared<MaterializingBlockOutputStream>(
output_getter(buf, sample, std::move(callback), format_settings), sample);
output_getter(buf, sample, std::move(callback), format_settings),
sample);
}
auto format = getOutputFormat(name, buf, sample, context, std::move(callback), ignore_no_row_delimiter);
return std::make_shared<MaterializingBlockOutputStream>(std::make_shared<OutputStreamToOutputFormat>(format), sample);
auto format = getOutputFormat(name, buf, sample, context, std::move(callback),
format_settings);
return std::make_shared<MaterializingBlockOutputStream>(
std::make_shared<OutputStreamToOutputFormat>(format), sample);
}
@ -235,25 +233,27 @@ InputFormatPtr FormatFactory::getInputFormat(
const Block & sample,
const Context & context,
UInt64 max_block_size,
ReadCallback callback) const
const std::optional<FormatSettings> & _format_settings) const
{
const auto & input_getter = getCreators(name).input_processor_creator;
if (!input_getter)
throw Exception("Format " + name + " is not suitable for input", ErrorCodes::FORMAT_IS_NOT_SUITABLE_FOR_INPUT);
const Settings & settings = context.getSettingsRef();
FormatSettings format_settings = getInputFormatSetting(settings, context);
auto format_settings = _format_settings
? *_format_settings : getFormatSettings(context);
RowInputFormatParams params;
params.max_block_size = max_block_size;
params.allow_errors_num = format_settings.input_allow_errors_num;
params.allow_errors_ratio = format_settings.input_allow_errors_ratio;
params.callback = std::move(callback);
params.max_execution_time = settings.max_execution_time;
params.timeout_overflow_mode = settings.timeout_overflow_mode;
auto format = input_getter(buf, sample, params, format_settings);
/// It's a kludge. Because I cannot remove context from values format.
if (auto * values = typeid_cast<ValuesBlockInputFormat *>(format.get()))
values->setContext(context);
@ -263,19 +263,20 @@ InputFormatPtr FormatFactory::getInputFormat(
OutputFormatPtr FormatFactory::getOutputFormat(
const String & name, WriteBuffer & buf, const Block & sample, const Context & context, WriteCallback callback, const bool ignore_no_row_delimiter) const
const String & name, WriteBuffer & buf, const Block & sample,
const Context & context, WriteCallback callback,
const std::optional<FormatSettings> & _format_settings) const
{
const auto & output_getter = getCreators(name).output_processor_creator;
if (!output_getter)
throw Exception("Format " + name + " is not suitable for output", ErrorCodes::FORMAT_IS_NOT_SUITABLE_FOR_OUTPUT);
const Settings & settings = context.getSettingsRef();
FormatSettings format_settings = getOutputFormatSetting(settings, context);
RowOutputFormatParams params;
params.ignore_no_row_delimiter = ignore_no_row_delimiter;
params.callback = std::move(callback);
auto format_settings = _format_settings
? *_format_settings : getFormatSettings(context);
/** TODO: Materialization is needed, because formats can use the functions `IDataType`,
* which only work with full columns.
*/

View File

@ -3,6 +3,7 @@
#include <common/types.h>
#include <Columns/IColumn.h>
#include <DataStreams/IBlockStream_fwd.h>
#include <Formats/FormatSettings.h>
#include <IO/BufferWithOwnMemory.h>
#include <functional>
@ -16,6 +17,8 @@ namespace DB
class Block;
class Context;
struct FormatSettings;
struct Settings;
struct FormatFactorySettings;
class ReadBuffer;
class WriteBuffer;
@ -32,6 +35,11 @@ struct RowOutputFormatParams;
using InputFormatPtr = std::shared_ptr<IInputFormat>;
using OutputFormatPtr = std::shared_ptr<IOutputFormat>;
FormatSettings getFormatSettings(const Context & context);
template <typename T>
FormatSettings getFormatSettings(const Context & context,
const T & settings);
/** Allows to create an IBlockInputStream or IBlockOutputStream by the name of the format.
* Note: format and compression are independent things.
@ -104,10 +112,11 @@ public:
const Block & sample,
const Context & context,
UInt64 max_block_size,
ReadCallback callback = {}) const;
const std::optional<FormatSettings> & format_settings = std::nullopt) const;
BlockOutputStreamPtr getOutput(const String & name, WriteBuffer & buf,
const Block & sample, const Context & context, WriteCallback callback = {}, const bool ignore_no_row_delimiter = false) const;
const Block & sample, const Context & context, WriteCallback callback = {},
const std::optional<FormatSettings> & format_settings = std::nullopt) const;
InputFormatPtr getInputFormat(
const String & name,
@ -115,10 +124,12 @@ public:
const Block & sample,
const Context & context,
UInt64 max_block_size,
ReadCallback callback = {}) const;
const std::optional<FormatSettings> & format_settings = std::nullopt) const;
OutputFormatPtr getOutputFormat(
const String & name, WriteBuffer & buf, const Block & sample, const Context & context, WriteCallback callback = {}, const bool ignore_no_row_delimiter = false) const;
const String & name, WriteBuffer & buf, const Block & sample,
const Context & context, WriteCallback callback = {},
const std::optional<FormatSettings> & format_settings = std::nullopt) const;
/// Register format by its name.
void registerInputFormat(const String & name, InputCreator input_creator);

View File

@ -6,10 +6,16 @@
namespace DB
{
/** Various tweaks for input/output formats.
* Text serialization/deserialization of data types also depend on some of these settings.
* NOTE Parameters for unrelated formats and unrelated data types
* are collected in this struct - it prevents modularity, but they are difficult to separate.
/**
* Various tweaks for input/output formats. Text serialization/deserialization
* of data types also depend on some of these settings. It is different from
* FormatFactorySettings in that it has all necessary user-provided settings
* combined with information from context etc, that we can use directly during
* serialization. In contrast, FormatFactorySettings' job is to reflect the
* changes made to user-visible format settings, such as when tweaking the
* the format for File engine.
* NOTE Parameters for unrelated formats and unrelated data types are collected
* in this struct - it prevents modularity, but they are difficult to separate.
*/
struct FormatSettings
{
@ -17,76 +23,6 @@ struct FormatSettings
/// Option means that each chunk of data need to be formatted independently. Also each chunk will be flushed at the end of processing.
bool enable_streaming = false;
struct JSON
{
bool quote_64bit_integers = true;
bool quote_denormals = true;
bool escape_forward_slashes = true;
};
JSON json;
struct CSV
{
char delimiter = ',';
bool allow_single_quotes = true;
bool allow_double_quotes = true;
bool unquoted_null_literal_as_null = false;
bool empty_as_default = false;
bool crlf_end_of_line = false;
bool input_format_enum_as_number = false;
};
CSV csv;
struct Pretty
{
UInt64 max_rows = 10000;
UInt64 max_column_pad_width = 250;
UInt64 max_value_width = 10000;
bool color = true;
bool output_format_pretty_row_numbers = false;
enum class Charset
{
UTF8,
ASCII,
};
Charset charset = Charset::UTF8;
};
Pretty pretty;
struct Values
{
bool interpret_expressions = true;
bool deduce_templates_of_expressions = true;
bool accurate_types_of_literals = true;
};
Values values;
struct Template
{
String resultset_format;
String row_format;
String row_between_delimiter;
};
Template template_settings;
struct TSV
{
bool empty_as_default = false;
bool crlf_end_of_line = false;
String null_representation = "\\N";
bool input_format_enum_as_number = false;
};
TSV tsv;
bool skip_unknown_fields = false;
bool with_names_use_header = false;
bool write_statistics = true;
@ -113,24 +49,29 @@ struct FormatSettings
UInt64 input_allow_errors_num = 0;
Float32 input_allow_errors_ratio = 0;
struct Arrow
struct
{
UInt64 row_group_size = 1000000;
} arrow;
struct Parquet
struct
{
UInt64 row_group_size = 1000000;
} parquet;
String schema_registry_url;
String output_codec;
UInt64 output_sync_interval = 16 * 1024;
bool allow_missing_fields = false;
} avro;
struct Schema
struct CSV
{
std::string format_schema;
std::string format_schema_path;
bool is_server = false;
};
Schema schema;
char delimiter = ',';
bool allow_single_quotes = true;
bool allow_double_quotes = true;
bool unquoted_null_literal_as_null = false;
bool empty_as_default = false;
bool crlf_end_of_line = false;
bool input_format_enum_as_number = false;
} csv;
struct Custom
{
@ -141,29 +82,87 @@ struct FormatSettings
std::string row_between_delimiter;
std::string field_delimiter;
std::string escaping_rule;
};
} custom;
Custom custom;
struct Avro
struct
{
String schema_registry_url;
String output_codec;
UInt64 output_sync_interval = 16 * 1024;
bool allow_missing_fields = false;
bool quote_64bit_integers = true;
bool quote_denormals = true;
bool escape_forward_slashes = true;
bool serialize_as_strings = false;
} json;
struct
{
UInt64 row_group_size = 1000000;
} parquet;
struct Pretty
{
UInt64 max_rows = 10000;
UInt64 max_column_pad_width = 250;
UInt64 max_value_width = 10000;
bool color = true;
bool output_format_pretty_row_numbers = false;
enum class Charset
{
UTF8,
ASCII,
};
Avro avro;
Charset charset = Charset::UTF8;
} pretty;
struct Regexp
struct
{
bool write_row_delimiters = true;
/**
* Some buffers (kafka / rabbit) split the rows internally using callback,
* and always send one row per message, so we can push there formats
* without framing / delimiters (like ProtobufSingle). In other cases,
* we have to enforce exporting at most one row in the format output,
* because Protobuf without delimiters is not generally useful.
*/
bool allow_many_rows_no_delimiters = false;
} protobuf;
struct
{
std::string regexp;
std::string escaping_rule;
bool skip_unmatched = false;
};
} regexp;
Regexp regexp;
struct
{
std::string format_schema;
std::string format_schema_path;
bool is_server = false;
} schema;
struct
{
String resultset_format;
String row_format;
String row_between_delimiter;
} template_settings;
struct
{
bool empty_as_default = false;
bool crlf_end_of_line = false;
String null_representation = "\\N";
bool input_format_enum_as_number = false;
} tsv;
struct
{
bool interpret_expressions = true;
bool deduce_templates_of_expressions = true;
bool accurate_types_of_literals = true;
} values;
};
}

View File

@ -38,8 +38,8 @@ try
FormatSettings format_settings;
RowInputFormatParams in_params{DEFAULT_INSERT_BLOCK_SIZE, 0, 0, []{}};
RowOutputFormatParams out_params{[](const Columns & /* columns */, size_t /* row */){},false};
RowInputFormatParams in_params{DEFAULT_INSERT_BLOCK_SIZE, 0, 0};
RowOutputFormatParams out_params{[](const Columns & /* columns */, size_t /* row */){}};
InputFormatPtr input_format = std::make_shared<TabSeparatedRowInputFormat>(sample, in_buf, in_params, false, false, format_settings);
BlockInputStreamPtr block_input = std::make_shared<InputStreamFromInputFormat>(std::move(input_format));

View File

@ -3,7 +3,6 @@
#if !defined(ARCADIA_BUILD) && USE_STATS
#include <math.h>
#include <sstream>
#include <DataTypes/DataTypeString.h>
#include <Columns/ColumnString.h>

View File

@ -239,7 +239,7 @@ void assertResponseIsOk(const Poco::Net::HTTPRequest & request, Poco::Net::HTTPR
if (!(status == Poco::Net::HTTPResponse::HTTP_OK || (isRedirect(status) && allow_redirects)))
{
std::stringstream error_message;
std::stringstream error_message; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
error_message.exceptions(std::ios::failbit);
error_message << "Received error from remote server " << request.getURI() << ". HTTP status code: " << status << " "
<< response.getReason() << ", body: " << istr.rdbuf();

View File

@ -1,5 +1,4 @@
#include <IO/MySQLPacketPayloadReadBuffer.h>
#include <sstream>
namespace DB
{

View File

@ -45,7 +45,9 @@ struct BinaryManipReadBuffer : std::reference_wrapper<ReadBuffer> { usin
template <typename T> WriteBuffer & operator<< (WriteBuffer & buf, const T & x) { writeText(x, buf); return buf; }
/// If you do not use the manipulators, the string is displayed without an escape, as is.
template <> inline WriteBuffer & operator<< (WriteBuffer & buf, const String & x) { writeString(x, buf); return buf; }
template <> inline WriteBuffer & operator<< (WriteBuffer & buf, const std::string_view & x) { writeString(StringRef(x), buf); return buf; }
template <> inline WriteBuffer & operator<< (WriteBuffer & buf, const char & x) { writeChar(x, buf); return buf; }
template <> inline WriteBuffer & operator<< (WriteBuffer & buf, const pcg32_fast & x) { PcgSerializer::serializePcg32(x, buf); return buf; }
inline WriteBuffer & operator<< (WriteBuffer & buf, const char * x) { writeCString(x, buf); return buf; }
@ -73,6 +75,7 @@ inline WriteBuffer & operator<< (WriteBuffer & buf, FlushManip) { buf.next(); re
template <typename T> ReadBuffer & operator>> (ReadBuffer & buf, T & x) { readText(x, buf); return buf; }
template <> inline ReadBuffer & operator>> (ReadBuffer & buf, String & x) { readString(x, buf); return buf; }
template <> inline ReadBuffer & operator>> (ReadBuffer & buf, char & x) { readChar(x, buf); return buf; }
template <> inline ReadBuffer & operator>> (ReadBuffer & buf, pcg32_fast & x) { PcgDeserializer::deserializePcg32(x, buf); return buf; }
/// If you specify a string literal for reading, this will mean - make sure there is a sequence of bytes and skip it.
inline ReadBuffer & operator>> (ReadBuffer & buf, const char * x) { assertString(x, buf); return buf; }

View File

@ -817,7 +817,11 @@ ReturnType readDateTimeTextFallback(time_t & datetime, ReadBuffer & buf, const D
{
static constexpr bool throw_exception = std::is_same_v<ReturnType, void>;
/// YYYY-MM-DD hh:mm:ss
static constexpr auto date_time_broken_down_length = 19;
/// YYYY-MM-DD
static constexpr auto date_broken_down_length = 10;
/// unix timestamp max length
static constexpr auto unix_timestamp_max_length = 10;
char s[date_time_broken_down_length];
@ -831,12 +835,15 @@ ReturnType readDateTimeTextFallback(time_t & datetime, ReadBuffer & buf, const D
++buf.position();
}
/// 2015-01-01 01:02:03
/// 2015-01-01 01:02:03 or 2015-01-01
if (s_pos == s + 4 && !buf.eof() && (*buf.position() < '0' || *buf.position() > '9'))
{
const size_t remaining_size = date_time_broken_down_length - (s_pos - s);
size_t size = buf.read(s_pos, remaining_size);
if (remaining_size != size)
const auto already_read_length = s_pos - s;
const size_t remaining_date_time_size = date_time_broken_down_length - already_read_length;
const size_t remaining_date_size = date_broken_down_length - already_read_length;
size_t size = buf.read(s_pos, remaining_date_time_size);
if (size != remaining_date_time_size && size != remaining_date_size)
{
s_pos[size] = 0;
@ -850,9 +857,16 @@ ReturnType readDateTimeTextFallback(time_t & datetime, ReadBuffer & buf, const D
UInt8 month = (s[5] - '0') * 10 + (s[6] - '0');
UInt8 day = (s[8] - '0') * 10 + (s[9] - '0');
UInt8 hour = (s[11] - '0') * 10 + (s[12] - '0');
UInt8 minute = (s[14] - '0') * 10 + (s[15] - '0');
UInt8 second = (s[17] - '0') * 10 + (s[18] - '0');
UInt8 hour = 0;
UInt8 minute = 0;
UInt8 second = 0;
if (size == remaining_date_time_size)
{
hour = (s[11] - '0') * 10 + (s[12] - '0');
minute = (s[14] - '0') * 10 + (s[15] - '0');
second = (s[17] - '0') * 10 + (s[18] - '0');
}
if (unlikely(year == 0))
datetime = 0;

Some files were not shown because too many files have changed in this diff Show More