mirror of
https://github.com/ClickHouse/ClickHouse.git
synced 2024-11-25 00:52:02 +00:00
Merge branch 'master' of https://github.com/ClickHouse/ClickHouse into library-bridge
This commit is contained in:
commit
8688ae9346
156
CHANGELOG.md
156
CHANGELOG.md
@ -1,3 +1,159 @@
|
||||
## ClickHouse release 21.3
|
||||
|
||||
### ClickHouse release v21.3, 2021-03-12
|
||||
|
||||
#### Backward Incompatible Change
|
||||
|
||||
* Now it's not allowed to create MergeTree tables in old syntax with table TTL because it's just ignored. Attach of old tables is still possible. [#20282](https://github.com/ClickHouse/ClickHouse/pull/20282) ([alesapin](https://github.com/alesapin)).
|
||||
* Now all case-insensitive function names will be rewritten to their canonical representations. This is needed for projection query routing (the upcoming feature). [#20174](https://github.com/ClickHouse/ClickHouse/pull/20174) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Fix creation of `TTL` in cases, when its expression is a function and it is the same as `ORDER BY` key. Now it's allowed to set custom aggregation to primary key columns in `TTL` with `GROUP BY`. Backward incompatible: For primary key columns, which are not in `GROUP BY` and aren't set explicitly now is applied function `any` instead of `max`, when TTL is expired. Also if you use TTL with `WHERE` or `GROUP BY` you can see exceptions at merges, while making rolling update. [#15450](https://github.com/ClickHouse/ClickHouse/pull/15450) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
|
||||
#### New Feature
|
||||
|
||||
* Add file engine settings: `engine_file_empty_if_not_exists` and `engine_file_truncate_on_insert`. [#20620](https://github.com/ClickHouse/ClickHouse/pull/20620) ([M0r64n](https://github.com/M0r64n)).
|
||||
* Add aggregate function `deltaSum` for summing the differences between consecutive rows. [#20057](https://github.com/ClickHouse/ClickHouse/pull/20057) ([Russ Frank](https://github.com/rf)).
|
||||
* New `event_time_microseconds` column in `system.part_log` table. [#20027](https://github.com/ClickHouse/ClickHouse/pull/20027) ([Bharat Nallan](https://github.com/bharatnc)).
|
||||
* Added `timezoneOffset(datetime)` function which will give the offset from UTC in seconds. This close [#issue:19850](https://github.com/ClickHouse/ClickHouse/issues/19850). [#19962](https://github.com/ClickHouse/ClickHouse/pull/19962) ([keenwolf](https://github.com/keen-wolf)).
|
||||
* Add setting `insert_shard_id` to support insert data into specific shard from distributed table. [#19961](https://github.com/ClickHouse/ClickHouse/pull/19961) ([flynn](https://github.com/ucasFL)).
|
||||
* Function `reinterpretAs` updated to support big integers. Fixes [#19691](https://github.com/ClickHouse/ClickHouse/issues/19691). [#19858](https://github.com/ClickHouse/ClickHouse/pull/19858) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Added Server Side Encryption Customer Keys (the `x-amz-server-side-encryption-customer-(key/md5)` header) support in S3 client. See [the link](https://docs.aws.amazon.com/AmazonS3/latest/dev/ServerSideEncryptionCustomerKeys.html). Closes [#19428](https://github.com/ClickHouse/ClickHouse/issues/19428). [#19748](https://github.com/ClickHouse/ClickHouse/pull/19748) ([Vladimir Chebotarev](https://github.com/excitoon)).
|
||||
* Added `implicit_key` option for `executable` dictionary source. It allows to avoid printing key for every record if records comes in the same order as the input keys. Implements [#14527](https://github.com/ClickHouse/ClickHouse/issues/14527). [#19677](https://github.com/ClickHouse/ClickHouse/pull/19677) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Add quota type `query_selects` and `query_inserts`. [#19603](https://github.com/ClickHouse/ClickHouse/pull/19603) ([JackyWoo](https://github.com/JackyWoo)).
|
||||
* Add function `extractTextFromHTML` [#19600](https://github.com/ClickHouse/ClickHouse/pull/19600) ([zlx19950903](https://github.com/zlx19950903)), ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Tables with `MergeTree*` engine now have two new table-level settings for query concurrency control. Setting `max_concurrent_queries` limits the number of concurrently executed queries which are related to this table. Setting `min_marks_to_honor_max_concurrent_queries` tells to apply previous setting only if query reads at least this number of marks. [#19544](https://github.com/ClickHouse/ClickHouse/pull/19544) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Added `file` function to read file from user_files directory as a String. This is different from the `file` table function. This implements [#issue:18851](https://github.com/ClickHouse/ClickHouse/issues/18851). [#19204](https://github.com/ClickHouse/ClickHouse/pull/19204) ([keenwolf](https://github.com/keen-wolf)).
|
||||
|
||||
#### Experimental feature
|
||||
|
||||
* Add experimental `Replicated` database engine. It replicates DDL queries across multiple hosts. [#16193](https://github.com/ClickHouse/ClickHouse/pull/16193) ([tavplubix](https://github.com/tavplubix)).
|
||||
* Introduce experimental support for window functions, enabled with `allow_experimental_functions = 1`. This is a preliminary, alpha-quality implementation that is not suitable for production use and will change in backward-incompatible ways in future releases. Please see [the documentation](https://github.com/ClickHouse/ClickHouse/blob/master/docs/en/sql-reference/window-functions/index.md#experimental-window-functions) for the list of supported features. [#20337](https://github.com/ClickHouse/ClickHouse/pull/20337) ([Alexander Kuzmenkov](https://github.com/akuzm)).
|
||||
* Add the ability to backup/restore metadata files for DiskS3. [#18377](https://github.com/ClickHouse/ClickHouse/pull/18377) ([Pavel Kovalenko](https://github.com/Jokser)).
|
||||
|
||||
#### Performance Improvement
|
||||
|
||||
* Hedged requests for remote queries. When setting `use_hedged_requests` enabled (off by default), allow to establish many connections with different replicas for query. New connection is enabled in case existent connection(s) with replica(s) were not established within `hedged_connection_timeout` or no data was received within `receive_data_timeout`. Query uses the first connection which send non empty progress packet (or data packet, if `allow_changing_replica_until_first_data_packet`); other connections are cancelled. Queries with `max_parallel_replicas > 1` are supported. [#19291](https://github.com/ClickHouse/ClickHouse/pull/19291) ([Kruglov Pavel](https://github.com/Avogar)). This allows to significantly reduce tail latencies on very large clusters.
|
||||
* Added support for `PREWHERE` (and enable the corresponding optimization) when tables have row-level security expressions specified. [#19576](https://github.com/ClickHouse/ClickHouse/pull/19576) ([Denis Glazachev](https://github.com/traceon)).
|
||||
* The setting `distributed_aggregation_memory_efficient` is enabled by default. It will lower memory usage and improve performance of distributed queries. [#20599](https://github.com/ClickHouse/ClickHouse/pull/20599) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Improve performance of GROUP BY multiple fixed size keys. [#20472](https://github.com/ClickHouse/ClickHouse/pull/20472) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Improve performance of aggregate functions by more strict aliasing. [#19946](https://github.com/ClickHouse/ClickHouse/pull/19946) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Speed up reading from `Memory` tables in extreme cases (when reading speed is in order of 50 GB/sec) by simplification of pipeline and (consequently) less lock contention in pipeline scheduling. [#20468](https://github.com/ClickHouse/ClickHouse/pull/20468) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Partially reimplement HTTP server to make it making less copies of incoming and outgoing data. It gives up to 1.5 performance improvement on inserting long records over HTTP. [#19516](https://github.com/ClickHouse/ClickHouse/pull/19516) ([Ivan](https://github.com/abyss7)).
|
||||
* Add `compress` setting for `Memory` tables. If it's enabled the table will use less RAM. On some machines and datasets it can also work faster on SELECT, but it is not always the case. This closes [#20093](https://github.com/ClickHouse/ClickHouse/issues/20093). Note: there are reasons why Memory tables can work slower than MergeTree: (1) lack of compression (2) static size of blocks (3) lack of indices and prewhere... [#20168](https://github.com/ClickHouse/ClickHouse/pull/20168) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Slightly better code in aggregation. [#20978](https://github.com/ClickHouse/ClickHouse/pull/20978) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Add back `intDiv`/`modulo` specializations for better performance. This fixes [#21293](https://github.com/ClickHouse/ClickHouse/issues/21293) . The regression was introduced in https://github.com/ClickHouse/ClickHouse/pull/18145 . [#21307](https://github.com/ClickHouse/ClickHouse/pull/21307) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Do not squash blocks too much on INSERT SELECT if inserting into Memory table. In previous versions inefficient data representation was created in Memory table after INSERT SELECT. This closes [#13052](https://github.com/ClickHouse/ClickHouse/issues/13052). [#20169](https://github.com/ClickHouse/ClickHouse/pull/20169) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fix at least one case when DataType parser may have exponential complexity (found by fuzzer). This closes [#20096](https://github.com/ClickHouse/ClickHouse/issues/20096). [#20132](https://github.com/ClickHouse/ClickHouse/pull/20132) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Parallelize SELECT with FINAL for single part with level > 0 when `do_not_merge_across_partitions_select_final` setting is 1. [#19375](https://github.com/ClickHouse/ClickHouse/pull/19375) ([Kruglov Pavel](https://github.com/Avogar)).
|
||||
* Fill only requested columns when querying `system.parts` and `system.parts_columns`. Closes [#19570](https://github.com/ClickHouse/ClickHouse/issues/19570). [#21035](https://github.com/ClickHouse/ClickHouse/pull/21035) ([Anmol Arora](https://github.com/anmolarora)).
|
||||
* Perform algebraic optimizations of arithmetic expressions inside `avg` aggregate function. close [#20092](https://github.com/ClickHouse/ClickHouse/issues/20092). [#20183](https://github.com/ClickHouse/ClickHouse/pull/20183) ([flynn](https://github.com/ucasFL)).
|
||||
|
||||
#### Improvement
|
||||
|
||||
* Case-insensitive compression methods for table functions. Also fixed LZMA compression method which was checked in upper case. [#21416](https://github.com/ClickHouse/ClickHouse/pull/21416) ([Vladimir Chebotarev](https://github.com/excitoon)).
|
||||
* Add two settings to delay or throw error during insertion when there are too many inactive parts. This is useful when server fails to clean up parts quickly enough. [#20178](https://github.com/ClickHouse/ClickHouse/pull/20178) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Provide better compatibility for mysql clients. 1. mysql jdbc 2. mycli. [#21367](https://github.com/ClickHouse/ClickHouse/pull/21367) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Forbid to drop a column if it's referenced by materialized view. Closes [#21164](https://github.com/ClickHouse/ClickHouse/issues/21164). [#21303](https://github.com/ClickHouse/ClickHouse/pull/21303) ([flynn](https://github.com/ucasFL)).
|
||||
* MySQL dictionary source will now retry unexpected connection failures (Lost connection to MySQL server during query) which sometimes happen on SSL/TLS connections. [#21237](https://github.com/ClickHouse/ClickHouse/pull/21237) ([Alexander Kazakov](https://github.com/Akazz)).
|
||||
* Usability improvement: more consistent `DateTime64` parsing: recognize the case when unix timestamp with subsecond resolution is specified as scaled integer (like `1111111111222` instead of `1111111111.222`). This closes [#13194](https://github.com/ClickHouse/ClickHouse/issues/13194). [#21053](https://github.com/ClickHouse/ClickHouse/pull/21053) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Do only merging of sorted blocks on initiator with distributed_group_by_no_merge. [#20882](https://github.com/ClickHouse/ClickHouse/pull/20882) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* When loading config for mysql source ClickHouse will now randomize the list of replicas with the same priority to ensure the round-robin logics of picking mysql endpoint. This closes [#20629](https://github.com/ClickHouse/ClickHouse/issues/20629). [#20632](https://github.com/ClickHouse/ClickHouse/pull/20632) ([Alexander Kazakov](https://github.com/Akazz)).
|
||||
* Function 'reinterpretAs(x, Type)' renamed into 'reinterpret(x, Type)'. [#20611](https://github.com/ClickHouse/ClickHouse/pull/20611) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Support vhost for RabbitMQ engine [#20576](https://github.com/ClickHouse/ClickHouse/issues/20576). [#20596](https://github.com/ClickHouse/ClickHouse/pull/20596) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Improved serialization for data types combined of Arrays and Tuples. Improved matching enum data types to protobuf enum type. Fixed serialization of the `Map` data type. Omitted values are now set by default. [#20506](https://github.com/ClickHouse/ClickHouse/pull/20506) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Fixed race between execution of distributed DDL tasks and cleanup of DDL queue. Now DDL task cannot be removed from ZooKeeper if there are active workers. Fixes [#20016](https://github.com/ClickHouse/ClickHouse/issues/20016). [#20448](https://github.com/ClickHouse/ClickHouse/pull/20448) ([tavplubix](https://github.com/tavplubix)).
|
||||
* Make FQDN and other DNS related functions work correctly in alpine images. [#20336](https://github.com/ClickHouse/ClickHouse/pull/20336) ([filimonov](https://github.com/filimonov)).
|
||||
* Do not allow early constant folding of explicitly forbidden functions. [#20303](https://github.com/ClickHouse/ClickHouse/pull/20303) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Implicit conversion from integer to Decimal type might succeeded if integer value doe not fit into Decimal type. Now it throws `ARGUMENT_OUT_OF_BOUND`. [#20232](https://github.com/ClickHouse/ClickHouse/pull/20232) ([tavplubix](https://github.com/tavplubix)).
|
||||
* Lockless `SYSTEM FLUSH DISTRIBUTED`. [#20215](https://github.com/ClickHouse/ClickHouse/pull/20215) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Normalize count(constant), sum(1) to count(). This is needed for projection query routing. [#20175](https://github.com/ClickHouse/ClickHouse/pull/20175) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Support all native integer types in bitmap functions. [#20171](https://github.com/ClickHouse/ClickHouse/pull/20171) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Updated `CacheDictionary`, `ComplexCacheDictionary`, `SSDCacheDictionary`, `SSDComplexKeyDictionary` to use LRUHashMap as underlying index. [#20164](https://github.com/ClickHouse/ClickHouse/pull/20164) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* The setting `access_management` is now configurable on startup by providing `CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT`, defaults to disabled (`0`) which was the prior value. [#20139](https://github.com/ClickHouse/ClickHouse/pull/20139) ([Marquitos](https://github.com/sonirico)).
|
||||
* Fix toDateTime64(toDate()/toDateTime()) for DateTime64 - Implement DateTime64 clamping to match DateTime behaviour. [#20131](https://github.com/ClickHouse/ClickHouse/pull/20131) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Quota improvements: SHOW TABLES is now considered as one query in the quota calculations, not two queries. SYSTEM queries now consume quota. Fix calculation of interval's end in quota consumption. [#20106](https://github.com/ClickHouse/ClickHouse/pull/20106) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Supports `path IN (set)` expressions for `system.zookeeper` table. [#20105](https://github.com/ClickHouse/ClickHouse/pull/20105) ([小路](https://github.com/nicelulu)).
|
||||
* Show full details of `MaterializeMySQL` tables in `system.tables`. [#20051](https://github.com/ClickHouse/ClickHouse/pull/20051) ([Stig Bakken](https://github.com/stigsb)).
|
||||
* Fix data race in executable dictionary that was possible only on misuse (when the script returns data ignoring its input). [#20045](https://github.com/ClickHouse/ClickHouse/pull/20045) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* The value of MYSQL_OPT_RECONNECT option can now be controlled by "opt_reconnect" parameter in the config section of mysql replica. [#19998](https://github.com/ClickHouse/ClickHouse/pull/19998) ([Alexander Kazakov](https://github.com/Akazz)).
|
||||
* If user calls `JSONExtract` function with `Float32` type requested, allow inaccurate conversion to the result type. For example the number `0.1` in JSON is double precision and is not representable in Float32, but the user still wants to get it. Previous versions return 0 for non-Nullable type and NULL for Nullable type to indicate that conversion is imprecise. The logic was 100% correct but it was surprising to users and leading to questions. This closes [#13962](https://github.com/ClickHouse/ClickHouse/issues/13962). [#19960](https://github.com/ClickHouse/ClickHouse/pull/19960) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Add conversion of block structure for INSERT into Distributed tables if it does not match. [#19947](https://github.com/ClickHouse/ClickHouse/pull/19947) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Improvement for the `system.distributed_ddl_queue` table. Initialize MaxDDLEntryID to the last value after restarting. Before this PR, MaxDDLEntryID will remain zero until a new DDLTask is processed. [#19924](https://github.com/ClickHouse/ClickHouse/pull/19924) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Show `MaterializeMySQL` tables in `system.parts`. [#19770](https://github.com/ClickHouse/ClickHouse/pull/19770) ([Stig Bakken](https://github.com/stigsb)).
|
||||
* Add separate config directive for `Buffer` profile. [#19721](https://github.com/ClickHouse/ClickHouse/pull/19721) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Move conditions that are not related to JOIN to WHERE clause. [#18720](https://github.com/ClickHouse/ClickHouse/issues/18720). [#19685](https://github.com/ClickHouse/ClickHouse/pull/19685) ([hexiaoting](https://github.com/hexiaoting)).
|
||||
* Add ability to throttle INSERT into Distributed based on amount of pending bytes for async send (`bytes_to_delay_insert`/`max_delay_to_insert` and `bytes_to_throw_insert` settings for `Distributed` engine has been added). [#19673](https://github.com/ClickHouse/ClickHouse/pull/19673) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix some rare cases when write errors can be ignored in destructors. [#19451](https://github.com/ClickHouse/ClickHouse/pull/19451) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Print inline frames in stack traces for fatal errors. [#19317](https://github.com/ClickHouse/ClickHouse/pull/19317) ([Ivan](https://github.com/abyss7)).
|
||||
|
||||
#### Bug Fix
|
||||
|
||||
* Fix redundant reconnects to ZooKeeper and the possibility of two active sessions for a single clickhouse server. Both problems introduced in #14678. [#21264](https://github.com/ClickHouse/ClickHouse/pull/21264) ([alesapin](https://github.com/alesapin)).
|
||||
* Fix error `Bad cast from type ... to DB::ColumnLowCardinality` while inserting into table with `LowCardinality` column from `Values` format. Fixes #21140 [#21357](https://github.com/ClickHouse/ClickHouse/pull/21357) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix a deadlock in `ALTER DELETE` mutations for non replicated MergeTree table engines when the predicate contains the table itself. Fixes [#20558](https://github.com/ClickHouse/ClickHouse/issues/20558). [#21477](https://github.com/ClickHouse/ClickHouse/pull/21477) ([alesapin](https://github.com/alesapin)).
|
||||
* Fix SIGSEGV for distributed queries on failures. [#21434](https://github.com/ClickHouse/ClickHouse/pull/21434) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Now `ALTER MODIFY COLUMN` queries will correctly affect changes in partition key, skip indices, TTLs, and so on. Fixes [#13675](https://github.com/ClickHouse/ClickHouse/issues/13675). [#21334](https://github.com/ClickHouse/ClickHouse/pull/21334) ([alesapin](https://github.com/alesapin)).
|
||||
* Fix bug with `join_use_nulls` and joining `TOTALS` from subqueries. This closes [#19362](https://github.com/ClickHouse/ClickHouse/issues/19362) and [#21137](https://github.com/ClickHouse/ClickHouse/issues/21137). [#21248](https://github.com/ClickHouse/ClickHouse/pull/21248) ([vdimir](https://github.com/vdimir)).
|
||||
* Fix crash in `EXPLAIN` for query with `UNION`. Fixes [#20876](https://github.com/ClickHouse/ClickHouse/issues/20876), [#21170](https://github.com/ClickHouse/ClickHouse/issues/21170). [#21246](https://github.com/ClickHouse/ClickHouse/pull/21246) ([flynn](https://github.com/ucasFL)).
|
||||
* Now mutations allowed only for table engines that support them (MergeTree family, Memory, MaterializedView). Other engines will report a more clear error. Fixes [#21168](https://github.com/ClickHouse/ClickHouse/issues/21168). [#21183](https://github.com/ClickHouse/ClickHouse/pull/21183) ([alesapin](https://github.com/alesapin)).
|
||||
* Fixes [#21112](https://github.com/ClickHouse/ClickHouse/issues/21112). Fixed bug that could cause duplicates with insert query (if one of the callbacks came a little too late). [#21138](https://github.com/ClickHouse/ClickHouse/pull/21138) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix `input_format_null_as_default` take effective when types are nullable. This fixes [#21116](https://github.com/ClickHouse/ClickHouse/issues/21116) . [#21121](https://github.com/ClickHouse/ClickHouse/pull/21121) ([Amos Bird](https://github.com/amosbird)).
|
||||
* fix bug related to cast Tuple to Map. Closes [#21029](https://github.com/ClickHouse/ClickHouse/issues/21029). [#21120](https://github.com/ClickHouse/ClickHouse/pull/21120) ([hexiaoting](https://github.com/hexiaoting)).
|
||||
* Fix the metadata leak when the Replicated*MergeTree with custom (non default) ZooKeeper cluster is dropped. [#21119](https://github.com/ClickHouse/ClickHouse/pull/21119) ([fastio](https://github.com/fastio)).
|
||||
* Fix type mismatch issue when using LowCardinality keys in joinGet. This fixes [#21114](https://github.com/ClickHouse/ClickHouse/issues/21114). [#21117](https://github.com/ClickHouse/ClickHouse/pull/21117) ([Amos Bird](https://github.com/amosbird)).
|
||||
* fix default_replica_path and default_replica_name values are useless on Replicated(*)MergeTree engine when the engine needs specify other parameters. [#21060](https://github.com/ClickHouse/ClickHouse/pull/21060) ([mxzlxy](https://github.com/mxzlxy)).
|
||||
* Out of bound memory access was possible when formatting specifically crafted out of range value of type `DateTime64`. This closes [#20494](https://github.com/ClickHouse/ClickHouse/issues/20494). This closes [#20543](https://github.com/ClickHouse/ClickHouse/issues/20543). [#21023](https://github.com/ClickHouse/ClickHouse/pull/21023) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Block parallel insertions into storage join. [#21009](https://github.com/ClickHouse/ClickHouse/pull/21009) ([vdimir](https://github.com/vdimir)).
|
||||
* Fixed behaviour, when `ALTER MODIFY COLUMN` created mutation, that will knowingly fail. [#21007](https://github.com/ClickHouse/ClickHouse/pull/21007) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Closes [#9969](https://github.com/ClickHouse/ClickHouse/issues/9969). Fixed Brotli http compression error, which reproduced for large data sizes, slightly complicated structure and with json output format. Update Brotli to the latest version to include the "fix rare access to uninitialized data in ring-buffer". [#20991](https://github.com/ClickHouse/ClickHouse/pull/20991) ([Kseniia Sumarokova](https://github.com/kssenii)).
|
||||
* Fix 'Empty task was returned from async task queue' on query cancellation. [#20881](https://github.com/ClickHouse/ClickHouse/pull/20881) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* `USE database;` query did not work when using MySQL 5.7 client to connect to ClickHouse server, it's fixed. Fixes [#18926](https://github.com/ClickHouse/ClickHouse/issues/18926). [#20878](https://github.com/ClickHouse/ClickHouse/pull/20878) ([tavplubix](https://github.com/tavplubix)).
|
||||
* Fix usage of `-Distinct` combinator with `-State` combinator in aggregate functions. [#20866](https://github.com/ClickHouse/ClickHouse/pull/20866) ([Anton Popov](https://github.com/CurtizJ)).
|
||||
* Fix subquery with union distinct and limit clause. close [#20597](https://github.com/ClickHouse/ClickHouse/issues/20597). [#20610](https://github.com/ClickHouse/ClickHouse/pull/20610) ([flynn](https://github.com/ucasFL)).
|
||||
* Fixed inconsistent behavior of dictionary in case of queries where we look for absent keys in dictionary. [#20578](https://github.com/ClickHouse/ClickHouse/pull/20578) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
|
||||
* Fix the number of threads for scalar subqueries and subqueries for index (after [#19007](https://github.com/ClickHouse/ClickHouse/issues/19007) single thread was always used). Fixes [#20457](https://github.com/ClickHouse/ClickHouse/issues/20457), [#20512](https://github.com/ClickHouse/ClickHouse/issues/20512). [#20550](https://github.com/ClickHouse/ClickHouse/pull/20550) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
|
||||
* Fix crash which could happen if unknown packet was received from remove query (was introduced in [#17868](https://github.com/ClickHouse/ClickHouse/issues/17868)). [#20547](https://github.com/ClickHouse/ClickHouse/pull/20547) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Add proper checks while parsing directory names for async INSERT (fixes SIGSEGV). [#20498](https://github.com/ClickHouse/ClickHouse/pull/20498) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix function `transform` does not work properly for floating point keys. Closes [#20460](https://github.com/ClickHouse/ClickHouse/issues/20460). [#20479](https://github.com/ClickHouse/ClickHouse/pull/20479) ([flynn](https://github.com/ucasFL)).
|
||||
* Fix infinite loop when propagating WITH aliases to subqueries. This fixes [#20388](https://github.com/ClickHouse/ClickHouse/issues/20388). [#20476](https://github.com/ClickHouse/ClickHouse/pull/20476) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Fix abnormal server termination when http client goes away. [#20464](https://github.com/ClickHouse/ClickHouse/pull/20464) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix `LOGICAL_ERROR` for `join_use_nulls=1` when JOIN contains const from SELECT. [#20461](https://github.com/ClickHouse/ClickHouse/pull/20461) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Check if table function `view` is used in expression list and throw an error. This fixes [#20342](https://github.com/ClickHouse/ClickHouse/issues/20342). [#20350](https://github.com/ClickHouse/ClickHouse/pull/20350) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Avoid invalid dereference in RANGE_HASHED() dictionary. [#20345](https://github.com/ClickHouse/ClickHouse/pull/20345) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix null dereference with `join_use_nulls=1`. [#20344](https://github.com/ClickHouse/ClickHouse/pull/20344) ([Azat Khuzhin](https://github.com/azat)).
|
||||
* Fix incorrect result of binary operations between two constant decimals of different scale. Fixes [#20283](https://github.com/ClickHouse/ClickHouse/issues/20283). [#20339](https://github.com/ClickHouse/ClickHouse/pull/20339) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Fix too often retries of failed background tasks for `ReplicatedMergeTree` table engines family. This could lead to too verbose logging and increased CPU load. Fixes [#20203](https://github.com/ClickHouse/ClickHouse/issues/20203). [#20335](https://github.com/ClickHouse/ClickHouse/pull/20335) ([alesapin](https://github.com/alesapin)).
|
||||
* Restrict to `DROP` or `RENAME` version column of `*CollapsingMergeTree` and `ReplacingMergeTree` table engines. [#20300](https://github.com/ClickHouse/ClickHouse/pull/20300) ([alesapin](https://github.com/alesapin)).
|
||||
* Fixed the behavior when in case of broken JSON we tried to read the whole file into memory which leads to exception from the allocator. Fixes [#19719](https://github.com/ClickHouse/ClickHouse/issues/19719). [#20286](https://github.com/ClickHouse/ClickHouse/pull/20286) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
|
||||
* Fix exception during vertical merge for `MergeTree` table engines family which don't allow to perform vertical merges. Fixes [#20259](https://github.com/ClickHouse/ClickHouse/issues/20259). [#20279](https://github.com/ClickHouse/ClickHouse/pull/20279) ([alesapin](https://github.com/alesapin)).
|
||||
* Fix rare server crash on config reload during the shutdown. Fixes [#19689](https://github.com/ClickHouse/ClickHouse/issues/19689). [#20224](https://github.com/ClickHouse/ClickHouse/pull/20224) ([alesapin](https://github.com/alesapin)).
|
||||
* Fix CTE when using in INSERT SELECT. This fixes [#20187](https://github.com/ClickHouse/ClickHouse/issues/20187), fixes [#20195](https://github.com/ClickHouse/ClickHouse/issues/20195). [#20211](https://github.com/ClickHouse/ClickHouse/pull/20211) ([Amos Bird](https://github.com/amosbird)).
|
||||
* Fixes [#19314](https://github.com/ClickHouse/ClickHouse/issues/19314). [#20156](https://github.com/ClickHouse/ClickHouse/pull/20156) ([Ivan](https://github.com/abyss7)).
|
||||
* fix toMinute function to handle special timezone correctly. [#20149](https://github.com/ClickHouse/ClickHouse/pull/20149) ([keenwolf](https://github.com/keen-wolf)).
|
||||
* Fix server crash after query with `if` function with `Tuple` type of then/else branches result. `Tuple` type must contain `Array` or another complex type. Fixes [#18356](https://github.com/ClickHouse/ClickHouse/issues/18356). [#20133](https://github.com/ClickHouse/ClickHouse/pull/20133) ([alesapin](https://github.com/alesapin)).
|
||||
* The `MongoDB` table engine now establishes connection only when it's going to read data. `ATTACH TABLE` won't try to connect anymore. [#20110](https://github.com/ClickHouse/ClickHouse/pull/20110) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Bugfix in StorageJoin. [#20079](https://github.com/ClickHouse/ClickHouse/pull/20079) ([vdimir](https://github.com/vdimir)).
|
||||
* Fix the case when calculating modulo of division of negative number by small divisor, the resulting data type was not large enough to accomodate the negative result. This closes [#20052](https://github.com/ClickHouse/ClickHouse/issues/20052). [#20067](https://github.com/ClickHouse/ClickHouse/pull/20067) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* MaterializeMySQL: Fix replication for statements that update several tables. [#20066](https://github.com/ClickHouse/ClickHouse/pull/20066) ([Håvard Kvålen](https://github.com/havardk)).
|
||||
* Prevent "Connection refused" in docker during initialization script execution. [#20012](https://github.com/ClickHouse/ClickHouse/pull/20012) ([filimonov](https://github.com/filimonov)).
|
||||
* `EmbeddedRocksDB` is an experimental storage. Fix the issue with lack of proper type checking. Simplified code. This closes [#19967](https://github.com/ClickHouse/ClickHouse/issues/19967). [#19972](https://github.com/ClickHouse/ClickHouse/pull/19972) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fix a segfault in function `fromModifiedJulianDay` when the argument type is `Nullable(T)` for any integral types other than Int32. [#19959](https://github.com/ClickHouse/ClickHouse/pull/19959) ([PHO](https://github.com/depressed-pho)).
|
||||
* BloomFilter index crash fix. Fixes [#19757](https://github.com/ClickHouse/ClickHouse/issues/19757). [#19884](https://github.com/ClickHouse/ClickHouse/pull/19884) ([Maksim Kita](https://github.com/kitaisreal)).
|
||||
* Deadlock was possible if system.text_log is enabled. This fixes [#19874](https://github.com/ClickHouse/ClickHouse/issues/19874). [#19875](https://github.com/ClickHouse/ClickHouse/pull/19875) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fix starting the server with tables having default expressions containing dictGet(). Allow getting return type of dictGet() without loading dictionary. [#19805](https://github.com/ClickHouse/ClickHouse/pull/19805) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
* Fix clickhouse-client abort exception while executing only `select`. [#19790](https://github.com/ClickHouse/ClickHouse/pull/19790) ([taiyang-li](https://github.com/taiyang-li)).
|
||||
* Fix a bug that moving pieces to destination table may failed in case of launching multiple clickhouse-copiers. [#19743](https://github.com/ClickHouse/ClickHouse/pull/19743) ([madianjun](https://github.com/mdianjun)).
|
||||
* Background thread which executes `ON CLUSTER` queries might hang waiting for dropped replicated table to do something. It's fixed. [#19684](https://github.com/ClickHouse/ClickHouse/pull/19684) ([yiguolei](https://github.com/yiguolei)).
|
||||
|
||||
#### Build/Testing/Packaging Improvement
|
||||
|
||||
* Allow to build ClickHouse with AVX-2 enabled globally. It gives slight performance benefits on modern CPUs. Not recommended for production and will not be supported as official build for now. [#20180](https://github.com/ClickHouse/ClickHouse/pull/20180) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Fix some of the issues found by Coverity. See [#19964](https://github.com/ClickHouse/ClickHouse/issues/19964). [#20010](https://github.com/ClickHouse/ClickHouse/pull/20010) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Allow to start up with modified binary under gdb. In previous version if you set up breakpoint in gdb before start, server will refuse to start up due to failed integrity check. [#21258](https://github.com/ClickHouse/ClickHouse/pull/21258) ([alexey-milovidov](https://github.com/alexey-milovidov)).
|
||||
* Add a test for different compression methods in Kafka. [#21111](https://github.com/ClickHouse/ClickHouse/pull/21111) ([filimonov](https://github.com/filimonov)).
|
||||
* Fixed port clash from test_storage_kerberized_hdfs test. [#19974](https://github.com/ClickHouse/ClickHouse/pull/19974) ([Ilya Yatsishin](https://github.com/qoega)).
|
||||
* Print `stdout` and `stderr` to log when failed to start docker in integration tests. Before this PR there was a very short error message in this case which didn't help to investigate the problems. [#20631](https://github.com/ClickHouse/ClickHouse/pull/20631) ([Vitaly Baranov](https://github.com/vitlibar)).
|
||||
|
||||
|
||||
## ClickHouse release 21.2
|
||||
|
||||
### ClickHouse release v21.2.2.8-stable, 2021-02-07
|
||||
|
@ -155,7 +155,6 @@ option(ENABLE_TESTS "Provide unit_test_dbms target with Google.Test unit tests"
|
||||
|
||||
if (OS_LINUX AND NOT UNBUNDLED AND MAKE_STATIC_LIBRARIES AND NOT SPLIT_SHARED_LIBRARIES AND CMAKE_VERSION VERSION_GREATER "3.9.0")
|
||||
# Only for Linux, x86_64.
|
||||
# Implies ${ENABLE_FASTMEMCPY}
|
||||
option(GLIBC_COMPATIBILITY "Enable compatibility with older glibc libraries." ON)
|
||||
elseif(GLIBC_COMPATIBILITY)
|
||||
message (${RECONFIGURE_MESSAGE_LEVEL} "Glibc compatibility cannot be enabled in current configuration")
|
||||
@ -241,9 +240,7 @@ else()
|
||||
message(STATUS "Disabling compiler -pipe option (have only ${AVAILABLE_PHYSICAL_MEMORY} mb of memory)")
|
||||
endif()
|
||||
|
||||
if(NOT DISABLE_CPU_OPTIMIZE)
|
||||
include(cmake/cpu_features.cmake)
|
||||
endif()
|
||||
include(cmake/cpu_features.cmake)
|
||||
|
||||
option(ARCH_NATIVE "Add -march=native compiler flag")
|
||||
|
||||
@ -536,7 +533,7 @@ macro (add_executable target)
|
||||
# explicitly acquire and interpose malloc symbols by clickhouse_malloc
|
||||
# if GLIBC_COMPATIBILITY is ON and ENABLE_THINLTO is on than provide memcpy symbol explicitly to neutrialize thinlto's libcall generation.
|
||||
if (GLIBC_COMPATIBILITY AND ENABLE_THINLTO)
|
||||
_add_executable (${ARGV} $<TARGET_OBJECTS:clickhouse_malloc> $<TARGET_OBJECTS:clickhouse_memcpy>)
|
||||
_add_executable (${ARGV} $<TARGET_OBJECTS:clickhouse_malloc> $<TARGET_OBJECTS:memcpy>)
|
||||
else ()
|
||||
_add_executable (${ARGV} $<TARGET_OBJECTS:clickhouse_malloc>)
|
||||
endif ()
|
||||
|
@ -74,7 +74,6 @@ target_link_libraries (common
|
||||
${CITYHASH_LIBRARIES}
|
||||
boost::headers_only
|
||||
boost::system
|
||||
FastMemcpy
|
||||
Poco::Net
|
||||
Poco::Net::SSL
|
||||
Poco::Util
|
||||
|
@ -76,6 +76,16 @@
|
||||
# endif
|
||||
#endif
|
||||
|
||||
#if !defined(UNDEFINED_BEHAVIOR_SANITIZER)
|
||||
# if defined(__has_feature)
|
||||
# if __has_feature(undefined_behavior_sanitizer)
|
||||
# define UNDEFINED_BEHAVIOR_SANITIZER 1
|
||||
# endif
|
||||
# elif defined(__UNDEFINED_BEHAVIOR_SANITIZER__)
|
||||
# define UNDEFINED_BEHAVIOR_SANITIZER 1
|
||||
# endif
|
||||
#endif
|
||||
|
||||
#if defined(ADDRESS_SANITIZER)
|
||||
# define BOOST_USE_ASAN 1
|
||||
# define BOOST_USE_UCONTEXT 1
|
||||
|
@ -11,7 +11,7 @@ set(PLATFORM_LIBS ${CMAKE_DL_LIBS})
|
||||
target_link_libraries (date_lut2 PRIVATE common ${PLATFORM_LIBS})
|
||||
target_link_libraries (date_lut3 PRIVATE common ${PLATFORM_LIBS})
|
||||
target_link_libraries (date_lut_default_timezone PRIVATE common ${PLATFORM_LIBS})
|
||||
target_link_libraries (local_date_time_comparison PRIVATE common)
|
||||
target_link_libraries (local_date_time_comparison PRIVATE common ${PLATFORM_LIBS})
|
||||
target_link_libraries (realloc-perf PRIVATE common)
|
||||
add_check(local_date_time_comparison)
|
||||
|
||||
|
@ -1,5 +1,8 @@
|
||||
if (GLIBC_COMPATIBILITY)
|
||||
set (ENABLE_FASTMEMCPY ON)
|
||||
add_subdirectory(memcpy)
|
||||
if(TARGET memcpy)
|
||||
set(MEMCPY_LIBRARY memcpy)
|
||||
endif()
|
||||
|
||||
enable_language(ASM)
|
||||
include(CheckIncludeFile)
|
||||
@ -27,13 +30,6 @@ if (GLIBC_COMPATIBILITY)
|
||||
list(APPEND glibc_compatibility_sources musl/getentropy.c)
|
||||
endif()
|
||||
|
||||
if (NOT ARCH_ARM)
|
||||
# clickhouse_memcpy don't support ARCH_ARM, see https://github.com/ClickHouse/ClickHouse/issues/18951
|
||||
add_library (clickhouse_memcpy OBJECT
|
||||
${ClickHouse_SOURCE_DIR}/contrib/FastMemcpy/memcpy_wrapper.c
|
||||
)
|
||||
endif()
|
||||
|
||||
# Need to omit frame pointers to match the performance of glibc
|
||||
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fomit-frame-pointer")
|
||||
|
||||
@ -51,15 +47,16 @@ if (GLIBC_COMPATIBILITY)
|
||||
target_compile_options(glibc-compatibility PRIVATE -fPIC)
|
||||
endif ()
|
||||
|
||||
target_link_libraries(global-libs INTERFACE glibc-compatibility)
|
||||
target_link_libraries(global-libs INTERFACE glibc-compatibility ${MEMCPY_LIBRARY})
|
||||
|
||||
install(
|
||||
TARGETS glibc-compatibility
|
||||
TARGETS glibc-compatibility ${MEMCPY_LIBRARY}
|
||||
EXPORT global
|
||||
ARCHIVE DESTINATION lib
|
||||
)
|
||||
|
||||
message (STATUS "Some symbols from glibc will be replaced for compatibility")
|
||||
|
||||
elseif (YANDEX_OFFICIAL_BUILD)
|
||||
message (WARNING "Option GLIBC_COMPATIBILITY must be turned on for production builds.")
|
||||
endif ()
|
||||
|
8
base/glibc-compatibility/memcpy/CMakeLists.txt
Normal file
8
base/glibc-compatibility/memcpy/CMakeLists.txt
Normal file
@ -0,0 +1,8 @@
|
||||
if (ARCH_AMD64)
|
||||
add_library(memcpy STATIC memcpy.cpp)
|
||||
|
||||
# We allow to include memcpy.h from user code for better inlining.
|
||||
target_include_directories(memcpy PUBLIC $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}>)
|
||||
|
||||
target_compile_options(memcpy PRIVATE -fno-builtin-memcpy)
|
||||
endif ()
|
6
base/glibc-compatibility/memcpy/memcpy.cpp
Normal file
6
base/glibc-compatibility/memcpy/memcpy.cpp
Normal file
@ -0,0 +1,6 @@
|
||||
#include "memcpy.h"
|
||||
|
||||
extern "C" void * memcpy(void * __restrict dst, const void * __restrict src, size_t size)
|
||||
{
|
||||
return inline_memcpy(dst, src, size);
|
||||
}
|
217
base/glibc-compatibility/memcpy/memcpy.h
Normal file
217
base/glibc-compatibility/memcpy/memcpy.h
Normal file
@ -0,0 +1,217 @@
|
||||
#include <cstddef>
|
||||
|
||||
#include <emmintrin.h>
|
||||
|
||||
|
||||
/** Custom memcpy implementation for ClickHouse.
|
||||
* It has the following benefits over using glibc's implementation:
|
||||
* 1. Avoiding dependency on specific version of glibc's symbol, like memcpy@@GLIBC_2.14 for portability.
|
||||
* 2. Avoiding indirect call via PLT due to shared linking, that can be less efficient.
|
||||
* 3. It's possible to include this header and call inline_memcpy directly for better inlining or interprocedural analysis.
|
||||
* 4. Better results on our performance tests on current CPUs: up to 25% on some queries and up to 0.7%..1% in average across all queries.
|
||||
*
|
||||
* Writing our own memcpy is extremely difficult for the following reasons:
|
||||
* 1. The optimal variant depends on the specific CPU model.
|
||||
* 2. The optimal variant depends on the distribution of size arguments.
|
||||
* 3. It depends on the number of threads copying data concurrently.
|
||||
* 4. It also depends on how the calling code is using the copied data and how the different memcpy calls are related to each other.
|
||||
* Due to vast range of scenarios it makes proper testing especially difficult.
|
||||
* When writing our own memcpy there is a risk to overoptimize it
|
||||
* on non-representative microbenchmarks while making real-world use cases actually worse.
|
||||
*
|
||||
* Most of the benchmarks for memcpy on the internet are wrong.
|
||||
*
|
||||
* Let's look at the details:
|
||||
*
|
||||
* For small size, the order of branches in code is important.
|
||||
* There are variants with specific order of branches (like here or in glibc)
|
||||
* or with jump table (in asm code see example from Cosmopolitan libc:
|
||||
* https://github.com/jart/cosmopolitan/blob/de09bec215675e9b0beb722df89c6f794da74f3f/libc/nexgen32e/memcpy.S#L61)
|
||||
* or with Duff device in C (see https://github.com/skywind3000/FastMemcpy/)
|
||||
*
|
||||
* It's also important how to copy uneven sizes.
|
||||
* Almost every implementation, including this, is using two overlapping movs.
|
||||
*
|
||||
* It is important to disable -ftree-loop-distribute-patterns when compiling memcpy implementation,
|
||||
* otherwise the compiler can replace internal loops to a call to memcpy that will lead to infinite recursion.
|
||||
*
|
||||
* For larger sizes it's important to choose the instructions used:
|
||||
* - SSE or AVX or AVX-512;
|
||||
* - rep movsb;
|
||||
* Performance will depend on the size threshold, on the CPU model, on the "erms" flag
|
||||
* ("Enhansed Rep MovS" - it indicates that performance of "rep movsb" is decent for large sizes)
|
||||
* https://stackoverflow.com/questions/43343231/enhanced-rep-movsb-for-memcpy
|
||||
*
|
||||
* Using AVX-512 can be bad due to throttling.
|
||||
* Using AVX can be bad if most code is using SSE due to switching penalty
|
||||
* (it also depends on the usage of "vzeroupper" instruction).
|
||||
* But in some cases AVX gives a win.
|
||||
*
|
||||
* It also depends on how many times the loop will be unrolled.
|
||||
* We are unrolling the loop 8 times (by the number of available registers), but it not always the best.
|
||||
*
|
||||
* It also depends on the usage of aligned or unaligned loads/stores.
|
||||
* We are using unaligned loads and aligned stores.
|
||||
*
|
||||
* It also depends on the usage of prefetch instructions. It makes sense on some Intel CPUs but can slow down performance on AMD.
|
||||
* Setting up correct offset for prefetching is non-obvious.
|
||||
*
|
||||
* Non-temporary (cache bypassing) stores can be used for very large sizes (more than a half of L3 cache).
|
||||
* But the exact threshold is unclear - when doing memcpy from multiple threads the optimal threshold can be lower,
|
||||
* because L3 cache is shared (and L2 cache is partially shared).
|
||||
*
|
||||
* Very large size of memcpy typically indicates suboptimal (not cache friendly) algorithms in code or unrealistic scenarios,
|
||||
* so we don't pay attention to using non-temporary stores.
|
||||
*
|
||||
* On recent Intel CPUs, the presence of "erms" makes "rep movsb" the most benefitial,
|
||||
* even comparing to non-temporary aligned unrolled stores even with the most wide registers.
|
||||
*
|
||||
* memcpy can be written in asm, C or C++. The latter can also use inline asm.
|
||||
* The asm implementation can be better to make sure that compiler won't make the code worse,
|
||||
* to ensure the order of branches, the code layout, the usage of all required registers.
|
||||
* But if it is located in separate translation unit, inlining will not be possible
|
||||
* (inline asm can be used to overcome this limitation).
|
||||
* Sometimes C or C++ code can be further optimized by compiler.
|
||||
* For example, clang is capable replacing SSE intrinsics to AVX code if -mavx is used.
|
||||
*
|
||||
* Please note that compiler can replace plain code to memcpy and vice versa.
|
||||
* - memcpy with compile-time known small size is replaced to simple instructions without a call to memcpy;
|
||||
* it is controlled by -fbuiltin-memcpy and can be manually ensured by calling __builtin_memcpy.
|
||||
* This is often used to implement unaligned load/store without undefined behaviour in C++.
|
||||
* - a loop with copying bytes can be recognized and replaced by a call to memcpy;
|
||||
* it is controlled by -ftree-loop-distribute-patterns.
|
||||
* - also note that a loop with copying bytes can be unrolled, peeled and vectorized that will give you
|
||||
* inline code somewhat similar to a decent implementation of memcpy.
|
||||
*
|
||||
* This description is up to date as of Mar 2021.
|
||||
*
|
||||
* How to test the memcpy implementation for performance:
|
||||
* 1. Test on real production workload.
|
||||
* 2. For synthetic test, see utils/memcpy-bench, but make sure you will do the best to exhaust the wide range of scenarios.
|
||||
*
|
||||
* TODO: Add self-tuning memcpy with bayesian bandits algorithm for large sizes.
|
||||
* See https://habr.com/en/company/yandex/blog/457612/
|
||||
*/
|
||||
|
||||
|
||||
static inline void * inline_memcpy(void * __restrict dst_, const void * __restrict src_, size_t size)
|
||||
{
|
||||
/// We will use pointer arithmetic, so char pointer will be used.
|
||||
/// Note that __restrict makes sense (otherwise compiler will reload data from memory
|
||||
/// instead of using the value of registers due to possible aliasing).
|
||||
char * __restrict dst = reinterpret_cast<char * __restrict>(dst_);
|
||||
const char * __restrict src = reinterpret_cast<const char * __restrict>(src_);
|
||||
|
||||
/// Standard memcpy returns the original value of dst. It is rarely used but we have to do it.
|
||||
/// If you use memcpy with small but non-constant sizes, you can call inline_memcpy directly
|
||||
/// for inlining and removing this single instruction.
|
||||
void * ret = dst;
|
||||
|
||||
tail:
|
||||
/// Small sizes and tails after the loop for large sizes.
|
||||
/// The order of branches is important but in fact the optimal order depends on the distribution of sizes in your application.
|
||||
/// This order of branches is from the disassembly of glibc's code.
|
||||
/// We copy chunks of possibly uneven size with two overlapping movs.
|
||||
/// Example: to copy 5 bytes [0, 1, 2, 3, 4] we will copy tail [1, 2, 3, 4] first and then head [0, 1, 2, 3].
|
||||
if (size <= 16)
|
||||
{
|
||||
if (size >= 8)
|
||||
{
|
||||
/// Chunks of 8..16 bytes.
|
||||
__builtin_memcpy(dst + size - 8, src + size - 8, 8);
|
||||
__builtin_memcpy(dst, src, 8);
|
||||
}
|
||||
else if (size >= 4)
|
||||
{
|
||||
/// Chunks of 4..7 bytes.
|
||||
__builtin_memcpy(dst + size - 4, src + size - 4, 4);
|
||||
__builtin_memcpy(dst, src, 4);
|
||||
}
|
||||
else if (size >= 2)
|
||||
{
|
||||
/// Chunks of 2..3 bytes.
|
||||
__builtin_memcpy(dst + size - 2, src + size - 2, 2);
|
||||
__builtin_memcpy(dst, src, 2);
|
||||
}
|
||||
else if (size >= 1)
|
||||
{
|
||||
/// A single byte.
|
||||
*dst = *src;
|
||||
}
|
||||
/// No bytes remaining.
|
||||
}
|
||||
else
|
||||
{
|
||||
/// Medium and large sizes.
|
||||
if (size <= 128)
|
||||
{
|
||||
/// Medium size, not enough for full loop unrolling.
|
||||
|
||||
/// We will copy the last 16 bytes.
|
||||
_mm_storeu_si128(reinterpret_cast<__m128i *>(dst + size - 16), _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + size - 16)));
|
||||
|
||||
/// Then we will copy every 16 bytes from the beginning in a loop.
|
||||
/// The last loop iteration will possibly overwrite some part of already copied last 16 bytes.
|
||||
/// This is Ok, similar to the code for small sizes above.
|
||||
while (size > 16)
|
||||
{
|
||||
_mm_storeu_si128(reinterpret_cast<__m128i *>(dst), _mm_loadu_si128(reinterpret_cast<const __m128i *>(src)));
|
||||
dst += 16;
|
||||
src += 16;
|
||||
size -= 16;
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
/// Large size with fully unrolled loop.
|
||||
|
||||
/// Align destination to 16 bytes boundary.
|
||||
size_t padding = (16 - (reinterpret_cast<size_t>(dst) & 15)) & 15;
|
||||
|
||||
/// If not aligned - we will copy first 16 bytes with unaligned stores.
|
||||
if (padding > 0)
|
||||
{
|
||||
__m128i head = _mm_loadu_si128(reinterpret_cast<const __m128i*>(src));
|
||||
_mm_storeu_si128(reinterpret_cast<__m128i*>(dst), head);
|
||||
dst += padding;
|
||||
src += padding;
|
||||
size -= padding;
|
||||
}
|
||||
|
||||
/// Aligned unrolled copy. We will use all available SSE registers.
|
||||
/// It's not possible to have both src and dst aligned.
|
||||
/// So, we will use aligned stores and unaligned loads.
|
||||
__m128i c0, c1, c2, c3, c4, c5, c6, c7;
|
||||
|
||||
while (size >= 128)
|
||||
{
|
||||
c0 = _mm_loadu_si128(reinterpret_cast<const __m128i*>(src) + 0);
|
||||
c1 = _mm_loadu_si128(reinterpret_cast<const __m128i*>(src) + 1);
|
||||
c2 = _mm_loadu_si128(reinterpret_cast<const __m128i*>(src) + 2);
|
||||
c3 = _mm_loadu_si128(reinterpret_cast<const __m128i*>(src) + 3);
|
||||
c4 = _mm_loadu_si128(reinterpret_cast<const __m128i*>(src) + 4);
|
||||
c5 = _mm_loadu_si128(reinterpret_cast<const __m128i*>(src) + 5);
|
||||
c6 = _mm_loadu_si128(reinterpret_cast<const __m128i*>(src) + 6);
|
||||
c7 = _mm_loadu_si128(reinterpret_cast<const __m128i*>(src) + 7);
|
||||
src += 128;
|
||||
_mm_store_si128((reinterpret_cast<__m128i*>(dst) + 0), c0);
|
||||
_mm_store_si128((reinterpret_cast<__m128i*>(dst) + 1), c1);
|
||||
_mm_store_si128((reinterpret_cast<__m128i*>(dst) + 2), c2);
|
||||
_mm_store_si128((reinterpret_cast<__m128i*>(dst) + 3), c3);
|
||||
_mm_store_si128((reinterpret_cast<__m128i*>(dst) + 4), c4);
|
||||
_mm_store_si128((reinterpret_cast<__m128i*>(dst) + 5), c5);
|
||||
_mm_store_si128((reinterpret_cast<__m128i*>(dst) + 6), c6);
|
||||
_mm_store_si128((reinterpret_cast<__m128i*>(dst) + 7), c7);
|
||||
dst += 128;
|
||||
|
||||
size -= 128;
|
||||
}
|
||||
|
||||
/// The latest remaining 0..127 bytes will be processed as usual.
|
||||
goto tail;
|
||||
}
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
1
contrib/CMakeLists.txt
vendored
1
contrib/CMakeLists.txt
vendored
@ -38,7 +38,6 @@ add_subdirectory (boost-cmake)
|
||||
add_subdirectory (cctz-cmake)
|
||||
add_subdirectory (consistent-hashing)
|
||||
add_subdirectory (dragonbox-cmake)
|
||||
add_subdirectory (FastMemcpy)
|
||||
add_subdirectory (hyperscan-cmake)
|
||||
add_subdirectory (jemalloc-cmake)
|
||||
add_subdirectory (libcpuid-cmake)
|
||||
|
@ -1,28 +0,0 @@
|
||||
option (ENABLE_FASTMEMCPY "Enable FastMemcpy library (only internal)" ${ENABLE_LIBRARIES})
|
||||
|
||||
if (NOT OS_LINUX OR ARCH_AARCH64)
|
||||
set (ENABLE_FASTMEMCPY OFF)
|
||||
endif ()
|
||||
|
||||
if (ENABLE_FASTMEMCPY)
|
||||
set (LIBRARY_DIR ${ClickHouse_SOURCE_DIR}/contrib/FastMemcpy)
|
||||
|
||||
set (SRCS
|
||||
${LIBRARY_DIR}/FastMemcpy.c
|
||||
|
||||
memcpy_wrapper.c
|
||||
)
|
||||
|
||||
add_library (FastMemcpy ${SRCS})
|
||||
target_include_directories (FastMemcpy PUBLIC ${LIBRARY_DIR})
|
||||
|
||||
target_compile_definitions(FastMemcpy PUBLIC USE_FASTMEMCPY=1)
|
||||
|
||||
message (STATUS "Using FastMemcpy")
|
||||
else ()
|
||||
add_library (FastMemcpy INTERFACE)
|
||||
|
||||
target_compile_definitions(FastMemcpy INTERFACE USE_FASTMEMCPY=0)
|
||||
|
||||
message (STATUS "Not using FastMemcpy")
|
||||
endif ()
|
@ -1,220 +0,0 @@
|
||||
//=====================================================================
|
||||
//
|
||||
// FastMemcpy.c - skywind3000@163.com, 2015
|
||||
//
|
||||
// feature:
|
||||
// 50% speed up in avg. vs standard memcpy (tested in vc2012/gcc4.9)
|
||||
//
|
||||
//=====================================================================
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <time.h>
|
||||
|
||||
#if (defined(_WIN32) || defined(WIN32))
|
||||
#include <windows.h>
|
||||
#include <mmsystem.h>
|
||||
#ifdef _MSC_VER
|
||||
#pragma comment(lib, "winmm.lib")
|
||||
#endif
|
||||
#elif defined(__unix)
|
||||
#include <sys/time.h>
|
||||
#include <unistd.h>
|
||||
#else
|
||||
#error it can only be compiled under windows or unix
|
||||
#endif
|
||||
|
||||
#include "FastMemcpy.h"
|
||||
|
||||
unsigned int gettime()
|
||||
{
|
||||
#if (defined(_WIN32) || defined(WIN32))
|
||||
return timeGetTime();
|
||||
#else
|
||||
static struct timezone tz={ 0,0 };
|
||||
struct timeval time;
|
||||
gettimeofday(&time,&tz);
|
||||
return (time.tv_sec * 1000 + time.tv_usec / 1000);
|
||||
#endif
|
||||
}
|
||||
|
||||
void sleepms(unsigned int millisec)
|
||||
{
|
||||
#if defined(_WIN32) || defined(WIN32)
|
||||
Sleep(millisec);
|
||||
#else
|
||||
usleep(millisec * 1000);
|
||||
#endif
|
||||
}
|
||||
|
||||
|
||||
void benchmark(int dstalign, int srcalign, size_t size, int times)
|
||||
{
|
||||
char *DATA1 = (char*)malloc(size + 64);
|
||||
char *DATA2 = (char*)malloc(size + 64);
|
||||
size_t LINEAR1 = ((size_t)DATA1);
|
||||
size_t LINEAR2 = ((size_t)DATA2);
|
||||
char *ALIGN1 = (char*)(((64 - (LINEAR1 & 63)) & 63) + LINEAR1);
|
||||
char *ALIGN2 = (char*)(((64 - (LINEAR2 & 63)) & 63) + LINEAR2);
|
||||
char *dst = (dstalign)? ALIGN1 : (ALIGN1 + 1);
|
||||
char *src = (srcalign)? ALIGN2 : (ALIGN2 + 3);
|
||||
unsigned int t1, t2;
|
||||
int k;
|
||||
|
||||
sleepms(100);
|
||||
t1 = gettime();
|
||||
for (k = times; k > 0; k--) {
|
||||
memcpy(dst, src, size);
|
||||
}
|
||||
t1 = gettime() - t1;
|
||||
sleepms(100);
|
||||
t2 = gettime();
|
||||
for (k = times; k > 0; k--) {
|
||||
memcpy_fast(dst, src, size);
|
||||
}
|
||||
t2 = gettime() - t2;
|
||||
|
||||
free(DATA1);
|
||||
free(DATA2);
|
||||
|
||||
printf("result(dst %s, src %s): memcpy_fast=%dms memcpy=%d ms\n",
|
||||
dstalign? "aligned" : "unalign",
|
||||
srcalign? "aligned" : "unalign", (int)t2, (int)t1);
|
||||
}
|
||||
|
||||
|
||||
void bench(int copysize, int times)
|
||||
{
|
||||
printf("benchmark(size=%d bytes, times=%d):\n", copysize, times);
|
||||
benchmark(1, 1, copysize, times);
|
||||
benchmark(1, 0, copysize, times);
|
||||
benchmark(0, 1, copysize, times);
|
||||
benchmark(0, 0, copysize, times);
|
||||
printf("\n");
|
||||
}
|
||||
|
||||
|
||||
void random_bench(int maxsize, int times)
|
||||
{
|
||||
static char A[11 * 1024 * 1024 + 2];
|
||||
static char B[11 * 1024 * 1024 + 2];
|
||||
static int random_offsets[0x10000];
|
||||
static int random_sizes[0x8000];
|
||||
unsigned int i, p1, p2;
|
||||
unsigned int t1, t2;
|
||||
for (i = 0; i < 0x10000; i++) { // generate random offsets
|
||||
random_offsets[i] = rand() % (10 * 1024 * 1024 + 1);
|
||||
}
|
||||
for (i = 0; i < 0x8000; i++) { // generate random sizes
|
||||
random_sizes[i] = 1 + rand() % maxsize;
|
||||
}
|
||||
sleepms(100);
|
||||
t1 = gettime();
|
||||
for (p1 = 0, p2 = 0, i = 0; i < times; i++) {
|
||||
int offset1 = random_offsets[(p1++) & 0xffff];
|
||||
int offset2 = random_offsets[(p1++) & 0xffff];
|
||||
int size = random_sizes[(p2++) & 0x7fff];
|
||||
memcpy(A + offset1, B + offset2, size);
|
||||
}
|
||||
t1 = gettime() - t1;
|
||||
sleepms(100);
|
||||
t2 = gettime();
|
||||
for (p1 = 0, p2 = 0, i = 0; i < times; i++) {
|
||||
int offset1 = random_offsets[(p1++) & 0xffff];
|
||||
int offset2 = random_offsets[(p1++) & 0xffff];
|
||||
int size = random_sizes[(p2++) & 0x7fff];
|
||||
memcpy_fast(A + offset1, B + offset2, size);
|
||||
}
|
||||
t2 = gettime() - t2;
|
||||
printf("benchmark random access:\n");
|
||||
printf("memcpy_fast=%dms memcpy=%dms\n\n", (int)t2, (int)t1);
|
||||
}
|
||||
|
||||
|
||||
#ifdef _MSC_VER
|
||||
#pragma comment(lib, "winmm.lib")
|
||||
#endif
|
||||
|
||||
int main(void)
|
||||
{
|
||||
bench(32, 0x1000000);
|
||||
bench(64, 0x1000000);
|
||||
bench(512, 0x800000);
|
||||
bench(1024, 0x400000);
|
||||
bench(4096, 0x80000);
|
||||
bench(8192, 0x40000);
|
||||
bench(1024 * 1024 * 1, 0x800);
|
||||
bench(1024 * 1024 * 4, 0x200);
|
||||
bench(1024 * 1024 * 8, 0x100);
|
||||
|
||||
random_bench(2048, 8000000);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
/*
|
||||
benchmark(size=32 bytes, times=16777216):
|
||||
result(dst aligned, src aligned): memcpy_fast=78ms memcpy=260 ms
|
||||
result(dst aligned, src unalign): memcpy_fast=78ms memcpy=250 ms
|
||||
result(dst unalign, src aligned): memcpy_fast=78ms memcpy=266 ms
|
||||
result(dst unalign, src unalign): memcpy_fast=78ms memcpy=234 ms
|
||||
|
||||
benchmark(size=64 bytes, times=16777216):
|
||||
result(dst aligned, src aligned): memcpy_fast=109ms memcpy=281 ms
|
||||
result(dst aligned, src unalign): memcpy_fast=109ms memcpy=328 ms
|
||||
result(dst unalign, src aligned): memcpy_fast=109ms memcpy=343 ms
|
||||
result(dst unalign, src unalign): memcpy_fast=93ms memcpy=344 ms
|
||||
|
||||
benchmark(size=512 bytes, times=8388608):
|
||||
result(dst aligned, src aligned): memcpy_fast=125ms memcpy=218 ms
|
||||
result(dst aligned, src unalign): memcpy_fast=156ms memcpy=484 ms
|
||||
result(dst unalign, src aligned): memcpy_fast=172ms memcpy=546 ms
|
||||
result(dst unalign, src unalign): memcpy_fast=172ms memcpy=515 ms
|
||||
|
||||
benchmark(size=1024 bytes, times=4194304):
|
||||
result(dst aligned, src aligned): memcpy_fast=109ms memcpy=172 ms
|
||||
result(dst aligned, src unalign): memcpy_fast=187ms memcpy=453 ms
|
||||
result(dst unalign, src aligned): memcpy_fast=172ms memcpy=437 ms
|
||||
result(dst unalign, src unalign): memcpy_fast=156ms memcpy=452 ms
|
||||
|
||||
benchmark(size=4096 bytes, times=524288):
|
||||
result(dst aligned, src aligned): memcpy_fast=62ms memcpy=78 ms
|
||||
result(dst aligned, src unalign): memcpy_fast=109ms memcpy=202 ms
|
||||
result(dst unalign, src aligned): memcpy_fast=94ms memcpy=203 ms
|
||||
result(dst unalign, src unalign): memcpy_fast=110ms memcpy=218 ms
|
||||
|
||||
benchmark(size=8192 bytes, times=262144):
|
||||
result(dst aligned, src aligned): memcpy_fast=62ms memcpy=78 ms
|
||||
result(dst aligned, src unalign): memcpy_fast=78ms memcpy=202 ms
|
||||
result(dst unalign, src aligned): memcpy_fast=78ms memcpy=203 ms
|
||||
result(dst unalign, src unalign): memcpy_fast=94ms memcpy=203 ms
|
||||
|
||||
benchmark(size=1048576 bytes, times=2048):
|
||||
result(dst aligned, src aligned): memcpy_fast=203ms memcpy=191 ms
|
||||
result(dst aligned, src unalign): memcpy_fast=219ms memcpy=281 ms
|
||||
result(dst unalign, src aligned): memcpy_fast=218ms memcpy=328 ms
|
||||
result(dst unalign, src unalign): memcpy_fast=218ms memcpy=312 ms
|
||||
|
||||
benchmark(size=4194304 bytes, times=512):
|
||||
result(dst aligned, src aligned): memcpy_fast=312ms memcpy=406 ms
|
||||
result(dst aligned, src unalign): memcpy_fast=296ms memcpy=421 ms
|
||||
result(dst unalign, src aligned): memcpy_fast=312ms memcpy=468 ms
|
||||
result(dst unalign, src unalign): memcpy_fast=297ms memcpy=452 ms
|
||||
|
||||
benchmark(size=8388608 bytes, times=256):
|
||||
result(dst aligned, src aligned): memcpy_fast=281ms memcpy=452 ms
|
||||
result(dst aligned, src unalign): memcpy_fast=280ms memcpy=468 ms
|
||||
result(dst unalign, src aligned): memcpy_fast=298ms memcpy=514 ms
|
||||
result(dst unalign, src unalign): memcpy_fast=344ms memcpy=472 ms
|
||||
|
||||
benchmark random access:
|
||||
memcpy_fast=515ms memcpy=1014ms
|
||||
|
||||
*/
|
||||
|
||||
|
||||
|
||||
|
@ -1,694 +0,0 @@
|
||||
//=====================================================================
|
||||
//
|
||||
// FastMemcpy.c - skywind3000@163.com, 2015
|
||||
//
|
||||
// feature:
|
||||
// 50% speed up in avg. vs standard memcpy (tested in vc2012/gcc5.1)
|
||||
//
|
||||
//=====================================================================
|
||||
#ifndef __FAST_MEMCPY_H__
|
||||
#define __FAST_MEMCPY_H__
|
||||
|
||||
#include <stddef.h>
|
||||
#include <stdint.h>
|
||||
#include <emmintrin.h>
|
||||
|
||||
|
||||
//---------------------------------------------------------------------
|
||||
// force inline for compilers
|
||||
//---------------------------------------------------------------------
|
||||
#ifndef INLINE
|
||||
#ifdef __GNUC__
|
||||
#if (__GNUC__ > 3) || ((__GNUC__ == 3) && (__GNUC_MINOR__ >= 1))
|
||||
#define INLINE __inline__ __attribute__((always_inline))
|
||||
#else
|
||||
#define INLINE __inline__
|
||||
#endif
|
||||
#elif defined(_MSC_VER)
|
||||
#define INLINE __forceinline
|
||||
#elif (defined(__BORLANDC__) || defined(__WATCOMC__))
|
||||
#define INLINE __inline
|
||||
#else
|
||||
#define INLINE
|
||||
#endif
|
||||
#endif
|
||||
|
||||
typedef __attribute__((__aligned__(1))) uint16_t uint16_unaligned_t;
|
||||
typedef __attribute__((__aligned__(1))) uint32_t uint32_unaligned_t;
|
||||
typedef __attribute__((__aligned__(1))) uint64_t uint64_unaligned_t;
|
||||
|
||||
//---------------------------------------------------------------------
|
||||
// fast copy for different sizes
|
||||
//---------------------------------------------------------------------
|
||||
static INLINE void memcpy_sse2_16(void *dst, const void *src) {
|
||||
__m128i m0 = _mm_loadu_si128(((const __m128i*)src) + 0);
|
||||
_mm_storeu_si128(((__m128i*)dst) + 0, m0);
|
||||
}
|
||||
|
||||
static INLINE void memcpy_sse2_32(void *dst, const void *src) {
|
||||
__m128i m0 = _mm_loadu_si128(((const __m128i*)src) + 0);
|
||||
__m128i m1 = _mm_loadu_si128(((const __m128i*)src) + 1);
|
||||
_mm_storeu_si128(((__m128i*)dst) + 0, m0);
|
||||
_mm_storeu_si128(((__m128i*)dst) + 1, m1);
|
||||
}
|
||||
|
||||
static INLINE void memcpy_sse2_64(void *dst, const void *src) {
|
||||
__m128i m0 = _mm_loadu_si128(((const __m128i*)src) + 0);
|
||||
__m128i m1 = _mm_loadu_si128(((const __m128i*)src) + 1);
|
||||
__m128i m2 = _mm_loadu_si128(((const __m128i*)src) + 2);
|
||||
__m128i m3 = _mm_loadu_si128(((const __m128i*)src) + 3);
|
||||
_mm_storeu_si128(((__m128i*)dst) + 0, m0);
|
||||
_mm_storeu_si128(((__m128i*)dst) + 1, m1);
|
||||
_mm_storeu_si128(((__m128i*)dst) + 2, m2);
|
||||
_mm_storeu_si128(((__m128i*)dst) + 3, m3);
|
||||
}
|
||||
|
||||
static INLINE void memcpy_sse2_128(void *dst, const void *src) {
|
||||
__m128i m0 = _mm_loadu_si128(((const __m128i*)src) + 0);
|
||||
__m128i m1 = _mm_loadu_si128(((const __m128i*)src) + 1);
|
||||
__m128i m2 = _mm_loadu_si128(((const __m128i*)src) + 2);
|
||||
__m128i m3 = _mm_loadu_si128(((const __m128i*)src) + 3);
|
||||
__m128i m4 = _mm_loadu_si128(((const __m128i*)src) + 4);
|
||||
__m128i m5 = _mm_loadu_si128(((const __m128i*)src) + 5);
|
||||
__m128i m6 = _mm_loadu_si128(((const __m128i*)src) + 6);
|
||||
__m128i m7 = _mm_loadu_si128(((const __m128i*)src) + 7);
|
||||
_mm_storeu_si128(((__m128i*)dst) + 0, m0);
|
||||
_mm_storeu_si128(((__m128i*)dst) + 1, m1);
|
||||
_mm_storeu_si128(((__m128i*)dst) + 2, m2);
|
||||
_mm_storeu_si128(((__m128i*)dst) + 3, m3);
|
||||
_mm_storeu_si128(((__m128i*)dst) + 4, m4);
|
||||
_mm_storeu_si128(((__m128i*)dst) + 5, m5);
|
||||
_mm_storeu_si128(((__m128i*)dst) + 6, m6);
|
||||
_mm_storeu_si128(((__m128i*)dst) + 7, m7);
|
||||
}
|
||||
|
||||
|
||||
//---------------------------------------------------------------------
|
||||
// tiny memory copy with jump table optimized
|
||||
//---------------------------------------------------------------------
|
||||
/// Attribute is used to avoid an error with undefined behaviour sanitizer
|
||||
/// ../contrib/FastMemcpy/FastMemcpy.h:91:56: runtime error: applying zero offset to null pointer
|
||||
/// Found by 01307_orc_output_format.sh, cause - ORCBlockInputFormat and external ORC library.
|
||||
__attribute__((__no_sanitize__("undefined"))) static INLINE void *memcpy_tiny(void *dst, const void *src, size_t size) {
|
||||
unsigned char *dd = ((unsigned char*)dst) + size;
|
||||
const unsigned char *ss = ((const unsigned char*)src) + size;
|
||||
|
||||
switch (size) {
|
||||
case 64:
|
||||
memcpy_sse2_64(dd - 64, ss - 64);
|
||||
case 0:
|
||||
break;
|
||||
|
||||
case 65:
|
||||
memcpy_sse2_64(dd - 65, ss - 65);
|
||||
case 1:
|
||||
dd[-1] = ss[-1];
|
||||
break;
|
||||
|
||||
case 66:
|
||||
memcpy_sse2_64(dd - 66, ss - 66);
|
||||
case 2:
|
||||
*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
|
||||
break;
|
||||
|
||||
case 67:
|
||||
memcpy_sse2_64(dd - 67, ss - 67);
|
||||
case 3:
|
||||
*((uint16_unaligned_t*)(dd - 3)) = *((uint16_unaligned_t*)(ss - 3));
|
||||
dd[-1] = ss[-1];
|
||||
break;
|
||||
|
||||
case 68:
|
||||
memcpy_sse2_64(dd - 68, ss - 68);
|
||||
case 4:
|
||||
*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
|
||||
break;
|
||||
|
||||
case 69:
|
||||
memcpy_sse2_64(dd - 69, ss - 69);
|
||||
case 5:
|
||||
*((uint32_unaligned_t*)(dd - 5)) = *((uint32_unaligned_t*)(ss - 5));
|
||||
dd[-1] = ss[-1];
|
||||
break;
|
||||
|
||||
case 70:
|
||||
memcpy_sse2_64(dd - 70, ss - 70);
|
||||
case 6:
|
||||
*((uint32_unaligned_t*)(dd - 6)) = *((uint32_unaligned_t*)(ss - 6));
|
||||
*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
|
||||
break;
|
||||
|
||||
case 71:
|
||||
memcpy_sse2_64(dd - 71, ss - 71);
|
||||
case 7:
|
||||
*((uint32_unaligned_t*)(dd - 7)) = *((uint32_unaligned_t*)(ss - 7));
|
||||
*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
|
||||
break;
|
||||
|
||||
case 72:
|
||||
memcpy_sse2_64(dd - 72, ss - 72);
|
||||
case 8:
|
||||
*((uint64_unaligned_t*)(dd - 8)) = *((uint64_unaligned_t*)(ss - 8));
|
||||
break;
|
||||
|
||||
case 73:
|
||||
memcpy_sse2_64(dd - 73, ss - 73);
|
||||
case 9:
|
||||
*((uint64_unaligned_t*)(dd - 9)) = *((uint64_unaligned_t*)(ss - 9));
|
||||
dd[-1] = ss[-1];
|
||||
break;
|
||||
|
||||
case 74:
|
||||
memcpy_sse2_64(dd - 74, ss - 74);
|
||||
case 10:
|
||||
*((uint64_unaligned_t*)(dd - 10)) = *((uint64_unaligned_t*)(ss - 10));
|
||||
*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
|
||||
break;
|
||||
|
||||
case 75:
|
||||
memcpy_sse2_64(dd - 75, ss - 75);
|
||||
case 11:
|
||||
*((uint64_unaligned_t*)(dd - 11)) = *((uint64_unaligned_t*)(ss - 11));
|
||||
*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
|
||||
break;
|
||||
|
||||
case 76:
|
||||
memcpy_sse2_64(dd - 76, ss - 76);
|
||||
case 12:
|
||||
*((uint64_unaligned_t*)(dd - 12)) = *((uint64_unaligned_t*)(ss - 12));
|
||||
*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
|
||||
break;
|
||||
|
||||
case 77:
|
||||
memcpy_sse2_64(dd - 77, ss - 77);
|
||||
case 13:
|
||||
*((uint64_unaligned_t*)(dd - 13)) = *((uint64_unaligned_t*)(ss - 13));
|
||||
*((uint32_unaligned_t*)(dd - 5)) = *((uint32_unaligned_t*)(ss - 5));
|
||||
dd[-1] = ss[-1];
|
||||
break;
|
||||
|
||||
case 78:
|
||||
memcpy_sse2_64(dd - 78, ss - 78);
|
||||
case 14:
|
||||
*((uint64_unaligned_t*)(dd - 14)) = *((uint64_unaligned_t*)(ss - 14));
|
||||
*((uint64_unaligned_t*)(dd - 8)) = *((uint64_unaligned_t*)(ss - 8));
|
||||
break;
|
||||
|
||||
case 79:
|
||||
memcpy_sse2_64(dd - 79, ss - 79);
|
||||
case 15:
|
||||
*((uint64_unaligned_t*)(dd - 15)) = *((uint64_unaligned_t*)(ss - 15));
|
||||
*((uint64_unaligned_t*)(dd - 8)) = *((uint64_unaligned_t*)(ss - 8));
|
||||
break;
|
||||
|
||||
case 80:
|
||||
memcpy_sse2_64(dd - 80, ss - 80);
|
||||
case 16:
|
||||
memcpy_sse2_16(dd - 16, ss - 16);
|
||||
break;
|
||||
|
||||
case 81:
|
||||
memcpy_sse2_64(dd - 81, ss - 81);
|
||||
case 17:
|
||||
memcpy_sse2_16(dd - 17, ss - 17);
|
||||
dd[-1] = ss[-1];
|
||||
break;
|
||||
|
||||
case 82:
|
||||
memcpy_sse2_64(dd - 82, ss - 82);
|
||||
case 18:
|
||||
memcpy_sse2_16(dd - 18, ss - 18);
|
||||
*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
|
||||
break;
|
||||
|
||||
case 83:
|
||||
memcpy_sse2_64(dd - 83, ss - 83);
|
||||
case 19:
|
||||
memcpy_sse2_16(dd - 19, ss - 19);
|
||||
*((uint16_unaligned_t*)(dd - 3)) = *((uint16_unaligned_t*)(ss - 3));
|
||||
dd[-1] = ss[-1];
|
||||
break;
|
||||
|
||||
case 84:
|
||||
memcpy_sse2_64(dd - 84, ss - 84);
|
||||
case 20:
|
||||
memcpy_sse2_16(dd - 20, ss - 20);
|
||||
*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
|
||||
break;
|
||||
|
||||
case 85:
|
||||
memcpy_sse2_64(dd - 85, ss - 85);
|
||||
case 21:
|
||||
memcpy_sse2_16(dd - 21, ss - 21);
|
||||
*((uint32_unaligned_t*)(dd - 5)) = *((uint32_unaligned_t*)(ss - 5));
|
||||
dd[-1] = ss[-1];
|
||||
break;
|
||||
|
||||
case 86:
|
||||
memcpy_sse2_64(dd - 86, ss - 86);
|
||||
case 22:
|
||||
memcpy_sse2_16(dd - 22, ss - 22);
|
||||
*((uint32_unaligned_t*)(dd - 6)) = *((uint32_unaligned_t*)(ss - 6));
|
||||
*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
|
||||
break;
|
||||
|
||||
case 87:
|
||||
memcpy_sse2_64(dd - 87, ss - 87);
|
||||
case 23:
|
||||
memcpy_sse2_16(dd - 23, ss - 23);
|
||||
*((uint32_unaligned_t*)(dd - 7)) = *((uint32_unaligned_t*)(ss - 7));
|
||||
*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
|
||||
break;
|
||||
|
||||
case 88:
|
||||
memcpy_sse2_64(dd - 88, ss - 88);
|
||||
case 24:
|
||||
memcpy_sse2_16(dd - 24, ss - 24);
|
||||
memcpy_sse2_16(dd - 16, ss - 16);
|
||||
break;
|
||||
|
||||
case 89:
|
||||
memcpy_sse2_64(dd - 89, ss - 89);
|
||||
case 25:
|
||||
memcpy_sse2_16(dd - 25, ss - 25);
|
||||
memcpy_sse2_16(dd - 16, ss - 16);
|
||||
break;
|
||||
|
||||
case 90:
|
||||
memcpy_sse2_64(dd - 90, ss - 90);
|
||||
case 26:
|
||||
memcpy_sse2_16(dd - 26, ss - 26);
|
||||
memcpy_sse2_16(dd - 16, ss - 16);
|
||||
break;
|
||||
|
||||
case 91:
|
||||
memcpy_sse2_64(dd - 91, ss - 91);
|
||||
case 27:
|
||||
memcpy_sse2_16(dd - 27, ss - 27);
|
||||
memcpy_sse2_16(dd - 16, ss - 16);
|
||||
break;
|
||||
|
||||
case 92:
|
||||
memcpy_sse2_64(dd - 92, ss - 92);
|
||||
case 28:
|
||||
memcpy_sse2_16(dd - 28, ss - 28);
|
||||
memcpy_sse2_16(dd - 16, ss - 16);
|
||||
break;
|
||||
|
||||
case 93:
|
||||
memcpy_sse2_64(dd - 93, ss - 93);
|
||||
case 29:
|
||||
memcpy_sse2_16(dd - 29, ss - 29);
|
||||
memcpy_sse2_16(dd - 16, ss - 16);
|
||||
break;
|
||||
|
||||
case 94:
|
||||
memcpy_sse2_64(dd - 94, ss - 94);
|
||||
case 30:
|
||||
memcpy_sse2_16(dd - 30, ss - 30);
|
||||
memcpy_sse2_16(dd - 16, ss - 16);
|
||||
break;
|
||||
|
||||
case 95:
|
||||
memcpy_sse2_64(dd - 95, ss - 95);
|
||||
case 31:
|
||||
memcpy_sse2_16(dd - 31, ss - 31);
|
||||
memcpy_sse2_16(dd - 16, ss - 16);
|
||||
break;
|
||||
|
||||
case 96:
|
||||
memcpy_sse2_64(dd - 96, ss - 96);
|
||||
case 32:
|
||||
memcpy_sse2_32(dd - 32, ss - 32);
|
||||
break;
|
||||
|
||||
case 97:
|
||||
memcpy_sse2_64(dd - 97, ss - 97);
|
||||
case 33:
|
||||
memcpy_sse2_32(dd - 33, ss - 33);
|
||||
dd[-1] = ss[-1];
|
||||
break;
|
||||
|
||||
case 98:
|
||||
memcpy_sse2_64(dd - 98, ss - 98);
|
||||
case 34:
|
||||
memcpy_sse2_32(dd - 34, ss - 34);
|
||||
*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
|
||||
break;
|
||||
|
||||
case 99:
|
||||
memcpy_sse2_64(dd - 99, ss - 99);
|
||||
case 35:
|
||||
memcpy_sse2_32(dd - 35, ss - 35);
|
||||
*((uint16_unaligned_t*)(dd - 3)) = *((uint16_unaligned_t*)(ss - 3));
|
||||
dd[-1] = ss[-1];
|
||||
break;
|
||||
|
||||
case 100:
|
||||
memcpy_sse2_64(dd - 100, ss - 100);
|
||||
case 36:
|
||||
memcpy_sse2_32(dd - 36, ss - 36);
|
||||
*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
|
||||
break;
|
||||
|
||||
case 101:
|
||||
memcpy_sse2_64(dd - 101, ss - 101);
|
||||
case 37:
|
||||
memcpy_sse2_32(dd - 37, ss - 37);
|
||||
*((uint32_unaligned_t*)(dd - 5)) = *((uint32_unaligned_t*)(ss - 5));
|
||||
dd[-1] = ss[-1];
|
||||
break;
|
||||
|
||||
case 102:
|
||||
memcpy_sse2_64(dd - 102, ss - 102);
|
||||
case 38:
|
||||
memcpy_sse2_32(dd - 38, ss - 38);
|
||||
*((uint32_unaligned_t*)(dd - 6)) = *((uint32_unaligned_t*)(ss - 6));
|
||||
*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
|
||||
break;
|
||||
|
||||
case 103:
|
||||
memcpy_sse2_64(dd - 103, ss - 103);
|
||||
case 39:
|
||||
memcpy_sse2_32(dd - 39, ss - 39);
|
||||
*((uint32_unaligned_t*)(dd - 7)) = *((uint32_unaligned_t*)(ss - 7));
|
||||
*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
|
||||
break;
|
||||
|
||||
case 104:
|
||||
memcpy_sse2_64(dd - 104, ss - 104);
|
||||
case 40:
|
||||
memcpy_sse2_32(dd - 40, ss - 40);
|
||||
*((uint64_unaligned_t*)(dd - 8)) = *((uint64_unaligned_t*)(ss - 8));
|
||||
break;
|
||||
|
||||
case 105:
|
||||
memcpy_sse2_64(dd - 105, ss - 105);
|
||||
case 41:
|
||||
memcpy_sse2_32(dd - 41, ss - 41);
|
||||
*((uint64_unaligned_t*)(dd - 9)) = *((uint64_unaligned_t*)(ss - 9));
|
||||
dd[-1] = ss[-1];
|
||||
break;
|
||||
|
||||
case 106:
|
||||
memcpy_sse2_64(dd - 106, ss - 106);
|
||||
case 42:
|
||||
memcpy_sse2_32(dd - 42, ss - 42);
|
||||
*((uint64_unaligned_t*)(dd - 10)) = *((uint64_unaligned_t*)(ss - 10));
|
||||
*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
|
||||
break;
|
||||
|
||||
case 107:
|
||||
memcpy_sse2_64(dd - 107, ss - 107);
|
||||
case 43:
|
||||
memcpy_sse2_32(dd - 43, ss - 43);
|
||||
*((uint64_unaligned_t*)(dd - 11)) = *((uint64_unaligned_t*)(ss - 11));
|
||||
*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
|
||||
break;
|
||||
|
||||
case 108:
|
||||
memcpy_sse2_64(dd - 108, ss - 108);
|
||||
case 44:
|
||||
memcpy_sse2_32(dd - 44, ss - 44);
|
||||
*((uint64_unaligned_t*)(dd - 12)) = *((uint64_unaligned_t*)(ss - 12));
|
||||
*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
|
||||
break;
|
||||
|
||||
case 109:
|
||||
memcpy_sse2_64(dd - 109, ss - 109);
|
||||
case 45:
|
||||
memcpy_sse2_32(dd - 45, ss - 45);
|
||||
*((uint64_unaligned_t*)(dd - 13)) = *((uint64_unaligned_t*)(ss - 13));
|
||||
*((uint32_unaligned_t*)(dd - 5)) = *((uint32_unaligned_t*)(ss - 5));
|
||||
dd[-1] = ss[-1];
|
||||
break;
|
||||
|
||||
case 110:
|
||||
memcpy_sse2_64(dd - 110, ss - 110);
|
||||
case 46:
|
||||
memcpy_sse2_32(dd - 46, ss - 46);
|
||||
*((uint64_unaligned_t*)(dd - 14)) = *((uint64_unaligned_t*)(ss - 14));
|
||||
*((uint64_unaligned_t*)(dd - 8)) = *((uint64_unaligned_t*)(ss - 8));
|
||||
break;
|
||||
|
||||
case 111:
|
||||
memcpy_sse2_64(dd - 111, ss - 111);
|
||||
case 47:
|
||||
memcpy_sse2_32(dd - 47, ss - 47);
|
||||
*((uint64_unaligned_t*)(dd - 15)) = *((uint64_unaligned_t*)(ss - 15));
|
||||
*((uint64_unaligned_t*)(dd - 8)) = *((uint64_unaligned_t*)(ss - 8));
|
||||
break;
|
||||
|
||||
case 112:
|
||||
memcpy_sse2_64(dd - 112, ss - 112);
|
||||
case 48:
|
||||
memcpy_sse2_32(dd - 48, ss - 48);
|
||||
memcpy_sse2_16(dd - 16, ss - 16);
|
||||
break;
|
||||
|
||||
case 113:
|
||||
memcpy_sse2_64(dd - 113, ss - 113);
|
||||
case 49:
|
||||
memcpy_sse2_32(dd - 49, ss - 49);
|
||||
memcpy_sse2_16(dd - 17, ss - 17);
|
||||
dd[-1] = ss[-1];
|
||||
break;
|
||||
|
||||
case 114:
|
||||
memcpy_sse2_64(dd - 114, ss - 114);
|
||||
case 50:
|
||||
memcpy_sse2_32(dd - 50, ss - 50);
|
||||
memcpy_sse2_16(dd - 18, ss - 18);
|
||||
*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
|
||||
break;
|
||||
|
||||
case 115:
|
||||
memcpy_sse2_64(dd - 115, ss - 115);
|
||||
case 51:
|
||||
memcpy_sse2_32(dd - 51, ss - 51);
|
||||
memcpy_sse2_16(dd - 19, ss - 19);
|
||||
*((uint16_unaligned_t*)(dd - 3)) = *((uint16_unaligned_t*)(ss - 3));
|
||||
dd[-1] = ss[-1];
|
||||
break;
|
||||
|
||||
case 116:
|
||||
memcpy_sse2_64(dd - 116, ss - 116);
|
||||
case 52:
|
||||
memcpy_sse2_32(dd - 52, ss - 52);
|
||||
memcpy_sse2_16(dd - 20, ss - 20);
|
||||
*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
|
||||
break;
|
||||
|
||||
case 117:
|
||||
memcpy_sse2_64(dd - 117, ss - 117);
|
||||
case 53:
|
||||
memcpy_sse2_32(dd - 53, ss - 53);
|
||||
memcpy_sse2_16(dd - 21, ss - 21);
|
||||
*((uint32_unaligned_t*)(dd - 5)) = *((uint32_unaligned_t*)(ss - 5));
|
||||
dd[-1] = ss[-1];
|
||||
break;
|
||||
|
||||
case 118:
|
||||
memcpy_sse2_64(dd - 118, ss - 118);
|
||||
case 54:
|
||||
memcpy_sse2_32(dd - 54, ss - 54);
|
||||
memcpy_sse2_16(dd - 22, ss - 22);
|
||||
*((uint32_unaligned_t*)(dd - 6)) = *((uint32_unaligned_t*)(ss - 6));
|
||||
*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
|
||||
break;
|
||||
|
||||
case 119:
|
||||
memcpy_sse2_64(dd - 119, ss - 119);
|
||||
case 55:
|
||||
memcpy_sse2_32(dd - 55, ss - 55);
|
||||
memcpy_sse2_16(dd - 23, ss - 23);
|
||||
*((uint32_unaligned_t*)(dd - 7)) = *((uint32_unaligned_t*)(ss - 7));
|
||||
*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
|
||||
break;
|
||||
|
||||
case 120:
|
||||
memcpy_sse2_64(dd - 120, ss - 120);
|
||||
case 56:
|
||||
memcpy_sse2_32(dd - 56, ss - 56);
|
||||
memcpy_sse2_16(dd - 24, ss - 24);
|
||||
memcpy_sse2_16(dd - 16, ss - 16);
|
||||
break;
|
||||
|
||||
case 121:
|
||||
memcpy_sse2_64(dd - 121, ss - 121);
|
||||
case 57:
|
||||
memcpy_sse2_32(dd - 57, ss - 57);
|
||||
memcpy_sse2_16(dd - 25, ss - 25);
|
||||
memcpy_sse2_16(dd - 16, ss - 16);
|
||||
break;
|
||||
|
||||
case 122:
|
||||
memcpy_sse2_64(dd - 122, ss - 122);
|
||||
case 58:
|
||||
memcpy_sse2_32(dd - 58, ss - 58);
|
||||
memcpy_sse2_16(dd - 26, ss - 26);
|
||||
memcpy_sse2_16(dd - 16, ss - 16);
|
||||
break;
|
||||
|
||||
case 123:
|
||||
memcpy_sse2_64(dd - 123, ss - 123);
|
||||
case 59:
|
||||
memcpy_sse2_32(dd - 59, ss - 59);
|
||||
memcpy_sse2_16(dd - 27, ss - 27);
|
||||
memcpy_sse2_16(dd - 16, ss - 16);
|
||||
break;
|
||||
|
||||
case 124:
|
||||
memcpy_sse2_64(dd - 124, ss - 124);
|
||||
case 60:
|
||||
memcpy_sse2_32(dd - 60, ss - 60);
|
||||
memcpy_sse2_16(dd - 28, ss - 28);
|
||||
memcpy_sse2_16(dd - 16, ss - 16);
|
||||
break;
|
||||
|
||||
case 125:
|
||||
memcpy_sse2_64(dd - 125, ss - 125);
|
||||
case 61:
|
||||
memcpy_sse2_32(dd - 61, ss - 61);
|
||||
memcpy_sse2_16(dd - 29, ss - 29);
|
||||
memcpy_sse2_16(dd - 16, ss - 16);
|
||||
break;
|
||||
|
||||
case 126:
|
||||
memcpy_sse2_64(dd - 126, ss - 126);
|
||||
case 62:
|
||||
memcpy_sse2_32(dd - 62, ss - 62);
|
||||
memcpy_sse2_16(dd - 30, ss - 30);
|
||||
memcpy_sse2_16(dd - 16, ss - 16);
|
||||
break;
|
||||
|
||||
case 127:
|
||||
memcpy_sse2_64(dd - 127, ss - 127);
|
||||
case 63:
|
||||
memcpy_sse2_32(dd - 63, ss - 63);
|
||||
memcpy_sse2_16(dd - 31, ss - 31);
|
||||
memcpy_sse2_16(dd - 16, ss - 16);
|
||||
break;
|
||||
|
||||
case 128:
|
||||
memcpy_sse2_128(dd - 128, ss - 128);
|
||||
break;
|
||||
}
|
||||
|
||||
return dst;
|
||||
}
|
||||
|
||||
|
||||
//---------------------------------------------------------------------
|
||||
// main routine
|
||||
//---------------------------------------------------------------------
|
||||
static void* memcpy_fast(void *destination, const void *source, size_t size)
|
||||
{
|
||||
unsigned char *dst = (unsigned char*)destination;
|
||||
const unsigned char *src = (const unsigned char*)source;
|
||||
static size_t cachesize = 0x200000; // L2-cache size
|
||||
size_t padding;
|
||||
|
||||
// small memory copy
|
||||
if (size <= 128) {
|
||||
return memcpy_tiny(dst, src, size);
|
||||
}
|
||||
|
||||
// align destination to 16 bytes boundary
|
||||
padding = (16 - (((size_t)dst) & 15)) & 15;
|
||||
|
||||
if (padding > 0) {
|
||||
__m128i head = _mm_loadu_si128((const __m128i*)src);
|
||||
_mm_storeu_si128((__m128i*)dst, head);
|
||||
dst += padding;
|
||||
src += padding;
|
||||
size -= padding;
|
||||
}
|
||||
|
||||
// medium size copy
|
||||
if (size <= cachesize) {
|
||||
__m128i c0, c1, c2, c3, c4, c5, c6, c7;
|
||||
|
||||
for (; size >= 128; size -= 128) {
|
||||
c0 = _mm_loadu_si128(((const __m128i*)src) + 0);
|
||||
c1 = _mm_loadu_si128(((const __m128i*)src) + 1);
|
||||
c2 = _mm_loadu_si128(((const __m128i*)src) + 2);
|
||||
c3 = _mm_loadu_si128(((const __m128i*)src) + 3);
|
||||
c4 = _mm_loadu_si128(((const __m128i*)src) + 4);
|
||||
c5 = _mm_loadu_si128(((const __m128i*)src) + 5);
|
||||
c6 = _mm_loadu_si128(((const __m128i*)src) + 6);
|
||||
c7 = _mm_loadu_si128(((const __m128i*)src) + 7);
|
||||
_mm_prefetch((const char*)(src + 256), _MM_HINT_NTA);
|
||||
src += 128;
|
||||
_mm_store_si128((((__m128i*)dst) + 0), c0);
|
||||
_mm_store_si128((((__m128i*)dst) + 1), c1);
|
||||
_mm_store_si128((((__m128i*)dst) + 2), c2);
|
||||
_mm_store_si128((((__m128i*)dst) + 3), c3);
|
||||
_mm_store_si128((((__m128i*)dst) + 4), c4);
|
||||
_mm_store_si128((((__m128i*)dst) + 5), c5);
|
||||
_mm_store_si128((((__m128i*)dst) + 6), c6);
|
||||
_mm_store_si128((((__m128i*)dst) + 7), c7);
|
||||
dst += 128;
|
||||
}
|
||||
}
|
||||
else { // big memory copy
|
||||
__m128i c0, c1, c2, c3, c4, c5, c6, c7;
|
||||
|
||||
_mm_prefetch((const char*)(src), _MM_HINT_NTA);
|
||||
|
||||
if ((((size_t)src) & 15) == 0) { // source aligned
|
||||
for (; size >= 128; size -= 128) {
|
||||
c0 = _mm_load_si128(((const __m128i*)src) + 0);
|
||||
c1 = _mm_load_si128(((const __m128i*)src) + 1);
|
||||
c2 = _mm_load_si128(((const __m128i*)src) + 2);
|
||||
c3 = _mm_load_si128(((const __m128i*)src) + 3);
|
||||
c4 = _mm_load_si128(((const __m128i*)src) + 4);
|
||||
c5 = _mm_load_si128(((const __m128i*)src) + 5);
|
||||
c6 = _mm_load_si128(((const __m128i*)src) + 6);
|
||||
c7 = _mm_load_si128(((const __m128i*)src) + 7);
|
||||
_mm_prefetch((const char*)(src + 256), _MM_HINT_NTA);
|
||||
src += 128;
|
||||
_mm_stream_si128((((__m128i*)dst) + 0), c0);
|
||||
_mm_stream_si128((((__m128i*)dst) + 1), c1);
|
||||
_mm_stream_si128((((__m128i*)dst) + 2), c2);
|
||||
_mm_stream_si128((((__m128i*)dst) + 3), c3);
|
||||
_mm_stream_si128((((__m128i*)dst) + 4), c4);
|
||||
_mm_stream_si128((((__m128i*)dst) + 5), c5);
|
||||
_mm_stream_si128((((__m128i*)dst) + 6), c6);
|
||||
_mm_stream_si128((((__m128i*)dst) + 7), c7);
|
||||
dst += 128;
|
||||
}
|
||||
}
|
||||
else { // source unaligned
|
||||
for (; size >= 128; size -= 128) {
|
||||
c0 = _mm_loadu_si128(((const __m128i*)src) + 0);
|
||||
c1 = _mm_loadu_si128(((const __m128i*)src) + 1);
|
||||
c2 = _mm_loadu_si128(((const __m128i*)src) + 2);
|
||||
c3 = _mm_loadu_si128(((const __m128i*)src) + 3);
|
||||
c4 = _mm_loadu_si128(((const __m128i*)src) + 4);
|
||||
c5 = _mm_loadu_si128(((const __m128i*)src) + 5);
|
||||
c6 = _mm_loadu_si128(((const __m128i*)src) + 6);
|
||||
c7 = _mm_loadu_si128(((const __m128i*)src) + 7);
|
||||
_mm_prefetch((const char*)(src + 256), _MM_HINT_NTA);
|
||||
src += 128;
|
||||
_mm_stream_si128((((__m128i*)dst) + 0), c0);
|
||||
_mm_stream_si128((((__m128i*)dst) + 1), c1);
|
||||
_mm_stream_si128((((__m128i*)dst) + 2), c2);
|
||||
_mm_stream_si128((((__m128i*)dst) + 3), c3);
|
||||
_mm_stream_si128((((__m128i*)dst) + 4), c4);
|
||||
_mm_stream_si128((((__m128i*)dst) + 5), c5);
|
||||
_mm_stream_si128((((__m128i*)dst) + 6), c6);
|
||||
_mm_stream_si128((((__m128i*)dst) + 7), c7);
|
||||
dst += 128;
|
||||
}
|
||||
}
|
||||
_mm_sfence();
|
||||
}
|
||||
|
||||
memcpy_tiny(dst, src, size);
|
||||
|
||||
return destination;
|
||||
}
|
||||
|
||||
|
||||
#endif
|
@ -1,171 +0,0 @@
|
||||
//=====================================================================
|
||||
//
|
||||
// FastMemcpy.c - skywind3000@163.com, 2015
|
||||
//
|
||||
// feature:
|
||||
// 50% speed up in avg. vs standard memcpy (tested in vc2012/gcc4.9)
|
||||
//
|
||||
//=====================================================================
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
#include <time.h>
|
||||
#include <assert.h>
|
||||
|
||||
#if (defined(_WIN32) || defined(WIN32))
|
||||
#include <windows.h>
|
||||
#include <mmsystem.h>
|
||||
#ifdef _MSC_VER
|
||||
#pragma comment(lib, "winmm.lib")
|
||||
#endif
|
||||
#elif defined(__unix)
|
||||
#include <sys/time.h>
|
||||
#include <unistd.h>
|
||||
#else
|
||||
#error it can only be compiled under windows or unix
|
||||
#endif
|
||||
|
||||
#include "FastMemcpy_Avx.h"
|
||||
|
||||
|
||||
unsigned int gettime()
|
||||
{
|
||||
#if (defined(_WIN32) || defined(WIN32))
|
||||
return timeGetTime();
|
||||
#else
|
||||
static struct timezone tz={ 0,0 };
|
||||
struct timeval time;
|
||||
gettimeofday(&time,&tz);
|
||||
return (time.tv_sec * 1000 + time.tv_usec / 1000);
|
||||
#endif
|
||||
}
|
||||
|
||||
void sleepms(unsigned int millisec)
|
||||
{
|
||||
#if defined(_WIN32) || defined(WIN32)
|
||||
Sleep(millisec);
|
||||
#else
|
||||
usleep(millisec * 1000);
|
||||
#endif
|
||||
}
|
||||
|
||||
|
||||
|
||||
void benchmark(int dstalign, int srcalign, size_t size, int times)
|
||||
{
|
||||
char *DATA1 = (char*)malloc(size + 64);
|
||||
char *DATA2 = (char*)malloc(size + 64);
|
||||
size_t LINEAR1 = ((size_t)DATA1);
|
||||
size_t LINEAR2 = ((size_t)DATA2);
|
||||
char *ALIGN1 = (char*)(((64 - (LINEAR1 & 63)) & 63) + LINEAR1);
|
||||
char *ALIGN2 = (char*)(((64 - (LINEAR2 & 63)) & 63) + LINEAR2);
|
||||
char *dst = (dstalign)? ALIGN1 : (ALIGN1 + 1);
|
||||
char *src = (srcalign)? ALIGN2 : (ALIGN2 + 3);
|
||||
unsigned int t1, t2;
|
||||
int k;
|
||||
|
||||
sleepms(100);
|
||||
t1 = gettime();
|
||||
for (k = times; k > 0; k--) {
|
||||
memcpy(dst, src, size);
|
||||
}
|
||||
t1 = gettime() - t1;
|
||||
sleepms(100);
|
||||
t2 = gettime();
|
||||
for (k = times; k > 0; k--) {
|
||||
memcpy_fast(dst, src, size);
|
||||
}
|
||||
t2 = gettime() - t2;
|
||||
|
||||
free(DATA1);
|
||||
free(DATA2);
|
||||
|
||||
printf("result(dst %s, src %s): memcpy_fast=%dms memcpy=%d ms\n",
|
||||
dstalign? "aligned" : "unalign",
|
||||
srcalign? "aligned" : "unalign", (int)t2, (int)t1);
|
||||
}
|
||||
|
||||
|
||||
void bench(int copysize, int times)
|
||||
{
|
||||
printf("benchmark(size=%d bytes, times=%d):\n", copysize, times);
|
||||
benchmark(1, 1, copysize, times);
|
||||
benchmark(1, 0, copysize, times);
|
||||
benchmark(0, 1, copysize, times);
|
||||
benchmark(0, 0, copysize, times);
|
||||
printf("\n");
|
||||
}
|
||||
|
||||
|
||||
void random_bench(int maxsize, int times)
|
||||
{
|
||||
static char A[11 * 1024 * 1024 + 2];
|
||||
static char B[11 * 1024 * 1024 + 2];
|
||||
static int random_offsets[0x10000];
|
||||
static int random_sizes[0x8000];
|
||||
unsigned int i, p1, p2;
|
||||
unsigned int t1, t2;
|
||||
for (i = 0; i < 0x10000; i++) { // generate random offsets
|
||||
random_offsets[i] = rand() % (10 * 1024 * 1024 + 1);
|
||||
}
|
||||
for (i = 0; i < 0x8000; i++) { // generate random sizes
|
||||
random_sizes[i] = 1 + rand() % maxsize;
|
||||
}
|
||||
sleepms(100);
|
||||
t1 = gettime();
|
||||
for (p1 = 0, p2 = 0, i = 0; i < times; i++) {
|
||||
int offset1 = random_offsets[(p1++) & 0xffff];
|
||||
int offset2 = random_offsets[(p1++) & 0xffff];
|
||||
int size = random_sizes[(p2++) & 0x7fff];
|
||||
memcpy(A + offset1, B + offset2, size);
|
||||
}
|
||||
t1 = gettime() - t1;
|
||||
sleepms(100);
|
||||
t2 = gettime();
|
||||
for (p1 = 0, p2 = 0, i = 0; i < times; i++) {
|
||||
int offset1 = random_offsets[(p1++) & 0xffff];
|
||||
int offset2 = random_offsets[(p1++) & 0xffff];
|
||||
int size = random_sizes[(p2++) & 0x7fff];
|
||||
memcpy_fast(A + offset1, B + offset2, size);
|
||||
}
|
||||
t2 = gettime() - t2;
|
||||
printf("benchmark random access:\n");
|
||||
printf("memcpy_fast=%dms memcpy=%dms\n\n", (int)t2, (int)t1);
|
||||
}
|
||||
|
||||
|
||||
#ifdef _MSC_VER
|
||||
#pragma comment(lib, "winmm.lib")
|
||||
#endif
|
||||
|
||||
int main(void)
|
||||
{
|
||||
#if 1
|
||||
bench(32, 0x1000000);
|
||||
bench(64, 0x1000000);
|
||||
bench(512, 0x800000);
|
||||
bench(1024, 0x400000);
|
||||
#endif
|
||||
bench(4096, 0x80000);
|
||||
bench(8192, 0x40000);
|
||||
#if 1
|
||||
bench(1024 * 1024 * 1, 0x800);
|
||||
bench(1024 * 1024 * 4, 0x200);
|
||||
#endif
|
||||
bench(1024 * 1024 * 8, 0x100);
|
||||
|
||||
random_bench(2048, 8000000);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
/*
|
||||
|
||||
*/
|
||||
|
||||
|
||||
|
||||
|
@ -1,492 +0,0 @@
|
||||
//=====================================================================
|
||||
//
|
||||
// FastMemcpy.c - skywind3000@163.com, 2015
|
||||
//
|
||||
// feature:
|
||||
// 50% speed up in avg. vs standard memcpy (tested in vc2012/gcc5.1)
|
||||
//
|
||||
//=====================================================================
|
||||
#ifndef __FAST_MEMCPY_H__
|
||||
#define __FAST_MEMCPY_H__
|
||||
|
||||
#include <stddef.h>
|
||||
#include <stdint.h>
|
||||
#include <immintrin.h>
|
||||
|
||||
|
||||
//---------------------------------------------------------------------
|
||||
// force inline for compilers
|
||||
//---------------------------------------------------------------------
|
||||
#ifndef INLINE
|
||||
#ifdef __GNUC__
|
||||
#if (__GNUC__ > 3) || ((__GNUC__ == 3) && (__GNUC_MINOR__ >= 1))
|
||||
#define INLINE __inline__ __attribute__((always_inline))
|
||||
#else
|
||||
#define INLINE __inline__
|
||||
#endif
|
||||
#elif defined(_MSC_VER)
|
||||
#define INLINE __forceinline
|
||||
#elif (defined(__BORLANDC__) || defined(__WATCOMC__))
|
||||
#define INLINE __inline
|
||||
#else
|
||||
#define INLINE
|
||||
#endif
|
||||
#endif
|
||||
|
||||
|
||||
|
||||
//---------------------------------------------------------------------
|
||||
// fast copy for different sizes
|
||||
//---------------------------------------------------------------------
|
||||
static INLINE void memcpy_avx_16(void *dst, const void *src) {
|
||||
#if 1
|
||||
__m128i m0 = _mm_loadu_si128(((const __m128i*)src) + 0);
|
||||
_mm_storeu_si128(((__m128i*)dst) + 0, m0);
|
||||
#else
|
||||
*((uint64_t*)((char*)dst + 0)) = *((uint64_t*)((const char*)src + 0));
|
||||
*((uint64_t*)((char*)dst + 8)) = *((uint64_t*)((const char*)src + 8));
|
||||
#endif
|
||||
}
|
||||
|
||||
static INLINE void memcpy_avx_32(void *dst, const void *src) {
|
||||
__m256i m0 = _mm256_loadu_si256(((const __m256i*)src) + 0);
|
||||
_mm256_storeu_si256(((__m256i*)dst) + 0, m0);
|
||||
}
|
||||
|
||||
static INLINE void memcpy_avx_64(void *dst, const void *src) {
|
||||
__m256i m0 = _mm256_loadu_si256(((const __m256i*)src) + 0);
|
||||
__m256i m1 = _mm256_loadu_si256(((const __m256i*)src) + 1);
|
||||
_mm256_storeu_si256(((__m256i*)dst) + 0, m0);
|
||||
_mm256_storeu_si256(((__m256i*)dst) + 1, m1);
|
||||
}
|
||||
|
||||
static INLINE void memcpy_avx_128(void *dst, const void *src) {
|
||||
__m256i m0 = _mm256_loadu_si256(((const __m256i*)src) + 0);
|
||||
__m256i m1 = _mm256_loadu_si256(((const __m256i*)src) + 1);
|
||||
__m256i m2 = _mm256_loadu_si256(((const __m256i*)src) + 2);
|
||||
__m256i m3 = _mm256_loadu_si256(((const __m256i*)src) + 3);
|
||||
_mm256_storeu_si256(((__m256i*)dst) + 0, m0);
|
||||
_mm256_storeu_si256(((__m256i*)dst) + 1, m1);
|
||||
_mm256_storeu_si256(((__m256i*)dst) + 2, m2);
|
||||
_mm256_storeu_si256(((__m256i*)dst) + 3, m3);
|
||||
}
|
||||
|
||||
static INLINE void memcpy_avx_256(void *dst, const void *src) {
|
||||
__m256i m0 = _mm256_loadu_si256(((const __m256i*)src) + 0);
|
||||
__m256i m1 = _mm256_loadu_si256(((const __m256i*)src) + 1);
|
||||
__m256i m2 = _mm256_loadu_si256(((const __m256i*)src) + 2);
|
||||
__m256i m3 = _mm256_loadu_si256(((const __m256i*)src) + 3);
|
||||
__m256i m4 = _mm256_loadu_si256(((const __m256i*)src) + 4);
|
||||
__m256i m5 = _mm256_loadu_si256(((const __m256i*)src) + 5);
|
||||
__m256i m6 = _mm256_loadu_si256(((const __m256i*)src) + 6);
|
||||
__m256i m7 = _mm256_loadu_si256(((const __m256i*)src) + 7);
|
||||
_mm256_storeu_si256(((__m256i*)dst) + 0, m0);
|
||||
_mm256_storeu_si256(((__m256i*)dst) + 1, m1);
|
||||
_mm256_storeu_si256(((__m256i*)dst) + 2, m2);
|
||||
_mm256_storeu_si256(((__m256i*)dst) + 3, m3);
|
||||
_mm256_storeu_si256(((__m256i*)dst) + 4, m4);
|
||||
_mm256_storeu_si256(((__m256i*)dst) + 5, m5);
|
||||
_mm256_storeu_si256(((__m256i*)dst) + 6, m6);
|
||||
_mm256_storeu_si256(((__m256i*)dst) + 7, m7);
|
||||
}
|
||||
|
||||
|
||||
//---------------------------------------------------------------------
|
||||
// tiny memory copy with jump table optimized
|
||||
//---------------------------------------------------------------------
|
||||
static INLINE void *memcpy_tiny(void *dst, const void *src, size_t size) {
|
||||
unsigned char *dd = ((unsigned char*)dst) + size;
|
||||
const unsigned char *ss = ((const unsigned char*)src) + size;
|
||||
|
||||
switch (size) {
|
||||
case 128: memcpy_avx_128(dd - 128, ss - 128);
|
||||
case 0: break;
|
||||
case 129: memcpy_avx_128(dd - 129, ss - 129);
|
||||
case 1: dd[-1] = ss[-1]; break;
|
||||
case 130: memcpy_avx_128(dd - 130, ss - 130);
|
||||
case 2: *((uint16_t*)(dd - 2)) = *((uint16_t*)(ss - 2)); break;
|
||||
case 131: memcpy_avx_128(dd - 131, ss - 131);
|
||||
case 3: *((uint16_t*)(dd - 3)) = *((uint16_t*)(ss - 3)); dd[-1] = ss[-1]; break;
|
||||
case 132: memcpy_avx_128(dd - 132, ss - 132);
|
||||
case 4: *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
|
||||
case 133: memcpy_avx_128(dd - 133, ss - 133);
|
||||
case 5: *((uint32_t*)(dd - 5)) = *((uint32_t*)(ss - 5)); dd[-1] = ss[-1]; break;
|
||||
case 134: memcpy_avx_128(dd - 134, ss - 134);
|
||||
case 6: *((uint32_t*)(dd - 6)) = *((uint32_t*)(ss - 6)); *((uint16_t*)(dd - 2)) = *((uint16_t*)(ss - 2)); break;
|
||||
case 135: memcpy_avx_128(dd - 135, ss - 135);
|
||||
case 7: *((uint32_t*)(dd - 7)) = *((uint32_t*)(ss - 7)); *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
|
||||
case 136: memcpy_avx_128(dd - 136, ss - 136);
|
||||
case 8: *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
|
||||
case 137: memcpy_avx_128(dd - 137, ss - 137);
|
||||
case 9: *((uint64_t*)(dd - 9)) = *((uint64_t*)(ss - 9)); dd[-1] = ss[-1]; break;
|
||||
case 138: memcpy_avx_128(dd - 138, ss - 138);
|
||||
case 10: *((uint64_t*)(dd - 10)) = *((uint64_t*)(ss - 10)); *((uint16_t*)(dd - 2)) = *((uint16_t*)(ss - 2)); break;
|
||||
case 139: memcpy_avx_128(dd - 139, ss - 139);
|
||||
case 11: *((uint64_t*)(dd - 11)) = *((uint64_t*)(ss - 11)); *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
|
||||
case 140: memcpy_avx_128(dd - 140, ss - 140);
|
||||
case 12: *((uint64_t*)(dd - 12)) = *((uint64_t*)(ss - 12)); *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
|
||||
case 141: memcpy_avx_128(dd - 141, ss - 141);
|
||||
case 13: *((uint64_t*)(dd - 13)) = *((uint64_t*)(ss - 13)); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
|
||||
case 142: memcpy_avx_128(dd - 142, ss - 142);
|
||||
case 14: *((uint64_t*)(dd - 14)) = *((uint64_t*)(ss - 14)); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
|
||||
case 143: memcpy_avx_128(dd - 143, ss - 143);
|
||||
case 15: *((uint64_t*)(dd - 15)) = *((uint64_t*)(ss - 15)); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
|
||||
case 144: memcpy_avx_128(dd - 144, ss - 144);
|
||||
case 16: memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 145: memcpy_avx_128(dd - 145, ss - 145);
|
||||
case 17: memcpy_avx_16(dd - 17, ss - 17); dd[-1] = ss[-1]; break;
|
||||
case 146: memcpy_avx_128(dd - 146, ss - 146);
|
||||
case 18: memcpy_avx_16(dd - 18, ss - 18); *((uint16_t*)(dd - 2)) = *((uint16_t*)(ss - 2)); break;
|
||||
case 147: memcpy_avx_128(dd - 147, ss - 147);
|
||||
case 19: memcpy_avx_16(dd - 19, ss - 19); *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
|
||||
case 148: memcpy_avx_128(dd - 148, ss - 148);
|
||||
case 20: memcpy_avx_16(dd - 20, ss - 20); *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
|
||||
case 149: memcpy_avx_128(dd - 149, ss - 149);
|
||||
case 21: memcpy_avx_16(dd - 21, ss - 21); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
|
||||
case 150: memcpy_avx_128(dd - 150, ss - 150);
|
||||
case 22: memcpy_avx_16(dd - 22, ss - 22); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
|
||||
case 151: memcpy_avx_128(dd - 151, ss - 151);
|
||||
case 23: memcpy_avx_16(dd - 23, ss - 23); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
|
||||
case 152: memcpy_avx_128(dd - 152, ss - 152);
|
||||
case 24: memcpy_avx_16(dd - 24, ss - 24); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
|
||||
case 153: memcpy_avx_128(dd - 153, ss - 153);
|
||||
case 25: memcpy_avx_16(dd - 25, ss - 25); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 154: memcpy_avx_128(dd - 154, ss - 154);
|
||||
case 26: memcpy_avx_16(dd - 26, ss - 26); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 155: memcpy_avx_128(dd - 155, ss - 155);
|
||||
case 27: memcpy_avx_16(dd - 27, ss - 27); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 156: memcpy_avx_128(dd - 156, ss - 156);
|
||||
case 28: memcpy_avx_16(dd - 28, ss - 28); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 157: memcpy_avx_128(dd - 157, ss - 157);
|
||||
case 29: memcpy_avx_16(dd - 29, ss - 29); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 158: memcpy_avx_128(dd - 158, ss - 158);
|
||||
case 30: memcpy_avx_16(dd - 30, ss - 30); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 159: memcpy_avx_128(dd - 159, ss - 159);
|
||||
case 31: memcpy_avx_16(dd - 31, ss - 31); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 160: memcpy_avx_128(dd - 160, ss - 160);
|
||||
case 32: memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 161: memcpy_avx_128(dd - 161, ss - 161);
|
||||
case 33: memcpy_avx_32(dd - 33, ss - 33); dd[-1] = ss[-1]; break;
|
||||
case 162: memcpy_avx_128(dd - 162, ss - 162);
|
||||
case 34: memcpy_avx_32(dd - 34, ss - 34); *((uint16_t*)(dd - 2)) = *((uint16_t*)(ss - 2)); break;
|
||||
case 163: memcpy_avx_128(dd - 163, ss - 163);
|
||||
case 35: memcpy_avx_32(dd - 35, ss - 35); *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
|
||||
case 164: memcpy_avx_128(dd - 164, ss - 164);
|
||||
case 36: memcpy_avx_32(dd - 36, ss - 36); *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
|
||||
case 165: memcpy_avx_128(dd - 165, ss - 165);
|
||||
case 37: memcpy_avx_32(dd - 37, ss - 37); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
|
||||
case 166: memcpy_avx_128(dd - 166, ss - 166);
|
||||
case 38: memcpy_avx_32(dd - 38, ss - 38); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
|
||||
case 167: memcpy_avx_128(dd - 167, ss - 167);
|
||||
case 39: memcpy_avx_32(dd - 39, ss - 39); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
|
||||
case 168: memcpy_avx_128(dd - 168, ss - 168);
|
||||
case 40: memcpy_avx_32(dd - 40, ss - 40); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
|
||||
case 169: memcpy_avx_128(dd - 169, ss - 169);
|
||||
case 41: memcpy_avx_32(dd - 41, ss - 41); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 170: memcpy_avx_128(dd - 170, ss - 170);
|
||||
case 42: memcpy_avx_32(dd - 42, ss - 42); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 171: memcpy_avx_128(dd - 171, ss - 171);
|
||||
case 43: memcpy_avx_32(dd - 43, ss - 43); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 172: memcpy_avx_128(dd - 172, ss - 172);
|
||||
case 44: memcpy_avx_32(dd - 44, ss - 44); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 173: memcpy_avx_128(dd - 173, ss - 173);
|
||||
case 45: memcpy_avx_32(dd - 45, ss - 45); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 174: memcpy_avx_128(dd - 174, ss - 174);
|
||||
case 46: memcpy_avx_32(dd - 46, ss - 46); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 175: memcpy_avx_128(dd - 175, ss - 175);
|
||||
case 47: memcpy_avx_32(dd - 47, ss - 47); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 176: memcpy_avx_128(dd - 176, ss - 176);
|
||||
case 48: memcpy_avx_32(dd - 48, ss - 48); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 177: memcpy_avx_128(dd - 177, ss - 177);
|
||||
case 49: memcpy_avx_32(dd - 49, ss - 49); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 178: memcpy_avx_128(dd - 178, ss - 178);
|
||||
case 50: memcpy_avx_32(dd - 50, ss - 50); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 179: memcpy_avx_128(dd - 179, ss - 179);
|
||||
case 51: memcpy_avx_32(dd - 51, ss - 51); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 180: memcpy_avx_128(dd - 180, ss - 180);
|
||||
case 52: memcpy_avx_32(dd - 52, ss - 52); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 181: memcpy_avx_128(dd - 181, ss - 181);
|
||||
case 53: memcpy_avx_32(dd - 53, ss - 53); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 182: memcpy_avx_128(dd - 182, ss - 182);
|
||||
case 54: memcpy_avx_32(dd - 54, ss - 54); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 183: memcpy_avx_128(dd - 183, ss - 183);
|
||||
case 55: memcpy_avx_32(dd - 55, ss - 55); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 184: memcpy_avx_128(dd - 184, ss - 184);
|
||||
case 56: memcpy_avx_32(dd - 56, ss - 56); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 185: memcpy_avx_128(dd - 185, ss - 185);
|
||||
case 57: memcpy_avx_32(dd - 57, ss - 57); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 186: memcpy_avx_128(dd - 186, ss - 186);
|
||||
case 58: memcpy_avx_32(dd - 58, ss - 58); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 187: memcpy_avx_128(dd - 187, ss - 187);
|
||||
case 59: memcpy_avx_32(dd - 59, ss - 59); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 188: memcpy_avx_128(dd - 188, ss - 188);
|
||||
case 60: memcpy_avx_32(dd - 60, ss - 60); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 189: memcpy_avx_128(dd - 189, ss - 189);
|
||||
case 61: memcpy_avx_32(dd - 61, ss - 61); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 190: memcpy_avx_128(dd - 190, ss - 190);
|
||||
case 62: memcpy_avx_32(dd - 62, ss - 62); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 191: memcpy_avx_128(dd - 191, ss - 191);
|
||||
case 63: memcpy_avx_32(dd - 63, ss - 63); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 192: memcpy_avx_128(dd - 192, ss - 192);
|
||||
case 64: memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 193: memcpy_avx_128(dd - 193, ss - 193);
|
||||
case 65: memcpy_avx_64(dd - 65, ss - 65); dd[-1] = ss[-1]; break;
|
||||
case 194: memcpy_avx_128(dd - 194, ss - 194);
|
||||
case 66: memcpy_avx_64(dd - 66, ss - 66); *((uint16_t*)(dd - 2)) = *((uint16_t*)(ss - 2)); break;
|
||||
case 195: memcpy_avx_128(dd - 195, ss - 195);
|
||||
case 67: memcpy_avx_64(dd - 67, ss - 67); *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
|
||||
case 196: memcpy_avx_128(dd - 196, ss - 196);
|
||||
case 68: memcpy_avx_64(dd - 68, ss - 68); *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
|
||||
case 197: memcpy_avx_128(dd - 197, ss - 197);
|
||||
case 69: memcpy_avx_64(dd - 69, ss - 69); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
|
||||
case 198: memcpy_avx_128(dd - 198, ss - 198);
|
||||
case 70: memcpy_avx_64(dd - 70, ss - 70); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
|
||||
case 199: memcpy_avx_128(dd - 199, ss - 199);
|
||||
case 71: memcpy_avx_64(dd - 71, ss - 71); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
|
||||
case 200: memcpy_avx_128(dd - 200, ss - 200);
|
||||
case 72: memcpy_avx_64(dd - 72, ss - 72); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
|
||||
case 201: memcpy_avx_128(dd - 201, ss - 201);
|
||||
case 73: memcpy_avx_64(dd - 73, ss - 73); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 202: memcpy_avx_128(dd - 202, ss - 202);
|
||||
case 74: memcpy_avx_64(dd - 74, ss - 74); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 203: memcpy_avx_128(dd - 203, ss - 203);
|
||||
case 75: memcpy_avx_64(dd - 75, ss - 75); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 204: memcpy_avx_128(dd - 204, ss - 204);
|
||||
case 76: memcpy_avx_64(dd - 76, ss - 76); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 205: memcpy_avx_128(dd - 205, ss - 205);
|
||||
case 77: memcpy_avx_64(dd - 77, ss - 77); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 206: memcpy_avx_128(dd - 206, ss - 206);
|
||||
case 78: memcpy_avx_64(dd - 78, ss - 78); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 207: memcpy_avx_128(dd - 207, ss - 207);
|
||||
case 79: memcpy_avx_64(dd - 79, ss - 79); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 208: memcpy_avx_128(dd - 208, ss - 208);
|
||||
case 80: memcpy_avx_64(dd - 80, ss - 80); memcpy_avx_16(dd - 16, ss - 16); break;
|
||||
case 209: memcpy_avx_128(dd - 209, ss - 209);
|
||||
case 81: memcpy_avx_64(dd - 81, ss - 81); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 210: memcpy_avx_128(dd - 210, ss - 210);
|
||||
case 82: memcpy_avx_64(dd - 82, ss - 82); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 211: memcpy_avx_128(dd - 211, ss - 211);
|
||||
case 83: memcpy_avx_64(dd - 83, ss - 83); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 212: memcpy_avx_128(dd - 212, ss - 212);
|
||||
case 84: memcpy_avx_64(dd - 84, ss - 84); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 213: memcpy_avx_128(dd - 213, ss - 213);
|
||||
case 85: memcpy_avx_64(dd - 85, ss - 85); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 214: memcpy_avx_128(dd - 214, ss - 214);
|
||||
case 86: memcpy_avx_64(dd - 86, ss - 86); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 215: memcpy_avx_128(dd - 215, ss - 215);
|
||||
case 87: memcpy_avx_64(dd - 87, ss - 87); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 216: memcpy_avx_128(dd - 216, ss - 216);
|
||||
case 88: memcpy_avx_64(dd - 88, ss - 88); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 217: memcpy_avx_128(dd - 217, ss - 217);
|
||||
case 89: memcpy_avx_64(dd - 89, ss - 89); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 218: memcpy_avx_128(dd - 218, ss - 218);
|
||||
case 90: memcpy_avx_64(dd - 90, ss - 90); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 219: memcpy_avx_128(dd - 219, ss - 219);
|
||||
case 91: memcpy_avx_64(dd - 91, ss - 91); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 220: memcpy_avx_128(dd - 220, ss - 220);
|
||||
case 92: memcpy_avx_64(dd - 92, ss - 92); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 221: memcpy_avx_128(dd - 221, ss - 221);
|
||||
case 93: memcpy_avx_64(dd - 93, ss - 93); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 222: memcpy_avx_128(dd - 222, ss - 222);
|
||||
case 94: memcpy_avx_64(dd - 94, ss - 94); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 223: memcpy_avx_128(dd - 223, ss - 223);
|
||||
case 95: memcpy_avx_64(dd - 95, ss - 95); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 224: memcpy_avx_128(dd - 224, ss - 224);
|
||||
case 96: memcpy_avx_64(dd - 96, ss - 96); memcpy_avx_32(dd - 32, ss - 32); break;
|
||||
case 225: memcpy_avx_128(dd - 225, ss - 225);
|
||||
case 97: memcpy_avx_64(dd - 97, ss - 97); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 226: memcpy_avx_128(dd - 226, ss - 226);
|
||||
case 98: memcpy_avx_64(dd - 98, ss - 98); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 227: memcpy_avx_128(dd - 227, ss - 227);
|
||||
case 99: memcpy_avx_64(dd - 99, ss - 99); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 228: memcpy_avx_128(dd - 228, ss - 228);
|
||||
case 100: memcpy_avx_64(dd - 100, ss - 100); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 229: memcpy_avx_128(dd - 229, ss - 229);
|
||||
case 101: memcpy_avx_64(dd - 101, ss - 101); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 230: memcpy_avx_128(dd - 230, ss - 230);
|
||||
case 102: memcpy_avx_64(dd - 102, ss - 102); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 231: memcpy_avx_128(dd - 231, ss - 231);
|
||||
case 103: memcpy_avx_64(dd - 103, ss - 103); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 232: memcpy_avx_128(dd - 232, ss - 232);
|
||||
case 104: memcpy_avx_64(dd - 104, ss - 104); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 233: memcpy_avx_128(dd - 233, ss - 233);
|
||||
case 105: memcpy_avx_64(dd - 105, ss - 105); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 234: memcpy_avx_128(dd - 234, ss - 234);
|
||||
case 106: memcpy_avx_64(dd - 106, ss - 106); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 235: memcpy_avx_128(dd - 235, ss - 235);
|
||||
case 107: memcpy_avx_64(dd - 107, ss - 107); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 236: memcpy_avx_128(dd - 236, ss - 236);
|
||||
case 108: memcpy_avx_64(dd - 108, ss - 108); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 237: memcpy_avx_128(dd - 237, ss - 237);
|
||||
case 109: memcpy_avx_64(dd - 109, ss - 109); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 238: memcpy_avx_128(dd - 238, ss - 238);
|
||||
case 110: memcpy_avx_64(dd - 110, ss - 110); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 239: memcpy_avx_128(dd - 239, ss - 239);
|
||||
case 111: memcpy_avx_64(dd - 111, ss - 111); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 240: memcpy_avx_128(dd - 240, ss - 240);
|
||||
case 112: memcpy_avx_64(dd - 112, ss - 112); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 241: memcpy_avx_128(dd - 241, ss - 241);
|
||||
case 113: memcpy_avx_64(dd - 113, ss - 113); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 242: memcpy_avx_128(dd - 242, ss - 242);
|
||||
case 114: memcpy_avx_64(dd - 114, ss - 114); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 243: memcpy_avx_128(dd - 243, ss - 243);
|
||||
case 115: memcpy_avx_64(dd - 115, ss - 115); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 244: memcpy_avx_128(dd - 244, ss - 244);
|
||||
case 116: memcpy_avx_64(dd - 116, ss - 116); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 245: memcpy_avx_128(dd - 245, ss - 245);
|
||||
case 117: memcpy_avx_64(dd - 117, ss - 117); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 246: memcpy_avx_128(dd - 246, ss - 246);
|
||||
case 118: memcpy_avx_64(dd - 118, ss - 118); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 247: memcpy_avx_128(dd - 247, ss - 247);
|
||||
case 119: memcpy_avx_64(dd - 119, ss - 119); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 248: memcpy_avx_128(dd - 248, ss - 248);
|
||||
case 120: memcpy_avx_64(dd - 120, ss - 120); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 249: memcpy_avx_128(dd - 249, ss - 249);
|
||||
case 121: memcpy_avx_64(dd - 121, ss - 121); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 250: memcpy_avx_128(dd - 250, ss - 250);
|
||||
case 122: memcpy_avx_64(dd - 122, ss - 122); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 251: memcpy_avx_128(dd - 251, ss - 251);
|
||||
case 123: memcpy_avx_64(dd - 123, ss - 123); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 252: memcpy_avx_128(dd - 252, ss - 252);
|
||||
case 124: memcpy_avx_64(dd - 124, ss - 124); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 253: memcpy_avx_128(dd - 253, ss - 253);
|
||||
case 125: memcpy_avx_64(dd - 125, ss - 125); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 254: memcpy_avx_128(dd - 254, ss - 254);
|
||||
case 126: memcpy_avx_64(dd - 126, ss - 126); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 255: memcpy_avx_128(dd - 255, ss - 255);
|
||||
case 127: memcpy_avx_64(dd - 127, ss - 127); memcpy_avx_64(dd - 64, ss - 64); break;
|
||||
case 256: memcpy_avx_256(dd - 256, ss - 256); break;
|
||||
}
|
||||
|
||||
return dst;
|
||||
}
|
||||
|
||||
|
||||
//---------------------------------------------------------------------
|
||||
// main routine
|
||||
//---------------------------------------------------------------------
|
||||
static void* memcpy_fast(void *destination, const void *source, size_t size)
|
||||
{
|
||||
unsigned char *dst = (unsigned char*)destination;
|
||||
const unsigned char *src = (const unsigned char*)source;
|
||||
static size_t cachesize = 0x200000; // L3-cache size
|
||||
size_t padding;
|
||||
|
||||
// small memory copy
|
||||
if (size <= 256) {
|
||||
memcpy_tiny(dst, src, size);
|
||||
_mm256_zeroupper();
|
||||
return destination;
|
||||
}
|
||||
|
||||
// align destination to 16 bytes boundary
|
||||
padding = (32 - (((size_t)dst) & 31)) & 31;
|
||||
|
||||
#if 0
|
||||
if (padding > 0) {
|
||||
__m256i head = _mm256_loadu_si256((const __m256i*)src);
|
||||
_mm256_storeu_si256((__m256i*)dst, head);
|
||||
dst += padding;
|
||||
src += padding;
|
||||
size -= padding;
|
||||
}
|
||||
#else
|
||||
__m256i head = _mm256_loadu_si256((const __m256i*)src);
|
||||
_mm256_storeu_si256((__m256i*)dst, head);
|
||||
dst += padding;
|
||||
src += padding;
|
||||
size -= padding;
|
||||
#endif
|
||||
|
||||
// medium size copy
|
||||
if (size <= cachesize) {
|
||||
__m256i c0, c1, c2, c3, c4, c5, c6, c7;
|
||||
|
||||
for (; size >= 256; size -= 256) {
|
||||
c0 = _mm256_loadu_si256(((const __m256i*)src) + 0);
|
||||
c1 = _mm256_loadu_si256(((const __m256i*)src) + 1);
|
||||
c2 = _mm256_loadu_si256(((const __m256i*)src) + 2);
|
||||
c3 = _mm256_loadu_si256(((const __m256i*)src) + 3);
|
||||
c4 = _mm256_loadu_si256(((const __m256i*)src) + 4);
|
||||
c5 = _mm256_loadu_si256(((const __m256i*)src) + 5);
|
||||
c6 = _mm256_loadu_si256(((const __m256i*)src) + 6);
|
||||
c7 = _mm256_loadu_si256(((const __m256i*)src) + 7);
|
||||
_mm_prefetch((const char*)(src + 512), _MM_HINT_NTA);
|
||||
src += 256;
|
||||
_mm256_storeu_si256((((__m256i*)dst) + 0), c0);
|
||||
_mm256_storeu_si256((((__m256i*)dst) + 1), c1);
|
||||
_mm256_storeu_si256((((__m256i*)dst) + 2), c2);
|
||||
_mm256_storeu_si256((((__m256i*)dst) + 3), c3);
|
||||
_mm256_storeu_si256((((__m256i*)dst) + 4), c4);
|
||||
_mm256_storeu_si256((((__m256i*)dst) + 5), c5);
|
||||
_mm256_storeu_si256((((__m256i*)dst) + 6), c6);
|
||||
_mm256_storeu_si256((((__m256i*)dst) + 7), c7);
|
||||
dst += 256;
|
||||
}
|
||||
}
|
||||
else { // big memory copy
|
||||
__m256i c0, c1, c2, c3, c4, c5, c6, c7;
|
||||
/* __m256i c0, c1, c2, c3, c4, c5, c6, c7; */
|
||||
|
||||
_mm_prefetch((const char*)(src), _MM_HINT_NTA);
|
||||
|
||||
if ((((size_t)src) & 31) == 0) { // source aligned
|
||||
for (; size >= 256; size -= 256) {
|
||||
c0 = _mm256_load_si256(((const __m256i*)src) + 0);
|
||||
c1 = _mm256_load_si256(((const __m256i*)src) + 1);
|
||||
c2 = _mm256_load_si256(((const __m256i*)src) + 2);
|
||||
c3 = _mm256_load_si256(((const __m256i*)src) + 3);
|
||||
c4 = _mm256_load_si256(((const __m256i*)src) + 4);
|
||||
c5 = _mm256_load_si256(((const __m256i*)src) + 5);
|
||||
c6 = _mm256_load_si256(((const __m256i*)src) + 6);
|
||||
c7 = _mm256_load_si256(((const __m256i*)src) + 7);
|
||||
_mm_prefetch((const char*)(src + 512), _MM_HINT_NTA);
|
||||
src += 256;
|
||||
_mm256_stream_si256((((__m256i*)dst) + 0), c0);
|
||||
_mm256_stream_si256((((__m256i*)dst) + 1), c1);
|
||||
_mm256_stream_si256((((__m256i*)dst) + 2), c2);
|
||||
_mm256_stream_si256((((__m256i*)dst) + 3), c3);
|
||||
_mm256_stream_si256((((__m256i*)dst) + 4), c4);
|
||||
_mm256_stream_si256((((__m256i*)dst) + 5), c5);
|
||||
_mm256_stream_si256((((__m256i*)dst) + 6), c6);
|
||||
_mm256_stream_si256((((__m256i*)dst) + 7), c7);
|
||||
dst += 256;
|
||||
}
|
||||
}
|
||||
else { // source unaligned
|
||||
for (; size >= 256; size -= 256) {
|
||||
c0 = _mm256_loadu_si256(((const __m256i*)src) + 0);
|
||||
c1 = _mm256_loadu_si256(((const __m256i*)src) + 1);
|
||||
c2 = _mm256_loadu_si256(((const __m256i*)src) + 2);
|
||||
c3 = _mm256_loadu_si256(((const __m256i*)src) + 3);
|
||||
c4 = _mm256_loadu_si256(((const __m256i*)src) + 4);
|
||||
c5 = _mm256_loadu_si256(((const __m256i*)src) + 5);
|
||||
c6 = _mm256_loadu_si256(((const __m256i*)src) + 6);
|
||||
c7 = _mm256_loadu_si256(((const __m256i*)src) + 7);
|
||||
_mm_prefetch((const char*)(src + 512), _MM_HINT_NTA);
|
||||
src += 256;
|
||||
_mm256_stream_si256((((__m256i*)dst) + 0), c0);
|
||||
_mm256_stream_si256((((__m256i*)dst) + 1), c1);
|
||||
_mm256_stream_si256((((__m256i*)dst) + 2), c2);
|
||||
_mm256_stream_si256((((__m256i*)dst) + 3), c3);
|
||||
_mm256_stream_si256((((__m256i*)dst) + 4), c4);
|
||||
_mm256_stream_si256((((__m256i*)dst) + 5), c5);
|
||||
_mm256_stream_si256((((__m256i*)dst) + 6), c6);
|
||||
_mm256_stream_si256((((__m256i*)dst) + 7), c7);
|
||||
dst += 256;
|
||||
}
|
||||
}
|
||||
_mm_sfence();
|
||||
}
|
||||
|
||||
memcpy_tiny(dst, src, size);
|
||||
_mm256_zeroupper();
|
||||
|
||||
return destination;
|
||||
}
|
||||
|
||||
|
||||
#endif
|
||||
|
||||
|
||||
|
@ -1,22 +0,0 @@
|
||||
The MIT License (MIT)
|
||||
|
||||
Copyright (c) 2015 Linwei
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
|
@ -1,20 +0,0 @@
|
||||
Internal implementation of `memcpy` function.
|
||||
|
||||
It has the following advantages over `libc`-supplied implementation:
|
||||
- it is linked statically, so the function is called directly, not through a `PLT` (procedure lookup table of shared library);
|
||||
- it is linked statically, so the function can have position-dependent code;
|
||||
- your binaries will not depend on `glibc`'s memcpy, that forces dependency on specific symbol version like `memcpy@@GLIBC_2.14` and consequently on specific version of `glibc` library;
|
||||
- you can include `memcpy.h` directly and the function has the chance to be inlined, which is beneficial for small but unknown at compile time sizes of memory regions;
|
||||
- this version of `memcpy` pretend to be faster (in our benchmarks, the difference is within few percents).
|
||||
|
||||
Currently it uses the implementation from **Linwei** (skywind3000@163.com).
|
||||
Look at https://www.zhihu.com/question/35172305 for discussion.
|
||||
|
||||
Drawbacks:
|
||||
- only use SSE 2, doesn't use wider (AVX, AVX 512) vector registers when available;
|
||||
- no CPU dispatching; doesn't take into account actual cache size.
|
||||
|
||||
Also worth to look at:
|
||||
- simple implementation from Facebook: https://github.com/facebook/folly/blob/master/folly/memcpy.S
|
||||
- implementation from Agner Fog: http://www.agner.org/optimize/
|
||||
- glibc source code.
|
@ -1,6 +0,0 @@
|
||||
#include "FastMemcpy.h"
|
||||
|
||||
void * memcpy(void * __restrict destination, const void * __restrict source, size_t size)
|
||||
{
|
||||
return memcpy_fast(destination, source, size);
|
||||
}
|
@ -2,5 +2,6 @@
|
||||
FROM yandex/clickhouse-binary-builder
|
||||
|
||||
COPY run.sh /run.sh
|
||||
COPY process_split_build_smoke_test_result.py /
|
||||
|
||||
CMD /run.sh
|
||||
|
61
docker/test/split_build_smoke_test/process_split_build_smoke_test_result.py
Executable file
61
docker/test/split_build_smoke_test/process_split_build_smoke_test_result.py
Executable file
@ -0,0 +1,61 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import os
|
||||
import logging
|
||||
import argparse
|
||||
import csv
|
||||
|
||||
RESULT_LOG_NAME = "run.log"
|
||||
|
||||
def process_result(result_folder):
|
||||
|
||||
status = "success"
|
||||
description = 'Server started and responded'
|
||||
summary = [("Smoke test", "OK")]
|
||||
with open(os.path.join(result_folder, RESULT_LOG_NAME), 'r') as run_log:
|
||||
lines = run_log.read().split('\n')
|
||||
if not lines or lines[0].strip() != 'OK':
|
||||
status = "failure"
|
||||
logging.info("Lines is not ok: %s", str('\n'.join(lines)))
|
||||
summary = [("Smoke test", "FAIL")]
|
||||
description = 'Server failed to respond, see result in logs'
|
||||
|
||||
result_logs = []
|
||||
server_log_path = os.path.join(result_folder, "clickhouse-server.log")
|
||||
stderr_log_path = os.path.join(result_folder, "stderr.log")
|
||||
client_stderr_log_path = os.path.join(result_folder, "clientstderr.log")
|
||||
|
||||
if os.path.exists(server_log_path):
|
||||
result_logs.append(server_log_path)
|
||||
|
||||
if os.path.exists(stderr_log_path):
|
||||
result_logs.append(stderr_log_path)
|
||||
|
||||
if os.path.exists(client_stderr_log_path):
|
||||
result_logs.append(client_stderr_log_path)
|
||||
|
||||
return status, description, summary, result_logs
|
||||
|
||||
|
||||
def write_results(results_file, status_file, results, status):
|
||||
with open(results_file, 'w') as f:
|
||||
out = csv.writer(f, delimiter='\t')
|
||||
out.writerows(results)
|
||||
with open(status_file, 'w') as f:
|
||||
out = csv.writer(f, delimiter='\t')
|
||||
out.writerow(status)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
|
||||
parser = argparse.ArgumentParser(description="ClickHouse script for parsing results of split build smoke test")
|
||||
parser.add_argument("--in-results-dir", default='/test_output/')
|
||||
parser.add_argument("--out-results-file", default='/test_output/test_results.tsv')
|
||||
parser.add_argument("--out-status-file", default='/test_output/check_status.tsv')
|
||||
args = parser.parse_args()
|
||||
|
||||
state, description, test_results, logs = process_result(args.in_results_dir)
|
||||
logging.info("Result parsed")
|
||||
status = (state, description)
|
||||
write_results(args.out_results_file, args.out_status_file, test_results, status)
|
||||
logging.info("Result written")
|
@ -5,16 +5,18 @@ set -x
|
||||
install_and_run_server() {
|
||||
mkdir /unpacked
|
||||
tar -xzf /package_folder/shared_build.tgz -C /unpacked --strip 1
|
||||
LD_LIBRARY_PATH=/unpacked /unpacked/clickhouse-server --config /unpacked/config/config.xml >/var/log/clickhouse-server/stderr.log 2>&1 &
|
||||
LD_LIBRARY_PATH=/unpacked /unpacked/clickhouse-server --config /unpacked/config/config.xml >/test_output/stderr.log 2>&1 &
|
||||
}
|
||||
|
||||
run_client() {
|
||||
for i in {1..100}; do
|
||||
sleep 1
|
||||
LD_LIBRARY_PATH=/unpacked /unpacked/clickhouse-client --query "select 'OK'" 2>/var/log/clickhouse-server/clientstderr.log && break
|
||||
LD_LIBRARY_PATH=/unpacked /unpacked/clickhouse-client --query "select 'OK'" > /test_output/run.log 2> /test_output/clientstderr.log && break
|
||||
[[ $i == 100 ]] && echo 'FAIL'
|
||||
done
|
||||
}
|
||||
|
||||
install_and_run_server
|
||||
run_client
|
||||
mv /var/log/clickhouse-server/clickhouse-server.log /test_output/clickhouse-server.log
|
||||
/process_split_build_smoke_test_result.py || echo -e "failure\tCannot parse results" > /test_output/check_status.tsv
|
||||
|
@ -1,7 +1,7 @@
|
||||
# docker build -t yandex/clickhouse-sqlancer-test .
|
||||
FROM ubuntu:20.04
|
||||
|
||||
RUN apt-get update --yes && env DEBIAN_FRONTEND=noninteractive apt-get install wget unzip git openjdk-14-jdk maven --yes --no-install-recommends
|
||||
RUN apt-get update --yes && env DEBIAN_FRONTEND=noninteractive apt-get install wget unzip git openjdk-14-jdk maven python3 --yes --no-install-recommends
|
||||
|
||||
RUN wget https://github.com/sqlancer/sqlancer/archive/master.zip -O /sqlancer.zip
|
||||
RUN mkdir /sqlancer && \
|
||||
@ -10,4 +10,5 @@ RUN mkdir /sqlancer && \
|
||||
RUN cd /sqlancer/sqlancer-master && mvn package -DskipTests
|
||||
|
||||
COPY run.sh /
|
||||
COPY process_sqlancer_result.py /
|
||||
CMD ["/bin/bash", "/run.sh"]
|
||||
|
74
docker/test/sqlancer/process_sqlancer_result.py
Executable file
74
docker/test/sqlancer/process_sqlancer_result.py
Executable file
@ -0,0 +1,74 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import os
|
||||
import logging
|
||||
import argparse
|
||||
import csv
|
||||
|
||||
|
||||
def process_result(result_folder):
|
||||
status = "success"
|
||||
summary = []
|
||||
paths = []
|
||||
tests = ["TLPWhere", "TLPGroupBy", "TLPHaving", "TLPWhereGroupBy", "TLPDistinct", "TLPAggregate"]
|
||||
|
||||
for test in tests:
|
||||
err_path = '{}/{}.err'.format(result_folder, test)
|
||||
out_path = '{}/{}.out'.format(result_folder, test)
|
||||
if not os.path.exists(err_path):
|
||||
logging.info("No output err on path %s", err_path)
|
||||
summary.append((test, "SKIPPED"))
|
||||
elif not os.path.exists(out_path):
|
||||
logging.info("No output log on path %s", out_path)
|
||||
else:
|
||||
paths.append(err_path)
|
||||
paths.append(out_path)
|
||||
with open(err_path, 'r') as f:
|
||||
if 'AssertionError' in f.read():
|
||||
summary.append((test, "FAIL"))
|
||||
else:
|
||||
summary.append((test, "OK"))
|
||||
|
||||
logs_path = '{}/logs.tar.gz'.format(result_folder)
|
||||
if not os.path.exists(logs_path):
|
||||
logging.info("No logs tar on path %s", logs_path)
|
||||
else:
|
||||
paths.append(logs_path)
|
||||
stdout_path = '{}/stdout.log'.format(result_folder)
|
||||
if not os.path.exists(stdout_path):
|
||||
logging.info("No stdout log on path %s", stdout_path)
|
||||
else:
|
||||
paths.append(stdout_path)
|
||||
stderr_path = '{}/stderr.log'.format(result_folder)
|
||||
if not os.path.exists(stderr_path):
|
||||
logging.info("No stderr log on path %s", stderr_path)
|
||||
else:
|
||||
paths.append(stderr_path)
|
||||
|
||||
description = "SQLancer test run. See report"
|
||||
|
||||
return status, description, summary, paths
|
||||
|
||||
|
||||
def write_results(results_file, status_file, results, status):
|
||||
with open(results_file, 'w') as f:
|
||||
out = csv.writer(f, delimiter='\t')
|
||||
out.writerows(results)
|
||||
with open(status_file, 'w') as f:
|
||||
out = csv.writer(f, delimiter='\t')
|
||||
out.writerow(status)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
|
||||
parser = argparse.ArgumentParser(description="ClickHouse script for parsing results of sqlancer test")
|
||||
parser.add_argument("--in-results-dir", default='/test_output/')
|
||||
parser.add_argument("--out-results-file", default='/test_output/test_results.tsv')
|
||||
parser.add_argument("--out-status-file", default='/test_output/check_status.tsv')
|
||||
args = parser.parse_args()
|
||||
|
||||
state, description, test_results, logs = process_result(args.in_results_dir)
|
||||
logging.info("Result parsed")
|
||||
status = (state, description)
|
||||
write_results(args.out_results_file, args.out_status_file, test_results, status)
|
||||
logging.info("Result written")
|
@ -29,4 +29,5 @@ tail -n 1000 /var/log/clickhouse-server/stderr.log > /test_output/stderr.log
|
||||
tail -n 1000 /var/log/clickhouse-server/stdout.log > /test_output/stdout.log
|
||||
tail -n 1000 /var/log/clickhouse-server/clickhouse-server.log > /test_output/clickhouse-server.log
|
||||
|
||||
/process_sqlancer_result.py || echo -e "failure\tCannot parse results" > /test_output/check_status.tsv
|
||||
ls /test_output
|
||||
|
@ -65,3 +65,11 @@ if [[ -n "$USE_DATABASE_REPLICATED" ]] && [[ "$USE_DATABASE_REPLICATED" -eq 1 ]]
|
||||
fi
|
||||
|
||||
clickhouse-test --testname --shard --zookeeper --no-stateless --hung-check --print-time "$SKIP_LIST_OPT" "${ADDITIONAL_OPTIONS[@]}" "$SKIP_TESTS_OPTION" 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee test_output/test_result.txt
|
||||
|
||||
./process_functional_tests_result.py || echo -e "failure\tCannot parse results" > /test_output/check_status.tsv
|
||||
|
||||
pigz < /var/log/clickhouse-server/clickhouse-server.log > /test_output/clickhouse-server.log.gz ||:
|
||||
mv /var/log/clickhouse-server/stderr.log /test_output/ ||:
|
||||
if [[ -n "$WITH_COVERAGE" ]] && [[ "$WITH_COVERAGE" -eq 1 ]]; then
|
||||
tar -chf /test_output/clickhouse_coverage.tar.gz /profraw ||:
|
||||
fi
|
||||
|
@ -46,4 +46,5 @@ ENV NUM_TRIES=1
|
||||
ENV MAX_RUN_TIME=0
|
||||
|
||||
COPY run.sh /
|
||||
COPY process_functional_tests_result.py /
|
||||
CMD ["/bin/bash", "/run.sh"]
|
||||
|
118
docker/test/stateless/process_functional_tests_result.py
Executable file
118
docker/test/stateless/process_functional_tests_result.py
Executable file
@ -0,0 +1,118 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import os
|
||||
import logging
|
||||
import argparse
|
||||
import csv
|
||||
|
||||
OK_SIGN = "[ OK "
|
||||
FAIL_SING = "[ FAIL "
|
||||
TIMEOUT_SING = "[ Timeout! "
|
||||
UNKNOWN_SIGN = "[ UNKNOWN "
|
||||
SKIPPED_SIGN = "[ SKIPPED "
|
||||
HUNG_SIGN = "Found hung queries in processlist"
|
||||
|
||||
def process_test_log(log_path):
|
||||
total = 0
|
||||
skipped = 0
|
||||
unknown = 0
|
||||
failed = 0
|
||||
success = 0
|
||||
hung = False
|
||||
test_results = []
|
||||
with open(log_path, 'r') as test_file:
|
||||
for line in test_file:
|
||||
line = line.strip()
|
||||
if HUNG_SIGN in line:
|
||||
hung = True
|
||||
if any(sign in line for sign in (OK_SIGN, FAIL_SING, UNKNOWN_SIGN, SKIPPED_SIGN)):
|
||||
test_name = line.split(' ')[2].split(':')[0]
|
||||
|
||||
test_time = ''
|
||||
try:
|
||||
time_token = line.split(']')[1].strip().split()[0]
|
||||
float(time_token)
|
||||
test_time = time_token
|
||||
except:
|
||||
pass
|
||||
|
||||
total += 1
|
||||
if TIMEOUT_SING in line:
|
||||
failed += 1
|
||||
test_results.append((test_name, "Timeout", test_time))
|
||||
elif FAIL_SING in line:
|
||||
failed += 1
|
||||
test_results.append((test_name, "FAIL", test_time))
|
||||
elif UNKNOWN_SIGN in line:
|
||||
unknown += 1
|
||||
test_results.append((test_name, "FAIL", test_time))
|
||||
elif SKIPPED_SIGN in line:
|
||||
skipped += 1
|
||||
test_results.append((test_name, "SKIPPED", test_time))
|
||||
else:
|
||||
success += int(OK_SIGN in line)
|
||||
test_results.append((test_name, "OK", test_time))
|
||||
return total, skipped, unknown, failed, success, hung, test_results
|
||||
|
||||
def process_result(result_path):
|
||||
test_results = []
|
||||
state = "success"
|
||||
description = ""
|
||||
files = os.listdir(result_path)
|
||||
if files:
|
||||
logging.info("Find files in result folder %s", ','.join(files))
|
||||
result_path = os.path.join(result_path, 'test_result.txt')
|
||||
else:
|
||||
result_path = None
|
||||
description = "No output log"
|
||||
state = "error"
|
||||
|
||||
if result_path and os.path.exists(result_path):
|
||||
total, skipped, unknown, failed, success, hung, test_results = process_test_log(result_path)
|
||||
is_flacky_check = 1 < int(os.environ.get('NUM_TRIES', 1))
|
||||
# If no tests were run (success == 0) it indicates an error (e.g. server did not start or crashed immediately)
|
||||
# But it's Ok for "flaky checks" - they can contain just one test for check which is marked as skipped.
|
||||
if failed != 0 or unknown != 0 or (success == 0 and (not is_flacky_check)):
|
||||
state = "failure"
|
||||
|
||||
if hung:
|
||||
description = "Some queries hung, "
|
||||
state = "failure"
|
||||
else:
|
||||
description = ""
|
||||
|
||||
description += "fail: {}, passed: {}".format(failed, success)
|
||||
if skipped != 0:
|
||||
description += ", skipped: {}".format(skipped)
|
||||
if unknown != 0:
|
||||
description += ", unknown: {}".format(unknown)
|
||||
else:
|
||||
state = "failure"
|
||||
description = "Output log doesn't exist"
|
||||
test_results = []
|
||||
|
||||
return state, description, test_results
|
||||
|
||||
|
||||
def write_results(results_file, status_file, results, status):
|
||||
with open(results_file, 'w') as f:
|
||||
out = csv.writer(f, delimiter='\t')
|
||||
out.writerows(results)
|
||||
with open(status_file, 'w') as f:
|
||||
out = csv.writer(f, delimiter='\t')
|
||||
out.writerow(status)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
|
||||
parser = argparse.ArgumentParser(description="ClickHouse script for parsing results of functional tests")
|
||||
parser.add_argument("--in-results-dir", default='/test_output/')
|
||||
parser.add_argument("--out-results-file", default='/test_output/test_results.tsv')
|
||||
parser.add_argument("--out-status-file", default='/test_output/check_status.tsv')
|
||||
args = parser.parse_args()
|
||||
|
||||
state, description, test_results = process_result(args.in_results_dir)
|
||||
logging.info("Result parsed")
|
||||
status = (state, description)
|
||||
write_results(args.out_results_file, args.out_status_file, test_results, status)
|
||||
logging.info("Result written")
|
@ -72,5 +72,12 @@ export -f run_tests
|
||||
|
||||
timeout "$MAX_RUN_TIME" bash -c run_tests ||:
|
||||
|
||||
./process_functional_tests_result.py || echo -e "failure\tCannot parse results" > /test_output/check_status.tsv
|
||||
|
||||
pigz < /var/log/clickhouse-server/clickhouse-server.log > /test_output/clickhouse-server.log.gz ||:
|
||||
mv /var/log/clickhouse-server/stderr.log /test_output/ ||:
|
||||
if [[ -n "$WITH_COVERAGE" ]] && [[ "$WITH_COVERAGE" -eq 1 ]]; then
|
||||
tar -chf /test_output/clickhouse_coverage.tar.gz /profraw ||:
|
||||
fi
|
||||
tar -chf /test_output/text_log_dump.tar /var/lib/clickhouse/data/system/text_log ||:
|
||||
tar -chf /test_output/query_log_dump.tar /var/lib/clickhouse/data/system/query_log ||:
|
||||
|
@ -53,10 +53,14 @@ handle SIGBUS stop print
|
||||
handle SIGABRT stop print
|
||||
continue
|
||||
thread apply all backtrace
|
||||
continue
|
||||
detach
|
||||
quit
|
||||
" > script.gdb
|
||||
|
||||
gdb -batch -command script.gdb -p "$(cat /var/run/clickhouse-server/clickhouse-server.pid)" &
|
||||
# FIXME Hung check may work incorrectly because of attached gdb
|
||||
# 1. False positives are possible
|
||||
# 2. We cannot attach another gdb to get stacktraces if some queries hung
|
||||
gdb -batch -command script.gdb -p "$(cat /var/run/clickhouse-server/clickhouse-server.pid)" >> /test_output/gdb.log &
|
||||
}
|
||||
|
||||
configure
|
||||
@ -78,11 +82,55 @@ clickhouse-client --query "RENAME TABLE datasets.hits_v1 TO test.hits"
|
||||
clickhouse-client --query "RENAME TABLE datasets.visits_v1 TO test.visits"
|
||||
clickhouse-client --query "SHOW TABLES FROM test"
|
||||
|
||||
./stress --hung-check --output-folder test_output --skip-func-tests "$SKIP_TESTS_OPTION" && echo "OK" > /test_output/script_exit_code.txt || echo "FAIL" > /test_output/script_exit_code.txt
|
||||
./stress --hung-check --output-folder test_output --skip-func-tests "$SKIP_TESTS_OPTION" \
|
||||
&& echo -e 'Test script exit code\tOK' >> /test_output/test_results.tsv \
|
||||
|| echo -e 'Test script failed\tFAIL' >> /test_output/test_results.tsv
|
||||
|
||||
stop
|
||||
start
|
||||
|
||||
clickhouse-client --query "SELECT 'Server successfuly started'" > /test_output/alive_check.txt || echo 'Server failed to start' > /test_output/alive_check.txt
|
||||
clickhouse-client --query "SELECT 'Server successfully started', 'OK'" >> /test_output/test_results.tsv \
|
||||
|| echo -e 'Server failed to start\tFAIL' >> /test_output/test_results.tsv
|
||||
|
||||
[ -f /var/log/clickhouse-server/clickhouse-server.log ] || echo -e "Server log does not exist\tFAIL"
|
||||
[ -f /var/log/clickhouse-server/stderr.log ] || echo -e "Stderr log does not exist\tFAIL"
|
||||
|
||||
# Print Fatal log messages to stdout
|
||||
zgrep -Fa " <Fatal> " /var/log/clickhouse-server/clickhouse-server.log
|
||||
|
||||
# Grep logs for sanitizer asserts, crashes and other critical errors
|
||||
|
||||
# Sanitizer asserts
|
||||
zgrep -Fa "==================" /var/log/clickhouse-server/stderr.log >> /test_output/tmp
|
||||
zgrep -Fa "WARNING" /var/log/clickhouse-server/stderr.log >> /test_output/tmp
|
||||
zgrep -Fav "ASan doesn't fully support makecontext/swapcontext functions" > /dev/null \
|
||||
&& echo -e 'Sanitizer assert (in stderr.log)\tFAIL' >> /test_output/test_results.tsv \
|
||||
|| echo -e 'No sanitizer asserts\tOK' >> /test_output/test_results.tsv
|
||||
rm -f /test_output/tmp
|
||||
|
||||
# Logical errors
|
||||
zgrep -Fa "Code: 49, e.displayText() = DB::Exception:" /var/log/clickhouse-server/clickhouse-server.log > /dev/null \
|
||||
&& echo -e 'Logical error thrown (see clickhouse-server.log)\tFAIL' >> /test_output/test_results.tsv \
|
||||
|| echo -e 'No logical errors\tOK' >> /test_output/test_results.tsv
|
||||
|
||||
# Crash
|
||||
zgrep -Fa "########################################" /var/log/clickhouse-server/clickhouse-server.log > /dev/null \
|
||||
&& echo -e 'Killed by signal (in clickhouse-server.log)\tFAIL' >> /test_output/test_results.tsv \
|
||||
|| echo -e 'Not crashed\tOK' >> /test_output/test_results.tsv
|
||||
|
||||
# It also checks for OOM or crash without stacktrace (printed by watchdog)
|
||||
zgrep -Fa " <Fatal> " /var/log/clickhouse-server/clickhouse-server.log > /dev/null \
|
||||
&& echo -e 'Fatal message in clickhouse-server.log\tFAIL' >> /test_output/test_results.tsv \
|
||||
|| echo -e 'No fatal messages in clickhouse-server.log\tOK' >> /test_output/test_results.tsv
|
||||
|
||||
zgrep -Fa "########################################" /test_output/* > /dev/null \
|
||||
&& echo -e 'Killed by signal (output files)\tFAIL' >> /test_output/test_results.tsv
|
||||
|
||||
# Put logs into /test_output/
|
||||
pigz < /var/log/clickhouse-server/clickhouse-server.log > /test_output/clickhouse-server.log.gz
|
||||
tar -chf /test_output/coordination.tar /var/lib/clickhouse/coordination ||:
|
||||
mv /var/log/clickhouse-server/stderr.log /test_output/
|
||||
|
||||
# Write check result into check_status.tsv
|
||||
clickhouse-local --structure "test String, res String" -q "SELECT 'failure', test FROM table WHERE res != 'OK' order by (lower(test) like '%hung%') LIMIT 1" < /test_output/test_results.tsv > /test_output/check_status.tsv
|
||||
[ -s /test_output/check_status.tsv ] || echo -e "success\tNo errors found" > /test_output/check_status.tsv
|
||||
|
@ -58,6 +58,27 @@ def run_func_test(cmd, output_prefix, num_processes, skip_tests_option, global_t
|
||||
time.sleep(0.5)
|
||||
return pipes
|
||||
|
||||
def prepare_for_hung_check():
|
||||
# FIXME this function should not exist, but...
|
||||
|
||||
# We attach gdb to clickhouse-server before running tests
|
||||
# to print stacktraces of all crashes even if clickhouse cannot print it for some reason.
|
||||
# However, it obstruct checking for hung queries.
|
||||
logging.info("Will terminate gdb (if any)")
|
||||
call("kill -TERM $(pidof gdb)", shell=True, stderr=STDOUT)
|
||||
|
||||
# Some tests execute SYSTEM STOP MERGES or similar queries.
|
||||
# It may cause some ALTERs to hang.
|
||||
# Possibly we should fix tests and forbid to use such queries without specifying table.
|
||||
call("clickhouse client -q 'SYSTEM START MERGES'", shell=True, stderr=STDOUT)
|
||||
call("clickhouse client -q 'SYSTEM START DISTRIBUTED SENDS'", shell=True, stderr=STDOUT)
|
||||
call("clickhouse client -q 'SYSTEM START TTL MERGES'", shell=True, stderr=STDOUT)
|
||||
call("clickhouse client -q 'SYSTEM START MOVES'", shell=True, stderr=STDOUT)
|
||||
call("clickhouse client -q 'SYSTEM START FETCHES'", shell=True, stderr=STDOUT)
|
||||
call("clickhouse client -q 'SYSTEM START REPLICATED SENDS'", shell=True, stderr=STDOUT)
|
||||
call("clickhouse client -q 'SYSTEM START REPLICATION QUEUES'", shell=True, stderr=STDOUT)
|
||||
|
||||
time.sleep(30)
|
||||
|
||||
if __name__ == "__main__":
|
||||
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
|
||||
@ -88,11 +109,14 @@ if __name__ == "__main__":
|
||||
|
||||
logging.info("All processes finished")
|
||||
if args.hung_check:
|
||||
prepare_for_hung_check()
|
||||
logging.info("Checking if some queries hung")
|
||||
cmd = "{} {} {}".format(args.test_cmd, "--hung-check", "00001_select_1")
|
||||
res = call(cmd, shell=True, stderr=STDOUT)
|
||||
hung_check_status = "No queries hung\tOK\n"
|
||||
if res != 0:
|
||||
logging.info("Hung check failed with exit code {}".format(res))
|
||||
sys.exit(1)
|
||||
hung_check_status = "Hung check failed\tFAIL\n"
|
||||
open(os.path.join(args.output_folder, "test_results.tsv"), 'w+').write(hung_check_status)
|
||||
|
||||
logging.info("Stress test finished")
|
||||
|
@ -10,14 +10,6 @@ RUN apt-get update && env DEBIAN_FRONTEND=noninteractive apt-get install --yes \
|
||||
yamllint \
|
||||
&& pip3 install codespell
|
||||
|
||||
|
||||
# For |& syntax
|
||||
SHELL ["bash", "-c"]
|
||||
|
||||
CMD cd /ClickHouse/utils/check-style && \
|
||||
./check-style -n |& tee /test_output/style_output.txt && \
|
||||
./check-typos |& tee /test_output/typos_output.txt && \
|
||||
./check-whitespaces -n |& tee /test_output/whitespaces_output.txt && \
|
||||
./check-duplicate-includes.sh |& tee /test_output/duplicate_output.txt && \
|
||||
./shellcheck-run.sh |& tee /test_output/shellcheck_output.txt && \
|
||||
true
|
||||
COPY run.sh /
|
||||
COPY process_style_check_result.py /
|
||||
CMD ["/bin/bash", "/run.sh"]
|
||||
|
96
docker/test/style/process_style_check_result.py
Executable file
96
docker/test/style/process_style_check_result.py
Executable file
@ -0,0 +1,96 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import os
|
||||
import logging
|
||||
import argparse
|
||||
import csv
|
||||
|
||||
|
||||
def process_result(result_folder):
|
||||
status = "success"
|
||||
description = ""
|
||||
test_results = []
|
||||
|
||||
style_log_path = '{}/style_output.txt'.format(result_folder)
|
||||
if not os.path.exists(style_log_path):
|
||||
logging.info("No style check log on path %s", style_log_path)
|
||||
return "exception", "No style check log", []
|
||||
elif os.stat(style_log_path).st_size != 0:
|
||||
description += "Style check failed. "
|
||||
test_results.append(("Style check", "FAIL"))
|
||||
status = "failure" # Disabled for now
|
||||
else:
|
||||
test_results.append(("Style check", "OK"))
|
||||
|
||||
typos_log_path = '{}/typos_output.txt'.format(result_folder)
|
||||
if not os.path.exists(style_log_path):
|
||||
logging.info("No typos check log on path %s", style_log_path)
|
||||
return "exception", "No typos check log", []
|
||||
elif os.stat(typos_log_path).st_size != 0:
|
||||
description += "Typos check failed. "
|
||||
test_results.append(("Typos check", "FAIL"))
|
||||
status = "failure"
|
||||
else:
|
||||
test_results.append(("Typos check", "OK"))
|
||||
|
||||
whitespaces_log_path = '{}/whitespaces_output.txt'.format(result_folder)
|
||||
if not os.path.exists(style_log_path):
|
||||
logging.info("No whitespaces check log on path %s", style_log_path)
|
||||
return "exception", "No whitespaces check log", []
|
||||
elif os.stat(whitespaces_log_path).st_size != 0:
|
||||
description += "Whitespaces check failed. "
|
||||
test_results.append(("Whitespaces check", "FAIL"))
|
||||
status = "failure"
|
||||
else:
|
||||
test_results.append(("Whitespaces check", "OK"))
|
||||
|
||||
duplicate_log_path = '{}/duplicate_output.txt'.format(result_folder)
|
||||
if not os.path.exists(duplicate_log_path):
|
||||
logging.info("No header duplicates check log on path %s", duplicate_log_path)
|
||||
return "exception", "No header duplicates check log", []
|
||||
elif os.stat(duplicate_log_path).st_size != 0:
|
||||
description += " Header duplicates check failed. "
|
||||
test_results.append(("Header duplicates check", "FAIL"))
|
||||
status = "failure"
|
||||
else:
|
||||
test_results.append(("Header duplicates check", "OK"))
|
||||
|
||||
shellcheck_log_path = '{}/shellcheck_output.txt'.format(result_folder)
|
||||
if not os.path.exists(shellcheck_log_path):
|
||||
logging.info("No shellcheck log on path %s", shellcheck_log_path)
|
||||
return "exception", "No shellcheck log", []
|
||||
elif os.stat(shellcheck_log_path).st_size != 0:
|
||||
description += " Shellcheck check failed. "
|
||||
test_results.append(("Shellcheck ", "FAIL"))
|
||||
status = "failure"
|
||||
else:
|
||||
test_results.append(("Shellcheck", "OK"))
|
||||
|
||||
if not description:
|
||||
description += "Style check success"
|
||||
|
||||
return status, description, test_results
|
||||
|
||||
|
||||
def write_results(results_file, status_file, results, status):
|
||||
with open(results_file, 'w') as f:
|
||||
out = csv.writer(f, delimiter='\t')
|
||||
out.writerows(results)
|
||||
with open(status_file, 'w') as f:
|
||||
out = csv.writer(f, delimiter='\t')
|
||||
out.writerow(status)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
|
||||
parser = argparse.ArgumentParser(description="ClickHouse script for parsing results of style check")
|
||||
parser.add_argument("--in-results-dir", default='/test_output/')
|
||||
parser.add_argument("--out-results-file", default='/test_output/test_results.tsv')
|
||||
parser.add_argument("--out-status-file", default='/test_output/check_status.tsv')
|
||||
args = parser.parse_args()
|
||||
|
||||
state, description, test_results = process_result(args.in_results_dir)
|
||||
logging.info("Result parsed")
|
||||
status = (state, description)
|
||||
write_results(args.out_results_file, args.out_status_file, test_results, status)
|
||||
logging.info("Result written")
|
9
docker/test/style/run.sh
Executable file
9
docker/test/style/run.sh
Executable file
@ -0,0 +1,9 @@
|
||||
#!/bin/bash
|
||||
|
||||
cd /ClickHouse/utils/check-style || echo -e "failure\tRepo not found" > /test_output/check_status.tsv
|
||||
./check-style -n |& tee /test_output/style_output.txt
|
||||
./check-typos |& tee /test_output/typos_output.txt
|
||||
./check-whitespaces -n |& tee /test_output/whitespaces_output.txt
|
||||
./check-duplicate-includes.sh |& tee /test_output/duplicate_output.txt
|
||||
./shellcheck-run.sh |& tee /test_output/shellcheck_output.txt
|
||||
/process_style_check_result.py || echo -e "failure\tCannot parse results" > /test_output/check_status.tsv
|
@ -61,6 +61,7 @@ RUN set -eux; \
|
||||
|
||||
COPY modprobe.sh /usr/local/bin/modprobe
|
||||
COPY dockerd-entrypoint.sh /usr/local/bin/
|
||||
COPY process_testflows_result.py /usr/local/bin/
|
||||
|
||||
RUN set -x \
|
||||
&& addgroup --system dockremap \
|
||||
@ -72,5 +73,5 @@ RUN set -x \
|
||||
VOLUME /var/lib/docker
|
||||
EXPOSE 2375
|
||||
ENTRYPOINT ["dockerd-entrypoint.sh"]
|
||||
CMD ["sh", "-c", "python3 regression.py --no-color -o classic --local --clickhouse-binary-path ${CLICKHOUSE_TESTS_SERVER_BIN_PATH} --log test.log ${TESTFLOWS_OPTS}; cat test.log | tfs report results --format json > results.json"]
|
||||
CMD ["sh", "-c", "python3 regression.py --no-color -o classic --local --clickhouse-binary-path ${CLICKHOUSE_TESTS_SERVER_BIN_PATH} --log test.log ${TESTFLOWS_OPTS}; cat test.log | tfs report results --format json > results.json; /usr/local/bin/process_testflows_result.py || echo -e 'failure\tCannot parse results' > check_status.tsv"]
|
||||
|
||||
|
67
docker/test/testflows/runner/process_testflows_result.py
Executable file
67
docker/test/testflows/runner/process_testflows_result.py
Executable file
@ -0,0 +1,67 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import os
|
||||
import logging
|
||||
import argparse
|
||||
import csv
|
||||
import json
|
||||
|
||||
|
||||
def process_result(result_folder):
|
||||
json_path = os.path.join(result_folder, "results.json")
|
||||
if not os.path.exists(json_path):
|
||||
return "success", "No testflows in branch", None, []
|
||||
|
||||
test_binary_log = os.path.join(result_folder, "test.log")
|
||||
with open(json_path) as source:
|
||||
results = json.loads(source.read())
|
||||
|
||||
total_tests = 0
|
||||
total_ok = 0
|
||||
total_fail = 0
|
||||
total_other = 0
|
||||
test_results = []
|
||||
for test in results["tests"]:
|
||||
test_name = test['test']['test_name']
|
||||
test_result = test['result']['result_type'].upper()
|
||||
test_time = str(test['result']['message_rtime'])
|
||||
total_tests += 1
|
||||
if test_result == "OK":
|
||||
total_ok += 1
|
||||
elif test_result == "FAIL" or test_result == "ERROR":
|
||||
total_fail += 1
|
||||
else:
|
||||
total_other += 1
|
||||
|
||||
test_results.append((test_name, test_result, test_time))
|
||||
if total_fail != 0:
|
||||
status = "failure"
|
||||
else:
|
||||
status = "success"
|
||||
|
||||
description = "failed: {}, passed: {}, other: {}".format(total_fail, total_ok, total_other)
|
||||
return status, description, test_results, [json_path, test_binary_log]
|
||||
|
||||
|
||||
def write_results(results_file, status_file, results, status):
|
||||
with open(results_file, 'w') as f:
|
||||
out = csv.writer(f, delimiter='\t')
|
||||
out.writerows(results)
|
||||
with open(status_file, 'w') as f:
|
||||
out = csv.writer(f, delimiter='\t')
|
||||
out.writerow(status)
|
||||
|
||||
if __name__ == "__main__":
|
||||
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
|
||||
parser = argparse.ArgumentParser(description="ClickHouse script for parsing results of Testflows tests")
|
||||
parser.add_argument("--in-results-dir", default='./')
|
||||
parser.add_argument("--out-results-file", default='./test_results.tsv')
|
||||
parser.add_argument("--out-status-file", default='./check_status.tsv')
|
||||
args = parser.parse_args()
|
||||
|
||||
state, description, test_results, logs = process_result(args.in_results_dir)
|
||||
logging.info("Result parsed")
|
||||
status = (state, description)
|
||||
write_results(args.out_results_file, args.out_status_file, test_results, status)
|
||||
logging.info("Result written")
|
||||
|
@ -5,6 +5,6 @@ ENV TZ=Europe/Moscow
|
||||
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
|
||||
RUN apt-get install gdb
|
||||
|
||||
CMD service zookeeper start && sleep 7 && /usr/share/zookeeper/bin/zkCli.sh -server localhost:2181 -create create /clickhouse_test ''; \
|
||||
gdb -q -ex 'set print inferior-events off' -ex 'set confirm off' -ex 'set print thread-events off' -ex run -ex bt -ex quit --args ./unit_tests_dbms | tee test_output/test_result.txt
|
||||
|
||||
COPY run.sh /
|
||||
COPY process_unit_tests_result.py /
|
||||
CMD ["/bin/bash", "/run.sh"]
|
||||
|
96
docker/test/unit/process_unit_tests_result.py
Executable file
96
docker/test/unit/process_unit_tests_result.py
Executable file
@ -0,0 +1,96 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import os
|
||||
import logging
|
||||
import argparse
|
||||
import csv
|
||||
|
||||
OK_SIGN = 'OK ]'
|
||||
FAILED_SIGN = 'FAILED ]'
|
||||
SEGFAULT = 'Segmentation fault'
|
||||
SIGNAL = 'received signal SIG'
|
||||
PASSED = 'PASSED'
|
||||
|
||||
def get_test_name(line):
|
||||
elements = reversed(line.split(' '))
|
||||
for element in elements:
|
||||
if '(' not in element and ')' not in element:
|
||||
return element
|
||||
raise Exception("No test name in line '{}'".format(line))
|
||||
|
||||
def process_result(result_folder):
|
||||
summary = []
|
||||
total_counter = 0
|
||||
failed_counter = 0
|
||||
result_log_path = '{}/test_result.txt'.format(result_folder)
|
||||
if not os.path.exists(result_log_path):
|
||||
logging.info("No output log on path %s", result_log_path)
|
||||
return "exception", "No output log", []
|
||||
|
||||
status = "success"
|
||||
description = ""
|
||||
passed = False
|
||||
with open(result_log_path, 'r') as test_result:
|
||||
for line in test_result:
|
||||
if OK_SIGN in line:
|
||||
logging.info("Found ok line: '%s'", line)
|
||||
test_name = get_test_name(line.strip())
|
||||
logging.info("Test name: '%s'", test_name)
|
||||
summary.append((test_name, "OK"))
|
||||
total_counter += 1
|
||||
elif FAILED_SIGN in line and 'listed below' not in line and 'ms)' in line:
|
||||
logging.info("Found fail line: '%s'", line)
|
||||
test_name = get_test_name(line.strip())
|
||||
logging.info("Test name: '%s'", test_name)
|
||||
summary.append((test_name, "FAIL"))
|
||||
total_counter += 1
|
||||
failed_counter += 1
|
||||
elif SEGFAULT in line:
|
||||
logging.info("Found segfault line: '%s'", line)
|
||||
status = "failure"
|
||||
description += "Segmentation fault. "
|
||||
break
|
||||
elif SIGNAL in line:
|
||||
logging.info("Received signal line: '%s'", line)
|
||||
status = "failure"
|
||||
description += "Exit on signal. "
|
||||
break
|
||||
elif PASSED in line:
|
||||
logging.info("PASSED record found: '%s'", line)
|
||||
passed = True
|
||||
|
||||
if not passed:
|
||||
status = "failure"
|
||||
description += "PASSED record not found. "
|
||||
|
||||
if failed_counter != 0:
|
||||
status = "failure"
|
||||
|
||||
if not description:
|
||||
description += "fail: {}, passed: {}".format(failed_counter, total_counter - failed_counter)
|
||||
|
||||
return status, description, summary
|
||||
|
||||
|
||||
def write_results(results_file, status_file, results, status):
|
||||
with open(results_file, 'w') as f:
|
||||
out = csv.writer(f, delimiter='\t')
|
||||
out.writerows(results)
|
||||
with open(status_file, 'w') as f:
|
||||
out = csv.writer(f, delimiter='\t')
|
||||
out.writerow(status)
|
||||
|
||||
if __name__ == "__main__":
|
||||
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
|
||||
parser = argparse.ArgumentParser(description="ClickHouse script for parsing results of unit tests")
|
||||
parser.add_argument("--in-results-dir", default='/test_output/')
|
||||
parser.add_argument("--out-results-file", default='/test_output/test_results.tsv')
|
||||
parser.add_argument("--out-status-file", default='/test_output/check_status.tsv')
|
||||
args = parser.parse_args()
|
||||
|
||||
state, description, test_results = process_result(args.in_results_dir)
|
||||
logging.info("Result parsed")
|
||||
status = (state, description)
|
||||
write_results(args.out_results_file, args.out_status_file, test_results, status)
|
||||
logging.info("Result written")
|
||||
|
7
docker/test/unit/run.sh
Normal file
7
docker/test/unit/run.sh
Normal file
@ -0,0 +1,7 @@
|
||||
#!/bin/bash
|
||||
|
||||
set -x
|
||||
|
||||
service zookeeper start && sleep 7 && /usr/share/zookeeper/bin/zkCli.sh -server localhost:2181 -create create /clickhouse_test '';
|
||||
gdb -q -ex 'set print inferior-events off' -ex 'set confirm off' -ex 'set print thread-events off' -ex run -ex bt -ex quit --args ./unit_tests_dbms | tee test_output/test_result.txt
|
||||
./process_unit_tests_result.py || echo -e "failure\tCannot parse results" > /test_output/check_status.tsv
|
@ -701,6 +701,32 @@ The `default` storage policy implies using only one volume, which consists of on
|
||||
|
||||
The number of threads performing background moves of data parts can be changed by [background_move_pool_size](../../../operations/settings/settings.md#background_move_pool_size) setting.
|
||||
|
||||
### Details {#details}
|
||||
|
||||
In the case of `MergeTree` tables, data is getting to disk in different ways:
|
||||
|
||||
- As a result of an insert (`INSERT` query).
|
||||
- During background merges and [mutations](../../../sql-reference/statements/alter/index.md#alter-mutations).
|
||||
- When downloading from another replica.
|
||||
- As a result of partition freezing [ALTER TABLE … FREEZE PARTITION](../../../sql-reference/statements/alter/partition.md#alter_freeze-partition).
|
||||
|
||||
In all these cases except for mutations and partition freezing, a part is stored on a volume and a disk according to the given storage policy:
|
||||
|
||||
1. The first volume (in the order of definition) that has enough disk space for storing a part (`unreserved_space > current_part_size`) and allows for storing parts of a given size (`max_data_part_size_bytes > current_part_size`) is chosen.
|
||||
2. Within this volume, that disk is chosen that follows the one, which was used for storing the previous chunk of data, and that has free space more than the part size (`unreserved_space - keep_free_space_bytes > current_part_size`).
|
||||
|
||||
Under the hood, mutations and partition freezing make use of [hard links](https://en.wikipedia.org/wiki/Hard_link). Hard links between different disks are not supported, therefore in such cases the resulting parts are stored on the same disks as the initial ones.
|
||||
|
||||
In the background, parts are moved between volumes on the basis of the amount of free space (`move_factor` parameter) according to the order the volumes are declared in the configuration file.
|
||||
Data is never transferred from the last one and into the first one. One may use system tables [system.part_log](../../../operations/system-tables/part_log.md#system_tables-part-log) (field `type = MOVE_PART`) and [system.parts](../../../operations/system-tables/parts.md#system_tables-parts) (fields `path` and `disk`) to monitor background moves. Also, the detailed information can be found in server logs.
|
||||
|
||||
User can force moving a part or a partition from one volume to another using the query [ALTER TABLE … MOVE PART\|PARTITION … TO VOLUME\|DISK …](../../../sql-reference/statements/alter/partition.md#alter_move-partition), all the restrictions for background operations are taken into account. The query initiates a move on its own and does not wait for background operations to be completed. User will get an error message if not enough free space is available or if any of the required conditions are not met.
|
||||
|
||||
Moving data does not interfere with data replication. Therefore, different storage policies can be specified for the same table on different replicas.
|
||||
|
||||
After the completion of background merges and mutations, old parts are removed only after a certain amount of time (`old_parts_lifetime`).
|
||||
During this time, they are not moved to other volumes or disks. Therefore, until the parts are finally removed, they are still taken into account for evaluation of the occupied disk space.
|
||||
|
||||
## Using S3 for Data Storage {#table_engine-mergetree-s3}
|
||||
|
||||
`MergeTree` family table engines is able to store data to [S3](https://aws.amazon.com/s3/) using a disk with type `s3`.
|
||||
@ -793,30 +819,4 @@ S3 disk can be configured as `main` or `cold` storage:
|
||||
|
||||
In case of `cold` option a data can be moved to S3 if local disk free size will be smaller than `move_factor * disk_size` or by TTL move rule.
|
||||
|
||||
### Details {#details}
|
||||
|
||||
In the case of `MergeTree` tables, data is getting to disk in different ways:
|
||||
|
||||
- As a result of an insert (`INSERT` query).
|
||||
- During background merges and [mutations](../../../sql-reference/statements/alter/index.md#alter-mutations).
|
||||
- When downloading from another replica.
|
||||
- As a result of partition freezing [ALTER TABLE … FREEZE PARTITION](../../../sql-reference/statements/alter/partition.md#alter_freeze-partition).
|
||||
|
||||
In all these cases except for mutations and partition freezing, a part is stored on a volume and a disk according to the given storage policy:
|
||||
|
||||
1. The first volume (in the order of definition) that has enough disk space for storing a part (`unreserved_space > current_part_size`) and allows for storing parts of a given size (`max_data_part_size_bytes > current_part_size`) is chosen.
|
||||
2. Within this volume, that disk is chosen that follows the one, which was used for storing the previous chunk of data, and that has free space more than the part size (`unreserved_space - keep_free_space_bytes > current_part_size`).
|
||||
|
||||
Under the hood, mutations and partition freezing make use of [hard links](https://en.wikipedia.org/wiki/Hard_link). Hard links between different disks are not supported, therefore in such cases the resulting parts are stored on the same disks as the initial ones.
|
||||
|
||||
In the background, parts are moved between volumes on the basis of the amount of free space (`move_factor` parameter) according to the order the volumes are declared in the configuration file.
|
||||
Data is never transferred from the last one and into the first one. One may use system tables [system.part_log](../../../operations/system-tables/part_log.md#system_tables-part-log) (field `type = MOVE_PART`) and [system.parts](../../../operations/system-tables/parts.md#system_tables-parts) (fields `path` and `disk`) to monitor background moves. Also, the detailed information can be found in server logs.
|
||||
|
||||
User can force moving a part or a partition from one volume to another using the query [ALTER TABLE … MOVE PART\|PARTITION … TO VOLUME\|DISK …](../../../sql-reference/statements/alter/partition.md#alter_move-partition), all the restrictions for background operations are taken into account. The query initiates a move on its own and does not wait for background operations to be completed. User will get an error message if not enough free space is available or if any of the required conditions are not met.
|
||||
|
||||
Moving data does not interfere with data replication. Therefore, different storage policies can be specified for the same table on different replicas.
|
||||
|
||||
After the completion of background merges and mutations, old parts are removed only after a certain amount of time (`old_parts_lifetime`).
|
||||
During this time, they are not moved to other volumes or disks. Therefore, until the parts are finally removed, they are still taken into account for evaluation of the occupied disk space.
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/ru/operations/table_engines/mergetree/) <!--hide-->
|
||||
|
@ -14,26 +14,19 @@ avg(x)
|
||||
|
||||
**Arguments**
|
||||
|
||||
- `x` — Values.
|
||||
|
||||
`x` must be
|
||||
[Integer](../../../sql-reference/data-types/int-uint.md),
|
||||
[floating-point](../../../sql-reference/data-types/float.md), or
|
||||
[Decimal](../../../sql-reference/data-types/decimal.md).
|
||||
- `x` — input values, must be [Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md), or [Decimal](../../../sql-reference/data-types/decimal.md).
|
||||
|
||||
**Returned value**
|
||||
|
||||
- `NaN` if the supplied parameter is empty.
|
||||
- Mean otherwise.
|
||||
|
||||
**Return type** is always [Float64](../../../sql-reference/data-types/float.md).
|
||||
- The arithmetic mean, always as [Float64](../../../sql-reference/data-types/float.md).
|
||||
- `NaN` if the input parameter `x` is empty.
|
||||
|
||||
**Example**
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
SELECT avg(x) FROM values('x Int8', 0, 1, 2, 3, 4, 5)
|
||||
SELECT avg(x) FROM values('x Int8', 0, 1, 2, 3, 4, 5);
|
||||
```
|
||||
|
||||
Result:
|
||||
@ -46,11 +39,20 @@ Result:
|
||||
|
||||
**Example**
|
||||
|
||||
Create a temp table:
|
||||
|
||||
Query:
|
||||
|
||||
``` sql
|
||||
CREATE table test (t UInt8) ENGINE = Memory;
|
||||
SELECT avg(t) FROM test
|
||||
```
|
||||
|
||||
Get the arithmetic mean:
|
||||
|
||||
Query:
|
||||
|
||||
```
|
||||
SELECT avg(t) FROM test;
|
||||
```
|
||||
|
||||
Result:
|
||||
@ -60,3 +62,5 @@ Result:
|
||||
│ nan │
|
||||
└────────┘
|
||||
```
|
||||
|
||||
[Original article](https://clickhouse.tech/docs/en/sql-reference/aggregate-functions/reference/avg/) <!--hide-->
|
||||
|
@ -415,7 +415,7 @@ Result:
|
||||
|
||||
## sign(x) {#signx}
|
||||
|
||||
The `sign` function can extract the sign of a real number.
|
||||
Returns the sign of a real number.
|
||||
|
||||
**Syntax**
|
||||
|
||||
@ -433,9 +433,9 @@ sign(x)
|
||||
- 0 for `x = 0`
|
||||
- 1 for `x > 0`
|
||||
|
||||
**Example**
|
||||
**Examples**
|
||||
|
||||
Query:
|
||||
Sign for the zero value:
|
||||
|
||||
``` sql
|
||||
SELECT sign(0);
|
||||
@ -449,7 +449,7 @@ Result:
|
||||
└─────────┘
|
||||
```
|
||||
|
||||
Query:
|
||||
Sign for the positive value:
|
||||
|
||||
``` sql
|
||||
SELECT sign(1);
|
||||
@ -463,7 +463,7 @@ Result:
|
||||
└─────────┘
|
||||
```
|
||||
|
||||
Query:
|
||||
Sign for the negative value:
|
||||
|
||||
``` sql
|
||||
SELECT sign(-1);
|
||||
|
@ -56,13 +56,13 @@ ORDER BY expr
|
||||
|
||||
ClickHouse использует ключ сортировки в качестве первичного ключа, если первичный ключ не задан в секции `PRIMARY KEY`.
|
||||
|
||||
Чтобы отключить сортировку, используйте синтаксис `ORDER BY tuple()`. Смотрите [выбор первичного ключа](#vybor-pervichnogo-kliucha).
|
||||
Чтобы отключить сортировку, используйте синтаксис `ORDER BY tuple()`. Смотрите [выбор первичного ключа](#primary-keys-and-indexes-in-queries).
|
||||
|
||||
- `PARTITION BY` — [ключ партиционирования](custom-partitioning-key.md). Необязательный параметр.
|
||||
|
||||
Для партиционирования по месяцам используйте выражение `toYYYYMM(date_column)`, где `date_column` — столбец с датой типа [Date](../../../engines/table-engines/mergetree-family/mergetree.md). В этом случае имена партиций имеют формат `"YYYYMM"`.
|
||||
|
||||
- `PRIMARY KEY` — первичный ключ, если он [отличается от ключа сортировки](#pervichnyi-kliuch-otlichnyi-ot-kliucha-sortirovki). Необязательный параметр.
|
||||
- `PRIMARY KEY` — первичный ключ, если он [отличается от ключа сортировки](#choosing-a-primary-key-that-differs-from-the-sorting-key). Необязательный параметр.
|
||||
|
||||
По умолчанию первичный ключ совпадает с ключом сортировки (который задаётся секцией `ORDER BY`.) Поэтому в большинстве случаев секцию `PRIMARY KEY` отдельно указывать не нужно.
|
||||
|
||||
@ -188,7 +188,7 @@ ClickHouse не требует уникального первичного кл
|
||||
|
||||
При сортировке с использованием выражения `ORDER BY` для значений `NULL` всегда работает принцип [NULLS_LAST](../../../sql-reference/statements/select/order-by.md#sorting-of-special-values).
|
||||
|
||||
### Выбор первичного ключа {#vybor-pervichnogo-kliucha}
|
||||
### Выбор первичного ключа {#selecting-the-primary-key}
|
||||
|
||||
Количество столбцов в первичном ключе не ограничено явным образом. В зависимости от структуры данных в первичный ключ можно включать больше или меньше столбцов. Это может:
|
||||
|
||||
@ -217,7 +217,7 @@ ClickHouse не требует уникального первичного кл
|
||||
|
||||
|
||||
|
||||
### Первичный ключ, отличный от ключа сортировки {#pervichnyi-kliuch-otlichnyi-ot-kliucha-sortirovki}
|
||||
### Первичный ключ, отличный от ключа сортировки {#choosing-a-primary-key-that-differs-from-the-sorting-key}
|
||||
|
||||
Существует возможность задать первичный ключ (выражение, значения которого будут записаны в индексный файл для
|
||||
каждой засечки), отличный от ключа сортировки (выражение, по которому будут упорядочены строки в кусках
|
||||
@ -236,7 +236,7 @@ ClickHouse не требует уникального первичного кл
|
||||
|
||||
[ALTER ключа сортировки](../../../engines/table-engines/mergetree-family/mergetree.md) — лёгкая операция, так как при одновременном добавлении нового столбца в таблицу и ключ сортировки не нужно изменять данные кусков (они остаются упорядоченными и по новому выражению ключа).
|
||||
|
||||
### Использование индексов и партиций в запросах {#ispolzovanie-indeksov-i-partitsii-v-zaprosakh}
|
||||
### Использование индексов и партиций в запросах {#use-of-indexes-and-partitions-in-queries}
|
||||
|
||||
Для запросов `SELECT` ClickHouse анализирует возможность использования индекса. Индекс может использоваться, если в секции `WHERE/PREWHERE`, в качестве одного из элементов конъюнкции, или целиком, есть выражение, представляющее операции сравнения на равенства, неравенства, а также `IN` или `LIKE` с фиксированным префиксом, над столбцами или выражениями, входящими в первичный ключ или ключ партиционирования, либо над некоторыми частично монотонными функциями от этих столбцов, а также логические связки над такими выражениями.
|
||||
|
||||
@ -270,7 +270,7 @@ SELECT count() FROM table WHERE CounterID = 34 OR URL LIKE '%upyachka%'
|
||||
|
||||
Ключ партиционирования по месяцам обеспечивает чтение только тех блоков данных, которые содержат даты из нужного диапазона. При этом блок данных может содержать данные за многие даты (до целого месяца). В пределах одного блока данные упорядочены по первичному ключу, который может не содержать дату в качестве первого столбца. В связи с этим, при использовании запроса с указанием условия только на дату, но не на префикс первичного ключа, будет читаться данных больше, чем за одну дату.
|
||||
|
||||
### Использование индекса для частично-монотонных первичных ключей {#ispolzovanie-indeksa-dlia-chastichno-monotonnykh-pervichnykh-kliuchei}
|
||||
### Использование индекса для частично-монотонных первичных ключей {#use-of-index-for-partially-monotonic-primary-keys}
|
||||
|
||||
Рассмотрим, например, дни месяца. Они образуют последовательность [монотонную](https://ru.wikipedia.org/wiki/Монотонная_последовательность) в течение одного месяца, но не монотонную на более длительных периодах. Это частично-монотонная последовательность. Если пользователь создаёт таблицу с частично-монотонным первичным ключом, ClickHouse как обычно создаёт разреженный индекс. Когда пользователь выбирает данные из такого рода таблиц, ClickHouse анализирует условия запроса. Если пользователь хочет получить данные между двумя метками индекса, и обе эти метки находятся внутри одного месяца, ClickHouse может использовать индекс в данном конкретном случае, поскольку он может рассчитать расстояние между параметрами запроса и индексными метками.
|
||||
|
||||
@ -312,7 +312,7 @@ SELECT count() FROM table WHERE s < 'z'
|
||||
SELECT count() FROM table WHERE u64 * i32 == 10 AND u64 * length(s) >= 1234
|
||||
```
|
||||
|
||||
#### Доступные индексы {#dostupnye-indeksy}
|
||||
#### Доступные индексы {#available-types-of-indices}
|
||||
|
||||
- `minmax` — Хранит минимум и максимум выражения (если выражение - `tuple`, то для каждого элемента `tuple`), используя их для пропуска блоков аналогично первичному ключу.
|
||||
|
||||
@ -375,7 +375,7 @@ INDEX b (u64 * length(str), i32 + f64 * 100, date, str) TYPE set(100) GRANULARIT
|
||||
- `s != 1`
|
||||
- `NOT startsWith(s, 'test')`
|
||||
|
||||
## Конкурентный доступ к данным {#konkurentnyi-dostup-k-dannym}
|
||||
## Конкурентный доступ к данным {#concurrent-data-access}
|
||||
|
||||
Для конкурентного доступа к таблице используется мультиверсионность. То есть, при одновременном чтении и обновлении таблицы, данные будут читаться из набора кусочков, актуального на момент запроса. Длинных блокировок нет. Вставки никак не мешают чтениям.
|
||||
|
||||
@ -531,13 +531,13 @@ TTL d + INTERVAL 1 MONTH GROUP BY k1, k2 SET x = max(x), y = min(y);
|
||||
|
||||
## Хранение данных таблицы на нескольких блочных устройствах {#table_engine-mergetree-multiple-volumes}
|
||||
|
||||
### Введение {#vvedenie}
|
||||
### Введение {#introduction}
|
||||
|
||||
Движки таблиц семейства `MergeTree` могут хранить данные на нескольких блочных устройствах. Это может оказаться полезным, например, при неявном разделении данных одной таблицы на «горячие» и «холодные». Наиболее свежая часть занимает малый объём и запрашивается регулярно, а большой хвост исторических данных запрашивается редко. При наличии в системе нескольких дисков, «горячая» часть данных может быть размещена на быстрых дисках (например, на NVMe SSD или в памяти), а холодная на более медленных (например, HDD).
|
||||
|
||||
Минимальной перемещаемой единицей для `MergeTree` является кусок данных (data part). Данные одного куска могут находится только на одном диске. Куски могут перемещаться между дисками в фоне, согласно пользовательским настройкам, а также с помощью запросов [ALTER](../../../engines/table-engines/mergetree-family/mergetree.md#alter_move-partition).
|
||||
|
||||
### Термины {#terminy}
|
||||
### Термины {#terms}
|
||||
|
||||
- Диск — примонтированное в файловой системе блочное устройство.
|
||||
- Диск по умолчанию — диск, на котором находится путь, указанный в конфигурационной настройке сервера [path](../../../operations/server-configuration-parameters/settings.md#server_configuration_parameters-path).
|
||||
@ -689,7 +689,7 @@ SETTINGS storage_policy = 'moving_from_ssd_to_hdd'
|
||||
|
||||
Количество потоков для фоновых перемещений кусков между дисками можно изменить с помощью настройки [background_move_pool_size](../../../operations/settings/settings.md#background_move_pool_size)
|
||||
|
||||
### Особенности работы {#osobennosti-raboty}
|
||||
### Особенности работы {#details}
|
||||
|
||||
В таблицах `MergeTree` данные попадают на диск несколькими способами:
|
||||
|
||||
@ -712,4 +712,99 @@ SETTINGS storage_policy = 'moving_from_ssd_to_hdd'
|
||||
|
||||
После выполнения фоновых слияний или мутаций старые куски не удаляются сразу, а через некоторое время (табличная настройка `old_parts_lifetime`). Также они не перемещаются на другие тома или диски, поэтому до момента удаления они продолжают учитываться при подсчёте занятого дискового пространства.
|
||||
|
||||
## Использование сервиса S3 для хранения данных {#table_engine-mergetree-s3}
|
||||
|
||||
Таблицы семейства `MergeTree` могут хранить данные в сервисе [S3](https://aws.amazon.com/s3/) при использовании диска типа `s3`.
|
||||
|
||||
Конфигурация:
|
||||
|
||||
``` xml
|
||||
<storage_configuration>
|
||||
...
|
||||
<disks>
|
||||
<s3>
|
||||
<type>s3</type>
|
||||
<endpoint>https://storage.yandexcloud.net/my-bucket/root-path/</endpoint>
|
||||
<access_key_id>your_access_key_id</access_key_id>
|
||||
<secret_access_key>your_secret_access_key</secret_access_key>
|
||||
<proxy>
|
||||
<uri>http://proxy1</uri>
|
||||
<uri>http://proxy2</uri>
|
||||
</proxy>
|
||||
<connect_timeout_ms>10000</connect_timeout_ms>
|
||||
<request_timeout_ms>5000</request_timeout_ms>
|
||||
<max_connections>100</max_connections>
|
||||
<retry_attempts>10</retry_attempts>
|
||||
<min_bytes_for_seek>1000</min_bytes_for_seek>
|
||||
<metadata_path>/var/lib/clickhouse/disks/s3/</metadata_path>
|
||||
<cache_enabled>true</cache_enabled>
|
||||
<cache_path>/var/lib/clickhouse/disks/s3/cache/</cache_path>
|
||||
<skip_access_check>false</skip_access_check>
|
||||
</s3>
|
||||
</disks>
|
||||
...
|
||||
</storage_configuration>
|
||||
```
|
||||
|
||||
Обязательные параметры:
|
||||
|
||||
- `endpoint` — URL точки приема запроса на стороне S3 в [форматах](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html) `path` или `virtual hosted`. URL точки должен содержать бакет и путь к корневой директории на сервере, где хранятся данные.
|
||||
- `access_key_id` — id ключа доступа к S3.
|
||||
- `secret_access_key` — секретный ключ доступа к S3.
|
||||
|
||||
Необязательные параметры:
|
||||
|
||||
- `use_environment_credentials` — признак, нужно ли считывать учетные данные AWS из переменных окружения `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` и `AWS_SESSION_TOKEN`, если они есть. Значение по умолчанию: `false`.
|
||||
- `proxy` — конфигурация прокси-сервера для конечной точки S3. Каждый элемент `uri` внутри блока `proxy` должен содержать URL прокси-сервера.
|
||||
- `connect_timeout_ms` — таймаут подключения к сокету в миллисекундах. Значение по умолчанию: 10 секунд.
|
||||
- `request_timeout_ms` — таймаут выполнения запроса в миллисекундах. Значение по умолчанию: 5 секунд.
|
||||
- `max_connections` — размер пула соединений S3. Значение по умолчанию: `100`.
|
||||
- `retry_attempts` — число попыток выполнения запроса в случае возникновения ошибки. Значение по умолчанию: `10`.
|
||||
- `min_bytes_for_seek` — минимальное количество байтов, которые используются для операций поиска вместо последовательного чтения. Значение по умолчанию: 1 МБайт.
|
||||
- `metadata_path` — путь к локальному файловому хранилищу для хранения файлов с метаданными для S3. Значение по умолчанию: `/var/lib/clickhouse/disks/<disk_name>/`.
|
||||
- `cache_enabled` — признак, разрешено ли хранение кэша засечек и файлов индекса в локальной файловой системе. Значение по умолчанию: `true`.
|
||||
- `cache_path` — путь в локальной файловой системе, где будут храниться кэш засечек и файлы индекса. Значение по умолчанию: `/var/lib/clickhouse/disks/<disk_name>/cache/`.
|
||||
- `skip_access_check` — признак, выполнять ли проверку доступов при запуске диска. Если установлено значение `true`, то проверка не выполняется. Значение по умолчанию: `false`.
|
||||
|
||||
|
||||
Диск S3 может быть сконфигурирован как `main` или `cold`:
|
||||
|
||||
``` xml
|
||||
<storage_configuration>
|
||||
...
|
||||
<disks>
|
||||
<s3>
|
||||
<type>s3</type>
|
||||
<endpoint>https://storage.yandexcloud.net/my-bucket/root-path/</endpoint>
|
||||
<access_key_id>your_access_key_id</access_key_id>
|
||||
<secret_access_key>your_secret_access_key</secret_access_key>
|
||||
</s3>
|
||||
</disks>
|
||||
<policies>
|
||||
<s3_main>
|
||||
<volumes>
|
||||
<main>
|
||||
<disk>s3</disk>
|
||||
</main>
|
||||
</volumes>
|
||||
</s3_main>
|
||||
<s3_cold>
|
||||
<volumes>
|
||||
<main>
|
||||
<disk>default</disk>
|
||||
</main>
|
||||
<external>
|
||||
<disk>s3</disk>
|
||||
</external>
|
||||
</volumes>
|
||||
<move_factor>0.2</move_factor>
|
||||
</s3_cold>
|
||||
</policies>
|
||||
...
|
||||
</storage_configuration>
|
||||
```
|
||||
|
||||
Если диск сконфигурирован как `cold`, данные будут переноситься в S3 при срабатывании правил TTL или когда свободное место на локальном диске станет меньше порогового значения, которое определяется как `move_factor * disk_size`.
|
||||
|
||||
|
||||
[Оригинальная статья](https://clickhouse.tech/docs/ru/engines/table-engines/mergetree-family/mergetree/) <!--hide-->
|
||||
|
@ -14,9 +14,9 @@ toc_title: clickhouse-local
|
||||
!!! warning "Warning"
|
||||
Мы не рекомендуем подключать серверную конфигурацию к `clickhouse-local`, поскольку данные можно легко повредить неосторожными действиями.
|
||||
|
||||
Для временных данных по умолчанию создается специальный каталог. Если вы хотите обойти это действие, каталог данных можно указать с помощью опции `-- --path`.
|
||||
Для временных данных по умолчанию создается специальный каталог.
|
||||
|
||||
## Вызов программы {#vyzov-programmy}
|
||||
## Вызов программы {#usage}
|
||||
|
||||
Основной формат вызова:
|
||||
|
||||
@ -31,15 +31,23 @@ $ clickhouse-local --structure "table_structure" --input-format "format_of_incom
|
||||
- `-if`, `--input-format` — формат входящих данных. По умолчанию — `TSV`.
|
||||
- `-f`, `--file` — путь к файлу с данными. По умолчанию — `stdin`.
|
||||
- `-q`, `--query` — запросы на выполнение. Разделитель запросов — `;`.
|
||||
- `-qf`, `--queries-file` - путь к файлу с запросами для выполнения. Необходимо задать либо параметр `query`, либо `queries-file`.
|
||||
- `-N`, `--table` — имя таблицы, в которую будут помещены входящие данные. По умолчанию - `table`.
|
||||
- `-of`, `--format`, `--output-format` — формат выходных данных. По умолчанию — `TSV`.
|
||||
- `-d`, `--database` — база данных по умолчанию. Если не указано, используется значение `_local`.
|
||||
- `--stacktrace` — вывод отладочной информации при исключениях.
|
||||
- `--echo` — перед выполнением запрос выводится в консоль.
|
||||
- `--verbose` — подробный вывод при выполнении запроса.
|
||||
- `-s` — отключает вывод системных логов в `stderr`.
|
||||
- `--config-file` — путь к файлу конфигурации. По умолчанию `clickhouse-local` запускается с пустой конфигурацией. Конфигурационный файл имеет тот же формат, что и для сервера ClickHouse и в нём можно использовать все конфигурационные параметры сервера. Обычно подключение конфигурации не требуется, если требуется установить отдельный параметр, то это можно сделать ключом с именем параметра.
|
||||
- `--logger.console` — логирование действий в консоль.
|
||||
- `--logger.log` — логирование действий в файл с указанным именем.
|
||||
- `--logger.level` — уровень логирования.
|
||||
- `--ignore-error` — не прекращать обработку если запрос выдал ошибку.
|
||||
- `-c`, `--config-file` — путь к файлу конфигурации. По умолчанию `clickhouse-local` запускается с пустой конфигурацией. Конфигурационный файл имеет тот же формат, что и для сервера ClickHouse, и в нём можно использовать все конфигурационные параметры сервера. Обычно подключение конфигурации не требуется; если требуется установить отдельный параметр, то это можно сделать ключом с именем параметра.
|
||||
- `--no-system-tables` — запуск без использования системных таблиц.
|
||||
- `--help` — вывод справочной информации о `clickhouse-local`.
|
||||
- `-V`, `--version` — вывод текущей версии и выход.
|
||||
|
||||
## Примеры вызова {#primery-vyzova}
|
||||
## Примеры вызова {#examples}
|
||||
|
||||
``` bash
|
||||
$ echo -e "1,2\n3,4" | clickhouse-local --structure "a Int64, b Int64" \
|
||||
|
@ -27,13 +27,13 @@ argMax(tuple(arg, val))
|
||||
|
||||
**Возвращаемое значение**
|
||||
|
||||
- Значение `arg`, соответствующее максимальному значению `val`.
|
||||
- значение `arg`, соответствующее максимальному значению `val`.
|
||||
|
||||
Тип: соответствует типу `arg`.
|
||||
|
||||
Если передан кортеж:
|
||||
|
||||
- Кортеж `(arg, val)` c максимальным значением `val` и соответствующим ему `arg`.
|
||||
- кортеж `(arg, val)` c максимальным значением `val` и соответствующим ему `arg`.
|
||||
|
||||
Тип: [Tuple](../../../sql-reference/data-types/tuple.md).
|
||||
|
||||
@ -52,15 +52,15 @@ argMax(tuple(arg, val))
|
||||
Запрос:
|
||||
|
||||
``` sql
|
||||
SELECT argMax(user, salary), argMax(tuple(user, salary)) FROM salary;
|
||||
SELECT argMax(user, salary), argMax(tuple(user, salary), salary), argMax(tuple(user, salary)) FROM salary;
|
||||
```
|
||||
|
||||
Результат:
|
||||
|
||||
``` text
|
||||
┌─argMax(user, salary)─┬─argMax(tuple(user, salary))─┐
|
||||
│ director │ ('director',5000) │
|
||||
└──────────────────────┴─────────────────────────────┘
|
||||
┌─argMax(user, salary)─┬─argMax(tuple(user, salary), salary)─┬─argMax(tuple(user, salary))─┐
|
||||
│ director │ ('director',5000) │ ('director',5000) │
|
||||
└──────────────────────┴─────────────────────────────────────┴─────────────────────────────┘
|
||||
```
|
||||
|
||||
[Оригинальная статья](https://clickhouse.tech/docs/ru/sql-reference/aggregate-functions/reference/argmax/) <!--hide-->
|
||||
|
@ -4,8 +4,61 @@ toc_priority: 5
|
||||
|
||||
# avg {#agg_function-avg}
|
||||
|
||||
Вычисляет среднее.
|
||||
Работает только для чисел.
|
||||
Результат всегда Float64.
|
||||
Вычисляет среднее арифметическое.
|
||||
|
||||
[Оригинальная статья](https://clickhouse.tech/docs/en/sql-reference/aggregate-functions/reference/avg/) <!--hide-->
|
||||
**Синтаксис**
|
||||
|
||||
``` sql
|
||||
avg(x)
|
||||
```
|
||||
|
||||
**Аргументы**
|
||||
|
||||
- `x` — входное значение типа [Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md) или [Decimal](../../../sql-reference/data-types/decimal.md).
|
||||
|
||||
**Возвращаемое значение**
|
||||
|
||||
- среднее арифметическое, всегда типа [Float64](../../../sql-reference/data-types/float.md).
|
||||
- `NaN`, если входное значение `x` — пустое.
|
||||
|
||||
**Пример**
|
||||
|
||||
Запрос:
|
||||
|
||||
``` sql
|
||||
SELECT avg(x) FROM values('x Int8', 0, 1, 2, 3, 4, 5);
|
||||
```
|
||||
|
||||
Результат:
|
||||
|
||||
``` text
|
||||
┌─avg(x)─┐
|
||||
│ 2.5 │
|
||||
└────────┘
|
||||
```
|
||||
|
||||
**Пример**
|
||||
|
||||
Создайте временную таблицу:
|
||||
|
||||
Запрос:
|
||||
|
||||
``` sql
|
||||
CREATE table test (t UInt8) ENGINE = Memory;
|
||||
```
|
||||
|
||||
Выполните запрос:
|
||||
|
||||
``` sql
|
||||
SELECT avg(t) FROM test;
|
||||
```
|
||||
|
||||
Результат:
|
||||
|
||||
``` text
|
||||
┌─avg(x)─┐
|
||||
│ nan │
|
||||
└────────┘
|
||||
```
|
||||
|
||||
[Оригинальная статья](https://clickhouse.tech/docs/ru/sql-reference/aggregate-functions/reference/avg/) <!--hide-->
|
||||
|
@ -405,4 +405,67 @@ SELECT log1p(0);
|
||||
|
||||
- [log(x)](../../sql-reference/functions/math-functions.md#logx)
|
||||
|
||||
## sign(x) {#signx}
|
||||
|
||||
Возвращает знак действительного числа.
|
||||
|
||||
**Синтаксис**
|
||||
|
||||
``` sql
|
||||
sign(x)
|
||||
```
|
||||
|
||||
**Аргумент**
|
||||
|
||||
- `x` — Значения от `-∞` до `+∞`. Любой числовой тип, поддерживаемый ClickHouse.
|
||||
|
||||
**Возвращаемое значение**
|
||||
|
||||
- -1 если `x < 0`
|
||||
- 0 если `x = 0`
|
||||
- 1 если `x > 0`
|
||||
|
||||
**Примеры**
|
||||
|
||||
Результат sign() для нуля:
|
||||
|
||||
``` sql
|
||||
SELECT sign(0);
|
||||
```
|
||||
Результат:
|
||||
|
||||
``` text
|
||||
┌─sign(0)─┐
|
||||
│ 0 │
|
||||
└─────────┘
|
||||
```
|
||||
|
||||
Результат sign() для положительного аргумента:
|
||||
|
||||
``` sql
|
||||
SELECT sign(1);
|
||||
```
|
||||
|
||||
Результат:
|
||||
|
||||
``` text
|
||||
┌─sign(1)─┐
|
||||
│ 1 │
|
||||
└─────────┘
|
||||
```
|
||||
|
||||
Результат sign() для отрицательного аргумента:
|
||||
|
||||
``` sql
|
||||
SELECT sign(-1);
|
||||
```
|
||||
|
||||
Результат:
|
||||
|
||||
``` text
|
||||
┌─sign(-1)─┐
|
||||
│ -1 │
|
||||
└──────────┘
|
||||
```
|
||||
|
||||
[Оригинальная статья](https://clickhouse.tech/docs/ru/query_language/functions/math_functions/) <!--hide-->
|
||||
|
@ -1,6 +1,6 @@
|
||||
#include <functional>
|
||||
#include <iostream>
|
||||
#include <string_view>
|
||||
#include <functional>
|
||||
#include <boost/program_options.hpp>
|
||||
|
||||
#include <IO/ReadBufferFromFileDescriptor.h>
|
||||
@ -50,6 +50,7 @@ int mainEntryClickHouseFormat(int argc, char ** argv)
|
||||
("quiet,q", "just check syntax, no output on success")
|
||||
("multiquery,n", "allow multiple queries in the same file")
|
||||
("obfuscate", "obfuscate instead of formatting")
|
||||
("backslash", "add a backslash at the end of each line of the formatted query")
|
||||
("seed", po::value<std::string>(), "seed (arbitrary string) that determines the result of obfuscation")
|
||||
;
|
||||
|
||||
@ -70,6 +71,7 @@ int mainEntryClickHouseFormat(int argc, char ** argv)
|
||||
bool quiet = options.count("quiet");
|
||||
bool multiple = options.count("multiquery");
|
||||
bool obfuscate = options.count("obfuscate");
|
||||
bool backslash = options.count("backslash");
|
||||
|
||||
if (quiet && (hilite || oneline || obfuscate))
|
||||
{
|
||||
@ -148,12 +150,39 @@ int mainEntryClickHouseFormat(int argc, char ** argv)
|
||||
}
|
||||
if (!quiet)
|
||||
{
|
||||
WriteBufferFromOStream res_buf(std::cout, 4096);
|
||||
formatAST(*res, res_buf, hilite, oneline);
|
||||
res_buf.next();
|
||||
if (multiple)
|
||||
std::cout << "\n;\n";
|
||||
std::cout << std::endl;
|
||||
if (!backslash)
|
||||
{
|
||||
WriteBufferFromOStream res_buf(std::cout, 4096);
|
||||
formatAST(*res, res_buf, hilite, oneline);
|
||||
res_buf.next();
|
||||
if (multiple)
|
||||
std::cout << "\n;\n";
|
||||
std::cout << std::endl;
|
||||
}
|
||||
/// add additional '\' at the end of each line;
|
||||
else
|
||||
{
|
||||
WriteBufferFromOwnString str_buf;
|
||||
formatAST(*res, str_buf, hilite, oneline);
|
||||
|
||||
auto res_string = str_buf.str();
|
||||
WriteBufferFromOStream res_cout(std::cout, 4096);
|
||||
|
||||
const char * s_pos= res_string.data();
|
||||
const char * s_end = s_pos + res_string.size();
|
||||
|
||||
while (s_pos != s_end)
|
||||
{
|
||||
if (*s_pos == '\n')
|
||||
res_cout.write(" \\", 2);
|
||||
res_cout.write(*s_pos++);
|
||||
}
|
||||
|
||||
res_cout.next();
|
||||
if (multiple)
|
||||
std::cout << " \\\n;\n";
|
||||
std::cout << std::endl;
|
||||
}
|
||||
}
|
||||
|
||||
do
|
||||
|
@ -599,7 +599,7 @@
|
||||
<!-- The list of hosts allowed to use in URL-related storage engines and table functions.
|
||||
If this section is not present in configuration, all hosts are allowed.
|
||||
-->
|
||||
<remote_url_allow_hosts>
|
||||
<!--<remote_url_allow_hosts>-->
|
||||
<!-- Host should be specified exactly as in URL. The name is checked before DNS resolution.
|
||||
Example: "yandex.ru", "yandex.ru." and "www.yandex.ru" are different hosts.
|
||||
If port is explicitly specified in URL, the host:port is checked as a whole.
|
||||
@ -613,7 +613,7 @@
|
||||
Regexps are not aligned: don't forget to add ^ and $. Also don't forget to escape dot (.) metacharacter
|
||||
(forgetting to do so is a common source of error).
|
||||
-->
|
||||
</remote_url_allow_hosts>
|
||||
<!--</remote_url_allow_hosts>-->
|
||||
|
||||
<!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.
|
||||
By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.
|
||||
|
@ -543,13 +543,14 @@
|
||||
M(574, DISTRIBUTED_TOO_MANY_PENDING_BYTES) \
|
||||
M(575, UNKNOWN_SNAPSHOT) \
|
||||
M(576, KERBEROS_ERROR) \
|
||||
M(577, INVALID_SHARD_ID) \
|
||||
M(578, INVALID_FORMAT_INSERT_QUERY_WITH_DATA) \
|
||||
M(579, INCORRECT_PART_TYPE) \
|
||||
\
|
||||
M(999, KEEPER_EXCEPTION) \
|
||||
M(1000, POCO_EXCEPTION) \
|
||||
M(1001, STD_EXCEPTION) \
|
||||
M(1002, UNKNOWN_EXCEPTION) \
|
||||
M(1003, INVALID_SHARD_ID) \
|
||||
M(1004, INVALID_FORMAT_INSERT_QUERY_WITH_DATA)
|
||||
|
||||
/* See END */
|
||||
|
||||
|
@ -568,7 +568,7 @@ public:
|
||||
|
||||
/// arr1 takes ownership of the heap memory of arr2.
|
||||
arr1.c_start = arr2.c_start;
|
||||
arr1.c_end_of_storage = arr1.c_start + heap_allocated - arr1.pad_right;
|
||||
arr1.c_end_of_storage = arr1.c_start + heap_allocated - arr2.pad_right - arr2.pad_left;
|
||||
arr1.c_end = arr1.c_start + this->byte_size(heap_size);
|
||||
|
||||
/// Allocate stack space for arr2.
|
||||
@ -585,7 +585,7 @@ public:
|
||||
dest.dealloc();
|
||||
dest.alloc(src.allocated_bytes(), std::forward<TAllocatorParams>(allocator_params)...);
|
||||
memcpy(dest.c_start, src.c_start, this->byte_size(src.size()));
|
||||
dest.c_end = dest.c_start + (src.c_end - src.c_start);
|
||||
dest.c_end = dest.c_start + this->byte_size(src.size());
|
||||
|
||||
src.c_start = Base::null;
|
||||
src.c_end = Base::null;
|
||||
@ -639,8 +639,8 @@ public:
|
||||
size_t rhs_size = rhs.size();
|
||||
size_t rhs_allocated = rhs.allocated_bytes();
|
||||
|
||||
this->c_end_of_storage = this->c_start + rhs_allocated - Base::pad_right;
|
||||
rhs.c_end_of_storage = rhs.c_start + lhs_allocated - Base::pad_right;
|
||||
this->c_end_of_storage = this->c_start + rhs_allocated - Base::pad_right - Base::pad_left;
|
||||
rhs.c_end_of_storage = rhs.c_start + lhs_allocated - Base::pad_right - Base::pad_left;
|
||||
|
||||
this->c_end = this->c_start + this->byte_size(rhs_size);
|
||||
rhs.c_end = rhs.c_start + this->byte_size(lhs_size);
|
||||
@ -693,34 +693,34 @@ public:
|
||||
}
|
||||
|
||||
|
||||
bool operator== (const PODArray & other) const
|
||||
bool operator== (const PODArray & rhs) const
|
||||
{
|
||||
if (this->size() != other.size())
|
||||
if (this->size() != rhs.size())
|
||||
return false;
|
||||
|
||||
const_iterator this_it = begin();
|
||||
const_iterator that_it = other.begin();
|
||||
const_iterator lhs_it = begin();
|
||||
const_iterator rhs_it = rhs.begin();
|
||||
|
||||
while (this_it != end())
|
||||
while (lhs_it != end())
|
||||
{
|
||||
if (*this_it != *that_it)
|
||||
if (*lhs_it != *rhs_it)
|
||||
return false;
|
||||
|
||||
++this_it;
|
||||
++that_it;
|
||||
++lhs_it;
|
||||
++rhs_it;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
bool operator!= (const PODArray & other) const
|
||||
bool operator!= (const PODArray & rhs) const
|
||||
{
|
||||
return !operator==(other);
|
||||
return !operator==(rhs);
|
||||
}
|
||||
};
|
||||
|
||||
template <typename T, size_t initial_bytes, typename TAllocator, size_t pad_right_>
|
||||
void swap(PODArray<T, initial_bytes, TAllocator, pad_right_> & lhs, PODArray<T, initial_bytes, TAllocator, pad_right_> & rhs)
|
||||
template <typename T, size_t initial_bytes, typename TAllocator, size_t pad_right_, size_t pad_left_>
|
||||
void swap(PODArray<T, initial_bytes, TAllocator, pad_right_, pad_left_> & lhs, PODArray<T, initial_bytes, TAllocator, pad_right_, pad_left_> & rhs)
|
||||
{
|
||||
lhs.swap(rhs);
|
||||
}
|
||||
|
@ -42,6 +42,7 @@ void RemoteHostFilter::setValuesFromConfig(const Poco::Util::AbstractConfigurati
|
||||
else if (startsWith(key, "host"))
|
||||
primary_hosts.insert(config.getString("remote_url_allow_hosts." + key));
|
||||
}
|
||||
is_allow_by_default = false;
|
||||
}
|
||||
}
|
||||
|
||||
@ -58,6 +59,6 @@ bool RemoteHostFilter::checkForDirectEntry(const std::string & str) const
|
||||
}
|
||||
return true;
|
||||
}
|
||||
return true;
|
||||
return is_allow_by_default;
|
||||
}
|
||||
}
|
||||
|
@ -24,6 +24,7 @@ public:
|
||||
void checkHostAndPort(const std::string & host, const std::string & port) const; /// Does the same as checkURL, but for host and port.
|
||||
|
||||
private:
|
||||
bool is_allow_by_default = true;
|
||||
std::unordered_set<std::string> primary_hosts; /// Allowed primary (<host>) URL from config.xml
|
||||
std::vector<std::string> regexp_hosts; /// Allowed regexp (<hots_regexp>) URL from config.xml
|
||||
|
||||
|
@ -63,7 +63,7 @@ using NamesAndTypes = std::vector<NameAndTypePair>;
|
||||
class NamesAndTypesList : public std::list<NameAndTypePair>
|
||||
{
|
||||
public:
|
||||
NamesAndTypesList() {}
|
||||
NamesAndTypesList() = default;
|
||||
|
||||
NamesAndTypesList(std::initializer_list<NameAndTypePair> init) : std::list<NameAndTypePair>(init) {}
|
||||
|
||||
|
@ -1,3 +1,6 @@
|
||||
#include <DataTypes/DataTypeCustomGeo.h>
|
||||
|
||||
#include <Columns/ColumnsNumber.h>
|
||||
#include <Columns/ColumnTuple.h>
|
||||
#include <DataTypes/DataTypeArray.h>
|
||||
#include <DataTypes/DataTypeCustom.h>
|
||||
@ -11,88 +14,84 @@ namespace DB
|
||||
|
||||
namespace
|
||||
{
|
||||
const auto point_data_type = std::make_shared<const DataTypeTuple>(
|
||||
DataTypes{std::make_shared<const DataTypeFloat64>(), std::make_shared<const DataTypeFloat64>()}
|
||||
);
|
||||
|
||||
class DataTypeCustomPointSerialization : public DataTypeCustomSimpleTextSerialization
|
||||
const auto ring_data_type = std::make_shared<const DataTypeArray>(DataTypeCustomPointSerialization::nestedDataType());
|
||||
|
||||
const auto polygon_data_type = std::make_shared<const DataTypeArray>(DataTypeCustomRingSerialization::nestedDataType());
|
||||
|
||||
const auto multipolygon_data_type = std::make_shared<const DataTypeArray>(DataTypeCustomPolygonSerialization::nestedDataType());
|
||||
}
|
||||
|
||||
|
||||
void DataTypeCustomPointSerialization::serializeText(
|
||||
const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const
|
||||
{
|
||||
public:
|
||||
void serializeText(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const override
|
||||
{
|
||||
nestedDataType()->serializeAsText(column, row_num, ostr, settings);
|
||||
}
|
||||
nestedDataType()->serializeAsText(column, row_num, ostr, settings);
|
||||
}
|
||||
|
||||
void deserializeText(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const override
|
||||
{
|
||||
nestedDataType()->deserializeAsWholeText(column, istr, settings);
|
||||
}
|
||||
|
||||
static DataTypePtr nestedDataType()
|
||||
{
|
||||
static auto data_type = DataTypePtr(std::make_unique<DataTypeTuple>(
|
||||
DataTypes({std::make_unique<DataTypeFloat64>(), std::make_unique<DataTypeFloat64>()})));
|
||||
return data_type;
|
||||
}
|
||||
};
|
||||
|
||||
class DataTypeCustomRingSerialization : public DataTypeCustomSimpleTextSerialization
|
||||
void DataTypeCustomPointSerialization::deserializeText(
|
||||
IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const
|
||||
{
|
||||
public:
|
||||
void serializeText(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const override
|
||||
{
|
||||
nestedDataType()->serializeAsText(column, row_num, ostr, settings);
|
||||
}
|
||||
nestedDataType()->deserializeAsWholeText(column, istr, settings);
|
||||
}
|
||||
|
||||
void deserializeText(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const override
|
||||
{
|
||||
nestedDataType()->deserializeAsWholeText(column, istr, settings);
|
||||
}
|
||||
|
||||
static DataTypePtr nestedDataType()
|
||||
{
|
||||
static auto data_type = DataTypePtr(std::make_unique<DataTypeArray>(DataTypeCustomPointSerialization::nestedDataType()));
|
||||
return data_type;
|
||||
}
|
||||
};
|
||||
|
||||
class DataTypeCustomPolygonSerialization : public DataTypeCustomSimpleTextSerialization
|
||||
DataTypePtr DataTypeCustomPointSerialization::nestedDataType()
|
||||
{
|
||||
public:
|
||||
void serializeText(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const override
|
||||
{
|
||||
nestedDataType()->serializeAsText(column, row_num, ostr, settings);
|
||||
}
|
||||
return point_data_type;
|
||||
}
|
||||
|
||||
void deserializeText(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const override
|
||||
{
|
||||
nestedDataType()->deserializeAsWholeText(column, istr, settings);
|
||||
}
|
||||
|
||||
static DataTypePtr nestedDataType()
|
||||
{
|
||||
static auto data_type = DataTypePtr(std::make_unique<DataTypeArray>(DataTypeCustomRingSerialization::nestedDataType()));
|
||||
return data_type;
|
||||
}
|
||||
};
|
||||
|
||||
class DataTypeCustomMultiPolygonSerialization : public DataTypeCustomSimpleTextSerialization
|
||||
void DataTypeCustomRingSerialization::serializeText(
|
||||
const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const
|
||||
{
|
||||
public:
|
||||
void serializeText(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const override
|
||||
{
|
||||
nestedDataType()->serializeAsText(column, row_num, ostr, settings);
|
||||
}
|
||||
nestedDataType()->serializeAsText(column, row_num, ostr, settings);
|
||||
}
|
||||
|
||||
void deserializeText(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const override
|
||||
{
|
||||
nestedDataType()->deserializeAsWholeText(column, istr, settings);
|
||||
}
|
||||
void DataTypeCustomRingSerialization::deserializeText(
|
||||
IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const
|
||||
{
|
||||
nestedDataType()->deserializeAsWholeText(column, istr, settings);
|
||||
}
|
||||
|
||||
static DataTypePtr nestedDataType()
|
||||
{
|
||||
static auto data_type = DataTypePtr(std::make_unique<DataTypeArray>(DataTypeCustomPolygonSerialization::nestedDataType()));
|
||||
return data_type;
|
||||
}
|
||||
};
|
||||
DataTypePtr DataTypeCustomRingSerialization::nestedDataType()
|
||||
{
|
||||
return ring_data_type;
|
||||
}
|
||||
|
||||
void DataTypeCustomPolygonSerialization::serializeText(
|
||||
const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const
|
||||
{
|
||||
nestedDataType()->serializeAsText(column, row_num, ostr, settings);
|
||||
}
|
||||
|
||||
void DataTypeCustomPolygonSerialization::deserializeText(
|
||||
IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const
|
||||
{
|
||||
nestedDataType()->deserializeAsWholeText(column, istr, settings);
|
||||
}
|
||||
|
||||
DataTypePtr DataTypeCustomPolygonSerialization::nestedDataType()
|
||||
{
|
||||
return polygon_data_type;
|
||||
}
|
||||
|
||||
void DataTypeCustomMultiPolygonSerialization::serializeText(
|
||||
const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const
|
||||
{
|
||||
nestedDataType()->serializeAsText(column, row_num, ostr, settings);
|
||||
}
|
||||
|
||||
void DataTypeCustomMultiPolygonSerialization::deserializeText(
|
||||
IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const
|
||||
{
|
||||
nestedDataType()->deserializeAsWholeText(column, istr, settings);
|
||||
}
|
||||
|
||||
DataTypePtr DataTypeCustomMultiPolygonSerialization::nestedDataType()
|
||||
{
|
||||
return multipolygon_data_type;
|
||||
}
|
||||
|
||||
void registerDataTypeDomainGeo(DataTypeFactory & factory)
|
||||
|
56
src/DataTypes/DataTypeCustomGeo.h
Normal file
56
src/DataTypes/DataTypeCustomGeo.h
Normal file
@ -0,0 +1,56 @@
|
||||
#pragma once
|
||||
|
||||
#include <Columns/ColumnsNumber.h>
|
||||
#include <Columns/ColumnTuple.h>
|
||||
#include <DataTypes/DataTypeArray.h>
|
||||
#include <DataTypes/DataTypeCustom.h>
|
||||
#include <DataTypes/DataTypeCustomSimpleTextSerialization.h>
|
||||
#include <DataTypes/DataTypeFactory.h>
|
||||
#include <DataTypes/DataTypeTuple.h>
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
class DataTypeCustomPointSerialization : public DataTypeCustomSimpleTextSerialization
|
||||
{
|
||||
public:
|
||||
void serializeText(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const override;
|
||||
|
||||
void deserializeText(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const override;
|
||||
|
||||
static DataTypePtr nestedDataType();
|
||||
};
|
||||
|
||||
|
||||
class DataTypeCustomRingSerialization : public DataTypeCustomSimpleTextSerialization
|
||||
{
|
||||
public:
|
||||
void serializeText(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const override;
|
||||
|
||||
void deserializeText(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const override;
|
||||
|
||||
static DataTypePtr nestedDataType();
|
||||
};
|
||||
|
||||
class DataTypeCustomPolygonSerialization : public DataTypeCustomSimpleTextSerialization
|
||||
{
|
||||
public:
|
||||
void serializeText(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const override;
|
||||
|
||||
void deserializeText(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const override;
|
||||
|
||||
static DataTypePtr nestedDataType();
|
||||
};
|
||||
|
||||
class DataTypeCustomMultiPolygonSerialization : public DataTypeCustomSimpleTextSerialization
|
||||
{
|
||||
public:
|
||||
void serializeText(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const override;
|
||||
|
||||
void deserializeText(IColumn & column, ReadBuffer & istr, const FormatSettings & settings) const override;
|
||||
|
||||
static DataTypePtr nestedDataType();
|
||||
};
|
||||
|
||||
}
|
@ -9,8 +9,7 @@ namespace DB
|
||||
class DataTypeWithSimpleSerialization : public IDataType
|
||||
{
|
||||
protected:
|
||||
DataTypeWithSimpleSerialization()
|
||||
{}
|
||||
DataTypeWithSimpleSerialization() = default;
|
||||
|
||||
void serializeTextEscaped(const IColumn & column, size_t row_num, WriteBuffer & ostr, const FormatSettings & settings) const override
|
||||
{
|
||||
|
@ -241,6 +241,8 @@ const DictionaryAttribute & DictionaryStructure::getAttribute(const std::string
|
||||
for (const auto & key_attribute : *key)
|
||||
if (key_attribute.name == attribute_name)
|
||||
return key_attribute;
|
||||
|
||||
throw Exception{"No such attribute '" + attribute_name + "' in keys", ErrorCodes::BAD_ARGUMENTS};
|
||||
}
|
||||
|
||||
size_t attribute_index = it->second;
|
||||
@ -354,6 +356,7 @@ std::vector<DictionaryAttribute> DictionaryStructure::getAttributes(
|
||||
config.keys(config_prefix, config_elems);
|
||||
auto has_hierarchy = false;
|
||||
|
||||
std::unordered_set<String> attribute_names;
|
||||
std::vector<DictionaryAttribute> res_attributes;
|
||||
|
||||
const FormatSettings format_settings;
|
||||
@ -376,6 +379,15 @@ std::vector<DictionaryAttribute> DictionaryStructure::getAttributes(
|
||||
if ((range_min && name == range_min->name) || (range_max && name == range_max->name))
|
||||
continue;
|
||||
|
||||
auto insert_result = attribute_names.insert(name);
|
||||
bool inserted = insert_result.second;
|
||||
|
||||
if (!inserted)
|
||||
throw Exception(
|
||||
ErrorCodes::BAD_ARGUMENTS,
|
||||
"Dictionary attributes names must be unique. Attribute name ({}) is not unique",
|
||||
name);
|
||||
|
||||
const auto type_string = config.getString(prefix + "type");
|
||||
const auto initial_type = DataTypeFactory::instance().get(type_string);
|
||||
auto type = initial_type;
|
||||
|
@ -221,7 +221,9 @@ BlockInputStreamPtr ExecutablePoolDictionarySource::getStreamForBlock(const Bloc
|
||||
}, configuration.max_command_execution_time * 10000);
|
||||
|
||||
if (!result)
|
||||
throw Exception(ErrorCodes::TIMEOUT_EXCEEDED, "Could not get process from pool, max command execution timeout exceeded");
|
||||
throw Exception(ErrorCodes::TIMEOUT_EXCEEDED,
|
||||
"Could not get process from pool, max command execution timeout exceeded ({}) seconds",
|
||||
configuration.max_command_execution_time);
|
||||
|
||||
size_t rows_to_read = block.rows();
|
||||
auto read_stream = context.getInputFormat(configuration.format, process->out, sample_block, rows_to_read);
|
||||
@ -298,7 +300,7 @@ void registerDictionarySourceExecutablePool(DictionarySourceFactory & factory)
|
||||
size_t max_command_execution_time = config.getUInt64(configuration_config_prefix + ".max_command_execution_time", 10);
|
||||
|
||||
size_t max_execution_time_seconds = static_cast<size_t>(context.getSettings().max_execution_time.totalSeconds());
|
||||
if (max_command_execution_time > max_execution_time_seconds)
|
||||
if (max_execution_time_seconds != 0 && max_command_execution_time > max_execution_time_seconds)
|
||||
max_command_execution_time = max_execution_time_seconds;
|
||||
|
||||
ExecutablePoolDictionarySource::Configuration configuration
|
||||
|
@ -17,14 +17,14 @@ namespace DB
|
||||
* BlockInputStream implementation for external dictionaries
|
||||
* read() returns single block consisting of the in-memory contents of the dictionaries
|
||||
*/
|
||||
template <typename DictionaryType, typename RangeType, typename Key>
|
||||
template <typename RangeType>
|
||||
class RangeDictionaryBlockInputStream : public DictionaryBlockInputStreamBase
|
||||
{
|
||||
public:
|
||||
using DictionaryPtr = std::shared_ptr<DictionaryType const>;
|
||||
using Key = UInt64;
|
||||
|
||||
RangeDictionaryBlockInputStream(
|
||||
DictionaryPtr dictionary,
|
||||
std::shared_ptr<const IDictionaryBase> dictionary,
|
||||
size_t max_block_size,
|
||||
const Names & column_names,
|
||||
PaddedPODArray<Key> && ids_to_fill,
|
||||
@ -40,35 +40,26 @@ private:
|
||||
template <typename T>
|
||||
ColumnPtr getColumnFromPODArray(const PaddedPODArray<T> & array) const;
|
||||
|
||||
template <typename DictionarySpecialAttributeType, typename T>
|
||||
void addSpecialColumn(
|
||||
const std::optional<DictionarySpecialAttributeType> & attribute,
|
||||
DataTypePtr type,
|
||||
const std::string & default_name,
|
||||
const std::unordered_set<std::string> & column_names_set,
|
||||
const PaddedPODArray<T> & values,
|
||||
ColumnsWithTypeAndName & columns,
|
||||
bool force = false) const;
|
||||
|
||||
Block fillBlock(
|
||||
const PaddedPODArray<Key> & ids_to_fill,
|
||||
const PaddedPODArray<RangeType> & block_start_dates,
|
||||
const PaddedPODArray<RangeType> & block_end_dates) const;
|
||||
|
||||
PaddedPODArray<Int64>
|
||||
makeDateKey(const PaddedPODArray<RangeType> & block_start_dates, const PaddedPODArray<RangeType> & block_end_dates) const;
|
||||
PaddedPODArray<Int64> makeDateKey(
|
||||
const PaddedPODArray<RangeType> & block_start_dates,
|
||||
const PaddedPODArray<RangeType> & block_end_dates) const;
|
||||
|
||||
DictionaryPtr dictionary;
|
||||
Names column_names;
|
||||
std::shared_ptr<const IDictionaryBase> dictionary;
|
||||
NameSet column_names;
|
||||
PaddedPODArray<Key> ids;
|
||||
PaddedPODArray<RangeType> start_dates;
|
||||
PaddedPODArray<RangeType> end_dates;
|
||||
};
|
||||
|
||||
|
||||
template <typename DictionaryType, typename RangeType, typename Key>
|
||||
RangeDictionaryBlockInputStream<DictionaryType, RangeType, Key>::RangeDictionaryBlockInputStream(
|
||||
DictionaryPtr dictionary_,
|
||||
template <typename RangeType>
|
||||
RangeDictionaryBlockInputStream<RangeType>::RangeDictionaryBlockInputStream(
|
||||
std::shared_ptr<const IDictionaryBase> dictionary_,
|
||||
size_t max_block_size_,
|
||||
const Names & column_names_,
|
||||
PaddedPODArray<Key> && ids_,
|
||||
@ -76,15 +67,15 @@ RangeDictionaryBlockInputStream<DictionaryType, RangeType, Key>::RangeDictionary
|
||||
PaddedPODArray<RangeType> && block_end_dates)
|
||||
: DictionaryBlockInputStreamBase(ids_.size(), max_block_size_)
|
||||
, dictionary(dictionary_)
|
||||
, column_names(column_names_)
|
||||
, column_names(column_names_.begin(), column_names_.end())
|
||||
, ids(std::move(ids_))
|
||||
, start_dates(std::move(block_start_dates))
|
||||
, end_dates(std::move(block_end_dates))
|
||||
{
|
||||
}
|
||||
|
||||
template <typename DictionaryType, typename RangeType, typename Key>
|
||||
Block RangeDictionaryBlockInputStream<DictionaryType, RangeType, Key>::getBlock(size_t start, size_t length) const
|
||||
template <typename RangeType>
|
||||
Block RangeDictionaryBlockInputStream<RangeType>::getBlock(size_t start, size_t length) const
|
||||
{
|
||||
PaddedPODArray<Key> block_ids;
|
||||
PaddedPODArray<RangeType> block_start_dates;
|
||||
@ -103,38 +94,19 @@ Block RangeDictionaryBlockInputStream<DictionaryType, RangeType, Key>::getBlock(
|
||||
return fillBlock(block_ids, block_start_dates, block_end_dates);
|
||||
}
|
||||
|
||||
template <typename DictionaryType, typename RangeType, typename Key>
|
||||
template <typename RangeType>
|
||||
template <typename T>
|
||||
ColumnPtr RangeDictionaryBlockInputStream<DictionaryType, RangeType, Key>::getColumnFromPODArray(const PaddedPODArray<T> & array) const
|
||||
ColumnPtr RangeDictionaryBlockInputStream<RangeType>::getColumnFromPODArray(const PaddedPODArray<T> & array) const
|
||||
{
|
||||
auto column_vector = ColumnVector<T>::create();
|
||||
column_vector->getData().reserve(array.size());
|
||||
for (T value : array)
|
||||
column_vector->insertValue(value);
|
||||
column_vector->getData().insert(array.begin(), array.end());
|
||||
|
||||
return column_vector;
|
||||
}
|
||||
|
||||
template <typename DictionaryType, typename RangeType, typename Key>
|
||||
template <typename DictionarySpecialAttributeType, typename T>
|
||||
void RangeDictionaryBlockInputStream<DictionaryType, RangeType, Key>::addSpecialColumn(
|
||||
const std::optional<DictionarySpecialAttributeType> & attribute,
|
||||
DataTypePtr type,
|
||||
const std::string & default_name,
|
||||
const std::unordered_set<std::string> & column_names_set,
|
||||
const PaddedPODArray<T> & values,
|
||||
ColumnsWithTypeAndName & columns,
|
||||
bool force) const
|
||||
{
|
||||
std::string name = default_name;
|
||||
if (attribute)
|
||||
name = attribute->name;
|
||||
|
||||
if (force || column_names_set.find(name) != column_names_set.end())
|
||||
columns.emplace_back(getColumnFromPODArray(values), type, name);
|
||||
}
|
||||
|
||||
template <typename DictionaryType, typename RangeType, typename Key>
|
||||
PaddedPODArray<Int64> RangeDictionaryBlockInputStream<DictionaryType, RangeType, Key>::makeDateKey(
|
||||
template <typename RangeType>
|
||||
PaddedPODArray<Int64> RangeDictionaryBlockInputStream<RangeType>::makeDateKey(
|
||||
const PaddedPODArray<RangeType> & block_start_dates, const PaddedPODArray<RangeType> & block_end_dates) const
|
||||
{
|
||||
PaddedPODArray<Int64> key(block_start_dates.size());
|
||||
@ -150,8 +122,8 @@ PaddedPODArray<Int64> RangeDictionaryBlockInputStream<DictionaryType, RangeType,
|
||||
}
|
||||
|
||||
|
||||
template <typename DictionaryType, typename RangeType, typename Key>
|
||||
Block RangeDictionaryBlockInputStream<DictionaryType, RangeType, Key>::fillBlock(
|
||||
template <typename RangeType>
|
||||
Block RangeDictionaryBlockInputStream<RangeType>::fillBlock(
|
||||
const PaddedPODArray<Key> & ids_to_fill,
|
||||
const PaddedPODArray<RangeType> & block_start_dates,
|
||||
const PaddedPODArray<RangeType> & block_end_dates) const
|
||||
@ -159,20 +131,32 @@ Block RangeDictionaryBlockInputStream<DictionaryType, RangeType, Key>::fillBlock
|
||||
ColumnsWithTypeAndName columns;
|
||||
const DictionaryStructure & structure = dictionary->getStructure();
|
||||
|
||||
std::unordered_set<std::string> names(column_names.begin(), column_names.end());
|
||||
|
||||
addSpecialColumn(structure.id, std::make_shared<DataTypeUInt64>(), "ID", names, ids_to_fill, columns, true);
|
||||
auto ids_column = columns.back().column;
|
||||
addSpecialColumn(structure.range_min, structure.range_max->type, "Range Start", names, block_start_dates, columns);
|
||||
addSpecialColumn(structure.range_max, structure.range_max->type, "Range End", names, block_end_dates, columns);
|
||||
auto ids_column = getColumnFromPODArray(ids_to_fill);
|
||||
const std::string & id_column_name = structure.id->name;
|
||||
if (column_names.find(id_column_name) != column_names.end())
|
||||
columns.emplace_back(ids_column, std::make_shared<DataTypeUInt64>(), id_column_name);
|
||||
|
||||
auto date_key = makeDateKey(block_start_dates, block_end_dates);
|
||||
auto date_column = getColumnFromPODArray(date_key);
|
||||
|
||||
const std::string & range_min_column_name = structure.range_min->name;
|
||||
if (column_names.find(range_min_column_name) != column_names.end())
|
||||
{
|
||||
auto range_min_column = getColumnFromPODArray(block_start_dates);
|
||||
columns.emplace_back(range_min_column, structure.range_max->type, range_min_column_name);
|
||||
}
|
||||
|
||||
const std::string & range_max_column_name = structure.range_max->name;
|
||||
if (column_names.find(range_max_column_name) != column_names.end())
|
||||
{
|
||||
auto range_max_column = getColumnFromPODArray(block_end_dates);
|
||||
columns.emplace_back(range_max_column, structure.range_max->type, range_max_column_name);
|
||||
}
|
||||
|
||||
for (const auto idx : ext::range(0, structure.attributes.size()))
|
||||
{
|
||||
const DictionaryAttribute & attribute = structure.attributes[idx];
|
||||
if (names.find(attribute.name) != names.end())
|
||||
if (column_names.find(attribute.name) != column_names.end())
|
||||
{
|
||||
ColumnPtr column = dictionary->getColumn(
|
||||
attribute.name,
|
||||
|
@ -52,7 +52,6 @@ namespace ErrorCodes
|
||||
extern const int DICTIONARY_IS_EMPTY;
|
||||
extern const int TYPE_MISMATCH;
|
||||
extern const int UNSUPPORTED_METHOD;
|
||||
extern const int NOT_IMPLEMENTED;
|
||||
}
|
||||
|
||||
bool RangeHashedDictionary::Range::isCorrectDate(const RangeStorageType & date)
|
||||
@ -178,10 +177,76 @@ ColumnPtr RangeHashedDictionary::getColumn(
|
||||
return result;
|
||||
}
|
||||
|
||||
ColumnUInt8::Ptr RangeHashedDictionary::hasKeys(const Columns &, const DataTypes &) const
|
||||
ColumnUInt8::Ptr RangeHashedDictionary::hasKeys(const Columns & key_columns, const DataTypes & key_types) const
|
||||
{
|
||||
throw Exception(ErrorCodes::NOT_IMPLEMENTED,
|
||||
"Has not supported", getDictionaryID().getNameForLogs());
|
||||
auto range_storage_column = key_columns[1];
|
||||
ColumnWithTypeAndName column_to_cast = {range_storage_column->convertToFullColumnIfConst(), key_types[1], ""};
|
||||
|
||||
auto range_column_storage_type = std::make_shared<DataTypeInt64>();
|
||||
auto range_column_updated = castColumnAccurate(column_to_cast, range_column_storage_type);
|
||||
|
||||
PaddedPODArray<Key> key_backup_storage;
|
||||
PaddedPODArray<RangeStorageType> range_backup_storage;
|
||||
|
||||
const PaddedPODArray<Key> & ids = getColumnVectorData(this, key_columns[0], key_backup_storage);
|
||||
const PaddedPODArray<RangeStorageType> & dates = getColumnVectorData(this, range_column_updated, range_backup_storage);
|
||||
|
||||
const auto & attribute = attributes.front();
|
||||
|
||||
ColumnUInt8::Ptr result;
|
||||
|
||||
auto type_call = [&](const auto & dictionary_attribute_type)
|
||||
{
|
||||
using Type = std::decay_t<decltype(dictionary_attribute_type)>;
|
||||
using AttributeType = typename Type::AttributeType;
|
||||
using ValueType = DictionaryValueType<AttributeType>;
|
||||
result = hasKeysImpl<ValueType>(attribute, ids, dates);
|
||||
};
|
||||
|
||||
callOnDictionaryAttributeType(attribute.type, type_call);
|
||||
|
||||
query_count.fetch_add(ids.size(), std::memory_order_relaxed);
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
template <typename AttributeType>
|
||||
ColumnUInt8::Ptr RangeHashedDictionary::hasKeysImpl(
|
||||
const Attribute & attribute,
|
||||
const PaddedPODArray<Key> & ids,
|
||||
const PaddedPODArray<RangeStorageType> & dates) const
|
||||
{
|
||||
auto result = ColumnUInt8::create(ids.size());
|
||||
auto& out = result->getData();
|
||||
|
||||
const auto & attr = *std::get<Ptr<AttributeType>>(attribute.maps);
|
||||
|
||||
for (const auto row : ext::range(0, ids.size()))
|
||||
{
|
||||
const auto it = attr.find(ids[row]);
|
||||
|
||||
if (it)
|
||||
{
|
||||
const auto date = dates[row];
|
||||
const auto & ranges_and_values = it->getMapped();
|
||||
const auto val_it = std::find_if(
|
||||
std::begin(ranges_and_values),
|
||||
std::end(ranges_and_values),
|
||||
[date](const Value<AttributeType> & v)
|
||||
{
|
||||
return v.range.contains(date);
|
||||
});
|
||||
|
||||
if (val_it != std::end(ranges_and_values))
|
||||
out[row] = true;
|
||||
else
|
||||
out[row] = false;
|
||||
}
|
||||
else
|
||||
out[row] = false;
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
void RangeHashedDictionary::createAttributes()
|
||||
@ -450,7 +515,9 @@ RangeHashedDictionary::getAttributeWithType(const std::string & attribute_name,
|
||||
|
||||
template <typename RangeType>
|
||||
void RangeHashedDictionary::getIdsAndDates(
|
||||
PaddedPODArray<Key> & ids, PaddedPODArray<RangeType> & start_dates, PaddedPODArray<RangeType> & end_dates) const
|
||||
PaddedPODArray<Key> & ids,
|
||||
PaddedPODArray<RangeType> & start_dates,
|
||||
PaddedPODArray<RangeType> & end_dates) const
|
||||
{
|
||||
const auto & attribute = attributes.front();
|
||||
|
||||
@ -458,11 +525,9 @@ void RangeHashedDictionary::getIdsAndDates(
|
||||
{
|
||||
using Type = std::decay_t<decltype(dictionary_attribute_type)>;
|
||||
using AttributeType = typename Type::AttributeType;
|
||||
using ValueType = DictionaryValueType<AttributeType>;
|
||||
|
||||
if constexpr (std::is_same_v<AttributeType, String>)
|
||||
getIdsAndDates<StringRef>(attribute, ids, start_dates, end_dates);
|
||||
else
|
||||
getIdsAndDates<AttributeType>(attribute, ids, start_dates, end_dates);
|
||||
getIdsAndDates<ValueType>(attribute, ids, start_dates, end_dates);
|
||||
};
|
||||
|
||||
callOnDictionaryAttributeType(attribute.type, type_call);
|
||||
@ -506,13 +571,20 @@ BlockInputStreamPtr RangeHashedDictionary::getBlockInputStreamImpl(const Names &
|
||||
PaddedPODArray<RangeType> end_dates;
|
||||
getIdsAndDates(ids, start_dates, end_dates);
|
||||
|
||||
using BlockInputStreamType = RangeDictionaryBlockInputStream<RangeHashedDictionary, RangeType, Key>;
|
||||
auto dict_ptr = std::static_pointer_cast<const RangeHashedDictionary>(shared_from_this());
|
||||
return std::make_shared<BlockInputStreamType>(
|
||||
dict_ptr, max_block_size, column_names, std::move(ids), std::move(start_dates), std::move(end_dates));
|
||||
using BlockInputStreamType = RangeDictionaryBlockInputStream<RangeType>;
|
||||
|
||||
auto stream = std::make_shared<BlockInputStreamType>(
|
||||
shared_from_this(),
|
||||
max_block_size,
|
||||
column_names,
|
||||
std::move(ids),
|
||||
std::move(start_dates),
|
||||
std::move(end_dates));
|
||||
|
||||
return stream;
|
||||
}
|
||||
|
||||
struct RangeHashedDIctionaryCallGetBlockInputStreamImpl
|
||||
struct RangeHashedDictionaryCallGetBlockInputStreamImpl
|
||||
{
|
||||
BlockInputStreamPtr stream;
|
||||
const RangeHashedDictionary * dict;
|
||||
@ -532,7 +604,7 @@ BlockInputStreamPtr RangeHashedDictionary::getBlockInputStream(const Names & col
|
||||
{
|
||||
using ListType = TypeList<UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Int128, Float32, Float64>;
|
||||
|
||||
RangeHashedDIctionaryCallGetBlockInputStreamImpl callable;
|
||||
RangeHashedDictionaryCallGetBlockInputStreamImpl callable;
|
||||
callable.dict = this;
|
||||
callable.column_names = &column_names;
|
||||
callable.max_block_size = max_block_size;
|
||||
|
@ -93,8 +93,6 @@ private:
|
||||
template <typename T>
|
||||
using Ptr = std::unique_ptr<Collection<T>>;
|
||||
|
||||
using NullableSet = HashSet<Key, DefaultHash<Key>>;
|
||||
|
||||
struct Attribute final
|
||||
{
|
||||
public:
|
||||
@ -159,6 +157,12 @@ private:
|
||||
ValueSetter && set_value,
|
||||
DefaultValueExtractor & default_value_extractor) const;
|
||||
|
||||
template <typename AttributeType>
|
||||
ColumnUInt8::Ptr hasKeysImpl(
|
||||
const Attribute & attribute,
|
||||
const PaddedPODArray<Key> & ids,
|
||||
const PaddedPODArray<RangeStorageType> & dates) const;
|
||||
|
||||
template <typename T>
|
||||
static void setAttributeValueImpl(Attribute & attribute, const Key id, const Range & range, const Field & value);
|
||||
|
||||
@ -181,7 +185,7 @@ private:
|
||||
template <typename RangeType>
|
||||
BlockInputStreamPtr getBlockInputStreamImpl(const Names & column_names, size_t max_block_size) const;
|
||||
|
||||
friend struct RangeHashedDIctionaryCallGetBlockInputStreamImpl;
|
||||
friend struct RangeHashedDictionaryCallGetBlockInputStreamImpl;
|
||||
|
||||
const DictionaryStructure dict_struct;
|
||||
const DictionarySourcePtr source_ptr;
|
||||
|
@ -265,6 +265,20 @@ void DiskCacheWrapper::removeRecursive(const String & path)
|
||||
DiskDecorator::removeRecursive(path);
|
||||
}
|
||||
|
||||
void DiskCacheWrapper::removeSharedFile(const String & path, bool keep_s3)
|
||||
{
|
||||
if (cache_disk->exists(path))
|
||||
cache_disk->removeSharedFile(path, keep_s3);
|
||||
DiskDecorator::removeSharedFile(path, keep_s3);
|
||||
}
|
||||
|
||||
void DiskCacheWrapper::removeSharedRecursive(const String & path, bool keep_s3)
|
||||
{
|
||||
if (cache_disk->exists(path))
|
||||
cache_disk->removeSharedRecursive(path, keep_s3);
|
||||
DiskDecorator::removeSharedRecursive(path, keep_s3);
|
||||
}
|
||||
|
||||
void DiskCacheWrapper::createHardLink(const String & src_path, const String & dst_path)
|
||||
{
|
||||
/// Don't create hardlinks for cache files to shadow directory as it just waste cache disk space.
|
||||
|
@ -40,6 +40,8 @@ public:
|
||||
void removeFileIfExists(const String & path) override;
|
||||
void removeDirectory(const String & path) override;
|
||||
void removeRecursive(const String & path) override;
|
||||
void removeSharedFile(const String & path, bool keep_s3) override;
|
||||
void removeSharedRecursive(const String & path, bool keep_s3) override;
|
||||
void createHardLink(const String & src_path, const String & dst_path) override;
|
||||
ReservationPtr reserve(UInt64 bytes) override;
|
||||
|
||||
|
@ -145,6 +145,16 @@ void DiskDecorator::removeRecursive(const String & path)
|
||||
delegate->removeRecursive(path);
|
||||
}
|
||||
|
||||
void DiskDecorator::removeSharedFile(const String & path, bool keep_s3)
|
||||
{
|
||||
delegate->removeSharedFile(path, keep_s3);
|
||||
}
|
||||
|
||||
void DiskDecorator::removeSharedRecursive(const String & path, bool keep_s3)
|
||||
{
|
||||
delegate->removeSharedRecursive(path, keep_s3);
|
||||
}
|
||||
|
||||
void DiskDecorator::setLastModified(const String & path, const Poco::Timestamp & timestamp)
|
||||
{
|
||||
delegate->setLastModified(path, timestamp);
|
||||
|
@ -42,11 +42,18 @@ public:
|
||||
void removeFileIfExists(const String & path) override;
|
||||
void removeDirectory(const String & path) override;
|
||||
void removeRecursive(const String & path) override;
|
||||
void removeSharedFile(const String & path, bool keep_s3) override;
|
||||
void removeSharedRecursive(const String & path, bool keep_s3) override;
|
||||
void setLastModified(const String & path, const Poco::Timestamp & timestamp) override;
|
||||
Poco::Timestamp getLastModified(const String & path) override;
|
||||
void setReadOnly(const String & path) override;
|
||||
void createHardLink(const String & src_path, const String & dst_path) override;
|
||||
void truncateFile(const String & path, size_t size) override;
|
||||
int open(const String & path, mode_t mode) const;
|
||||
void close(int fd) const;
|
||||
void sync(int fd) const;
|
||||
String getUniqueId(const String & path) const override { return delegate->getUniqueId(path); }
|
||||
bool checkUniqueId(const String & id) const override { return delegate->checkUniqueId(id); }
|
||||
DiskType::Type getType() const override { return delegate->getType(); }
|
||||
Executor & getExecutor() override;
|
||||
void onFreeze(const String & path) override;
|
||||
|
32
src/Disks/DiskType.h
Normal file
32
src/Disks/DiskType.h
Normal file
@ -0,0 +1,32 @@
|
||||
#pragma once
|
||||
|
||||
#include <common/types.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
struct DiskType
|
||||
{
|
||||
enum class Type
|
||||
{
|
||||
Local,
|
||||
RAM,
|
||||
S3
|
||||
};
|
||||
static String toString(Type disk_type)
|
||||
{
|
||||
switch (disk_type)
|
||||
{
|
||||
case Type::Local:
|
||||
return "local";
|
||||
case Type::RAM:
|
||||
return "memory";
|
||||
case Type::S3:
|
||||
return "s3";
|
||||
}
|
||||
__builtin_unreachable();
|
||||
}
|
||||
};
|
||||
|
||||
}
|
||||
|
@ -5,6 +5,7 @@
|
||||
#include <Common/CurrentMetrics.h>
|
||||
#include <Common/Exception.h>
|
||||
#include <Disks/Executor.h>
|
||||
#include <Disks/DiskType.h>
|
||||
|
||||
#include <memory>
|
||||
#include <mutex>
|
||||
@ -57,29 +58,6 @@ public:
|
||||
|
||||
using SpacePtr = std::shared_ptr<Space>;
|
||||
|
||||
struct DiskType
|
||||
{
|
||||
enum class Type
|
||||
{
|
||||
Local,
|
||||
RAM,
|
||||
S3
|
||||
};
|
||||
static String toString(Type disk_type)
|
||||
{
|
||||
switch (disk_type)
|
||||
{
|
||||
case Type::Local:
|
||||
return "local";
|
||||
case Type::RAM:
|
||||
return "memory";
|
||||
case Type::S3:
|
||||
return "s3";
|
||||
}
|
||||
__builtin_unreachable();
|
||||
}
|
||||
};
|
||||
|
||||
/**
|
||||
* A guard, that should synchronize file's or directory's state
|
||||
* with storage device (e.g. fsync in POSIX) in its destructor.
|
||||
@ -195,6 +173,21 @@ public:
|
||||
/// Remove file or directory with all children. Use with extra caution. Throws exception if file doesn't exists.
|
||||
virtual void removeRecursive(const String & path) = 0;
|
||||
|
||||
/// Remove file. Throws exception if file doesn't exists or if directory is not empty.
|
||||
/// Differs from removeFile for S3 disks
|
||||
/// Second bool param is a flag to remove (true) or keep (false) shared data on S3
|
||||
virtual void removeSharedFile(const String & path, bool) { removeFile(path); }
|
||||
|
||||
/// Remove file or directory with all children. Use with extra caution. Throws exception if file doesn't exists.
|
||||
/// Differs from removeRecursive for S3 disks
|
||||
/// Second bool param is a flag to remove (true) or keep (false) shared data on S3
|
||||
virtual void removeSharedRecursive(const String & path, bool) { removeRecursive(path); }
|
||||
|
||||
/// Remove file or directory if it exists.
|
||||
/// Differs from removeFileIfExists for S3 disks
|
||||
/// Second bool param is a flag to remove (true) or keep (false) shared data on S3
|
||||
virtual void removeSharedFileIfExists(const String & path, bool) { removeFileIfExists(path); }
|
||||
|
||||
/// Set last modified time to file or directory at `path`.
|
||||
virtual void setLastModified(const String & path, const Poco::Timestamp & timestamp) = 0;
|
||||
|
||||
@ -216,6 +209,15 @@ public:
|
||||
/// Invoked when Global Context is shutdown.
|
||||
virtual void shutdown() { }
|
||||
|
||||
/// Return some uniq string for file, overrode for S3
|
||||
/// Required for distinguish different copies of the same part on S3
|
||||
virtual String getUniqueId(const String & path) const { return path; }
|
||||
|
||||
/// Check file exists and ClickHouse has an access to it
|
||||
/// Overrode in DiskS3
|
||||
/// Required for S3 to ensure that replica has access to data wroten by other node
|
||||
virtual bool checkUniqueId(const String & id) const { return exists(id); }
|
||||
|
||||
/// Returns executor to perform asynchronous operations.
|
||||
virtual Executor & getExecutor() { return *executor; }
|
||||
|
||||
|
@ -1,4 +1,7 @@
|
||||
#pragma once
|
||||
|
||||
#include <Disks/DiskType.h>
|
||||
|
||||
#include <memory>
|
||||
#include <vector>
|
||||
#include <common/types.h>
|
||||
@ -36,6 +39,7 @@ public:
|
||||
/// mutations files
|
||||
virtual DiskPtr getAnyDisk() const = 0;
|
||||
virtual DiskPtr getDiskByName(const String & disk_name) const = 0;
|
||||
virtual Disks getDisksByType(DiskType::Type type) const = 0;
|
||||
/// Get free space from most free disk
|
||||
virtual UInt64 getMaxUnreservedFreeSpace() const = 0;
|
||||
/// Reserves space on any volume with index > min_volume_index or returns nullptr
|
||||
@ -57,6 +61,7 @@ public:
|
||||
/// Check if we have any volume with stopped merges
|
||||
virtual bool hasAnyVolumeWithDisabledMerges() const = 0;
|
||||
virtual bool containsVolume(const String & volume_name) const = 0;
|
||||
/// Returns disks by type ordered by volumes priority
|
||||
};
|
||||
|
||||
}
|
||||
|
@ -90,6 +90,16 @@ void throwIfError(Aws::Utils::Outcome<Result, Error> & response)
|
||||
}
|
||||
}
|
||||
|
||||
template <typename Result, typename Error>
|
||||
void throwIfError(const Aws::Utils::Outcome<Result, Error> & response)
|
||||
{
|
||||
if (!response.IsSuccess())
|
||||
{
|
||||
const auto & err = response.GetError();
|
||||
throw Exception(err.GetMessage(), static_cast<int>(err.GetErrorType()));
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* S3 metadata file layout:
|
||||
* Number of S3 objects, Total size of all S3 objects.
|
||||
@ -609,6 +619,15 @@ void DiskS3::createDirectories(const String & path)
|
||||
Poco::File(metadata_path + path).createDirectories();
|
||||
}
|
||||
|
||||
String DiskS3::getUniqueId(const String & path) const
|
||||
{
|
||||
Metadata metadata(s3_root_path, metadata_path, path);
|
||||
String id;
|
||||
if (!metadata.s3_objects.empty())
|
||||
id = metadata.s3_root_path + metadata.s3_objects[0].first;
|
||||
return id;
|
||||
}
|
||||
|
||||
DiskDirectoryIteratorPtr DiskS3::iterateDirectory(const String & path)
|
||||
{
|
||||
return std::make_unique<DiskS3DirectoryIterator>(metadata_path + path, path);
|
||||
@ -791,13 +810,6 @@ void DiskS3::removeAws(const AwsS3KeyKeeper & keys)
|
||||
}
|
||||
}
|
||||
|
||||
void DiskS3::removeFile(const String & path)
|
||||
{
|
||||
AwsS3KeyKeeper keys;
|
||||
removeMeta(path, keys);
|
||||
removeAws(keys);
|
||||
}
|
||||
|
||||
void DiskS3::removeFileIfExists(const String & path)
|
||||
{
|
||||
AwsS3KeyKeeper keys;
|
||||
@ -813,11 +825,20 @@ void DiskS3::removeDirectory(const String & path)
|
||||
Poco::File(metadata_path + path).remove();
|
||||
}
|
||||
|
||||
void DiskS3::removeRecursive(const String & path)
|
||||
void DiskS3::removeSharedFile(const String & path, bool keep_s3)
|
||||
{
|
||||
AwsS3KeyKeeper keys;
|
||||
removeMeta(path, keys);
|
||||
if (!keep_s3)
|
||||
removeAws(keys);
|
||||
}
|
||||
|
||||
void DiskS3::removeSharedRecursive(const String & path, bool keep_s3)
|
||||
{
|
||||
AwsS3KeyKeeper keys;
|
||||
removeMetaRecursive(path, keys);
|
||||
removeAws(keys);
|
||||
if (!keep_s3)
|
||||
removeAws(keys);
|
||||
}
|
||||
|
||||
bool DiskS3::tryReserve(UInt64 bytes)
|
||||
@ -956,6 +977,23 @@ bool DiskS3::checkObjectExists(const String & prefix)
|
||||
return !outcome.GetResult().GetContents().empty();
|
||||
}
|
||||
|
||||
bool DiskS3::checkUniqueId(const String & id) const
|
||||
{
|
||||
/// Check that we have right s3 and have access rights
|
||||
/// Actually interprets id as s3 object name and checks if it exists
|
||||
Aws::S3::Model::ListObjectsV2Request request;
|
||||
request.SetBucket(bucket);
|
||||
request.SetPrefix(id);
|
||||
auto resp = client->ListObjectsV2(request);
|
||||
throwIfError(resp);
|
||||
Aws::Vector<Aws::S3::Model::Object> object_list = resp.GetResult().GetContents();
|
||||
|
||||
for (const auto & object : object_list)
|
||||
if (object.GetKey() == id)
|
||||
return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
Aws::S3::Model::HeadObjectResult DiskS3::headObject(const String & source_bucket, const String & key)
|
||||
{
|
||||
Aws::S3::Model::HeadObjectRequest request;
|
||||
|
@ -96,10 +96,13 @@ public:
|
||||
size_t buf_size,
|
||||
WriteMode mode) override;
|
||||
|
||||
void removeFile(const String & path) override;
|
||||
void removeFile(const String & path) override { removeSharedFile(path, false); }
|
||||
void removeFileIfExists(const String & path) override;
|
||||
void removeDirectory(const String & path) override;
|
||||
void removeRecursive(const String & path) override;
|
||||
void removeRecursive(const String & path) override { removeSharedRecursive(path, false); }
|
||||
|
||||
void removeSharedFile(const String & path, bool keep_s3) override;
|
||||
void removeSharedRecursive(const String & path, bool keep_s3) override;
|
||||
|
||||
void createHardLink(const String & src_path, const String & dst_path) override;
|
||||
|
||||
@ -115,6 +118,14 @@ public:
|
||||
|
||||
void shutdown() override;
|
||||
|
||||
/// Return some uniq string for file
|
||||
/// Required for distinguish different copies of the same part on S3
|
||||
String getUniqueId(const String & path) const override;
|
||||
|
||||
/// Check file exists and ClickHouse has an access to it
|
||||
/// Required for S3 to ensure that replica has access to data wroten by other node
|
||||
bool checkUniqueId(const String & id) const override;
|
||||
|
||||
/// Actions performed after disk creation.
|
||||
void startup();
|
||||
|
||||
|
@ -159,6 +159,17 @@ Disks StoragePolicy::getDisks() const
|
||||
}
|
||||
|
||||
|
||||
Disks StoragePolicy::getDisksByType(DiskType::Type type) const
|
||||
{
|
||||
Disks res;
|
||||
for (const auto & volume : volumes)
|
||||
for (const auto & disk : volume->getDisks())
|
||||
if (disk->getType() == type)
|
||||
res.push_back(disk);
|
||||
return res;
|
||||
}
|
||||
|
||||
|
||||
DiskPtr StoragePolicy::getAnyDisk() const
|
||||
{
|
||||
/// StoragePolicy must contain at least one Volume
|
||||
|
@ -47,6 +47,9 @@ public:
|
||||
/// Returns disks ordered by volumes priority
|
||||
Disks getDisks() const override;
|
||||
|
||||
/// Returns disks by type ordered by volumes priority
|
||||
Disks getDisksByType(DiskType::Type type) const override;
|
||||
|
||||
/// Returns any disk
|
||||
/// Used when it's not important, for example for
|
||||
/// mutations files
|
||||
|
@ -54,7 +54,6 @@ namespace ErrorCodes
|
||||
extern const int ILLEGAL_COLUMN;
|
||||
extern const int BAD_ARGUMENTS;
|
||||
extern const int TYPE_MISMATCH;
|
||||
extern const int NOT_IMPLEMENTED;
|
||||
}
|
||||
|
||||
|
||||
@ -154,13 +153,20 @@ public:
|
||||
String getName() const override { return name; }
|
||||
|
||||
private:
|
||||
size_t getNumberOfArguments() const override { return 2; }
|
||||
size_t getNumberOfArguments() const override { return 0; }
|
||||
bool isVariadic() const override { return true; }
|
||||
|
||||
bool isDeterministic() const override { return false; }
|
||||
|
||||
bool useDefaultImplementationForConstants() const final { return true; }
|
||||
|
||||
ColumnNumbers getArgumentsThatAreAlwaysConstant() const final { return {0}; }
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
if (arguments.size() < 2)
|
||||
throw Exception{"Wrong argument count for function " + getName(), ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH};
|
||||
|
||||
if (!isString(arguments[0]))
|
||||
throw Exception{"Illegal type " + arguments[0]->getName() + " of first argument of function " + getName()
|
||||
+ ", expected a string.", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT};
|
||||
@ -173,8 +179,6 @@ private:
|
||||
return std::make_shared<DataTypeUInt8>();
|
||||
}
|
||||
|
||||
bool isDeterministic() const override { return false; }
|
||||
|
||||
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const override
|
||||
{
|
||||
/** Do not require existence of the dictionary if the function is called for empty columns.
|
||||
@ -194,6 +198,24 @@ private:
|
||||
const auto key_column = key_column_with_type.column;
|
||||
const auto key_column_type = WhichDataType(key_column_with_type.type);
|
||||
|
||||
ColumnPtr range_col = nullptr;
|
||||
DataTypePtr range_col_type = nullptr;
|
||||
|
||||
if (dictionary_key_type == DictionaryKeyType::range)
|
||||
{
|
||||
if (arguments.size() != 3)
|
||||
throw Exception{"Wrong argument count for function " + getName()
|
||||
+ " when dictionary has key type range", ErrorCodes::NUMBER_OF_ARGUMENTS_DOESNT_MATCH};
|
||||
|
||||
range_col = arguments[2].column;
|
||||
range_col_type = arguments[2].type;
|
||||
|
||||
if (!(range_col_type->isValueRepresentedByInteger() && range_col_type->getSizeOfValueInMemory() <= sizeof(Int64)))
|
||||
throw Exception{"Illegal type " + range_col_type->getName() + " of fourth argument of function "
|
||||
+ getName() + " must be convertible to Int64.",
|
||||
ErrorCodes::ILLEGAL_COLUMN};
|
||||
}
|
||||
|
||||
if (dictionary_key_type == DictionaryKeyType::simple)
|
||||
{
|
||||
if (!key_column_type.isUInt64())
|
||||
@ -217,7 +239,7 @@ private:
|
||||
return dictionary->hasKeys(key_columns, key_types);
|
||||
}
|
||||
else
|
||||
throw Exception(ErrorCodes::NOT_IMPLEMENTED, "Has not supported for range dictionary", dictionary->getDictionaryID().getNameForLogs());
|
||||
return dictionary->hasKeys({key_column, range_col}, {std::make_shared<DataTypeUInt64>(), range_col_type});
|
||||
}
|
||||
|
||||
mutable FunctionDictHelper helper;
|
||||
|
360
src/Functions/geometryConverters.h
Normal file
360
src/Functions/geometryConverters.h
Normal file
@ -0,0 +1,360 @@
|
||||
#pragma once
|
||||
|
||||
#include <Core/ColumnWithTypeAndName.h>
|
||||
#include <Core/Types.h>
|
||||
|
||||
#include <boost/geometry/geometries/geometries.hpp>
|
||||
#include <boost/geometry.hpp>
|
||||
#include <boost/geometry/geometries/point_xy.hpp>
|
||||
|
||||
#include <Columns/ColumnsNumber.h>
|
||||
#include <Columns/ColumnArray.h>
|
||||
#include <Columns/ColumnTuple.h>
|
||||
#include <Common/NaNUtils.h>
|
||||
#include <DataTypes/DataTypeArray.h>
|
||||
#include <DataTypes/IDataType.h>
|
||||
#include <DataTypes/DataTypeCustomGeo.h>
|
||||
#include <IO/WriteHelpers.h>
|
||||
#include <Interpreters/castColumn.h>
|
||||
|
||||
#include <cmath>
|
||||
#include <common/logger_useful.h>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int BAD_ARGUMENTS;
|
||||
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
|
||||
}
|
||||
|
||||
template <typename Point>
|
||||
using Ring = boost::geometry::model::ring<Point>;
|
||||
|
||||
template <typename Point>
|
||||
using Polygon = boost::geometry::model::polygon<Point>;
|
||||
|
||||
template <typename Point>
|
||||
using MultiPolygon = boost::geometry::model::multi_polygon<Polygon<Point>>;
|
||||
|
||||
using CartesianPoint = boost::geometry::model::d2::point_xy<Float64>;
|
||||
using CartesianRing = Ring<CartesianPoint>;
|
||||
using CartesianPolygon = Polygon<CartesianPoint>;
|
||||
using CartesianMultiPolygon = MultiPolygon<CartesianPoint>;
|
||||
|
||||
using SphericalPoint = boost::geometry::model::point<Float64, 2, boost::geometry::cs::spherical_equatorial<boost::geometry::degree>>;
|
||||
using SphericalRing = Ring<SphericalPoint>;
|
||||
using SphericalPolygon = Polygon<SphericalPoint>;
|
||||
using SphericalMultiPolygon = MultiPolygon<SphericalPoint>;
|
||||
|
||||
/**
|
||||
* Class which takes converts Column with type Tuple(Float64, Float64) to a vector of boost point type.
|
||||
* They are (x,y) in case of cartesian coordinated and (lon,lat) in case of Spherical.
|
||||
*/
|
||||
template <typename Point>
|
||||
struct ColumnToPointsConverter
|
||||
{
|
||||
static std::vector<Point> convert(ColumnPtr col)
|
||||
{
|
||||
const auto * tuple = typeid_cast<const ColumnTuple *>(col.get());
|
||||
const auto & tuple_columns = tuple->getColumns();
|
||||
|
||||
const auto * x_data = typeid_cast<const ColumnFloat64 *>(tuple_columns[0].get());
|
||||
const auto * y_data = typeid_cast<const ColumnFloat64 *>(tuple_columns[1].get());
|
||||
|
||||
const auto * first_container = x_data->getData().data();
|
||||
const auto * second_container = y_data->getData().data();
|
||||
|
||||
std::vector<Point> answer(col->size());
|
||||
|
||||
for (size_t i = 0; i < col->size(); ++i)
|
||||
{
|
||||
const Float64 first = first_container[i];
|
||||
const Float64 second = second_container[i];
|
||||
|
||||
if (isNaN(first) || isNaN(second))
|
||||
throw Exception("Point's component must not be NaN", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
|
||||
if (isinf(first) || isinf(second))
|
||||
throw Exception("Point's component must not be infinite", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
|
||||
answer[i] = Point(first, second);
|
||||
}
|
||||
|
||||
return answer;
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
template <typename Point>
|
||||
struct ColumnToRingsConverter
|
||||
{
|
||||
static std::vector<Ring<Point>> convert(ColumnPtr col)
|
||||
{
|
||||
const IColumn::Offsets & offsets = typeid_cast<const ColumnArray &>(*col).getOffsets();
|
||||
size_t prev_offset = 0;
|
||||
std::vector<Ring<Point>> answer;
|
||||
answer.reserve(offsets.size());
|
||||
auto tmp = ColumnToPointsConverter<Point>::convert(typeid_cast<const ColumnArray &>(*col).getDataPtr());
|
||||
for (size_t offset : offsets)
|
||||
{
|
||||
answer.emplace_back(tmp.begin() + prev_offset, tmp.begin() + offset);
|
||||
prev_offset = offset;
|
||||
}
|
||||
return answer;
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
template <typename Point>
|
||||
struct ColumnToPolygonsConverter
|
||||
{
|
||||
static std::vector<Polygon<Point>> convert(ColumnPtr col)
|
||||
{
|
||||
const IColumn::Offsets & offsets = typeid_cast<const ColumnArray &>(*col).getOffsets();
|
||||
std::vector<Polygon<Point>> answer(offsets.size());
|
||||
auto all_rings = ColumnToRingsConverter<Point>::convert(typeid_cast<const ColumnArray &>(*col).getDataPtr());
|
||||
|
||||
size_t prev_offset = 0;
|
||||
for (size_t iter = 0; iter < offsets.size(); ++iter)
|
||||
{
|
||||
const auto current_array_size = offsets[iter] - prev_offset;
|
||||
answer[iter].outer() = std::move(all_rings[prev_offset]);
|
||||
answer[iter].inners().reserve(current_array_size);
|
||||
for (size_t inner_holes = prev_offset + 1; inner_holes < offsets[iter]; ++inner_holes)
|
||||
answer[iter].inners().emplace_back(std::move(all_rings[inner_holes]));
|
||||
prev_offset = offsets[iter];
|
||||
}
|
||||
|
||||
return answer;
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
template <typename Point>
|
||||
struct ColumnToMultiPolygonsConverter
|
||||
{
|
||||
static std::vector<MultiPolygon<Point>> convert(ColumnPtr col)
|
||||
{
|
||||
const IColumn::Offsets & offsets = typeid_cast<const ColumnArray &>(*col).getOffsets();
|
||||
size_t prev_offset = 0;
|
||||
std::vector<MultiPolygon<Point>> answer(offsets.size());
|
||||
|
||||
auto all_polygons = ColumnToPolygonsConverter<Point>::convert(typeid_cast<const ColumnArray &>(*col).getDataPtr());
|
||||
|
||||
for (size_t iter = 0; iter < offsets.size(); ++iter)
|
||||
{
|
||||
for (size_t polygon_iter = prev_offset; polygon_iter < offsets[iter]; ++polygon_iter)
|
||||
answer[iter].emplace_back(std::move(all_polygons[polygon_iter]));
|
||||
prev_offset = offsets[iter];
|
||||
}
|
||||
|
||||
return answer;
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
/// To serialize Spherical or Cartesian point (a pair of numbers in both cases).
|
||||
template <typename Point>
|
||||
class PointSerializer
|
||||
{
|
||||
public:
|
||||
PointSerializer()
|
||||
: first(ColumnFloat64::create())
|
||||
, second(ColumnFloat64::create())
|
||||
, first_container(first->getData())
|
||||
, second_container(second->getData())
|
||||
{}
|
||||
|
||||
explicit PointSerializer(size_t n)
|
||||
: first(ColumnFloat64::create(n))
|
||||
, second(ColumnFloat64::create(n))
|
||||
, first_container(first->getData())
|
||||
, second_container(second->getData())
|
||||
{}
|
||||
|
||||
void add(const Point & point)
|
||||
{
|
||||
first_container.emplace_back(point.template get<0>());
|
||||
second_container.emplace_back(point.template get<1>());
|
||||
}
|
||||
|
||||
ColumnPtr finalize()
|
||||
{
|
||||
Columns columns(2);
|
||||
columns[0] = std::move(first);
|
||||
columns[1] = std::move(second);
|
||||
|
||||
return ColumnTuple::create(columns);
|
||||
}
|
||||
|
||||
private:
|
||||
ColumnFloat64::MutablePtr first;
|
||||
ColumnFloat64::MutablePtr second;
|
||||
|
||||
ColumnFloat64::Container & first_container;
|
||||
ColumnFloat64::Container & second_container;
|
||||
};
|
||||
|
||||
/// Serialize Point, Ring as Ring
|
||||
template <typename Point>
|
||||
class RingSerializer
|
||||
{
|
||||
public:
|
||||
RingSerializer()
|
||||
: offsets(ColumnUInt64::create())
|
||||
{}
|
||||
|
||||
explicit RingSerializer(size_t n)
|
||||
: offsets(ColumnUInt64::create(n))
|
||||
{}
|
||||
|
||||
void add(const Ring<Point> & ring)
|
||||
{
|
||||
size += ring.size();
|
||||
offsets->insertValue(size);
|
||||
for (const auto & point : ring)
|
||||
point_serializer.add(point);
|
||||
}
|
||||
|
||||
ColumnPtr finalize()
|
||||
{
|
||||
return ColumnArray::create(point_serializer.finalize(), std::move(offsets));
|
||||
}
|
||||
|
||||
private:
|
||||
size_t size = 0;
|
||||
PointSerializer<Point> point_serializer;
|
||||
ColumnUInt64::MutablePtr offsets;
|
||||
};
|
||||
|
||||
/// Serialize Point, Ring, Polygon as Polygon
|
||||
template <typename Point>
|
||||
class PolygonSerializer
|
||||
{
|
||||
public:
|
||||
PolygonSerializer()
|
||||
: offsets(ColumnUInt64::create())
|
||||
{}
|
||||
|
||||
explicit PolygonSerializer(size_t n)
|
||||
: offsets(ColumnUInt64::create(n))
|
||||
{}
|
||||
|
||||
void add(const Ring<Point> & ring)
|
||||
{
|
||||
size++;
|
||||
offsets->insertValue(size);
|
||||
ring_serializer.add(ring);
|
||||
}
|
||||
|
||||
void add(const Polygon<Point> & polygon)
|
||||
{
|
||||
/// Outer ring + all inner rings (holes).
|
||||
size += 1 + polygon.inners().size();
|
||||
offsets->insertValue(size);
|
||||
ring_serializer.add(polygon.outer());
|
||||
for (const auto & ring : polygon.inners())
|
||||
ring_serializer.add(ring);
|
||||
}
|
||||
|
||||
ColumnPtr finalize()
|
||||
{
|
||||
return ColumnArray::create(ring_serializer.finalize(), std::move(offsets));
|
||||
}
|
||||
|
||||
private:
|
||||
size_t size = 0;
|
||||
RingSerializer<Point> ring_serializer;
|
||||
ColumnUInt64::MutablePtr offsets;
|
||||
};
|
||||
|
||||
/// Serialize Point, Ring, Polygon, MultiPolygon as MultiPolygon
|
||||
template <typename Point>
|
||||
class MultiPolygonSerializer
|
||||
{
|
||||
public:
|
||||
MultiPolygonSerializer()
|
||||
: offsets(ColumnUInt64::create())
|
||||
{}
|
||||
|
||||
explicit MultiPolygonSerializer(size_t n)
|
||||
: offsets(ColumnUInt64::create(n))
|
||||
{}
|
||||
|
||||
void add(const Ring<Point> & ring)
|
||||
{
|
||||
size++;
|
||||
offsets->insertValue(size);
|
||||
polygon_serializer.add(ring);
|
||||
}
|
||||
|
||||
void add(const Polygon<Point> & polygon)
|
||||
{
|
||||
size++;
|
||||
offsets->insertValue(size);
|
||||
polygon_serializer.add(polygon);
|
||||
}
|
||||
|
||||
void add(const MultiPolygon<Point> & multi_polygon)
|
||||
{
|
||||
size += multi_polygon.size();
|
||||
offsets->insertValue(size);
|
||||
for (const auto & polygon : multi_polygon)
|
||||
{
|
||||
polygon_serializer.add(polygon);
|
||||
}
|
||||
}
|
||||
|
||||
ColumnPtr finalize()
|
||||
{
|
||||
return ColumnArray::create(polygon_serializer.finalize(), std::move(offsets));
|
||||
}
|
||||
|
||||
private:
|
||||
size_t size = 0;
|
||||
PolygonSerializer<Point> polygon_serializer;
|
||||
ColumnUInt64::MutablePtr offsets;
|
||||
};
|
||||
|
||||
|
||||
template <typename PType>
|
||||
struct ConverterType
|
||||
{
|
||||
using Type = PType;
|
||||
};
|
||||
|
||||
template <typename Point, typename F>
|
||||
static void callOnGeometryDataType(DataTypePtr type, F && f)
|
||||
{
|
||||
/// There is no Point type, because for most of geometry functions it is useless.
|
||||
if (DataTypeCustomPointSerialization::nestedDataType()->equals(*type))
|
||||
return f(ConverterType<ColumnToPointsConverter<Point>>());
|
||||
else if (DataTypeCustomRingSerialization::nestedDataType()->equals(*type))
|
||||
return f(ConverterType<ColumnToRingsConverter<Point>>());
|
||||
else if (DataTypeCustomPolygonSerialization::nestedDataType()->equals(*type))
|
||||
return f(ConverterType<ColumnToPolygonsConverter<Point>>());
|
||||
else if (DataTypeCustomMultiPolygonSerialization::nestedDataType()->equals(*type))
|
||||
return f(ConverterType<ColumnToMultiPolygonsConverter<Point>>());
|
||||
throw Exception(fmt::format("Unknown geometry type {}", type->getName()), ErrorCodes::BAD_ARGUMENTS);
|
||||
}
|
||||
|
||||
|
||||
template <typename Point, typename F>
|
||||
static void callOnTwoGeometryDataTypes(DataTypePtr left_type, DataTypePtr right_type, F && func)
|
||||
{
|
||||
return callOnGeometryDataType<Point>(left_type, [&](const auto & left_types)
|
||||
{
|
||||
using LeftConverterType = std::decay_t<decltype(left_types)>;
|
||||
|
||||
return callOnGeometryDataType<Point>(right_type, [&](const auto & right_types)
|
||||
{
|
||||
using RightConverterType = std::decay_t<decltype(right_types)>;
|
||||
|
||||
return func(LeftConverterType(), RightConverterType());
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
}
|
@ -13,6 +13,7 @@
|
||||
#include <Columns/ColumnsNumber.h>
|
||||
#include <Common/ObjectPool.h>
|
||||
#include <Common/ProfileEvents.h>
|
||||
#include <common/arithmeticOverflow.h>
|
||||
#include <DataTypes/DataTypeArray.h>
|
||||
#include <DataTypes/DataTypeString.h>
|
||||
#include <DataTypes/DataTypeTuple.h>
|
||||
@ -425,7 +426,14 @@ private:
|
||||
{
|
||||
out_container.reserve(end - begin);
|
||||
for (size_t i = begin; i < end; ++i)
|
||||
{
|
||||
Int64 result = 0;
|
||||
if (common::mulOverflow(static_cast<Int64>(x_data[i]), static_cast<Int64>(y_data[i]), result))
|
||||
throw Exception("The coordinates of the point are such that subsequent calculations cannot be performed correctly. " \
|
||||
"Most likely they are very large in modulus.", ErrorCodes::BAD_ARGUMENTS);
|
||||
|
||||
out_container.emplace_back(x_data[i], y_data[i]);
|
||||
}
|
||||
}
|
||||
|
||||
void parseConstPolygonWithoutHolesFromSingleColumn(const IColumn & column, size_t i, Polygon & out_polygon) const
|
||||
@ -540,7 +548,7 @@ private:
|
||||
}
|
||||
}
|
||||
|
||||
void parseConstPolygon(const ColumnsWithTypeAndName & arguments, Polygon & out_polygon) const
|
||||
void NO_SANITIZE_UNDEFINED parseConstPolygon(const ColumnsWithTypeAndName & arguments, Polygon & out_polygon) const
|
||||
{
|
||||
if (arguments.size() == 2)
|
||||
parseConstPolygonFromSingleColumn(arguments, out_polygon);
|
||||
|
107
src/Functions/polygonArea.cpp
Normal file
107
src/Functions/polygonArea.cpp
Normal file
@ -0,0 +1,107 @@
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/geometryConverters.h>
|
||||
|
||||
#include <boost/geometry.hpp>
|
||||
#include <boost/geometry/geometries/point_xy.hpp>
|
||||
#include <boost/geometry/geometries/polygon.hpp>
|
||||
|
||||
#include <common/logger_useful.h>
|
||||
|
||||
#include <Columns/ColumnArray.h>
|
||||
#include <Columns/ColumnTuple.h>
|
||||
#include <Columns/ColumnsNumber.h>
|
||||
#include <DataTypes/DataTypeArray.h>
|
||||
#include <DataTypes/DataTypeTuple.h>
|
||||
#include <DataTypes/DataTypeCustomGeo.h>
|
||||
|
||||
#include <memory>
|
||||
#include <string>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
|
||||
}
|
||||
|
||||
template <typename Point>
|
||||
class FunctionPolygonArea : public IFunction
|
||||
{
|
||||
public:
|
||||
static inline const char * name;
|
||||
|
||||
explicit FunctionPolygonArea() = default;
|
||||
|
||||
static FunctionPtr create(const Context &)
|
||||
{
|
||||
return std::make_shared<FunctionPolygonArea>();
|
||||
}
|
||||
|
||||
String getName() const override
|
||||
{
|
||||
return name;
|
||||
}
|
||||
|
||||
bool isVariadic() const override
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override
|
||||
{
|
||||
return 1;
|
||||
}
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes &) const override
|
||||
{
|
||||
return std::make_shared<DataTypeFloat64>();
|
||||
}
|
||||
|
||||
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & /*result_type*/, size_t input_rows_count) const override
|
||||
{
|
||||
auto res_column = ColumnFloat64::create();
|
||||
auto & res_data = res_column->getData();
|
||||
res_data.reserve(input_rows_count);
|
||||
|
||||
callOnGeometryDataType<Point>(arguments[0].type, [&] (const auto & type)
|
||||
{
|
||||
using TypeConverter = std::decay_t<decltype(type)>;
|
||||
using Converter = typename TypeConverter::Type;
|
||||
|
||||
if constexpr (std::is_same_v<ColumnToPointsConverter<Point>, Converter>)
|
||||
throw Exception(fmt::format("The argument of function {} must not be Point", getName()), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
else
|
||||
{
|
||||
auto geometries = Converter::convert(arguments[0].column->convertToFullColumnIfConst());
|
||||
|
||||
for (size_t i = 0; i < input_rows_count; i++)
|
||||
res_data.emplace_back(boost::geometry::area(geometries[i]));
|
||||
}
|
||||
}
|
||||
);
|
||||
|
||||
return res_column;
|
||||
}
|
||||
|
||||
bool useDefaultImplementationForConstants() const override
|
||||
{
|
||||
return true;
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
const char * FunctionPolygonArea<CartesianPoint>::name = "polygonAreaCartesian";
|
||||
|
||||
template <>
|
||||
const char * FunctionPolygonArea<SphericalPoint>::name = "polygonAreaSpherical";
|
||||
|
||||
|
||||
void registerFunctionPolygonArea(FunctionFactory & factory)
|
||||
{
|
||||
factory.registerFunction<FunctionPolygonArea<CartesianPoint>>();
|
||||
factory.registerFunction<FunctionPolygonArea<SphericalPoint>>();
|
||||
}
|
||||
|
||||
|
||||
}
|
105
src/Functions/polygonConvexHull.cpp
Normal file
105
src/Functions/polygonConvexHull.cpp
Normal file
@ -0,0 +1,105 @@
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/geometryConverters.h>
|
||||
|
||||
#include <boost/geometry.hpp>
|
||||
#include <boost/geometry/geometries/point_xy.hpp>
|
||||
#include <boost/geometry/geometries/polygon.hpp>
|
||||
|
||||
#include <common/logger_useful.h>
|
||||
|
||||
#include <Columns/ColumnArray.h>
|
||||
#include <Columns/ColumnTuple.h>
|
||||
#include <Columns/ColumnsNumber.h>
|
||||
#include <DataTypes/DataTypeArray.h>
|
||||
#include <DataTypes/DataTypeTuple.h>
|
||||
#include <DataTypes/DataTypeCustomGeo.h>
|
||||
|
||||
#include <memory>
|
||||
#include <string>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int BAD_ARGUMENTS;
|
||||
}
|
||||
|
||||
template <typename Point>
|
||||
class FunctionPolygonConvexHull : public IFunction
|
||||
{
|
||||
public:
|
||||
static const char * name;
|
||||
|
||||
explicit FunctionPolygonConvexHull() = default;
|
||||
|
||||
static FunctionPtr create(const Context &)
|
||||
{
|
||||
return std::make_shared<FunctionPolygonConvexHull>();
|
||||
}
|
||||
|
||||
String getName() const override
|
||||
{
|
||||
return name;
|
||||
}
|
||||
|
||||
bool isVariadic() const override
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override
|
||||
{
|
||||
return 1;
|
||||
}
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes &) const override
|
||||
{
|
||||
return DataTypeCustomPolygonSerialization::nestedDataType();
|
||||
}
|
||||
|
||||
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & /*result_type*/, size_t input_rows_count) const override
|
||||
{
|
||||
PolygonSerializer<Point> serializer;
|
||||
|
||||
callOnGeometryDataType<Point>(arguments[0].type, [&] (const auto & type)
|
||||
{
|
||||
using TypeConverter = std::decay_t<decltype(type)>;
|
||||
using Converter = typename TypeConverter::Type;
|
||||
|
||||
if constexpr (std::is_same_v<Converter, ColumnToPointsConverter<Point>>)
|
||||
throw Exception(fmt::format("The argument of function {} must not be a Point", getName()), ErrorCodes::BAD_ARGUMENTS);
|
||||
else
|
||||
{
|
||||
auto geometries = Converter::convert(arguments[0].column->convertToFullColumnIfConst());
|
||||
|
||||
for (size_t i = 0; i < input_rows_count; i++)
|
||||
{
|
||||
Polygon<Point> convex_hull{};
|
||||
boost::geometry::convex_hull(geometries[i], convex_hull);
|
||||
serializer.add(convex_hull);
|
||||
}
|
||||
}
|
||||
}
|
||||
);
|
||||
|
||||
return serializer.finalize();
|
||||
}
|
||||
|
||||
bool useDefaultImplementationForConstants() const override
|
||||
{
|
||||
return true;
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
template <>
|
||||
const char * FunctionPolygonConvexHull<CartesianPoint>::name = "polygonConvexHullCartesian";
|
||||
|
||||
|
||||
void registerFunctionPolygonConvexHull(FunctionFactory & factory)
|
||||
{
|
||||
factory.registerFunction<FunctionPolygonConvexHull<CartesianPoint>>();
|
||||
}
|
||||
|
||||
}
|
107
src/Functions/polygonPerimeter.cpp
Normal file
107
src/Functions/polygonPerimeter.cpp
Normal file
@ -0,0 +1,107 @@
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/geometryConverters.h>
|
||||
|
||||
#include <boost/geometry.hpp>
|
||||
#include <boost/geometry/geometries/point_xy.hpp>
|
||||
#include <boost/geometry/geometries/polygon.hpp>
|
||||
|
||||
#include <common/logger_useful.h>
|
||||
|
||||
#include <Columns/ColumnArray.h>
|
||||
#include <Columns/ColumnTuple.h>
|
||||
#include <Columns/ColumnsNumber.h>
|
||||
#include <DataTypes/DataTypeArray.h>
|
||||
#include <DataTypes/DataTypeTuple.h>
|
||||
#include <DataTypes/DataTypeCustomGeo.h>
|
||||
|
||||
#include <memory>
|
||||
#include <string>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
|
||||
}
|
||||
|
||||
|
||||
template <typename Point>
|
||||
class FunctionPolygonPerimeter : public IFunction
|
||||
{
|
||||
public:
|
||||
static const char * name;
|
||||
|
||||
explicit FunctionPolygonPerimeter() = default;
|
||||
|
||||
static FunctionPtr create(const Context &)
|
||||
{
|
||||
return std::make_shared<FunctionPolygonPerimeter>();
|
||||
}
|
||||
|
||||
String getName() const override
|
||||
{
|
||||
return name;
|
||||
}
|
||||
|
||||
bool isVariadic() const override
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override
|
||||
{
|
||||
return 1;
|
||||
}
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes &) const override
|
||||
{
|
||||
return std::make_shared<DataTypeFloat64>();
|
||||
}
|
||||
|
||||
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & /*result_type*/, size_t input_rows_count) const override
|
||||
{
|
||||
auto res_column = ColumnFloat64::create();
|
||||
auto & res_data = res_column->getData();
|
||||
res_data.reserve(input_rows_count);
|
||||
|
||||
callOnGeometryDataType<Point>(arguments[0].type, [&] (const auto & type)
|
||||
{
|
||||
using TypeConverter = std::decay_t<decltype(type)>;
|
||||
using Converter = typename TypeConverter::Type;
|
||||
|
||||
if constexpr (std::is_same_v<ColumnToPointsConverter<Point>, Converter>)
|
||||
throw Exception(fmt::format("The argument of function {} must not be Point", getName()), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
else
|
||||
{
|
||||
auto geometries = Converter::convert(arguments[0].column->convertToFullColumnIfConst());
|
||||
|
||||
for (size_t i = 0; i < input_rows_count; i++)
|
||||
res_data.emplace_back(boost::geometry::perimeter(geometries[i]));
|
||||
}
|
||||
}
|
||||
);
|
||||
|
||||
return res_column;
|
||||
}
|
||||
|
||||
bool useDefaultImplementationForConstants() const override
|
||||
{
|
||||
return true;
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
const char * FunctionPolygonPerimeter<CartesianPoint>::name = "polygonPerimeterCartesian";
|
||||
|
||||
template <>
|
||||
const char * FunctionPolygonPerimeter<SphericalPoint>::name = "polygonPerimeterSpherical";
|
||||
|
||||
|
||||
void registerFunctionPolygonPerimeter(FunctionFactory & factory)
|
||||
{
|
||||
factory.registerFunction<FunctionPolygonPerimeter<CartesianPoint>>();
|
||||
factory.registerFunction<FunctionPolygonPerimeter<SphericalPoint>>();
|
||||
}
|
||||
|
||||
|
||||
}
|
117
src/Functions/polygonsDistance.cpp
Normal file
117
src/Functions/polygonsDistance.cpp
Normal file
@ -0,0 +1,117 @@
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/geometryConverters.h>
|
||||
|
||||
#include <boost/geometry.hpp>
|
||||
#include <boost/geometry/geometries/point_xy.hpp>
|
||||
#include <boost/geometry/geometries/polygon.hpp>
|
||||
|
||||
#include <common/logger_useful.h>
|
||||
|
||||
#include <Columns/ColumnArray.h>
|
||||
#include <Columns/ColumnTuple.h>
|
||||
#include <Columns/ColumnConst.h>
|
||||
#include <Columns/ColumnsNumber.h>
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
#include <DataTypes/DataTypeArray.h>
|
||||
#include <DataTypes/DataTypeTuple.h>
|
||||
#include <DataTypes/DataTypeCustomGeo.h>
|
||||
|
||||
#include <memory>
|
||||
#include <utility>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
|
||||
}
|
||||
|
||||
template <typename Point>
|
||||
class FunctionPolygonsDistance : public IFunction
|
||||
{
|
||||
public:
|
||||
static inline const char * name;
|
||||
|
||||
explicit FunctionPolygonsDistance() = default;
|
||||
|
||||
static FunctionPtr create(const Context &)
|
||||
{
|
||||
return std::make_shared<FunctionPolygonsDistance>();
|
||||
}
|
||||
|
||||
String getName() const override
|
||||
{
|
||||
return name;
|
||||
}
|
||||
|
||||
bool isVariadic() const override
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override
|
||||
{
|
||||
return 2;
|
||||
}
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes &) const override
|
||||
{
|
||||
return std::make_shared<DataTypeFloat64>();
|
||||
}
|
||||
|
||||
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & /*result_type*/, size_t input_rows_count) const override
|
||||
{
|
||||
auto res_column = ColumnFloat64::create();
|
||||
auto & res_data = res_column->getData();
|
||||
res_data.reserve(input_rows_count);
|
||||
|
||||
callOnTwoGeometryDataTypes<Point>(arguments[0].type, arguments[1].type, [&](const auto & left_type, const auto & right_type)
|
||||
{
|
||||
using LeftConverterType = std::decay_t<decltype(left_type)>;
|
||||
using RightConverterType = std::decay_t<decltype(right_type)>;
|
||||
|
||||
using LeftConverter = typename LeftConverterType::Type;
|
||||
using RightConverter = typename RightConverterType::Type;
|
||||
|
||||
if constexpr (std::is_same_v<ColumnToPointsConverter<Point>, LeftConverter> || std::is_same_v<ColumnToPointsConverter<Point>, RightConverter>)
|
||||
throw Exception(fmt::format("Any argument of function {} must not be Point", getName()), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
else
|
||||
{
|
||||
auto first = LeftConverter::convert(arguments[0].column->convertToFullColumnIfConst());
|
||||
auto second = RightConverter::convert(arguments[1].column->convertToFullColumnIfConst());
|
||||
|
||||
for (size_t i = 0; i < input_rows_count; i++)
|
||||
{
|
||||
boost::geometry::correct(first[i]);
|
||||
boost::geometry::correct(second[i]);
|
||||
|
||||
res_data.emplace_back(boost::geometry::distance(first[i], second[i]));
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
return res_column;
|
||||
}
|
||||
|
||||
bool useDefaultImplementationForConstants() const override
|
||||
{
|
||||
return true;
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
const char * FunctionPolygonsDistance<CartesianPoint>::name = "polygonsDistanceCartesian";
|
||||
|
||||
template <>
|
||||
const char * FunctionPolygonsDistance<SphericalPoint>::name = "polygonsDistanceSpherical";
|
||||
|
||||
|
||||
void registerFunctionPolygonsDistance(FunctionFactory & factory)
|
||||
{
|
||||
factory.registerFunction<FunctionPolygonsDistance<CartesianPoint>>();
|
||||
factory.registerFunction<FunctionPolygonsDistance<SphericalPoint>>();
|
||||
}
|
||||
|
||||
|
||||
}
|
114
src/Functions/polygonsEquals.cpp
Normal file
114
src/Functions/polygonsEquals.cpp
Normal file
@ -0,0 +1,114 @@
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/geometryConverters.h>
|
||||
|
||||
#include <boost/geometry.hpp>
|
||||
#include <boost/geometry/geometries/point_xy.hpp>
|
||||
#include <boost/geometry/geometries/polygon.hpp>
|
||||
|
||||
#include <common/logger_useful.h>
|
||||
|
||||
#include <Columns/ColumnArray.h>
|
||||
#include <Columns/ColumnTuple.h>
|
||||
#include <Columns/ColumnConst.h>
|
||||
#include <Columns/ColumnsNumber.h>
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
#include <DataTypes/DataTypeArray.h>
|
||||
#include <DataTypes/DataTypeTuple.h>
|
||||
#include <DataTypes/DataTypeCustomGeo.h>
|
||||
|
||||
#include <memory>
|
||||
#include <utility>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
|
||||
}
|
||||
|
||||
template <typename Point>
|
||||
class FunctionPolygonsEquals : public IFunction
|
||||
{
|
||||
public:
|
||||
static const char * name;
|
||||
|
||||
explicit FunctionPolygonsEquals() = default;
|
||||
|
||||
static FunctionPtr create(const Context &)
|
||||
{
|
||||
return std::make_shared<FunctionPolygonsEquals>();
|
||||
}
|
||||
|
||||
String getName() const override
|
||||
{
|
||||
return name;
|
||||
}
|
||||
|
||||
bool isVariadic() const override
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override
|
||||
{
|
||||
return 2;
|
||||
}
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes &) const override
|
||||
{
|
||||
return std::make_shared<DataTypeUInt8>();
|
||||
}
|
||||
|
||||
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & /*result_type*/, size_t input_rows_count) const override
|
||||
{
|
||||
auto res_column = ColumnUInt8::create();
|
||||
auto & res_data = res_column->getData();
|
||||
res_data.reserve(input_rows_count);
|
||||
|
||||
callOnTwoGeometryDataTypes<Point>(arguments[0].type, arguments[1].type, [&](const auto & left_type, const auto & right_type)
|
||||
{
|
||||
using LeftConverterType = std::decay_t<decltype(left_type)>;
|
||||
using RightConverterType = std::decay_t<decltype(right_type)>;
|
||||
|
||||
using LeftConverter = typename LeftConverterType::Type;
|
||||
using RightConverter = typename RightConverterType::Type;
|
||||
|
||||
if constexpr (std::is_same_v<ColumnToPointsConverter<Point>, LeftConverter> || std::is_same_v<ColumnToPointsConverter<Point>, RightConverter>)
|
||||
throw Exception(fmt::format("Any argument of function {} must not be Point", getName()), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
else
|
||||
{
|
||||
auto first = LeftConverter::convert(arguments[0].column->convertToFullColumnIfConst());
|
||||
auto second = RightConverter::convert(arguments[1].column->convertToFullColumnIfConst());
|
||||
|
||||
for (size_t i = 0; i < input_rows_count; i++)
|
||||
{
|
||||
boost::geometry::correct(first[i]);
|
||||
boost::geometry::correct(second[i]);
|
||||
|
||||
/// Main work here.
|
||||
res_data.emplace_back(boost::geometry::equals(first[i], second[i]));
|
||||
}
|
||||
}
|
||||
}
|
||||
);
|
||||
|
||||
return res_column;
|
||||
}
|
||||
|
||||
bool useDefaultImplementationForConstants() const override
|
||||
{
|
||||
return true;
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
template <>
|
||||
const char * FunctionPolygonsEquals<CartesianPoint>::name = "polygonsEqualsCartesian";
|
||||
|
||||
|
||||
void registerFunctionPolygonsEquals(FunctionFactory & factory)
|
||||
{
|
||||
factory.registerFunction<FunctionPolygonsEquals<CartesianPoint>>();
|
||||
}
|
||||
|
||||
}
|
122
src/Functions/polygonsIntersection.cpp
Normal file
122
src/Functions/polygonsIntersection.cpp
Normal file
@ -0,0 +1,122 @@
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/geometryConverters.h>
|
||||
|
||||
#include <boost/geometry.hpp>
|
||||
#include <boost/geometry/geometries/point_xy.hpp>
|
||||
#include <boost/geometry/geometries/polygon.hpp>
|
||||
|
||||
#include <common/logger_useful.h>
|
||||
|
||||
#include <Columns/ColumnArray.h>
|
||||
#include <Columns/ColumnTuple.h>
|
||||
#include <Columns/ColumnConst.h>
|
||||
#include <DataTypes/DataTypeArray.h>
|
||||
#include <DataTypes/DataTypeTuple.h>
|
||||
#include <DataTypes/DataTypeCustomGeo.h>
|
||||
|
||||
#include <memory>
|
||||
#include <utility>
|
||||
#include <chrono>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
|
||||
}
|
||||
|
||||
template <typename Point>
|
||||
class FunctionPolygonsIntersection : public IFunction
|
||||
{
|
||||
public:
|
||||
static inline const char * name;
|
||||
|
||||
explicit FunctionPolygonsIntersection() = default;
|
||||
|
||||
static FunctionPtr create(const Context &)
|
||||
{
|
||||
return std::make_shared<FunctionPolygonsIntersection>();
|
||||
}
|
||||
|
||||
String getName() const override
|
||||
{
|
||||
return name;
|
||||
}
|
||||
|
||||
bool isVariadic() const override
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override
|
||||
{
|
||||
return 2;
|
||||
}
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes &) const override
|
||||
{
|
||||
/// Intersection of each with figure with each could be easily represent as MultiPolygon.
|
||||
return DataTypeCustomMultiPolygonSerialization::nestedDataType();
|
||||
}
|
||||
|
||||
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & /*result_type*/, size_t input_rows_count) const override
|
||||
{
|
||||
MultiPolygonSerializer<Point> serializer;
|
||||
|
||||
callOnTwoGeometryDataTypes<Point>(arguments[0].type, arguments[1].type, [&](const auto & left_type, const auto & right_type)
|
||||
{
|
||||
using LeftConverterType = std::decay_t<decltype(left_type)>;
|
||||
using RightConverterType = std::decay_t<decltype(right_type)>;
|
||||
|
||||
using LeftConverter = typename LeftConverterType::Type;
|
||||
using RightConverter = typename RightConverterType::Type;
|
||||
|
||||
if constexpr (std::is_same_v<ColumnToPointsConverter<Point>, LeftConverter> || std::is_same_v<ColumnToPointsConverter<Point>, RightConverter>)
|
||||
throw Exception(fmt::format("Any argument of function {} must not be Point", getName()), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
else
|
||||
{
|
||||
auto first = LeftConverter::convert(arguments[0].column->convertToFullColumnIfConst());
|
||||
auto second = RightConverter::convert(arguments[1].column->convertToFullColumnIfConst());
|
||||
|
||||
/// We are not interested in some pitfalls in third-party libraries
|
||||
/// NOLINTNEXTLINE(clang-analyzer-core.uninitialized.Assign)
|
||||
for (size_t i = 0; i < input_rows_count; ++i)
|
||||
{
|
||||
/// Orient the polygons correctly.
|
||||
boost::geometry::correct(first[i]);
|
||||
boost::geometry::correct(second[i]);
|
||||
|
||||
MultiPolygon<Point> intersection{};
|
||||
/// Main work here.
|
||||
boost::geometry::intersection(first[i], second[i], intersection);
|
||||
|
||||
serializer.add(intersection);
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
return serializer.finalize();
|
||||
}
|
||||
|
||||
bool useDefaultImplementationForConstants() const override
|
||||
{
|
||||
return true;
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
template <>
|
||||
const char * FunctionPolygonsIntersection<CartesianPoint>::name = "polygonsIntersectionCartesian";
|
||||
|
||||
template <>
|
||||
const char * FunctionPolygonsIntersection<SphericalPoint>::name = "polygonsIntersectionSpherical";
|
||||
|
||||
|
||||
void registerFunctionPolygonsIntersection(FunctionFactory & factory)
|
||||
{
|
||||
factory.registerFunction<FunctionPolygonsIntersection<CartesianPoint>>();
|
||||
factory.registerFunction<FunctionPolygonsIntersection<SphericalPoint>>();
|
||||
}
|
||||
|
||||
}
|
116
src/Functions/polygonsSymDifference.cpp
Normal file
116
src/Functions/polygonsSymDifference.cpp
Normal file
@ -0,0 +1,116 @@
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/geometryConverters.h>
|
||||
|
||||
#include <boost/geometry.hpp>
|
||||
#include <boost/geometry/geometries/point_xy.hpp>
|
||||
#include <boost/geometry/geometries/polygon.hpp>
|
||||
|
||||
#include <common/logger_useful.h>
|
||||
|
||||
#include <Columns/ColumnArray.h>
|
||||
#include <Columns/ColumnTuple.h>
|
||||
#include <Columns/ColumnConst.h>
|
||||
#include <DataTypes/DataTypeArray.h>
|
||||
#include <DataTypes/DataTypeTuple.h>
|
||||
#include <DataTypes/DataTypeCustomGeo.h>
|
||||
|
||||
#include <memory>
|
||||
#include <utility>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
|
||||
}
|
||||
|
||||
|
||||
template <typename Point>
|
||||
class FunctionPolygonsSymDifference : public IFunction
|
||||
{
|
||||
public:
|
||||
static const char * name;
|
||||
|
||||
explicit FunctionPolygonsSymDifference() = default;
|
||||
|
||||
static FunctionPtr create(const Context &)
|
||||
{
|
||||
return std::make_shared<FunctionPolygonsSymDifference>();
|
||||
}
|
||||
|
||||
String getName() const override
|
||||
{
|
||||
return name;
|
||||
}
|
||||
|
||||
bool isVariadic() const override
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override
|
||||
{
|
||||
return 2;
|
||||
}
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes &) const override
|
||||
{
|
||||
return DataTypeCustomMultiPolygonSerialization::nestedDataType();
|
||||
}
|
||||
|
||||
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & /*result_type*/, size_t input_rows_count) const override
|
||||
{
|
||||
MultiPolygonSerializer<Point> serializer;
|
||||
|
||||
callOnTwoGeometryDataTypes<Point>(arguments[0].type, arguments[1].type, [&](const auto & left_type, const auto & right_type)
|
||||
{
|
||||
using LeftConverterType = std::decay_t<decltype(left_type)>;
|
||||
using RightConverterType = std::decay_t<decltype(right_type)>;
|
||||
|
||||
using LeftConverter = typename LeftConverterType::Type;
|
||||
using RightConverter = typename RightConverterType::Type;
|
||||
|
||||
if constexpr (std::is_same_v<ColumnToPointsConverter<Point>, LeftConverter> || std::is_same_v<ColumnToPointsConverter<Point>, RightConverter>)
|
||||
throw Exception(fmt::format("Any argument of function {} must not be Point", getName()), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
else
|
||||
{
|
||||
auto first = LeftConverter::convert(arguments[0].column->convertToFullColumnIfConst());
|
||||
auto second = RightConverter::convert(arguments[1].column->convertToFullColumnIfConst());
|
||||
|
||||
/// NOLINTNEXTLINE(clang-analyzer-core.uninitialized.Assign)
|
||||
for (size_t i = 0; i < input_rows_count; i++)
|
||||
{
|
||||
boost::geometry::correct(first[i]);
|
||||
boost::geometry::correct(second[i]);
|
||||
|
||||
MultiPolygon<Point> sym_difference{};
|
||||
boost::geometry::sym_difference(first[i], second[i], sym_difference);
|
||||
|
||||
serializer.add(sym_difference);
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
return serializer.finalize();
|
||||
}
|
||||
|
||||
bool useDefaultImplementationForConstants() const override
|
||||
{
|
||||
return true;
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
const char * FunctionPolygonsSymDifference<CartesianPoint>::name = "polygonsSymDifferenceCartesian";
|
||||
|
||||
template <>
|
||||
const char * FunctionPolygonsSymDifference<SphericalPoint>::name = "polygonsSymDifferenceSpherical";
|
||||
|
||||
void registerFunctionPolygonsSymDifference(FunctionFactory & factory)
|
||||
{
|
||||
factory.registerFunction<FunctionPolygonsSymDifference<CartesianPoint>>();
|
||||
factory.registerFunction<FunctionPolygonsSymDifference<SphericalPoint>>();
|
||||
}
|
||||
|
||||
}
|
120
src/Functions/polygonsUnion.cpp
Normal file
120
src/Functions/polygonsUnion.cpp
Normal file
@ -0,0 +1,120 @@
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/geometryConverters.h>
|
||||
|
||||
#include <boost/geometry.hpp>
|
||||
#include <boost/geometry/geometries/point_xy.hpp>
|
||||
#include <boost/geometry/geometries/polygon.hpp>
|
||||
|
||||
#include <common/logger_useful.h>
|
||||
|
||||
#include <Columns/ColumnArray.h>
|
||||
#include <Columns/ColumnTuple.h>
|
||||
#include <Columns/ColumnConst.h>
|
||||
#include <DataTypes/DataTypeArray.h>
|
||||
#include <DataTypes/DataTypeTuple.h>
|
||||
#include <DataTypes/DataTypeCustomGeo.h>
|
||||
|
||||
#include <memory>
|
||||
#include <string>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
|
||||
}
|
||||
|
||||
|
||||
template <typename Point>
|
||||
class FunctionPolygonsUnion : public IFunction
|
||||
{
|
||||
public:
|
||||
static inline const char * name;
|
||||
|
||||
explicit FunctionPolygonsUnion() = default;
|
||||
|
||||
static FunctionPtr create(const Context &)
|
||||
{
|
||||
return std::make_shared<FunctionPolygonsUnion>();
|
||||
}
|
||||
|
||||
String getName() const override
|
||||
{
|
||||
return name;
|
||||
}
|
||||
|
||||
bool isVariadic() const override
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override
|
||||
{
|
||||
return 2;
|
||||
}
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes &) const override
|
||||
{
|
||||
return DataTypeCustomMultiPolygonSerialization::nestedDataType();
|
||||
}
|
||||
|
||||
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & /*result_type*/, size_t input_rows_count) const override
|
||||
{
|
||||
MultiPolygonSerializer<Point> serializer;
|
||||
|
||||
callOnTwoGeometryDataTypes<Point>(arguments[0].type, arguments[1].type, [&](const auto & left_type, const auto & right_type)
|
||||
{
|
||||
using LeftConverterType = std::decay_t<decltype(left_type)>;
|
||||
using RightConverterType = std::decay_t<decltype(right_type)>;
|
||||
|
||||
using LeftConverter = typename LeftConverterType::Type;
|
||||
using RightConverter = typename RightConverterType::Type;
|
||||
|
||||
if constexpr (std::is_same_v<ColumnToPointsConverter<Point>, LeftConverter> || std::is_same_v<ColumnToPointsConverter<Point>, RightConverter>)
|
||||
throw Exception(fmt::format("Any argument of function {} must not be Point", getName()), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
else
|
||||
{
|
||||
auto first = LeftConverter::convert(arguments[0].column->convertToFullColumnIfConst());
|
||||
auto second = RightConverter::convert(arguments[1].column->convertToFullColumnIfConst());
|
||||
|
||||
/// We are not interested in some pitfalls in third-party libraries
|
||||
/// NOLINTNEXTLINE(clang-analyzer-core.uninitialized.Assign)
|
||||
for (size_t i = 0; i < input_rows_count; i++)
|
||||
{
|
||||
/// Orient the polygons correctly.
|
||||
boost::geometry::correct(first[i]);
|
||||
boost::geometry::correct(second[i]);
|
||||
|
||||
MultiPolygon<Point> polygons_union{};
|
||||
/// Main work here.
|
||||
boost::geometry::union_(first[i], second[i], polygons_union);
|
||||
|
||||
serializer.add(polygons_union);
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
return serializer.finalize();
|
||||
}
|
||||
|
||||
bool useDefaultImplementationForConstants() const override
|
||||
{
|
||||
return true;
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
const char * FunctionPolygonsUnion<CartesianPoint>::name = "polygonsUnionCartesian";
|
||||
|
||||
template <>
|
||||
const char * FunctionPolygonsUnion<SphericalPoint>::name = "polygonsUnionSpherical";
|
||||
|
||||
|
||||
void registerFunctionPolygonsUnion(FunctionFactory & factory)
|
||||
{
|
||||
factory.registerFunction<FunctionPolygonsUnion<CartesianPoint>>();
|
||||
factory.registerFunction<FunctionPolygonsUnion<SphericalPoint>>();
|
||||
}
|
||||
|
||||
}
|
119
src/Functions/polygonsWithin.cpp
Normal file
119
src/Functions/polygonsWithin.cpp
Normal file
@ -0,0 +1,119 @@
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/geometryConverters.h>
|
||||
|
||||
#include <boost/geometry.hpp>
|
||||
#include <boost/geometry/geometries/point_xy.hpp>
|
||||
#include <boost/geometry/geometries/polygon.hpp>
|
||||
|
||||
#include <common/logger_useful.h>
|
||||
|
||||
#include <Columns/ColumnArray.h>
|
||||
#include <Columns/ColumnTuple.h>
|
||||
#include <Columns/ColumnConst.h>
|
||||
#include <Columns/ColumnsNumber.h>
|
||||
#include <DataTypes/DataTypesNumber.h>
|
||||
#include <DataTypes/DataTypeArray.h>
|
||||
#include <DataTypes/DataTypeTuple.h>
|
||||
#include <DataTypes/DataTypeCustomGeo.h>
|
||||
|
||||
#include <memory>
|
||||
#include <utility>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
|
||||
}
|
||||
|
||||
|
||||
template <typename Point>
|
||||
class FunctionPolygonsWithin : public IFunction
|
||||
{
|
||||
public:
|
||||
static inline const char * name;
|
||||
|
||||
explicit FunctionPolygonsWithin() = default;
|
||||
|
||||
static FunctionPtr create(const Context &)
|
||||
{
|
||||
return std::make_shared<FunctionPolygonsWithin>();
|
||||
}
|
||||
|
||||
String getName() const override
|
||||
{
|
||||
return name;
|
||||
}
|
||||
|
||||
bool isVariadic() const override
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override
|
||||
{
|
||||
return 2;
|
||||
}
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes &) const override
|
||||
{
|
||||
return std::make_shared<DataTypeUInt8>();
|
||||
}
|
||||
|
||||
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & /*result_type*/, size_t input_rows_count) const override
|
||||
{
|
||||
auto res_column = ColumnUInt8::create();
|
||||
auto & res_data = res_column->getData();
|
||||
res_data.reserve(input_rows_count);
|
||||
|
||||
callOnTwoGeometryDataTypes<Point>(arguments[0].type, arguments[1].type, [&](const auto & left_type, const auto & right_type)
|
||||
{
|
||||
using LeftConverterType = std::decay_t<decltype(left_type)>;
|
||||
using RightConverterType = std::decay_t<decltype(right_type)>;
|
||||
|
||||
using LeftConverter = typename LeftConverterType::Type;
|
||||
using RightConverter = typename RightConverterType::Type;
|
||||
|
||||
if constexpr (std::is_same_v<ColumnToPointsConverter<Point>, LeftConverter> || std::is_same_v<ColumnToPointsConverter<Point>, RightConverter>)
|
||||
throw Exception(fmt::format("Any argument of function {} must not be Point", getName()), ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
else
|
||||
{
|
||||
auto first = LeftConverter::convert(arguments[0].column->convertToFullColumnIfConst());
|
||||
auto second = RightConverter::convert(arguments[1].column->convertToFullColumnIfConst());
|
||||
|
||||
/// NOLINTNEXTLINE(clang-analyzer-core.uninitialized.Assign)
|
||||
for (size_t i = 0; i < input_rows_count; i++)
|
||||
{
|
||||
boost::geometry::correct(first[i]);
|
||||
boost::geometry::correct(second[i]);
|
||||
|
||||
res_data.emplace_back(boost::geometry::within(first[i], second[i]));
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
return res_column;
|
||||
}
|
||||
|
||||
bool useDefaultImplementationForConstants() const override
|
||||
{
|
||||
return true;
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
template <>
|
||||
const char * FunctionPolygonsWithin<CartesianPoint>::name = "polygonsWithinCartesian";
|
||||
|
||||
template <>
|
||||
const char * FunctionPolygonsWithin<SphericalPoint>::name = "polygonsWithinSpherical";
|
||||
|
||||
|
||||
void registerFunctionPolygonsWithin(FunctionFactory & factory)
|
||||
{
|
||||
factory.registerFunction<FunctionPolygonsWithin<CartesianPoint>>();
|
||||
factory.registerFunction<FunctionPolygonsWithin<SphericalPoint>>();
|
||||
}
|
||||
|
||||
}
|
105
src/Functions/readWkt.cpp
Normal file
105
src/Functions/readWkt.cpp
Normal file
@ -0,0 +1,105 @@
|
||||
#include <DataTypes/DataTypeString.h>
|
||||
#include <DataTypes/DataTypeCustomGeo.h>
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/FunctionHelpers.h>
|
||||
#include <Functions/geometryConverters.h>
|
||||
#include <Columns/ColumnString.h>
|
||||
|
||||
#include <string>
|
||||
#include <memory>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
|
||||
}
|
||||
|
||||
|
||||
template <class DataType, class Geometry, class Serializer, class NameHolder>
|
||||
class FunctionReadWkt : public IFunction
|
||||
{
|
||||
public:
|
||||
explicit FunctionReadWkt() = default;
|
||||
|
||||
static constexpr const char * name = NameHolder::name;
|
||||
|
||||
String getName() const override
|
||||
{
|
||||
return name;
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override
|
||||
{
|
||||
return 1;
|
||||
}
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
if (checkAndGetDataType<DataTypeString>(arguments[0].get()) == nullptr)
|
||||
{
|
||||
throw Exception("First argument should be String",
|
||||
ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
}
|
||||
|
||||
return DataType::nestedDataType();
|
||||
}
|
||||
|
||||
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & /*result_type*/, size_t input_rows_count) const override
|
||||
{
|
||||
const auto * column_string = checkAndGetColumn<ColumnString>(arguments[0].column.get());
|
||||
|
||||
Serializer serializer;
|
||||
Geometry geometry;
|
||||
|
||||
for (size_t i = 0; i < input_rows_count; i++)
|
||||
{
|
||||
const auto & str = column_string->getDataAt(i).toString();
|
||||
boost::geometry::read_wkt(str, geometry);
|
||||
serializer.add(geometry);
|
||||
}
|
||||
|
||||
return serializer.finalize();
|
||||
}
|
||||
|
||||
bool useDefaultImplementationForConstants() const override
|
||||
{
|
||||
return true;
|
||||
}
|
||||
|
||||
static FunctionPtr create(const Context &)
|
||||
{
|
||||
return std::make_shared<FunctionReadWkt<DataType, Geometry, Serializer, NameHolder>>();
|
||||
}
|
||||
};
|
||||
|
||||
struct ReadWktPointNameHolder
|
||||
{
|
||||
static constexpr const char * name = "readWktPoint";
|
||||
};
|
||||
|
||||
struct ReadWktRingNameHolder
|
||||
{
|
||||
static constexpr const char * name = "readWktRing";
|
||||
};
|
||||
|
||||
struct ReadWktPolygonNameHolder
|
||||
{
|
||||
static constexpr const char * name = "readWktPolygon";
|
||||
};
|
||||
|
||||
struct ReadWktMultiPolygonNameHolder
|
||||
{
|
||||
static constexpr const char * name = "readWktMultiPolygon";
|
||||
};
|
||||
|
||||
void registerFunctionReadWkt(FunctionFactory & factory)
|
||||
{
|
||||
factory.registerFunction<FunctionReadWkt<DataTypeCustomPointSerialization, CartesianPoint, PointSerializer<CartesianPoint>, ReadWktPointNameHolder>>();
|
||||
factory.registerFunction<FunctionReadWkt<DataTypeCustomRingSerialization, CartesianRing, RingSerializer<CartesianPoint>, ReadWktRingNameHolder>>();
|
||||
factory.registerFunction<FunctionReadWkt<DataTypeCustomPolygonSerialization, CartesianPolygon, PolygonSerializer<CartesianPoint>, ReadWktPolygonNameHolder>>();
|
||||
factory.registerFunction<FunctionReadWkt<DataTypeCustomMultiPolygonSerialization, CartesianMultiPolygon, MultiPolygonSerializer<CartesianPoint>, ReadWktMultiPolygonNameHolder>>();
|
||||
}
|
||||
|
||||
}
|
@ -10,9 +10,21 @@ class FunctionFactory;
|
||||
void registerFunctionGeoDistance(FunctionFactory & factory);
|
||||
void registerFunctionPointInEllipses(FunctionFactory & factory);
|
||||
void registerFunctionPointInPolygon(FunctionFactory & factory);
|
||||
void registerFunctionPolygonsIntersection(FunctionFactory & factory);
|
||||
void registerFunctionPolygonsUnion(FunctionFactory & factory);
|
||||
void registerFunctionPolygonArea(FunctionFactory & factory);
|
||||
void registerFunctionPolygonConvexHull(FunctionFactory & factory);
|
||||
void registerFunctionPolygonsSymDifference(FunctionFactory & factory);
|
||||
void registerFunctionPolygonsEquals(FunctionFactory & factory);
|
||||
void registerFunctionPolygonsDistance(FunctionFactory & factory);
|
||||
void registerFunctionPolygonsWithin(FunctionFactory & factory);
|
||||
void registerFunctionPolygonPerimeter(FunctionFactory & factory);
|
||||
void registerFunctionGeohashEncode(FunctionFactory & factory);
|
||||
void registerFunctionGeohashDecode(FunctionFactory & factory);
|
||||
void registerFunctionGeohashesInBox(FunctionFactory & factory);
|
||||
void registerFunctionWkt(FunctionFactory & factory);
|
||||
void registerFunctionReadWkt(FunctionFactory & factory);
|
||||
void registerFunctionSvg(FunctionFactory & factory);
|
||||
|
||||
#if USE_H3
|
||||
void registerFunctionGeoToH3(FunctionFactory &);
|
||||
@ -36,9 +48,21 @@ void registerFunctionsGeo(FunctionFactory & factory)
|
||||
registerFunctionGeoDistance(factory);
|
||||
registerFunctionPointInEllipses(factory);
|
||||
registerFunctionPointInPolygon(factory);
|
||||
registerFunctionPolygonsIntersection(factory);
|
||||
registerFunctionPolygonsUnion(factory);
|
||||
registerFunctionPolygonArea(factory);
|
||||
registerFunctionPolygonConvexHull(factory);
|
||||
registerFunctionPolygonsSymDifference(factory);
|
||||
registerFunctionPolygonsEquals(factory);
|
||||
registerFunctionPolygonsDistance(factory);
|
||||
registerFunctionPolygonsWithin(factory);
|
||||
registerFunctionPolygonPerimeter(factory);
|
||||
registerFunctionGeohashEncode(factory);
|
||||
registerFunctionGeohashDecode(factory);
|
||||
registerFunctionGeohashesInBox(factory);
|
||||
registerFunctionWkt(factory);
|
||||
registerFunctionReadWkt(factory);
|
||||
registerFunctionSvg(factory);
|
||||
|
||||
#if USE_H3
|
||||
registerFunctionGeoToH3(factory);
|
||||
|
105
src/Functions/svg.cpp
Normal file
105
src/Functions/svg.cpp
Normal file
@ -0,0 +1,105 @@
|
||||
#include <DataTypes/DataTypeString.h>
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/FunctionHelpers.h>
|
||||
#include <Functions/geometryConverters.h>
|
||||
#include <Columns/ColumnString.h>
|
||||
|
||||
#include <string>
|
||||
#include <memory>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int ILLEGAL_TYPE_OF_ARGUMENT;
|
||||
extern const int TOO_MANY_ARGUMENTS_FOR_FUNCTION;
|
||||
extern const int TOO_FEW_ARGUMENTS_FOR_FUNCTION;
|
||||
}
|
||||
|
||||
class FunctionSvg : public IFunction
|
||||
{
|
||||
public:
|
||||
static inline const char * name = "svg";
|
||||
|
||||
explicit FunctionSvg() = default;
|
||||
|
||||
static FunctionPtr create(const Context &)
|
||||
{
|
||||
return std::make_shared<FunctionSvg>();
|
||||
}
|
||||
|
||||
String getName() const override
|
||||
{
|
||||
return name;
|
||||
}
|
||||
|
||||
bool isVariadic() const override
|
||||
{
|
||||
return true;
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override
|
||||
{
|
||||
return 2;
|
||||
}
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes & arguments) const override
|
||||
{
|
||||
if (arguments.size() > 2)
|
||||
{
|
||||
throw Exception("Too many arguments", ErrorCodes::TOO_MANY_ARGUMENTS_FOR_FUNCTION);
|
||||
}
|
||||
else if (arguments.empty())
|
||||
{
|
||||
throw Exception("Too few arguments", ErrorCodes::TOO_FEW_ARGUMENTS_FOR_FUNCTION);
|
||||
}
|
||||
else if (arguments.size() == 2 && checkAndGetDataType<DataTypeString>(arguments[1].get()) == nullptr)
|
||||
{
|
||||
throw Exception("Second argument should be String", ErrorCodes::ILLEGAL_TYPE_OF_ARGUMENT);
|
||||
}
|
||||
|
||||
return std::make_shared<DataTypeString>();
|
||||
}
|
||||
|
||||
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & /*result_type*/, size_t input_rows_count) const override
|
||||
{
|
||||
auto res_column = ColumnString::create();
|
||||
bool has_style = arguments.size() > 1;
|
||||
ColumnPtr style;
|
||||
if (has_style)
|
||||
style = arguments[1].column;
|
||||
|
||||
callOnGeometryDataType<CartesianPoint>(arguments[0].type, [&] (const auto & type)
|
||||
{
|
||||
using TypeConverter = std::decay_t<decltype(type)>;
|
||||
using Converter = typename TypeConverter::Type;
|
||||
|
||||
auto figures = Converter::convert(arguments[0].column->convertToFullColumnIfConst());
|
||||
|
||||
for (size_t i = 0; i < input_rows_count; i++)
|
||||
{
|
||||
std::stringstream str; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
|
||||
boost::geometry::correct(figures[i]);
|
||||
str << boost::geometry::svg(figures[i], has_style ? style->getDataAt(i).toString() : "");
|
||||
std::string serialized = str.str();
|
||||
res_column->insertData(serialized.c_str(), serialized.size());
|
||||
}
|
||||
}
|
||||
);
|
||||
|
||||
return res_column;
|
||||
}
|
||||
|
||||
bool useDefaultImplementationForConstants() const override
|
||||
{
|
||||
return true;
|
||||
}
|
||||
};
|
||||
|
||||
void registerFunctionSvg(FunctionFactory & factory)
|
||||
{
|
||||
factory.registerFunction<FunctionSvg>();
|
||||
}
|
||||
|
||||
}
|
74
src/Functions/wkt.cpp
Normal file
74
src/Functions/wkt.cpp
Normal file
@ -0,0 +1,74 @@
|
||||
#include <DataTypes/DataTypeString.h>
|
||||
#include <Functions/FunctionFactory.h>
|
||||
#include <Functions/geometryConverters.h>
|
||||
#include <Columns/ColumnString.h>
|
||||
|
||||
#include <string>
|
||||
#include <memory>
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
class FunctionWkt : public IFunction
|
||||
{
|
||||
public:
|
||||
static inline const char * name = "wkt";
|
||||
|
||||
explicit FunctionWkt() = default;
|
||||
|
||||
static FunctionPtr create(const Context &)
|
||||
{
|
||||
return std::make_shared<FunctionWkt>();
|
||||
}
|
||||
|
||||
String getName() const override
|
||||
{
|
||||
return name;
|
||||
}
|
||||
|
||||
size_t getNumberOfArguments() const override
|
||||
{
|
||||
return 1;
|
||||
}
|
||||
|
||||
DataTypePtr getReturnTypeImpl(const DataTypes &) const override
|
||||
{
|
||||
return std::make_shared<DataTypeString>();
|
||||
}
|
||||
|
||||
ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & /*result_type*/, size_t input_rows_count) const override
|
||||
{
|
||||
auto res_column = ColumnString::create();
|
||||
|
||||
callOnGeometryDataType<CartesianPoint>(arguments[0].type, [&] (const auto & type)
|
||||
{
|
||||
using TypeConverter = std::decay_t<decltype(type)>;
|
||||
using Converter = typename TypeConverter::Type;
|
||||
|
||||
auto figures = Converter::convert(arguments[0].column->convertToFullColumnIfConst());
|
||||
|
||||
for (size_t i = 0; i < input_rows_count; i++)
|
||||
{
|
||||
std::stringstream str; // STYLE_CHECK_ALLOW_STD_STRING_STREAM
|
||||
str << boost::geometry::wkt(figures[i]);
|
||||
std::string serialized = str.str();
|
||||
res_column->insertData(serialized.c_str(), serialized.size());
|
||||
}
|
||||
}
|
||||
);
|
||||
|
||||
return res_column;
|
||||
}
|
||||
|
||||
bool useDefaultImplementationForConstants() const override
|
||||
{
|
||||
return true;
|
||||
}
|
||||
};
|
||||
|
||||
void registerFunctionWkt(FunctionFactory & factory)
|
||||
{
|
||||
factory.registerFunction<FunctionWkt>();
|
||||
}
|
||||
|
||||
}
|
@ -377,6 +377,15 @@ SRCS(
|
||||
plus.cpp
|
||||
pointInEllipses.cpp
|
||||
pointInPolygon.cpp
|
||||
polygonArea.cpp
|
||||
polygonConvexHull.cpp
|
||||
polygonPerimeter.cpp
|
||||
polygonsDistance.cpp
|
||||
polygonsEquals.cpp
|
||||
polygonsIntersection.cpp
|
||||
polygonsSymDifference.cpp
|
||||
polygonsUnion.cpp
|
||||
polygonsWithin.cpp
|
||||
position.cpp
|
||||
positionCaseInsensitive.cpp
|
||||
positionCaseInsensitiveUTF8.cpp
|
||||
@ -389,6 +398,7 @@ SRCS(
|
||||
randomPrintableASCII.cpp
|
||||
randomString.cpp
|
||||
randomStringUTF8.cpp
|
||||
readWkt.cpp
|
||||
regexpQuoteMeta.cpp
|
||||
registerFunctions.cpp
|
||||
registerFunctionsArithmetic.cpp
|
||||
@ -447,6 +457,7 @@ SRCS(
|
||||
subtractSeconds.cpp
|
||||
subtractWeeks.cpp
|
||||
subtractYears.cpp
|
||||
svg.cpp
|
||||
tan.cpp
|
||||
tanh.cpp
|
||||
tcpPort.cpp
|
||||
@ -525,6 +536,7 @@ SRCS(
|
||||
visitParamExtractString.cpp
|
||||
visitParamExtractUInt.cpp
|
||||
visitParamHas.cpp
|
||||
wkt.cpp
|
||||
yandexConsistentHash.cpp
|
||||
yesterday.cpp
|
||||
|
||||
|
@ -45,7 +45,7 @@ private:
|
||||
}
|
||||
|
||||
public:
|
||||
WriteBufferFromVector(VectorType & vector_)
|
||||
explicit WriteBufferFromVector(VectorType & vector_)
|
||||
: WriteBuffer(reinterpret_cast<Position>(vector_.data()), vector_.size()), vector(vector_)
|
||||
{
|
||||
if (vector.empty())
|
||||
|
@ -136,7 +136,7 @@ std::optional<size_t> getIdentMembership(const ASTIdentifier & ident, const std:
|
||||
std::optional<size_t> table_pos = IdentifierSemantic::getMembership(ident);
|
||||
if (table_pos)
|
||||
return table_pos;
|
||||
return IdentifierSemantic::chooseTableColumnMatch(ident, tables);
|
||||
return IdentifierSemantic::chooseTableColumnMatch(ident, tables, true);
|
||||
}
|
||||
|
||||
std::optional<size_t> getIdentsMembership(const ASTPtr ast,
|
||||
|
@ -854,6 +854,13 @@ JoinPtr SelectQueryExpressionAnalyzer::makeTableJoin(
|
||||
return join;
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
const ColumnsWithTypeAndName & right_sample_columns = subquery_for_join.sample_block.getColumnsWithTypeAndName();
|
||||
bool need_convert = syntax->analyzed_join->applyJoinKeyConvert(left_sample_columns, right_sample_columns);
|
||||
if (need_convert)
|
||||
subquery_for_join.addJoinActions(std::make_shared<ExpressionActions>(syntax->analyzed_join->rightConvertingActions()));
|
||||
}
|
||||
|
||||
return subquery_for_join.join;
|
||||
}
|
||||
|
@ -1,4 +1,5 @@
|
||||
#include <Interpreters/InterpreterGrantQuery.h>
|
||||
#include <Interpreters/QueryLog.h>
|
||||
#include <Parsers/ASTGrantQuery.h>
|
||||
#include <Parsers/ASTRolesOrUsersSet.h>
|
||||
#include <Interpreters/Context.h>
|
||||
@ -209,4 +210,13 @@ void InterpreterGrantQuery::updateRoleFromQuery(Role & role, const ASTGrantQuery
|
||||
updateFromQueryImpl(role, query, roles_to_grant_or_revoke);
|
||||
}
|
||||
|
||||
void InterpreterGrantQuery::extendQueryLogElemImpl(QueryLogElement & elem, const ASTPtr & /*ast*/, const Context &) const
|
||||
{
|
||||
auto & query = query_ptr->as<ASTGrantQuery &>();
|
||||
if (query.kind == Kind::GRANT)
|
||||
elem.query_kind = "Grant";
|
||||
else if (query.kind == Kind::REVOKE)
|
||||
elem.query_kind = "Revoke";
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -21,6 +21,7 @@ public:
|
||||
|
||||
static void updateUserFromQuery(User & user, const ASTGrantQuery & query);
|
||||
static void updateRoleFromQuery(Role & role, const ASTGrantQuery & query);
|
||||
void extendQueryLogElemImpl(QueryLogElement &, const ASTPtr &, const Context &) const override;
|
||||
|
||||
private:
|
||||
ASTPtr query_ptr;
|
||||
|
@ -754,4 +754,9 @@ AccessRightsElements InterpreterSystemQuery::getRequiredAccessForDDLOnCluster()
|
||||
return required_access;
|
||||
}
|
||||
|
||||
void InterpreterSystemQuery::extendQueryLogElemImpl(QueryLogElement & elem, const ASTPtr & /*ast*/, const Context &) const
|
||||
{
|
||||
elem.query_kind = "System";
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -56,6 +56,8 @@ private:
|
||||
|
||||
AccessRightsElements getRequiredAccessForDDLOnCluster() const;
|
||||
void startStopAction(StorageActionBlockType action_type, bool start);
|
||||
|
||||
void extendQueryLogElemImpl(QueryLogElement &, const ASTPtr &, const Context &) const override;
|
||||
};
|
||||
|
||||
|
||||
|
@ -16,6 +16,12 @@
|
||||
#include <shared_mutex>
|
||||
#include <utility>
|
||||
|
||||
namespace zkutil
|
||||
{
|
||||
class ZooKeeper;
|
||||
using ZooKeeperPtr = std::shared_ptr<ZooKeeper>;
|
||||
}
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
|
@ -11,6 +11,7 @@
|
||||
#include <Storages/MergeTree/ReplicatedFetchList.h>
|
||||
#include <Common/CurrentMetrics.h>
|
||||
#include <Common/NetException.h>
|
||||
#include <IO/createReadBufferFromFileBase.h>
|
||||
#include <ext/scope_guard.h>
|
||||
|
||||
#include <Poco/File.h>
|
||||
@ -36,6 +37,8 @@ namespace ErrorCodes
|
||||
extern const int INSECURE_PATH;
|
||||
extern const int CORRUPTED_DATA;
|
||||
extern const int LOGICAL_ERROR;
|
||||
extern const int S3_ERROR;
|
||||
extern const int INCORRECT_PART_TYPE;
|
||||
}
|
||||
|
||||
namespace DataPartsExchange
|
||||
@ -48,6 +51,7 @@ constexpr auto REPLICATION_PROTOCOL_VERSION_WITH_PARTS_SIZE_AND_TTL_INFOS = 2;
|
||||
constexpr auto REPLICATION_PROTOCOL_VERSION_WITH_PARTS_TYPE = 3;
|
||||
constexpr auto REPLICATION_PROTOCOL_VERSION_WITH_PARTS_DEFAULT_COMPRESSION = 4;
|
||||
constexpr auto REPLICATION_PROTOCOL_VERSION_WITH_PARTS_UUID = 5;
|
||||
constexpr auto REPLICATION_PROTOCOL_VERSION_WITH_PARTS_S3_COPY = 6;
|
||||
|
||||
|
||||
std::string getEndpointId(const std::string & node_id)
|
||||
@ -112,7 +116,7 @@ void Service::processQuery(const HTMLForm & params, ReadBuffer & /*body*/, Write
|
||||
}
|
||||
|
||||
/// We pretend to work as older server version, to be sure that client will correctly process our version
|
||||
response.addCookie({"server_protocol_version", toString(std::min(client_protocol_version, REPLICATION_PROTOCOL_VERSION_WITH_PARTS_UUID))});
|
||||
response.addCookie({"server_protocol_version", toString(std::min(client_protocol_version, REPLICATION_PROTOCOL_VERSION_WITH_PARTS_S3_COPY))});
|
||||
|
||||
++total_sends;
|
||||
SCOPE_EXIT({--total_sends;});
|
||||
@ -148,7 +152,30 @@ void Service::processQuery(const HTMLForm & params, ReadBuffer & /*body*/, Write
|
||||
sendPartFromMemory(part, out);
|
||||
else
|
||||
{
|
||||
sendPartFromDisk(part, out, client_protocol_version);
|
||||
bool try_use_s3_copy = false;
|
||||
|
||||
if (data_settings->allow_s3_zero_copy_replication
|
||||
&& client_protocol_version >= REPLICATION_PROTOCOL_VERSION_WITH_PARTS_S3_COPY)
|
||||
{ /// if source and destination are in the same S3 storage we try to use S3 CopyObject request first
|
||||
int send_s3_metadata = parse<int>(params.get("send_s3_metadata", "0"));
|
||||
if (send_s3_metadata == 1)
|
||||
{
|
||||
auto disk = part->volume->getDisk();
|
||||
if (disk->getType() == DB::DiskType::Type::S3)
|
||||
{
|
||||
try_use_s3_copy = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
if (try_use_s3_copy)
|
||||
{
|
||||
response.addCookie({"send_s3_metadata", "1"});
|
||||
sendPartS3Metadata(part, out);
|
||||
}
|
||||
else
|
||||
{
|
||||
sendPartFromDisk(part, out, client_protocol_version);
|
||||
}
|
||||
}
|
||||
}
|
||||
catch (const NetException &)
|
||||
@ -229,6 +256,55 @@ void Service::sendPartFromDisk(const MergeTreeData::DataPartPtr & part, WriteBuf
|
||||
part->checksums.checkEqual(data_checksums, false);
|
||||
}
|
||||
|
||||
void Service::sendPartS3Metadata(const MergeTreeData::DataPartPtr & part, WriteBuffer & out)
|
||||
{
|
||||
/// We'll take a list of files from the list of checksums.
|
||||
MergeTreeData::DataPart::Checksums checksums = part->checksums;
|
||||
/// Add files that are not in the checksum list.
|
||||
auto file_names_without_checksums = part->getFileNamesWithoutChecksums();
|
||||
for (const auto & file_name : file_names_without_checksums)
|
||||
checksums.files[file_name] = {};
|
||||
|
||||
auto disk = part->volume->getDisk();
|
||||
if (disk->getType() != DB::DiskType::Type::S3)
|
||||
throw Exception("S3 disk is not S3 anymore", ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
part->storage.lockSharedData(*part);
|
||||
|
||||
String part_id = part->getUniqueId();
|
||||
writeStringBinary(part_id, out);
|
||||
|
||||
writeBinary(checksums.files.size(), out);
|
||||
for (const auto & it : checksums.files)
|
||||
{
|
||||
String file_name = it.first;
|
||||
|
||||
String metadata_file = disk->getPath() + part->getFullRelativePath() + file_name;
|
||||
|
||||
Poco::File metadata(metadata_file);
|
||||
|
||||
if (!metadata.exists())
|
||||
throw Exception("S3 metadata '" + file_name + "' is not exists", ErrorCodes::CORRUPTED_DATA);
|
||||
if (!metadata.isFile())
|
||||
throw Exception("S3 metadata '" + file_name + "' is not a file", ErrorCodes::CORRUPTED_DATA);
|
||||
UInt64 file_size = metadata.getSize();
|
||||
|
||||
writeStringBinary(it.first, out);
|
||||
writeBinary(file_size, out);
|
||||
|
||||
auto file_in = createReadBufferFromFileBase(metadata_file, 0, 0, 0, DBMS_DEFAULT_BUFFER_SIZE);
|
||||
HashingWriteBuffer hashing_out(out);
|
||||
copyData(*file_in, hashing_out, blocker.getCounter());
|
||||
if (blocker.isCancelled())
|
||||
throw Exception("Transferring part to replica was cancelled", ErrorCodes::ABORTED);
|
||||
|
||||
if (hashing_out.count() != file_size)
|
||||
throw Exception("Unexpected size of file " + metadata_file, ErrorCodes::BAD_SIZE_OF_FILE_IN_DATA_PART);
|
||||
|
||||
writePODBinary(hashing_out.getHash(), out);
|
||||
}
|
||||
}
|
||||
|
||||
MergeTreeData::DataPartPtr Service::findPart(const String & name)
|
||||
{
|
||||
/// It is important to include PreCommitted and Outdated parts here because remote replicas cannot reliably
|
||||
@ -252,7 +328,10 @@ MergeTreeData::MutableDataPartPtr Fetcher::fetchPart(
|
||||
const String & password,
|
||||
const String & interserver_scheme,
|
||||
bool to_detached,
|
||||
const String & tmp_prefix_)
|
||||
const String & tmp_prefix_,
|
||||
std::optional<CurrentlySubmergingEmergingTagger> * tagger_ptr,
|
||||
bool try_use_s3_copy,
|
||||
const DiskPtr disk_s3)
|
||||
{
|
||||
if (blocker.isCancelled())
|
||||
throw Exception("Fetching of part was cancelled", ErrorCodes::ABORTED);
|
||||
@ -269,10 +348,36 @@ MergeTreeData::MutableDataPartPtr Fetcher::fetchPart(
|
||||
{
|
||||
{"endpoint", getEndpointId(replica_path)},
|
||||
{"part", part_name},
|
||||
{"client_protocol_version", toString(REPLICATION_PROTOCOL_VERSION_WITH_PARTS_UUID)},
|
||||
{"client_protocol_version", toString(REPLICATION_PROTOCOL_VERSION_WITH_PARTS_S3_COPY)},
|
||||
{"compress", "false"}
|
||||
});
|
||||
|
||||
if (try_use_s3_copy && disk_s3 && disk_s3->getType() != DB::DiskType::Type::S3)
|
||||
throw Exception("Try to fetch shared s3 part on non-s3 disk", ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
Disks disks_s3;
|
||||
|
||||
if (!data_settings->allow_s3_zero_copy_replication)
|
||||
try_use_s3_copy = false;
|
||||
|
||||
if (try_use_s3_copy)
|
||||
{
|
||||
if (disk_s3)
|
||||
disks_s3.push_back(disk_s3);
|
||||
else
|
||||
{
|
||||
disks_s3 = data.getDisksByType(DiskType::Type::S3);
|
||||
|
||||
if (disks_s3.empty())
|
||||
try_use_s3_copy = false;
|
||||
}
|
||||
}
|
||||
|
||||
if (try_use_s3_copy)
|
||||
{
|
||||
uri.addQueryParameter("send_s3_metadata", "1");
|
||||
}
|
||||
|
||||
Poco::Net::HTTPBasicCredentials creds{};
|
||||
if (!user.empty())
|
||||
{
|
||||
@ -293,6 +398,44 @@ MergeTreeData::MutableDataPartPtr Fetcher::fetchPart(
|
||||
|
||||
int server_protocol_version = parse<int>(in.getResponseCookie("server_protocol_version", "0"));
|
||||
|
||||
int send_s3 = parse<int>(in.getResponseCookie("send_s3_metadata", "0"));
|
||||
|
||||
if (send_s3 == 1)
|
||||
{
|
||||
if (server_protocol_version < REPLICATION_PROTOCOL_VERSION_WITH_PARTS_S3_COPY)
|
||||
throw Exception("Got 'send_s3_metadata' cookie with old protocol version", ErrorCodes::LOGICAL_ERROR);
|
||||
if (!try_use_s3_copy)
|
||||
throw Exception("Got 'send_s3_metadata' cookie when was not requested", ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
size_t sum_files_size = 0;
|
||||
readBinary(sum_files_size, in);
|
||||
IMergeTreeDataPart::TTLInfos ttl_infos;
|
||||
/// Skip ttl infos, not required for S3 metadata
|
||||
String ttl_infos_string;
|
||||
readBinary(ttl_infos_string, in);
|
||||
String part_type = "Wide";
|
||||
readStringBinary(part_type, in);
|
||||
if (part_type == "InMemory")
|
||||
throw Exception("Got 'send_s3_metadata' cookie for in-memory part", ErrorCodes::INCORRECT_PART_TYPE);
|
||||
|
||||
UUID part_uuid = UUIDHelpers::Nil;
|
||||
if (server_protocol_version >= REPLICATION_PROTOCOL_VERSION_WITH_PARTS_UUID)
|
||||
readUUIDText(part_uuid, in);
|
||||
|
||||
try
|
||||
{
|
||||
return downloadPartToS3(part_name, replica_path, to_detached, tmp_prefix_, std::move(disks_s3), in);
|
||||
}
|
||||
catch (const Exception & e)
|
||||
{
|
||||
if (e.code() != ErrorCodes::S3_ERROR)
|
||||
throw;
|
||||
/// Try again but without S3 copy
|
||||
return fetchPart(metadata_snapshot, part_name, replica_path, host, port, timeouts,
|
||||
user, password, interserver_scheme, to_detached, tmp_prefix_, nullptr, false);
|
||||
}
|
||||
}
|
||||
|
||||
ReservationPtr reservation;
|
||||
size_t sum_files_size = 0;
|
||||
if (server_protocol_version >= REPLICATION_PROTOCOL_VERSION_WITH_PARTS_SIZE)
|
||||
@ -306,10 +449,18 @@ MergeTreeData::MutableDataPartPtr Fetcher::fetchPart(
|
||||
ReadBufferFromString ttl_infos_buffer(ttl_infos_string);
|
||||
assertString("ttl format version: 1\n", ttl_infos_buffer);
|
||||
ttl_infos.read(ttl_infos_buffer);
|
||||
reservation = data.reserveSpacePreferringTTLRules(metadata_snapshot, sum_files_size, ttl_infos, std::time(nullptr), 0, true);
|
||||
reservation
|
||||
= data.balancedReservation(metadata_snapshot, sum_files_size, 0, part_name, part_info, {}, tagger_ptr, &ttl_infos, true);
|
||||
if (!reservation)
|
||||
reservation
|
||||
= data.reserveSpacePreferringTTLRules(metadata_snapshot, sum_files_size, ttl_infos, std::time(nullptr), 0, true);
|
||||
}
|
||||
else
|
||||
reservation = data.reserveSpace(sum_files_size);
|
||||
{
|
||||
reservation = data.balancedReservation(metadata_snapshot, sum_files_size, 0, part_name, part_info, {}, tagger_ptr, nullptr);
|
||||
if (!reservation)
|
||||
reservation = data.reserveSpace(sum_files_size);
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
@ -462,6 +613,100 @@ MergeTreeData::MutableDataPartPtr Fetcher::downloadPartToDisk(
|
||||
return new_data_part;
|
||||
}
|
||||
|
||||
MergeTreeData::MutableDataPartPtr Fetcher::downloadPartToS3(
|
||||
const String & part_name,
|
||||
const String & replica_path,
|
||||
bool to_detached,
|
||||
const String & tmp_prefix_,
|
||||
const Disks & disks_s3,
|
||||
PooledReadWriteBufferFromHTTP & in
|
||||
)
|
||||
{
|
||||
if (disks_s3.empty())
|
||||
throw Exception("No S3 disks anymore", ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
String part_id;
|
||||
readStringBinary(part_id, in);
|
||||
|
||||
DiskPtr disk = disks_s3[0];
|
||||
|
||||
for (const auto & disk_s3 : disks_s3)
|
||||
{
|
||||
if (disk_s3->checkUniqueId(part_id))
|
||||
{
|
||||
disk = disk_s3;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
static const String TMP_PREFIX = "tmp_fetch_";
|
||||
String tmp_prefix = tmp_prefix_.empty() ? TMP_PREFIX : tmp_prefix_;
|
||||
|
||||
String part_relative_path = String(to_detached ? "detached/" : "") + tmp_prefix + part_name;
|
||||
String part_download_path = data.getRelativeDataPath() + part_relative_path + "/";
|
||||
|
||||
if (disk->exists(part_download_path))
|
||||
throw Exception("Directory " + fullPath(disk, part_download_path) + " already exists.", ErrorCodes::DIRECTORY_ALREADY_EXISTS);
|
||||
|
||||
CurrentMetrics::Increment metric_increment{CurrentMetrics::ReplicatedFetch};
|
||||
|
||||
disk->createDirectories(part_download_path);
|
||||
|
||||
size_t files;
|
||||
readBinary(files, in);
|
||||
|
||||
auto volume = std::make_shared<SingleDiskVolume>("volume_" + part_name, disk);
|
||||
MergeTreeData::MutableDataPartPtr new_data_part = data.createPart(part_name, volume, part_relative_path);
|
||||
|
||||
for (size_t i = 0; i < files; ++i)
|
||||
{
|
||||
String file_name;
|
||||
UInt64 file_size;
|
||||
|
||||
readStringBinary(file_name, in);
|
||||
readBinary(file_size, in);
|
||||
|
||||
String data_path = new_data_part->getFullRelativePath() + file_name;
|
||||
String metadata_file = fullPath(disk, data_path);
|
||||
|
||||
{
|
||||
auto file_out = std::make_unique<WriteBufferFromFile>(metadata_file, DBMS_DEFAULT_BUFFER_SIZE, -1, 0666, nullptr, 0);
|
||||
|
||||
HashingWriteBuffer hashing_out(*file_out);
|
||||
|
||||
copyData(in, hashing_out, file_size, blocker.getCounter());
|
||||
|
||||
if (blocker.isCancelled())
|
||||
{
|
||||
/// NOTE The is_cancelled flag also makes sense to check every time you read over the network,
|
||||
/// performing a poll with a not very large timeout.
|
||||
/// And now we check it only between read chunks (in the `copyData` function).
|
||||
disk->removeSharedRecursive(part_download_path, true);
|
||||
throw Exception("Fetching of part was cancelled", ErrorCodes::ABORTED);
|
||||
}
|
||||
|
||||
MergeTreeDataPartChecksum::uint128 expected_hash;
|
||||
readPODBinary(expected_hash, in);
|
||||
|
||||
if (expected_hash != hashing_out.getHash())
|
||||
{
|
||||
throw Exception("Checksum mismatch for file " + metadata_file + " transferred from " + replica_path,
|
||||
ErrorCodes::CHECKSUM_DOESNT_MATCH);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
assertEOF(in);
|
||||
|
||||
new_data_part->is_temp = true;
|
||||
new_data_part->modification_time = time(nullptr);
|
||||
new_data_part->loadColumnsChecksumsIndexes(true, false);
|
||||
|
||||
new_data_part->storage.lockSharedData(*new_data_part);
|
||||
|
||||
return new_data_part;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -9,6 +9,12 @@
|
||||
#include <IO/ReadWriteBufferFromHTTP.h>
|
||||
|
||||
|
||||
namespace zkutil
|
||||
{
|
||||
class ZooKeeper;
|
||||
using ZooKeeperPtr = std::shared_ptr<ZooKeeper>;
|
||||
}
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
@ -32,6 +38,7 @@ private:
|
||||
MergeTreeData::DataPartPtr findPart(const String & name);
|
||||
void sendPartFromMemory(const MergeTreeData::DataPartPtr & part, WriteBuffer & out);
|
||||
void sendPartFromDisk(const MergeTreeData::DataPartPtr & part, WriteBuffer & out, int client_protocol_version);
|
||||
void sendPartS3Metadata(const MergeTreeData::DataPartPtr & part, WriteBuffer & out);
|
||||
|
||||
/// StorageReplicatedMergeTree::shutdown() waits for all parts exchange handlers to finish,
|
||||
/// so Service will never access dangling reference to storage
|
||||
@ -58,7 +65,10 @@ public:
|
||||
const String & password,
|
||||
const String & interserver_scheme,
|
||||
bool to_detached = false,
|
||||
const String & tmp_prefix_ = "");
|
||||
const String & tmp_prefix_ = "",
|
||||
std::optional<CurrentlySubmergingEmergingTagger> * tagger_ptr = nullptr,
|
||||
bool try_use_s3_copy = true,
|
||||
const DiskPtr disk_s3 = nullptr);
|
||||
|
||||
/// You need to stop the data transfer.
|
||||
ActionBlocker blocker;
|
||||
@ -80,6 +90,14 @@ private:
|
||||
ReservationPtr reservation,
|
||||
PooledReadWriteBufferFromHTTP & in);
|
||||
|
||||
MergeTreeData::MutableDataPartPtr downloadPartToS3(
|
||||
const String & part_name,
|
||||
const String & replica_path,
|
||||
bool to_detached,
|
||||
const String & tmp_prefix_,
|
||||
const Disks & disks_s3,
|
||||
PooledReadWriteBufferFromHTTP & in);
|
||||
|
||||
MergeTreeData & data;
|
||||
Poco::Logger * log;
|
||||
};
|
||||
|
@ -9,8 +9,10 @@
|
||||
#include <Storages/MergeTree/MergeTreeData.h>
|
||||
#include <Storages/MergeTree/localBackup.h>
|
||||
#include <Storages/MergeTree/checkDataPart.h>
|
||||
#include <Storages/StorageReplicatedMergeTree.h>
|
||||
#include <Common/StringUtils/StringUtils.h>
|
||||
#include <Common/escapeForFileName.h>
|
||||
#include <Common/ZooKeeper/ZooKeeper.h>
|
||||
#include <Common/CurrentMetrics.h>
|
||||
#include <common/JSON.h>
|
||||
#include <common/logger_useful.h>
|
||||
@ -35,6 +37,7 @@ namespace CurrentMetrics
|
||||
|
||||
namespace DB
|
||||
{
|
||||
|
||||
namespace ErrorCodes
|
||||
{
|
||||
extern const int DIRECTORY_ALREADY_EXISTS;
|
||||
@ -404,7 +407,7 @@ void IMergeTreeDataPart::removeIfNeeded()
|
||||
}
|
||||
}
|
||||
|
||||
remove();
|
||||
remove(false);
|
||||
|
||||
if (state == State::DeleteOnDestroy)
|
||||
{
|
||||
@ -934,7 +937,8 @@ void IMergeTreeDataPart::loadColumns(bool require)
|
||||
{
|
||||
/// We can get list of columns only from columns.txt in compact parts.
|
||||
if (require || part_type == Type::COMPACT)
|
||||
throw Exception("No columns.txt in part " + name, ErrorCodes::NO_FILE_IN_DATA_PART);
|
||||
throw Exception("No columns.txt in part " + name + ", expected path " + path + " on drive " + volume->getDisk()->getName(),
|
||||
ErrorCodes::NO_FILE_IN_DATA_PART);
|
||||
|
||||
/// If there is no file with a list of columns, write it down.
|
||||
for (const NameAndTypePair & column : metadata_snapshot->getColumns().getAllPhysical())
|
||||
@ -1015,10 +1019,12 @@ void IMergeTreeDataPart::renameTo(const String & new_relative_path, bool remove_
|
||||
SyncGuardPtr sync_guard;
|
||||
if (storage.getSettings()->fsync_part_directory)
|
||||
sync_guard = volume->getDisk()->getDirectorySyncGuard(to);
|
||||
|
||||
storage.lockSharedData(*this);
|
||||
}
|
||||
|
||||
|
||||
void IMergeTreeDataPart::remove() const
|
||||
void IMergeTreeDataPart::remove(bool keep_s3) const
|
||||
{
|
||||
if (!isStoredOnDisk())
|
||||
return;
|
||||
@ -1048,7 +1054,7 @@ void IMergeTreeDataPart::remove() const
|
||||
|
||||
try
|
||||
{
|
||||
volume->getDisk()->removeRecursive(to + "/");
|
||||
volume->getDisk()->removeSharedRecursive(to + "/", keep_s3);
|
||||
}
|
||||
catch (...)
|
||||
{
|
||||
@ -1071,7 +1077,7 @@ void IMergeTreeDataPart::remove() const
|
||||
if (checksums.empty())
|
||||
{
|
||||
/// If the part is not completely written, we cannot use fast path by listing files.
|
||||
volume->getDisk()->removeRecursive(to + "/");
|
||||
volume->getDisk()->removeSharedRecursive(to + "/", keep_s3);
|
||||
}
|
||||
else
|
||||
{
|
||||
@ -1084,16 +1090,16 @@ void IMergeTreeDataPart::remove() const
|
||||
# pragma GCC diagnostic ignored "-Wunused-variable"
|
||||
#endif
|
||||
for (const auto & [file, _] : checksums.files)
|
||||
volume->getDisk()->removeFile(to + "/" + file);
|
||||
volume->getDisk()->removeSharedFile(to + "/" + file, keep_s3);
|
||||
#if !__clang__
|
||||
# pragma GCC diagnostic pop
|
||||
#endif
|
||||
|
||||
for (const auto & file : {"checksums.txt", "columns.txt"})
|
||||
volume->getDisk()->removeFile(to + "/" + file);
|
||||
volume->getDisk()->removeSharedFile(to + "/" + file, keep_s3);
|
||||
|
||||
volume->getDisk()->removeFileIfExists(to + "/" + DEFAULT_COMPRESSION_CODEC_FILE_NAME);
|
||||
volume->getDisk()->removeFileIfExists(to + "/" + DELETE_ON_DESTROY_MARKER_FILE_NAME);
|
||||
volume->getDisk()->removeSharedFileIfExists(to + "/" + DEFAULT_COMPRESSION_CODEC_FILE_NAME, keep_s3);
|
||||
volume->getDisk()->removeSharedFileIfExists(to + "/" + DELETE_ON_DESTROY_MARKER_FILE_NAME, keep_s3);
|
||||
|
||||
volume->getDisk()->removeDirectory(to);
|
||||
}
|
||||
@ -1103,7 +1109,7 @@ void IMergeTreeDataPart::remove() const
|
||||
|
||||
LOG_ERROR(storage.log, "Cannot quickly remove directory {} by removing files; fallback to recursive removal. Reason: {}", fullPath(volume->getDisk(), to), getCurrentExceptionMessage(false));
|
||||
|
||||
volume->getDisk()->removeRecursive(to + "/");
|
||||
volume->getDisk()->removeSharedRecursive(to + "/", keep_s3);
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -1168,7 +1174,6 @@ void IMergeTreeDataPart::makeCloneOnDisk(const DiskPtr & disk, const String & di
|
||||
disk->removeRecursive(path_to_clone + relative_path + '/');
|
||||
}
|
||||
disk->createDirectories(path_to_clone);
|
||||
|
||||
volume->getDisk()->copy(getFullRelativePath(), disk, path_to_clone);
|
||||
volume->getDisk()->removeFileIfExists(path_to_clone + '/' + DELETE_ON_DESTROY_MARKER_FILE_NAME);
|
||||
}
|
||||
@ -1305,6 +1310,21 @@ bool IMergeTreeDataPart::checkAllTTLCalculated(const StorageMetadataPtr & metada
|
||||
return true;
|
||||
}
|
||||
|
||||
String IMergeTreeDataPart::getUniqueId() const
|
||||
{
|
||||
String id;
|
||||
|
||||
auto disk = volume->getDisk();
|
||||
|
||||
if (disk->getType() == DB::DiskType::Type::S3)
|
||||
id = disk->getUniqueId(getFullRelativePath() + "checksums.txt");
|
||||
|
||||
if (id.empty())
|
||||
throw Exception("Can't get unique S3 object", ErrorCodes::LOGICAL_ERROR);
|
||||
|
||||
return id;
|
||||
}
|
||||
|
||||
bool isCompactPart(const MergeTreeDataPartPtr & data_part)
|
||||
{
|
||||
return (data_part && data_part->getType() == MergeTreeDataPartType::COMPACT);
|
||||
@ -1321,3 +1341,4 @@ bool isInMemoryPart(const MergeTreeDataPartPtr & data_part)
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user