Merge branch 'master' into alexey-sm-DOCSUP-7099-translate-runningConcurrency

2024-11-21 23:21:59 +00:00 · 2021-03-16 22:02:11 +03:00 · 2021-03-16 22:02:11 +03:00 · 64539452d9
commit 64539452d9
parent 1ae127eb51 848bb59175
1319 changed files with 60462 additions and 28468 deletions
--- a/.gitattributes
+++ b/.gitattributes
@ -1,2 +1,4 @@
 contrib/* linguist-vendored
 *.h linguist-language=C++
+# to avoid frequent conflicts
+tests/queries/0_stateless/arcadia_skip_list.txt text merge=union
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,3 +1,159 @@
+## ClickHouse release 21.3
+
+### ClickHouse release v21.3, 2021-03-12
+
+#### Backward Incompatible Change
+
+* Now it's not allowed to create MergeTree tables in old syntax with table TTL because it's just ignored. Attach of old tables is still possible. [#20282](https://github.com/ClickHouse/ClickHouse/pull/20282) ([alesapin](https://github.com/alesapin)).
+* Now all case-insensitive function names will be rewritten to their canonical representations. This is needed for projection query routing (the upcoming feature). [#20174](https://github.com/ClickHouse/ClickHouse/pull/20174) ([Amos Bird](https://github.com/amosbird)).
+* Fix creation of `TTL` in cases, when its expression is a function and it is the same as `ORDER BY` key. Now it's allowed to set custom aggregation to primary key columns in `TTL` with `GROUP BY`. Backward incompatible: For primary key columns, which are not in `GROUP BY` and aren't set explicitly now is applied function `any` instead of `max`, when TTL is expired. Also if you use TTL with `WHERE` or `GROUP BY` you can see exceptions at merges, while making rolling update. [#15450](https://github.com/ClickHouse/ClickHouse/pull/15450) ([Anton Popov](https://github.com/CurtizJ)).
+
+#### New Feature
+
+* Add file engine settings: `engine_file_empty_if_not_exists` and `engine_file_truncate_on_insert`. [#20620](https://github.com/ClickHouse/ClickHouse/pull/20620) ([M0r64n](https://github.com/M0r64n)).
+* Add aggregate function `deltaSum` for summing the differences between consecutive rows. [#20057](https://github.com/ClickHouse/ClickHouse/pull/20057) ([Russ Frank](https://github.com/rf)).
+* New `event_time_microseconds` column in `system.part_log` table. [#20027](https://github.com/ClickHouse/ClickHouse/pull/20027) ([Bharat Nallan](https://github.com/bharatnc)).
+* Added `timezoneOffset(datetime)` function which will give the offset from UTC in seconds. This close [#issue:19850](https://github.com/ClickHouse/ClickHouse/issues/19850). [#19962](https://github.com/ClickHouse/ClickHouse/pull/19962) ([keenwolf](https://github.com/keen-wolf)).
+* Add setting `insert_shard_id` to support insert data into specific shard from distributed table. [#19961](https://github.com/ClickHouse/ClickHouse/pull/19961) ([flynn](https://github.com/ucasFL)).
+* Function `reinterpretAs` updated to support big integers. Fixes [#19691](https://github.com/ClickHouse/ClickHouse/issues/19691). [#19858](https://github.com/ClickHouse/ClickHouse/pull/19858) ([Maksim Kita](https://github.com/kitaisreal)).
+* Added Server Side Encryption Customer Keys (the `x-amz-server-side-encryption-customer-(key/md5)` header) support in S3 client. See [the link](https://docs.aws.amazon.com/AmazonS3/latest/dev/ServerSideEncryptionCustomerKeys.html). Closes [#19428](https://github.com/ClickHouse/ClickHouse/issues/19428). [#19748](https://github.com/ClickHouse/ClickHouse/pull/19748) ([Vladimir Chebotarev](https://github.com/excitoon)).
+* Added `implicit_key` option for `executable` dictionary source. It allows to avoid printing key for every record if records comes in the same order as the input keys. Implements [#14527](https://github.com/ClickHouse/ClickHouse/issues/14527). [#19677](https://github.com/ClickHouse/ClickHouse/pull/19677) ([Maksim Kita](https://github.com/kitaisreal)).
+* Add quota type `query_selects` and `query_inserts`. [#19603](https://github.com/ClickHouse/ClickHouse/pull/19603) ([JackyWoo](https://github.com/JackyWoo)).
+* Add function `extractTextFromHTML` [#19600](https://github.com/ClickHouse/ClickHouse/pull/19600) ([zlx19950903](https://github.com/zlx19950903)), ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Tables with `MergeTree*` engine now have two new table-level settings for query concurrency control. Setting `max_concurrent_queries` limits the number of concurrently executed queries which are related to this table. Setting `min_marks_to_honor_max_concurrent_queries` tells to apply previous setting only if query reads at least this number of marks. [#19544](https://github.com/ClickHouse/ClickHouse/pull/19544) ([Amos Bird](https://github.com/amosbird)).
+* Added `file` function to read file from user_files directory as a String. This is different from the `file` table function. This implements [#issue:18851](https://github.com/ClickHouse/ClickHouse/issues/18851). [#19204](https://github.com/ClickHouse/ClickHouse/pull/19204) ([keenwolf](https://github.com/keen-wolf)).
+
+#### Experimental feature
+
+* Add experimental `Replicated` database engine. It replicates DDL queries across multiple hosts. [#16193](https://github.com/ClickHouse/ClickHouse/pull/16193) ([tavplubix](https://github.com/tavplubix)).
+* Introduce experimental support for window functions, enabled with `allow_experimental_functions = 1`. This is a preliminary, alpha-quality implementation that is not suitable for production use and will change in backward-incompatible ways in future releases. Please see [the documentation](https://github.com/ClickHouse/ClickHouse/blob/master/docs/en/sql-reference/window-functions/index.md#experimental-window-functions) for the list of supported features. [#20337](https://github.com/ClickHouse/ClickHouse/pull/20337) ([Alexander Kuzmenkov](https://github.com/akuzm)).
+* Add the ability to backup/restore metadata files for DiskS3. [#18377](https://github.com/ClickHouse/ClickHouse/pull/18377) ([Pavel Kovalenko](https://github.com/Jokser)).
+
+#### Performance Improvement
+
+* Hedged requests for remote queries. When setting `use_hedged_requests` enabled (off by default), allow to establish many connections with different replicas for query. New connection is enabled in case existent connection(s) with replica(s) were not established within `hedged_connection_timeout` or no data was received within `receive_data_timeout`. Query uses the first connection which send non empty progress packet (or data packet, if `allow_changing_replica_until_first_data_packet`); other connections are cancelled. Queries with `max_parallel_replicas > 1` are supported. [#19291](https://github.com/ClickHouse/ClickHouse/pull/19291) ([Kruglov Pavel](https://github.com/Avogar)). This allows to significantly reduce tail latencies on very large clusters.
+* Added support for `PREWHERE` (and enable the corresponding optimization) when tables have row-level security expressions specified. [#19576](https://github.com/ClickHouse/ClickHouse/pull/19576) ([Denis Glazachev](https://github.com/traceon)).
+* The setting `distributed_aggregation_memory_efficient` is enabled by default. It will lower memory usage and improve performance of distributed queries. [#20599](https://github.com/ClickHouse/ClickHouse/pull/20599) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Improve performance of GROUP BY multiple fixed size keys. [#20472](https://github.com/ClickHouse/ClickHouse/pull/20472) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Improve performance of aggregate functions by more strict aliasing. [#19946](https://github.com/ClickHouse/ClickHouse/pull/19946) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Speed up reading from `Memory` tables in extreme cases (when reading speed is in order of 50 GB/sec) by simplification of pipeline and (consequently) less lock contention in pipeline scheduling. [#20468](https://github.com/ClickHouse/ClickHouse/pull/20468) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Partially reimplement HTTP server to make it making less copies of incoming and outgoing data. It gives up to 1.5 performance improvement on inserting long records over HTTP. [#19516](https://github.com/ClickHouse/ClickHouse/pull/19516) ([Ivan](https://github.com/abyss7)).
+* Add `compress` setting for `Memory` tables. If it's enabled the table will use less RAM. On some machines and datasets it can also work faster on SELECT, but it is not always the case. This closes [#20093](https://github.com/ClickHouse/ClickHouse/issues/20093). Note: there are reasons why Memory tables can work slower than MergeTree: (1) lack of compression (2) static size of blocks (3) lack of indices and prewhere... [#20168](https://github.com/ClickHouse/ClickHouse/pull/20168) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Slightly better code in aggregation. [#20978](https://github.com/ClickHouse/ClickHouse/pull/20978) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Add back `intDiv`/`modulo` specializations for better performance. This fixes [#21293](https://github.com/ClickHouse/ClickHouse/issues/21293) . The regression was introduced in https://github.com/ClickHouse/ClickHouse/pull/18145 . [#21307](https://github.com/ClickHouse/ClickHouse/pull/21307) ([Amos Bird](https://github.com/amosbird)).
+* Do not squash blocks too much on INSERT SELECT if inserting into Memory table. In previous versions inefficient data representation was created in Memory table after INSERT SELECT. This closes [#13052](https://github.com/ClickHouse/ClickHouse/issues/13052). [#20169](https://github.com/ClickHouse/ClickHouse/pull/20169) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Fix at least one case when DataType parser may have exponential complexity (found by fuzzer). This closes [#20096](https://github.com/ClickHouse/ClickHouse/issues/20096). [#20132](https://github.com/ClickHouse/ClickHouse/pull/20132) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Parallelize SELECT with FINAL for single part with level > 0 when `do_not_merge_across_partitions_select_final` setting is 1. [#19375](https://github.com/ClickHouse/ClickHouse/pull/19375) ([Kruglov Pavel](https://github.com/Avogar)).
+* Fill only requested columns when querying `system.parts` and `system.parts_columns`. Closes [#19570](https://github.com/ClickHouse/ClickHouse/issues/19570). [#21035](https://github.com/ClickHouse/ClickHouse/pull/21035) ([Anmol Arora](https://github.com/anmolarora)).
+* Perform algebraic optimizations of arithmetic expressions inside `avg` aggregate function. close [#20092](https://github.com/ClickHouse/ClickHouse/issues/20092). [#20183](https://github.com/ClickHouse/ClickHouse/pull/20183) ([flynn](https://github.com/ucasFL)).
+
+#### Improvement
+
+* Case-insensitive compression methods for table functions. Also fixed LZMA compression method which was checked in upper case. [#21416](https://github.com/ClickHouse/ClickHouse/pull/21416) ([Vladimir Chebotarev](https://github.com/excitoon)).
+* Add two settings to delay or throw error during insertion when there are too many inactive parts. This is useful when server fails to clean up parts quickly enough. [#20178](https://github.com/ClickHouse/ClickHouse/pull/20178) ([Amos Bird](https://github.com/amosbird)).
+* Provide better compatibility for mysql clients. 1. mysql jdbc 2. mycli. [#21367](https://github.com/ClickHouse/ClickHouse/pull/21367) ([Amos Bird](https://github.com/amosbird)).
+* Forbid to drop a column if it's referenced by materialized view. Closes [#21164](https://github.com/ClickHouse/ClickHouse/issues/21164). [#21303](https://github.com/ClickHouse/ClickHouse/pull/21303) ([flynn](https://github.com/ucasFL)).
+* MySQL dictionary source will now retry unexpected connection failures (Lost connection to MySQL server during query) which sometimes happen on SSL/TLS connections. [#21237](https://github.com/ClickHouse/ClickHouse/pull/21237) ([Alexander Kazakov](https://github.com/Akazz)).
+* Usability improvement: more consistent `DateTime64` parsing: recognize the case when unix timestamp with subsecond resolution is specified as scaled integer (like `1111111111222` instead of `1111111111.222`). This closes [#13194](https://github.com/ClickHouse/ClickHouse/issues/13194). [#21053](https://github.com/ClickHouse/ClickHouse/pull/21053) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Do only merging of sorted blocks on initiator with distributed_group_by_no_merge. [#20882](https://github.com/ClickHouse/ClickHouse/pull/20882) ([Azat Khuzhin](https://github.com/azat)).
+* When loading config for mysql source ClickHouse will now randomize the list of replicas with the same priority to ensure the round-robin logics of picking mysql endpoint. This closes [#20629](https://github.com/ClickHouse/ClickHouse/issues/20629). [#20632](https://github.com/ClickHouse/ClickHouse/pull/20632) ([Alexander Kazakov](https://github.com/Akazz)).
+* Function 'reinterpretAs(x, Type)' renamed into 'reinterpret(x, Type)'. [#20611](https://github.com/ClickHouse/ClickHouse/pull/20611) ([Maksim Kita](https://github.com/kitaisreal)).
+* Support vhost for RabbitMQ engine [#20576](https://github.com/ClickHouse/ClickHouse/issues/20576). [#20596](https://github.com/ClickHouse/ClickHouse/pull/20596) ([Kseniia Sumarokova](https://github.com/kssenii)).
+* Improved serialization for data types combined of Arrays and Tuples. Improved matching enum data types to protobuf enum type. Fixed serialization of the `Map` data type. Omitted values are now set by default. [#20506](https://github.com/ClickHouse/ClickHouse/pull/20506) ([Vitaly Baranov](https://github.com/vitlibar)).
+* Fixed race between execution of distributed DDL tasks and cleanup of DDL queue. Now DDL task cannot be removed from ZooKeeper if there are active workers. Fixes [#20016](https://github.com/ClickHouse/ClickHouse/issues/20016). [#20448](https://github.com/ClickHouse/ClickHouse/pull/20448) ([tavplubix](https://github.com/tavplubix)).
+* Make FQDN and other DNS related functions work correctly in alpine images. [#20336](https://github.com/ClickHouse/ClickHouse/pull/20336) ([filimonov](https://github.com/filimonov)).
+* Do not allow early constant folding of explicitly forbidden functions. [#20303](https://github.com/ClickHouse/ClickHouse/pull/20303) ([Azat Khuzhin](https://github.com/azat)).
+* Implicit conversion from integer to Decimal type might succeeded if integer value doe not fit into Decimal type. Now it throws `ARGUMENT_OUT_OF_BOUND`. [#20232](https://github.com/ClickHouse/ClickHouse/pull/20232) ([tavplubix](https://github.com/tavplubix)).
+* Lockless `SYSTEM FLUSH DISTRIBUTED`. [#20215](https://github.com/ClickHouse/ClickHouse/pull/20215) ([Azat Khuzhin](https://github.com/azat)).
+* Normalize count(constant), sum(1) to count(). This is needed for projection query routing. [#20175](https://github.com/ClickHouse/ClickHouse/pull/20175) ([Amos Bird](https://github.com/amosbird)).
+* Support all native integer types in bitmap functions. [#20171](https://github.com/ClickHouse/ClickHouse/pull/20171) ([Amos Bird](https://github.com/amosbird)).
+* Updated `CacheDictionary`, `ComplexCacheDictionary`, `SSDCacheDictionary`, `SSDComplexKeyDictionary` to use LRUHashMap as underlying index. [#20164](https://github.com/ClickHouse/ClickHouse/pull/20164) ([Maksim Kita](https://github.com/kitaisreal)).
+* The setting `access_management` is now configurable on startup by providing `CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT`, defaults to disabled (`0`) which was the prior value. [#20139](https://github.com/ClickHouse/ClickHouse/pull/20139) ([Marquitos](https://github.com/sonirico)).
+* Fix toDateTime64(toDate()/toDateTime()) for DateTime64 - Implement DateTime64 clamping to match DateTime behaviour. [#20131](https://github.com/ClickHouse/ClickHouse/pull/20131) ([Azat Khuzhin](https://github.com/azat)).
+* Quota improvements: SHOW TABLES is now considered as one query in the quota calculations, not two queries. SYSTEM queries now consume quota. Fix calculation of interval's end in quota consumption. [#20106](https://github.com/ClickHouse/ClickHouse/pull/20106) ([Vitaly Baranov](https://github.com/vitlibar)).
+* Supports `path IN (set)` expressions for `system.zookeeper` table. [#20105](https://github.com/ClickHouse/ClickHouse/pull/20105) ([小路](https://github.com/nicelulu)).
+* Show full details of `MaterializeMySQL` tables in `system.tables`. [#20051](https://github.com/ClickHouse/ClickHouse/pull/20051) ([Stig Bakken](https://github.com/stigsb)).
+* Fix data race in executable dictionary that was possible only on misuse (when the script returns data ignoring its input). [#20045](https://github.com/ClickHouse/ClickHouse/pull/20045) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* The value of MYSQL_OPT_RECONNECT option can now be controlled by "opt_reconnect" parameter in the config section of mysql replica. [#19998](https://github.com/ClickHouse/ClickHouse/pull/19998) ([Alexander Kazakov](https://github.com/Akazz)).
+* If user calls `JSONExtract` function with `Float32` type requested, allow inaccurate conversion to the result type. For example the number `0.1` in JSON is double precision and is not representable in Float32, but the user still wants to get it. Previous versions return 0 for non-Nullable type and NULL for Nullable type to indicate that conversion is imprecise. The logic was 100% correct but it was surprising to users and leading to questions. This closes [#13962](https://github.com/ClickHouse/ClickHouse/issues/13962). [#19960](https://github.com/ClickHouse/ClickHouse/pull/19960) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Add conversion of block structure for INSERT into Distributed tables if it does not match. [#19947](https://github.com/ClickHouse/ClickHouse/pull/19947) ([Azat Khuzhin](https://github.com/azat)).
+* Improvement for the `system.distributed_ddl_queue` table. Initialize MaxDDLEntryID to the last value after restarting. Before this PR, MaxDDLEntryID will remain zero until a new DDLTask is processed. [#19924](https://github.com/ClickHouse/ClickHouse/pull/19924) ([Amos Bird](https://github.com/amosbird)).
+* Show `MaterializeMySQL` tables in `system.parts`. [#19770](https://github.com/ClickHouse/ClickHouse/pull/19770) ([Stig Bakken](https://github.com/stigsb)).
+* Add separate config directive for `Buffer` profile. [#19721](https://github.com/ClickHouse/ClickHouse/pull/19721) ([Azat Khuzhin](https://github.com/azat)).
+* Move conditions that are not related to JOIN to WHERE clause. [#18720](https://github.com/ClickHouse/ClickHouse/issues/18720). [#19685](https://github.com/ClickHouse/ClickHouse/pull/19685) ([hexiaoting](https://github.com/hexiaoting)).
+* Add ability to throttle INSERT into Distributed based on amount of pending bytes for async send (`bytes_to_delay_insert`/`max_delay_to_insert` and `bytes_to_throw_insert` settings for `Distributed` engine has been added). [#19673](https://github.com/ClickHouse/ClickHouse/pull/19673) ([Azat Khuzhin](https://github.com/azat)).
+* Fix some rare cases when write errors can be ignored in destructors. [#19451](https://github.com/ClickHouse/ClickHouse/pull/19451) ([Azat Khuzhin](https://github.com/azat)).
+* Print inline frames in stack traces for fatal errors. [#19317](https://github.com/ClickHouse/ClickHouse/pull/19317) ([Ivan](https://github.com/abyss7)).
+
+#### Bug Fix
+
+* Fix redundant reconnects to ZooKeeper and the possibility of two active sessions for a single clickhouse server. Both problems introduced in #14678. [#21264](https://github.com/ClickHouse/ClickHouse/pull/21264) ([alesapin](https://github.com/alesapin)).
+* Fix error `Bad cast from type ... to DB::ColumnLowCardinality` while inserting into table with `LowCardinality` column from `Values` format. Fixes #21140 [#21357](https://github.com/ClickHouse/ClickHouse/pull/21357) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Fix a deadlock in `ALTER DELETE` mutations for non replicated MergeTree table engines when the predicate contains the table itself. Fixes [#20558](https://github.com/ClickHouse/ClickHouse/issues/20558). [#21477](https://github.com/ClickHouse/ClickHouse/pull/21477) ([alesapin](https://github.com/alesapin)).
+* Fix SIGSEGV for distributed queries on failures. [#21434](https://github.com/ClickHouse/ClickHouse/pull/21434) ([Azat Khuzhin](https://github.com/azat)).
+* Now `ALTER MODIFY COLUMN` queries will correctly affect changes in partition key, skip indices, TTLs, and so on. Fixes [#13675](https://github.com/ClickHouse/ClickHouse/issues/13675). [#21334](https://github.com/ClickHouse/ClickHouse/pull/21334) ([alesapin](https://github.com/alesapin)).
+* Fix bug with `join_use_nulls` and joining `TOTALS` from subqueries. This closes [#19362](https://github.com/ClickHouse/ClickHouse/issues/19362) and [#21137](https://github.com/ClickHouse/ClickHouse/issues/21137). [#21248](https://github.com/ClickHouse/ClickHouse/pull/21248) ([vdimir](https://github.com/vdimir)).
+* Fix crash in `EXPLAIN` for query with `UNION`. Fixes [#20876](https://github.com/ClickHouse/ClickHouse/issues/20876), [#21170](https://github.com/ClickHouse/ClickHouse/issues/21170). [#21246](https://github.com/ClickHouse/ClickHouse/pull/21246) ([flynn](https://github.com/ucasFL)).
+* Now mutations allowed only for table engines that support them (MergeTree family, Memory, MaterializedView). Other engines will report a more clear error. Fixes [#21168](https://github.com/ClickHouse/ClickHouse/issues/21168). [#21183](https://github.com/ClickHouse/ClickHouse/pull/21183) ([alesapin](https://github.com/alesapin)).
+* Fixes [#21112](https://github.com/ClickHouse/ClickHouse/issues/21112). Fixed bug that could cause duplicates with insert query (if one of the callbacks came a little too late). [#21138](https://github.com/ClickHouse/ClickHouse/pull/21138) ([Kseniia Sumarokova](https://github.com/kssenii)).
+* Fix `input_format_null_as_default` take effective when types are nullable. This fixes [#21116](https://github.com/ClickHouse/ClickHouse/issues/21116) . [#21121](https://github.com/ClickHouse/ClickHouse/pull/21121) ([Amos Bird](https://github.com/amosbird)).
+* fix bug related to cast Tuple to Map. Closes [#21029](https://github.com/ClickHouse/ClickHouse/issues/21029). [#21120](https://github.com/ClickHouse/ClickHouse/pull/21120) ([hexiaoting](https://github.com/hexiaoting)).
+* Fix the metadata leak when the Replicated*MergeTree with custom (non default) ZooKeeper cluster is dropped. [#21119](https://github.com/ClickHouse/ClickHouse/pull/21119) ([fastio](https://github.com/fastio)).
+* Fix type mismatch issue when using LowCardinality keys in joinGet. This fixes [#21114](https://github.com/ClickHouse/ClickHouse/issues/21114). [#21117](https://github.com/ClickHouse/ClickHouse/pull/21117) ([Amos Bird](https://github.com/amosbird)).
+* fix default_replica_path and default_replica_name values are useless on Replicated(*)MergeTree engine when the engine needs specify other parameters. [#21060](https://github.com/ClickHouse/ClickHouse/pull/21060) ([mxzlxy](https://github.com/mxzlxy)).
+* Out of bound memory access was possible when formatting specifically crafted out of range value of type `DateTime64`. This closes [#20494](https://github.com/ClickHouse/ClickHouse/issues/20494). This closes [#20543](https://github.com/ClickHouse/ClickHouse/issues/20543). [#21023](https://github.com/ClickHouse/ClickHouse/pull/21023) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Block parallel insertions into storage join. [#21009](https://github.com/ClickHouse/ClickHouse/pull/21009) ([vdimir](https://github.com/vdimir)).
+* Fixed behaviour, when `ALTER MODIFY COLUMN` created mutation, that will knowingly fail. [#21007](https://github.com/ClickHouse/ClickHouse/pull/21007) ([Anton Popov](https://github.com/CurtizJ)).
+* Closes [#9969](https://github.com/ClickHouse/ClickHouse/issues/9969). Fixed Brotli http compression error, which reproduced for large data sizes, slightly complicated structure and with json output format. Update Brotli to the latest version to include the "fix rare access to uninitialized data in ring-buffer". [#20991](https://github.com/ClickHouse/ClickHouse/pull/20991) ([Kseniia Sumarokova](https://github.com/kssenii)).
+* Fix 'Empty task was returned from async task queue' on query cancellation. [#20881](https://github.com/ClickHouse/ClickHouse/pull/20881) ([Azat Khuzhin](https://github.com/azat)).
+* `USE database;` query did not work when using MySQL 5.7 client to connect to ClickHouse server, it's fixed. Fixes [#18926](https://github.com/ClickHouse/ClickHouse/issues/18926). [#20878](https://github.com/ClickHouse/ClickHouse/pull/20878) ([tavplubix](https://github.com/tavplubix)).
+* Fix usage of `-Distinct` combinator with `-State` combinator in aggregate functions. [#20866](https://github.com/ClickHouse/ClickHouse/pull/20866) ([Anton Popov](https://github.com/CurtizJ)).
+* Fix subquery with union distinct and limit clause. close [#20597](https://github.com/ClickHouse/ClickHouse/issues/20597). [#20610](https://github.com/ClickHouse/ClickHouse/pull/20610) ([flynn](https://github.com/ucasFL)).
+* Fixed inconsistent behavior of dictionary in case of queries where we look for absent keys in dictionary. [#20578](https://github.com/ClickHouse/ClickHouse/pull/20578) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
+* Fix the number of threads for scalar subqueries and subqueries for index (after [#19007](https://github.com/ClickHouse/ClickHouse/issues/19007) single thread was always used). Fixes [#20457](https://github.com/ClickHouse/ClickHouse/issues/20457), [#20512](https://github.com/ClickHouse/ClickHouse/issues/20512). [#20550](https://github.com/ClickHouse/ClickHouse/pull/20550) ([Nikolai Kochetov](https://github.com/KochetovNicolai)).
+* Fix crash which could happen if unknown packet was received from remove query (was introduced in [#17868](https://github.com/ClickHouse/ClickHouse/issues/17868)). [#20547](https://github.com/ClickHouse/ClickHouse/pull/20547) ([Azat Khuzhin](https://github.com/azat)).
+* Add proper checks while parsing directory names for async INSERT (fixes SIGSEGV). [#20498](https://github.com/ClickHouse/ClickHouse/pull/20498) ([Azat Khuzhin](https://github.com/azat)).
+* Fix function `transform` does not work properly for floating point keys. Closes [#20460](https://github.com/ClickHouse/ClickHouse/issues/20460). [#20479](https://github.com/ClickHouse/ClickHouse/pull/20479) ([flynn](https://github.com/ucasFL)).
+* Fix infinite loop when propagating WITH aliases to subqueries. This fixes [#20388](https://github.com/ClickHouse/ClickHouse/issues/20388). [#20476](https://github.com/ClickHouse/ClickHouse/pull/20476) ([Amos Bird](https://github.com/amosbird)).
+* Fix abnormal server termination when http client goes away. [#20464](https://github.com/ClickHouse/ClickHouse/pull/20464) ([Azat Khuzhin](https://github.com/azat)).
+* Fix `LOGICAL_ERROR` for `join_use_nulls=1` when JOIN contains const from SELECT. [#20461](https://github.com/ClickHouse/ClickHouse/pull/20461) ([Azat Khuzhin](https://github.com/azat)).
+* Check if table function `view` is used in expression list and throw an error. This fixes [#20342](https://github.com/ClickHouse/ClickHouse/issues/20342). [#20350](https://github.com/ClickHouse/ClickHouse/pull/20350) ([Amos Bird](https://github.com/amosbird)).
+* Avoid invalid dereference in RANGE_HASHED() dictionary. [#20345](https://github.com/ClickHouse/ClickHouse/pull/20345) ([Azat Khuzhin](https://github.com/azat)).
+* Fix null dereference with `join_use_nulls=1`. [#20344](https://github.com/ClickHouse/ClickHouse/pull/20344) ([Azat Khuzhin](https://github.com/azat)).
+* Fix incorrect result of binary operations between two constant decimals of different scale. Fixes [#20283](https://github.com/ClickHouse/ClickHouse/issues/20283). [#20339](https://github.com/ClickHouse/ClickHouse/pull/20339) ([Maksim Kita](https://github.com/kitaisreal)).
+* Fix too often retries of failed background tasks for `ReplicatedMergeTree` table engines family. This could lead to too verbose logging and increased CPU load. Fixes [#20203](https://github.com/ClickHouse/ClickHouse/issues/20203). [#20335](https://github.com/ClickHouse/ClickHouse/pull/20335) ([alesapin](https://github.com/alesapin)).
+* Restrict to `DROP` or `RENAME` version column of `*CollapsingMergeTree` and `ReplacingMergeTree` table engines. [#20300](https://github.com/ClickHouse/ClickHouse/pull/20300) ([alesapin](https://github.com/alesapin)).
+* Fixed the behavior when in case of broken JSON we tried to read the whole file into memory which leads to exception from the allocator. Fixes [#19719](https://github.com/ClickHouse/ClickHouse/issues/19719). [#20286](https://github.com/ClickHouse/ClickHouse/pull/20286) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)).
+* Fix exception during vertical merge for `MergeTree` table engines family which don't allow to perform vertical merges. Fixes [#20259](https://github.com/ClickHouse/ClickHouse/issues/20259). [#20279](https://github.com/ClickHouse/ClickHouse/pull/20279) ([alesapin](https://github.com/alesapin)).
+* Fix rare server crash on config reload during the shutdown. Fixes [#19689](https://github.com/ClickHouse/ClickHouse/issues/19689). [#20224](https://github.com/ClickHouse/ClickHouse/pull/20224) ([alesapin](https://github.com/alesapin)).
+* Fix CTE when using in INSERT SELECT. This fixes [#20187](https://github.com/ClickHouse/ClickHouse/issues/20187), fixes [#20195](https://github.com/ClickHouse/ClickHouse/issues/20195). [#20211](https://github.com/ClickHouse/ClickHouse/pull/20211) ([Amos Bird](https://github.com/amosbird)).
+* Fixes [#19314](https://github.com/ClickHouse/ClickHouse/issues/19314). [#20156](https://github.com/ClickHouse/ClickHouse/pull/20156) ([Ivan](https://github.com/abyss7)).
+* fix toMinute function to handle special timezone correctly. [#20149](https://github.com/ClickHouse/ClickHouse/pull/20149) ([keenwolf](https://github.com/keen-wolf)).
+* Fix server crash after query with `if` function with `Tuple` type of then/else branches result. `Tuple` type must contain `Array` or another complex type. Fixes [#18356](https://github.com/ClickHouse/ClickHouse/issues/18356). [#20133](https://github.com/ClickHouse/ClickHouse/pull/20133) ([alesapin](https://github.com/alesapin)).
+* The `MongoDB` table engine now establishes connection only when it's going to read data. `ATTACH TABLE` won't try to connect anymore. [#20110](https://github.com/ClickHouse/ClickHouse/pull/20110) ([Vitaly Baranov](https://github.com/vitlibar)).
+* Bugfix in StorageJoin. [#20079](https://github.com/ClickHouse/ClickHouse/pull/20079) ([vdimir](https://github.com/vdimir)).
+* Fix the case when calculating modulo of division of negative number by small divisor, the resulting data type was not large enough to accomodate the negative result. This closes [#20052](https://github.com/ClickHouse/ClickHouse/issues/20052). [#20067](https://github.com/ClickHouse/ClickHouse/pull/20067) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* MaterializeMySQL: Fix replication for statements that update several tables. [#20066](https://github.com/ClickHouse/ClickHouse/pull/20066) ([Håvard Kvålen](https://github.com/havardk)).
+* Prevent "Connection refused" in docker during initialization script execution. [#20012](https://github.com/ClickHouse/ClickHouse/pull/20012) ([filimonov](https://github.com/filimonov)).
+* `EmbeddedRocksDB` is an experimental storage. Fix the issue with lack of proper type checking. Simplified code. This closes [#19967](https://github.com/ClickHouse/ClickHouse/issues/19967). [#19972](https://github.com/ClickHouse/ClickHouse/pull/19972) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Fix a segfault in function `fromModifiedJulianDay` when the argument type is `Nullable(T)` for any integral types other than Int32. [#19959](https://github.com/ClickHouse/ClickHouse/pull/19959) ([PHO](https://github.com/depressed-pho)).
+* BloomFilter index crash fix. Fixes [#19757](https://github.com/ClickHouse/ClickHouse/issues/19757). [#19884](https://github.com/ClickHouse/ClickHouse/pull/19884) ([Maksim Kita](https://github.com/kitaisreal)).
+* Deadlock was possible if system.text_log is enabled. This fixes [#19874](https://github.com/ClickHouse/ClickHouse/issues/19874). [#19875](https://github.com/ClickHouse/ClickHouse/pull/19875) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Fix starting the server with tables having default expressions containing dictGet(). Allow getting return type of dictGet() without loading dictionary. [#19805](https://github.com/ClickHouse/ClickHouse/pull/19805) ([Vitaly Baranov](https://github.com/vitlibar)).
+* Fix clickhouse-client abort exception while executing only `select`. [#19790](https://github.com/ClickHouse/ClickHouse/pull/19790) ([taiyang-li](https://github.com/taiyang-li)).
+* Fix a bug that moving pieces to destination table may failed in case of launching multiple clickhouse-copiers. [#19743](https://github.com/ClickHouse/ClickHouse/pull/19743) ([madianjun](https://github.com/mdianjun)).
+* Background thread which executes `ON CLUSTER` queries might hang waiting for dropped replicated table to do something. It's fixed. [#19684](https://github.com/ClickHouse/ClickHouse/pull/19684) ([yiguolei](https://github.com/yiguolei)).
+
+#### Build/Testing/Packaging Improvement
+
+* Allow to build ClickHouse with AVX-2 enabled globally. It gives slight performance benefits on modern CPUs. Not recommended for production and will not be supported as official build for now. [#20180](https://github.com/ClickHouse/ClickHouse/pull/20180) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Fix some of the issues found by Coverity. See [#19964](https://github.com/ClickHouse/ClickHouse/issues/19964). [#20010](https://github.com/ClickHouse/ClickHouse/pull/20010) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Allow to start up with modified binary under gdb. In previous version if you set up breakpoint in gdb before start, server will refuse to start up due to failed integrity check. [#21258](https://github.com/ClickHouse/ClickHouse/pull/21258) ([alexey-milovidov](https://github.com/alexey-milovidov)).
+* Add a test for different compression methods in Kafka. [#21111](https://github.com/ClickHouse/ClickHouse/pull/21111) ([filimonov](https://github.com/filimonov)).
+* Fixed port clash from test_storage_kerberized_hdfs test. [#19974](https://github.com/ClickHouse/ClickHouse/pull/19974) ([Ilya Yatsishin](https://github.com/qoega)).
+* Print `stdout` and `stderr` to log when failed to start docker in integration tests. Before this PR there was a very short error message in this case which didn't help to investigate the problems. [#20631](https://github.com/ClickHouse/ClickHouse/pull/20631) ([Vitaly Baranov](https://github.com/vitlibar)).
+
+
 ## ClickHouse release 21.2

 ### ClickHouse release v21.2.2.8-stable, 2021-02-07
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@ -155,7 +155,6 @@ option(ENABLE_TESTS "Provide unit_test_dbms target with Google.Test unit tests"

 if (OS_LINUX AND NOT UNBUNDLED AND MAKE_STATIC_LIBRARIES AND NOT SPLIT_SHARED_LIBRARIES AND CMAKE_VERSION VERSION_GREATER "3.9.0")
    # Only for Linux, x86_64.
-    # Implies ${ENABLE_FASTMEMCPY}
    option(GLIBC_COMPATIBILITY "Enable compatibility with older glibc libraries." ON)
 elseif(GLIBC_COMPATIBILITY)
    message (${RECONFIGURE_MESSAGE_LEVEL} "Glibc compatibility cannot be enabled in current configuration")
@ -169,7 +168,7 @@ endif ()
 set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -rdynamic")

 if (OS_LINUX)
-    find_program (OBJCOPY_PATH NAMES "llvm-objcopy" "llvm-objcopy-11" "llvm-objcopy-10" "llvm-objcopy-9" "llvm-objcopy-8" "objcopy")
+    find_program (OBJCOPY_PATH NAMES "llvm-objcopy" "llvm-objcopy-12" "llvm-objcopy-11" "llvm-objcopy-10" "llvm-objcopy-9" "llvm-objcopy-8" "objcopy")
    if (OBJCOPY_PATH)
        message(STATUS "Using objcopy: ${OBJCOPY_PATH}.")

@ -241,9 +240,7 @@ else()
    message(STATUS "Disabling compiler -pipe option (have only ${AVAILABLE_PHYSICAL_MEMORY} mb of memory)")
 endif()

-if(NOT DISABLE_CPU_OPTIMIZE)
-    include(cmake/cpu_features.cmake)
-endif()
+include(cmake/cpu_features.cmake)

 option(ARCH_NATIVE "Add -march=native compiler flag")

@ -331,7 +328,7 @@ if (COMPILER_CLANG)
    endif ()

    # Always prefer llvm tools when using clang. For instance, we cannot use GNU ar when llvm LTO is enabled
-    find_program (LLVM_AR_PATH NAMES "llvm-ar" "llvm-ar-11" "llvm-ar-10" "llvm-ar-9" "llvm-ar-8")
+    find_program (LLVM_AR_PATH NAMES "llvm-ar" "llvm-ar-12" "llvm-ar-11" "llvm-ar-10" "llvm-ar-9" "llvm-ar-8")

    if (LLVM_AR_PATH)
        message(STATUS "Using llvm-ar: ${LLVM_AR_PATH}.")
@ -340,7 +337,7 @@ if (COMPILER_CLANG)
        message(WARNING "Cannot find llvm-ar. System ar will be used instead. It does not work with ThinLTO.")
    endif ()

-    find_program (LLVM_RANLIB_PATH NAMES "llvm-ranlib" "llvm-ranlib-11" "llvm-ranlib-10" "llvm-ranlib-9" "llvm-ranlib-8")
+    find_program (LLVM_RANLIB_PATH NAMES "llvm-ranlib" "llvm-ranlib-12" "llvm-ranlib-11" "llvm-ranlib-10" "llvm-ranlib-9" "llvm-ranlib-8")

    if (LLVM_RANLIB_PATH)
        message(STATUS "Using llvm-ranlib: ${LLVM_RANLIB_PATH}.")
@ -536,7 +533,7 @@ macro (add_executable target)
    # explicitly acquire and interpose malloc symbols by clickhouse_malloc
    # if GLIBC_COMPATIBILITY is ON and ENABLE_THINLTO is on than provide memcpy symbol explicitly to neutrialize thinlto's libcall generation.
    if (GLIBC_COMPATIBILITY AND ENABLE_THINLTO)
-        _add_executable (${ARGV} $<TARGET_OBJECTS:clickhouse_malloc> $<TARGET_OBJECTS:clickhouse_memcpy>)
+        _add_executable (${ARGV} $<TARGET_OBJECTS:clickhouse_malloc> $<TARGET_OBJECTS:memcpy>)
    else ()
        _add_executable (${ARGV} $<TARGET_OBJECTS:clickhouse_malloc>)
    endif ()
--- a/base/common/CMakeLists.txt
+++ b/base/common/CMakeLists.txt
@ -74,7 +74,6 @@ target_link_libraries (common
        ${CITYHASH_LIBRARIES}
        boost::headers_only
        boost::system
-        FastMemcpy
        Poco::Net
        Poco::Net::SSL
        Poco::Util
--- a/base/common/defines.h
+++ b/base/common/defines.h
@ -76,6 +76,16 @@
 #    endif
 #endif

+#if !defined(UNDEFINED_BEHAVIOR_SANITIZER)
+#    if defined(__has_feature)
+#        if __has_feature(undefined_behavior_sanitizer)
+#            define UNDEFINED_BEHAVIOR_SANITIZER 1
+#        endif
+#    elif defined(__UNDEFINED_BEHAVIOR_SANITIZER__)
+#        define UNDEFINED_BEHAVIOR_SANITIZER 1
+#    endif
+#endif
+
 #if defined(ADDRESS_SANITIZER)
 #    define BOOST_USE_ASAN 1
 #    define BOOST_USE_UCONTEXT 1
--- a/base/common/tests/CMakeLists.txt
+++ b/base/common/tests/CMakeLists.txt
@ -11,7 +11,7 @@ set(PLATFORM_LIBS ${CMAKE_DL_LIBS})
 target_link_libraries (date_lut2 PRIVATE common ${PLATFORM_LIBS})
 target_link_libraries (date_lut3 PRIVATE common ${PLATFORM_LIBS})
 target_link_libraries (date_lut_default_timezone PRIVATE common ${PLATFORM_LIBS})
-target_link_libraries (local_date_time_comparison PRIVATE common)
+target_link_libraries (local_date_time_comparison PRIVATE common ${PLATFORM_LIBS})
 target_link_libraries (realloc-perf PRIVATE common)
 add_check(local_date_time_comparison)

--- a/base/common/wide_integer_impl.h
+++ b/base/common/wide_integer_impl.h
@ -249,15 +249,15 @@ struct integer<Bits, Signed>::_impl
            return;
        }

-        const T alpha = t / max_int;
+        const T alpha = t / static_cast<T>(max_int);

-        if (alpha <= max_int)
+        if (alpha <= static_cast<T>(max_int))
            self = static_cast<uint64_t>(alpha);
        else // max(double) / 2^64 will surely contain less than 52 precision bits, so speed up computations.
            set_multiplier<double>(self, alpha);

        self *= max_int;
-        self += static_cast<uint64_t>(t - alpha * max_int); // += b_i
+        self += static_cast<uint64_t>(t - alpha * static_cast<T>(max_int)); // += b_i
    }

    constexpr static void wide_integer_from_bultin(integer<Bits, Signed>& self, double rhs) noexcept {
@ -275,7 +275,7 @@ struct integer<Bits, Signed>::_impl
            "On your system long double has less than 64 precision bits,"
            "which may result in UB when initializing double from int64_t");

-        if ((rhs > 0 && rhs < max_int) || (rhs < 0 && rhs > min_int))
+        if ((rhs > 0 && rhs < static_cast<long double>(max_int)) || (rhs < 0 && rhs > static_cast<long double>(min_int)))
        {
            self = static_cast<int64_t>(rhs);
            return;
--- a/base/glibc-compatibility/CMakeLists.txt
+++ b/base/glibc-compatibility/CMakeLists.txt
@ -1,5 +1,8 @@
 if (GLIBC_COMPATIBILITY)
-    set (ENABLE_FASTMEMCPY ON)
+    add_subdirectory(memcpy)
+    if(TARGET memcpy)
+        set(MEMCPY_LIBRARY memcpy)
+    endif()

    enable_language(ASM)
    include(CheckIncludeFile)
@ -27,13 +30,6 @@ if (GLIBC_COMPATIBILITY)
        list(APPEND glibc_compatibility_sources musl/getentropy.c)
    endif()

-    if (NOT ARCH_ARM)
-        # clickhouse_memcpy don't support ARCH_ARM, see https://github.com/ClickHouse/ClickHouse/issues/18951
-        add_library (clickhouse_memcpy OBJECT
-            ${ClickHouse_SOURCE_DIR}/contrib/FastMemcpy/memcpy_wrapper.c
-        )
-    endif()
-
    # Need to omit frame pointers to match the performance of glibc
    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fomit-frame-pointer")

@ -51,15 +47,16 @@ if (GLIBC_COMPATIBILITY)
        target_compile_options(glibc-compatibility PRIVATE -fPIC)
    endif ()

-    target_link_libraries(global-libs INTERFACE glibc-compatibility)
+    target_link_libraries(global-libs INTERFACE glibc-compatibility ${MEMCPY_LIBRARY})

    install(
-        TARGETS glibc-compatibility
+        TARGETS glibc-compatibility ${MEMCPY_LIBRARY}
        EXPORT global
        ARCHIVE DESTINATION lib
    )

    message (STATUS "Some symbols from glibc will be replaced for compatibility")
+
 elseif (YANDEX_OFFICIAL_BUILD)
    message (WARNING "Option GLIBC_COMPATIBILITY must be turned on for production builds.")
 endif ()
--- a/base/glibc-compatibility/memcpy/CMakeLists.txt
+++ b/base/glibc-compatibility/memcpy/CMakeLists.txt
@ -0,0 +1,8 @@
+if (ARCH_AMD64)
+    add_library(memcpy STATIC memcpy.cpp)
+
+    # We allow to include memcpy.h from user code for better inlining.
+    target_include_directories(memcpy PUBLIC $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}>)
+
+    target_compile_options(memcpy PRIVATE -fno-builtin-memcpy)
+endif ()
--- a/base/glibc-compatibility/memcpy/memcpy.cpp
+++ b/base/glibc-compatibility/memcpy/memcpy.cpp
@ -0,0 +1,6 @@
+#include "memcpy.h"
+
+extern "C" void * memcpy(void * __restrict dst, const void * __restrict src, size_t size)
+{
+    return inline_memcpy(dst, src, size);
+}
--- a/base/glibc-compatibility/memcpy/memcpy.h
+++ b/base/glibc-compatibility/memcpy/memcpy.h
@ -0,0 +1,217 @@
+#include <cstddef>
+
+#include <emmintrin.h>
+
+
+/** Custom memcpy implementation for ClickHouse.
+  * It has the following benefits over using glibc's implementation:
+  * 1. Avoiding dependency on specific version of glibc's symbol, like memcpy@@GLIBC_2.14 for portability.
+  * 2. Avoiding indirect call via PLT due to shared linking, that can be less efficient.
+  * 3. It's possible to include this header and call inline_memcpy directly for better inlining or interprocedural analysis.
+  * 4. Better results on our performance tests on current CPUs: up to 25% on some queries and up to 0.7%..1% in average across all queries.
+  *
+  * Writing our own memcpy is extremely difficult for the following reasons:
+  * 1. The optimal variant depends on the specific CPU model.
+  * 2. The optimal variant depends on the distribution of size arguments.
+  * 3. It depends on the number of threads copying data concurrently.
+  * 4. It also depends on how the calling code is using the copied data and how the different memcpy calls are related to each other.
+  * Due to vast range of scenarios it makes proper testing especially difficult.
+  * When writing our own memcpy there is a risk to overoptimize it
+  * on non-representative microbenchmarks while making real-world use cases actually worse.
+  *
+  * Most of the benchmarks for memcpy on the internet are wrong.
+  *
+  * Let's look at the details:
+  *
+  * For small size, the order of branches in code is important.
+  * There are variants with specific order of branches (like here or in glibc)
+  * or with jump table (in asm code see example from Cosmopolitan libc:
+  * https://github.com/jart/cosmopolitan/blob/de09bec215675e9b0beb722df89c6f794da74f3f/libc/nexgen32e/memcpy.S#L61)
+  * or with Duff device in C (see https://github.com/skywind3000/FastMemcpy/)
+  *
+  * It's also important how to copy uneven sizes.
+  * Almost every implementation, including this, is using two overlapping movs.
+  *
+  * It is important to disable -ftree-loop-distribute-patterns when compiling memcpy implementation,
+  * otherwise the compiler can replace internal loops to a call to memcpy that will lead to infinite recursion.
+  *
+  * For larger sizes it's important to choose the instructions used:
+  * - SSE or AVX or AVX-512;
+  * - rep movsb;
+  * Performance will depend on the size threshold, on the CPU model, on the "erms" flag
+  * ("Enhansed Rep MovS" - it indicates that performance of "rep movsb" is decent for large sizes)
+  * https://stackoverflow.com/questions/43343231/enhanced-rep-movsb-for-memcpy
+  *
+  * Using AVX-512 can be bad due to throttling.
+  * Using AVX can be bad if most code is using SSE due to switching penalty
+  * (it also depends on the usage of "vzeroupper" instruction).
+  * But in some cases AVX gives a win.
+  *
+  * It also depends on how many times the loop will be unrolled.
+  * We are unrolling the loop 8 times (by the number of available registers), but it not always the best.
+  *
+  * It also depends on the usage of aligned or unaligned loads/stores.
+  * We are using unaligned loads and aligned stores.
+  *
+  * It also depends on the usage of prefetch instructions. It makes sense on some Intel CPUs but can slow down performance on AMD.
+  * Setting up correct offset for prefetching is non-obvious.
+  *
+  * Non-temporary (cache bypassing) stores can be used for very large sizes (more than a half of L3 cache).
+  * But the exact threshold is unclear - when doing memcpy from multiple threads the optimal threshold can be lower,
+  * because L3 cache is shared (and L2 cache is partially shared).
+  *
+  * Very large size of memcpy typically indicates suboptimal (not cache friendly) algorithms in code or unrealistic scenarios,
+  * so we don't pay attention to using non-temporary stores.
+  *
+  * On recent Intel CPUs, the presence of "erms" makes "rep movsb" the most benefitial,
+  * even comparing to non-temporary aligned unrolled stores even with the most wide registers.
+  *
+  * memcpy can be written in asm, C or C++. The latter can also use inline asm.
+  * The asm implementation can be better to make sure that compiler won't make the code worse,
+  * to ensure the order of branches, the code layout, the usage of all required registers.
+  * But if it is located in separate translation unit, inlining will not be possible
+  * (inline asm can be used to overcome this limitation).
+  * Sometimes C or C++ code can be further optimized by compiler.
+  * For example, clang is capable replacing SSE intrinsics to AVX code if -mavx is used.
+  *
+  * Please note that compiler can replace plain code to memcpy and vice versa.
+  * - memcpy with compile-time known small size is replaced to simple instructions without a call to memcpy;
+  *   it is controlled by -fbuiltin-memcpy and can be manually ensured by calling __builtin_memcpy.
+  *   This is often used to implement unaligned load/store without undefined behaviour in C++.
+  * - a loop with copying bytes can be recognized and replaced by a call to memcpy;
+  *   it is controlled by -ftree-loop-distribute-patterns.
+  * - also note that a loop with copying bytes can be unrolled, peeled and vectorized that will give you
+  *   inline code somewhat similar to a decent implementation of memcpy.
+  *
+  * This description is up to date as of Mar 2021.
+  *
+  * How to test the memcpy implementation for performance:
+  * 1. Test on real production workload.
+  * 2. For synthetic test, see utils/memcpy-bench, but make sure you will do the best to exhaust the wide range of scenarios.
+  *
+  * TODO: Add self-tuning memcpy with bayesian bandits algorithm for large sizes.
+  * See https://habr.com/en/company/yandex/blog/457612/
+  */
+
+
+static inline void * inline_memcpy(void * __restrict dst_, const void * __restrict src_, size_t size)
+{
+    /// We will use pointer arithmetic, so char pointer will be used.
+    /// Note that __restrict makes sense (otherwise compiler will reload data from memory
+    /// instead of using the value of registers due to possible aliasing).
+    char * __restrict dst = reinterpret_cast<char * __restrict>(dst_);
+    const char * __restrict src = reinterpret_cast<const char * __restrict>(src_);
+
+    /// Standard memcpy returns the original value of dst. It is rarely used but we have to do it.
+    /// If you use memcpy with small but non-constant sizes, you can call inline_memcpy directly
+    /// for inlining and removing this single instruction.
+    void * ret = dst;
+
+tail:
+    /// Small sizes and tails after the loop for large sizes.
+    /// The order of branches is important but in fact the optimal order depends on the distribution of sizes in your application.
+    /// This order of branches is from the disassembly of glibc's code.
+    /// We copy chunks of possibly uneven size with two overlapping movs.
+    /// Example: to copy 5 bytes [0, 1, 2, 3, 4] we will copy tail [1, 2, 3, 4] first and then head [0, 1, 2, 3].
+    if (size <= 16)
+    {
+        if (size >= 8)
+        {
+            /// Chunks of 8..16 bytes.
+            __builtin_memcpy(dst + size - 8, src + size - 8, 8);
+            __builtin_memcpy(dst, src, 8);
+        }
+        else if (size >= 4)
+        {
+            /// Chunks of 4..7 bytes.
+            __builtin_memcpy(dst + size - 4, src + size - 4, 4);
+            __builtin_memcpy(dst, src, 4);
+        }
+        else if (size >= 2)
+        {
+            /// Chunks of 2..3 bytes.
+            __builtin_memcpy(dst + size - 2, src + size - 2, 2);
+            __builtin_memcpy(dst, src, 2);
+        }
+        else if (size >= 1)
+        {
+            /// A single byte.
+            *dst = *src;
+        }
+        /// No bytes remaining.
+    }
+    else
+    {
+        /// Medium and large sizes.
+        if (size <= 128)
+        {
+            /// Medium size, not enough for full loop unrolling.
+
+            /// We will copy the last 16 bytes.
+            _mm_storeu_si128(reinterpret_cast<__m128i *>(dst + size - 16), _mm_loadu_si128(reinterpret_cast<const __m128i *>(src + size - 16)));
+
+            /// Then we will copy every 16 bytes from the beginning in a loop.
+            /// The last loop iteration will possibly overwrite some part of already copied last 16 bytes.
+            /// This is Ok, similar to the code for small sizes above.
+            while (size > 16)
+            {
+                _mm_storeu_si128(reinterpret_cast<__m128i *>(dst), _mm_loadu_si128(reinterpret_cast<const __m128i *>(src)));
+                dst += 16;
+                src += 16;
+                size -= 16;
+            }
+        }
+        else
+        {
+            /// Large size with fully unrolled loop.
+
+            /// Align destination to 16 bytes boundary.
+            size_t padding = (16 - (reinterpret_cast<size_t>(dst) & 15)) & 15;
+
+            /// If not aligned - we will copy first 16 bytes with unaligned stores.
+            if (padding > 0)
+            {
+                __m128i head = _mm_loadu_si128(reinterpret_cast<const __m128i*>(src));
+                _mm_storeu_si128(reinterpret_cast<__m128i*>(dst), head);
+                dst += padding;
+                src += padding;
+                size -= padding;
+            }
+
+            /// Aligned unrolled copy. We will use half of available SSE registers.
+            /// It's not possible to have both src and dst aligned.
+            /// So, we will use aligned stores and unaligned loads.
+            __m128i c0, c1, c2, c3, c4, c5, c6, c7;
+
+            while (size >= 128)
+            {
+                c0 = _mm_loadu_si128(reinterpret_cast<const __m128i*>(src) + 0);
+                c1 = _mm_loadu_si128(reinterpret_cast<const __m128i*>(src) + 1);
+                c2 = _mm_loadu_si128(reinterpret_cast<const __m128i*>(src) + 2);
+                c3 = _mm_loadu_si128(reinterpret_cast<const __m128i*>(src) + 3);
+                c4 = _mm_loadu_si128(reinterpret_cast<const __m128i*>(src) + 4);
+                c5 = _mm_loadu_si128(reinterpret_cast<const __m128i*>(src) + 5);
+                c6 = _mm_loadu_si128(reinterpret_cast<const __m128i*>(src) + 6);
+                c7 = _mm_loadu_si128(reinterpret_cast<const __m128i*>(src) + 7);
+                src += 128;
+                _mm_store_si128((reinterpret_cast<__m128i*>(dst) + 0), c0);
+                _mm_store_si128((reinterpret_cast<__m128i*>(dst) + 1), c1);
+                _mm_store_si128((reinterpret_cast<__m128i*>(dst) + 2), c2);
+                _mm_store_si128((reinterpret_cast<__m128i*>(dst) + 3), c3);
+                _mm_store_si128((reinterpret_cast<__m128i*>(dst) + 4), c4);
+                _mm_store_si128((reinterpret_cast<__m128i*>(dst) + 5), c5);
+                _mm_store_si128((reinterpret_cast<__m128i*>(dst) + 6), c6);
+                _mm_store_si128((reinterpret_cast<__m128i*>(dst) + 7), c7);
+                dst += 128;
+
+                size -= 128;
+            }
+
+            /// The latest remaining 0..127 bytes will be processed as usual.
+            goto tail;
+        }
+    }
+
+    return ret;
+}
+
--- a/base/mysqlxx/Pool.cpp
+++ b/base/mysqlxx/Pool.cpp
@ -174,9 +174,11 @@ Pool::Entry Pool::tryGet()
        /// Fixme: There is a race condition here b/c we do not synchronize with Pool::Entry's copy-assignment operator
        if (connection_ptr->ref_count == 0)
        {
-            Entry res(connection_ptr, this);
-            if (res.tryForceConnected())  /// Tries to reestablish connection as well
-                return res;
+            {
+                Entry res(connection_ptr, this);
+                if (res.tryForceConnected())  /// Tries to reestablish connection as well
+                    return res;
+            }

            logger.debug("(%s): Idle connection to MySQL server cannot be recovered, dropping it.", getDescription());

--- a/base/readpassphrase/CMakeLists.txt
+++ b/base/readpassphrase/CMakeLists.txt
@ -4,5 +4,5 @@
 add_library(readpassphrase readpassphrase.c)

 set_target_properties(readpassphrase PROPERTIES LINKER_LANGUAGE C)
-target_compile_options(readpassphrase PRIVATE -Wno-unused-result -Wno-reserved-id-macro)
+target_compile_options(readpassphrase PRIVATE -Wno-unused-result -Wno-reserved-id-macro -Wno-disabled-macro-expansion)
 target_include_directories(readpassphrase PUBLIC .)
--- a/base/readpassphrase/readpassphrase.c
+++ b/base/readpassphrase/readpassphrase.c
@ -94,7 +94,7 @@ restart:
    if (input != STDIN_FILENO && tcgetattr(input, &oterm) == 0) {
        memcpy(&term, &oterm, sizeof(term));
        if (!(flags & RPP_ECHO_ON))
-            term.c_lflag &= ~(ECHO | ECHONL);
+            term.c_lflag &= ~((unsigned int) (ECHO | ECHONL));
 #ifdef VSTATUS
        if (term.c_cc[VSTATUS] != _POSIX_VDISABLE)
            term.c_cc[VSTATUS] = _POSIX_VDISABLE;
--- a/cmake/autogenerated_versions.txt
+++ b/cmake/autogenerated_versions.txt
@ -1,9 +1,9 @@
 # This strings autochanged from release_lib.sh:
-SET(VERSION_REVISION 54448)
+SET(VERSION_REVISION 54449)
 SET(VERSION_MAJOR 21)
-SET(VERSION_MINOR 3)
+SET(VERSION_MINOR 4)
 SET(VERSION_PATCH 1)
-SET(VERSION_GITHASH ef72ba7349f230321750c13ee63b49a11a7c0adc)
-SET(VERSION_DESCRIBE v21.3.1.1-prestable)
-SET(VERSION_STRING 21.3.1.1)
+SET(VERSION_GITHASH af2135ef9dc72f16fa4f229b731262c3f0a8bbdc)
+SET(VERSION_DESCRIBE v21.4.1.1-prestable)
+SET(VERSION_STRING 21.4.1.1)
 # end of autochange
--- a/cmake/find/krb5.cmake
+++ b/cmake/find/krb5.cmake
@ -5,8 +5,8 @@ if (NOT EXISTS "${ClickHouse_SOURCE_DIR}/contrib/krb5/README")
    set (ENABLE_KRB5 0)
 endif ()

-if (NOT CMAKE_SYSTEM_NAME MATCHES "Linux")
-    message (WARNING "krb5 disabled in non-Linux environments")
+if (NOT CMAKE_SYSTEM_NAME MATCHES "Linux" AND NOT (CMAKE_SYSTEM_NAME MATCHES "Darwin" AND NOT CMAKE_CROSSCOMPILING))
+    message (WARNING "krb5 disabled in non-Linux and non-native-Darwin environments")
    set (ENABLE_KRB5 0)
 endif ()

--- a/cmake/tools.cmake
+++ b/cmake/tools.cmake
@ -75,8 +75,13 @@ if (OS_LINUX AND NOT LINKER_NAME)
 endif ()

 if (LINKER_NAME)
-    set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -fuse-ld=${LINKER_NAME}")
-    set (CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -fuse-ld=${LINKER_NAME}")
+    if (COMPILER_CLANG AND (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 12.0.0 OR CMAKE_CXX_COMPILER_VERSION VERSION_EQUAL 12.0.0))
+        set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} --ld-path=${LINKER_NAME}")
+        set (CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} --ld-path=${LINKER_NAME}")
+    else ()
+        set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -fuse-ld=${LINKER_NAME}")
+        set (CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -fuse-ld=${LINKER_NAME}")
+    endif ()

    message(STATUS "Using custom linker by name: ${LINKER_NAME}")
 endif ()
--- a/contrib/CMakeLists.txt
+++ b/contrib/CMakeLists.txt
@ -32,12 +32,12 @@ endif()

 set_property(DIRECTORY PROPERTY EXCLUDE_FROM_ALL 1)

+add_subdirectory (abseil-cpp-cmake)
 add_subdirectory (antlr4-runtime-cmake)
 add_subdirectory (boost-cmake)
 add_subdirectory (cctz-cmake)
 add_subdirectory (consistent-hashing)
 add_subdirectory (dragonbox-cmake)
-add_subdirectory (FastMemcpy)
 add_subdirectory (hyperscan-cmake)
 add_subdirectory (jemalloc-cmake)
 add_subdirectory (libcpuid-cmake)
--- a/contrib/FastMemcpy/CMakeLists.txt
+++ b/contrib/FastMemcpy/CMakeLists.txt
@ -1,28 +0,0 @@
-option (ENABLE_FASTMEMCPY "Enable FastMemcpy library (only internal)" ${ENABLE_LIBRARIES})
-
-if (NOT OS_LINUX OR ARCH_AARCH64)
-    set (ENABLE_FASTMEMCPY OFF)
-endif ()
-
-if (ENABLE_FASTMEMCPY)
-    set (LIBRARY_DIR ${ClickHouse_SOURCE_DIR}/contrib/FastMemcpy)
-
-    set (SRCS
-        ${LIBRARY_DIR}/FastMemcpy.c
-
-        memcpy_wrapper.c
-    )
-
-    add_library (FastMemcpy ${SRCS})
-    target_include_directories (FastMemcpy PUBLIC ${LIBRARY_DIR})
-
-    target_compile_definitions(FastMemcpy PUBLIC USE_FASTMEMCPY=1)
-
-    message (STATUS "Using FastMemcpy")
-else ()
-    add_library (FastMemcpy INTERFACE)
-
-    target_compile_definitions(FastMemcpy INTERFACE USE_FASTMEMCPY=0)
-
-    message (STATUS "Not using FastMemcpy")
-endif ()
--- a/contrib/FastMemcpy/FastMemcpy.c
+++ b/contrib/FastMemcpy/FastMemcpy.c
@ -1,220 +0,0 @@
-//=====================================================================
-//
-// FastMemcpy.c - skywind3000@163.com, 2015
-//
-// feature:
-// 50% speed up in avg. vs standard memcpy (tested in vc2012/gcc4.9)
-//
-//=====================================================================
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <time.h>
-
-#if (defined(_WIN32) || defined(WIN32))
-#include <windows.h>
-#include <mmsystem.h>
-#ifdef _MSC_VER
-#pragma comment(lib, "winmm.lib")
-#endif
-#elif defined(__unix)
-#include <sys/time.h>
-#include <unistd.h>
-#else
-#error it can only be compiled under windows or unix
-#endif
-
-#include "FastMemcpy.h"
-
-unsigned int gettime()
-{
-	#if (defined(_WIN32) || defined(WIN32))
-	return timeGetTime();
-	#else
-	static struct timezone tz={ 0,0 };
-	struct timeval time;
-	gettimeofday(&time,&tz);
-	return (time.tv_sec * 1000 + time.tv_usec / 1000);
-	#endif
-}
-
-void sleepms(unsigned int millisec)
-{
-#if defined(_WIN32) || defined(WIN32)
-	Sleep(millisec);
-#else
-	usleep(millisec * 1000);
-#endif
-}
-
-
-void benchmark(int dstalign, int srcalign, size_t size, int times)
-{
-	char *DATA1 = (char*)malloc(size + 64);
-	char *DATA2 = (char*)malloc(size + 64);
-	size_t LINEAR1 = ((size_t)DATA1);
-	size_t LINEAR2 = ((size_t)DATA2);
-	char *ALIGN1 = (char*)(((64 - (LINEAR1 & 63)) & 63) + LINEAR1);
-	char *ALIGN2 = (char*)(((64 - (LINEAR2 & 63)) & 63) + LINEAR2);
-	char *dst = (dstalign)? ALIGN1 : (ALIGN1 + 1);
-	char *src = (srcalign)? ALIGN2 : (ALIGN2 + 3);
-	unsigned int t1, t2;
-	int k;
-	
-	sleepms(100);
-	t1 = gettime();
-	for (k = times; k > 0; k--) {
-		memcpy(dst, src, size);
-	}
-	t1 = gettime() - t1;
-	sleepms(100);
-	t2 = gettime();
-	for (k = times; k > 0; k--) {
-		memcpy_fast(dst, src, size);
-	}
-	t2 = gettime() - t2;
-
-	free(DATA1);
-	free(DATA2);
-
-	printf("result(dst %s, src %s): memcpy_fast=%dms memcpy=%d ms\n",  
-		dstalign? "aligned" : "unalign", 
-		srcalign? "aligned" : "unalign", (int)t2, (int)t1);
-}
-
-
-void bench(int copysize, int times)
-{
-	printf("benchmark(size=%d bytes, times=%d):\n", copysize, times);
-	benchmark(1, 1, copysize, times);
-	benchmark(1, 0, copysize, times);
-	benchmark(0, 1, copysize, times);
-	benchmark(0, 0, copysize, times);
-	printf("\n");
-}
-
-
-void random_bench(int maxsize, int times)
-{
-	static char A[11 * 1024 * 1024 + 2];
-	static char B[11 * 1024 * 1024 + 2];
-	static int random_offsets[0x10000];
-	static int random_sizes[0x8000];
-	unsigned int i, p1, p2;
-	unsigned int t1, t2;
-	for (i = 0; i < 0x10000; i++) {	// generate random offsets
-		random_offsets[i] = rand() % (10 * 1024 * 1024 + 1);
-	}
-	for (i = 0; i < 0x8000; i++) {	// generate random sizes
-		random_sizes[i] = 1 + rand() % maxsize;
-	}
-	sleepms(100);
-	t1 = gettime();
-	for (p1 = 0, p2 = 0, i = 0; i < times; i++) {
-		int offset1 = random_offsets[(p1++) & 0xffff];
-		int offset2 = random_offsets[(p1++) & 0xffff];
-		int size = random_sizes[(p2++) & 0x7fff];
-		memcpy(A + offset1, B + offset2, size);
-	}
-	t1 = gettime() - t1;
-	sleepms(100);
-	t2 = gettime();
-	for (p1 = 0, p2 = 0, i = 0; i < times; i++) {
-		int offset1 = random_offsets[(p1++) & 0xffff];
-		int offset2 = random_offsets[(p1++) & 0xffff];
-		int size = random_sizes[(p2++) & 0x7fff];
-		memcpy_fast(A + offset1, B + offset2, size);
-	}
-	t2 = gettime() - t2;
-	printf("benchmark random access:\n");
-	printf("memcpy_fast=%dms memcpy=%dms\n\n", (int)t2, (int)t1);
-}
-
-
-#ifdef _MSC_VER
-#pragma comment(lib, "winmm.lib")
-#endif
-
-int main(void)
-{
-	bench(32, 0x1000000);
-	bench(64, 0x1000000);
-	bench(512, 0x800000);
-	bench(1024, 0x400000);
-	bench(4096, 0x80000);
-	bench(8192, 0x40000);
-	bench(1024 * 1024 * 1, 0x800);
-	bench(1024 * 1024 * 4, 0x200);
-	bench(1024 * 1024 * 8, 0x100);
-	
-	random_bench(2048, 8000000);
-
-	return 0;
-}
-
-
-
-
-/*
-benchmark(size=32 bytes, times=16777216):
-result(dst aligned, src aligned): memcpy_fast=78ms memcpy=260 ms
-result(dst aligned, src unalign): memcpy_fast=78ms memcpy=250 ms
-result(dst unalign, src aligned): memcpy_fast=78ms memcpy=266 ms
-result(dst unalign, src unalign): memcpy_fast=78ms memcpy=234 ms
-
-benchmark(size=64 bytes, times=16777216):
-result(dst aligned, src aligned): memcpy_fast=109ms memcpy=281 ms
-result(dst aligned, src unalign): memcpy_fast=109ms memcpy=328 ms
-result(dst unalign, src aligned): memcpy_fast=109ms memcpy=343 ms
-result(dst unalign, src unalign): memcpy_fast=93ms memcpy=344 ms
-
-benchmark(size=512 bytes, times=8388608):
-result(dst aligned, src aligned): memcpy_fast=125ms memcpy=218 ms
-result(dst aligned, src unalign): memcpy_fast=156ms memcpy=484 ms
-result(dst unalign, src aligned): memcpy_fast=172ms memcpy=546 ms
-result(dst unalign, src unalign): memcpy_fast=172ms memcpy=515 ms
-
-benchmark(size=1024 bytes, times=4194304):
-result(dst aligned, src aligned): memcpy_fast=109ms memcpy=172 ms
-result(dst aligned, src unalign): memcpy_fast=187ms memcpy=453 ms
-result(dst unalign, src aligned): memcpy_fast=172ms memcpy=437 ms
-result(dst unalign, src unalign): memcpy_fast=156ms memcpy=452 ms
-
-benchmark(size=4096 bytes, times=524288):
-result(dst aligned, src aligned): memcpy_fast=62ms memcpy=78 ms
-result(dst aligned, src unalign): memcpy_fast=109ms memcpy=202 ms
-result(dst unalign, src aligned): memcpy_fast=94ms memcpy=203 ms
-result(dst unalign, src unalign): memcpy_fast=110ms memcpy=218 ms
-
-benchmark(size=8192 bytes, times=262144):
-result(dst aligned, src aligned): memcpy_fast=62ms memcpy=78 ms
-result(dst aligned, src unalign): memcpy_fast=78ms memcpy=202 ms
-result(dst unalign, src aligned): memcpy_fast=78ms memcpy=203 ms
-result(dst unalign, src unalign): memcpy_fast=94ms memcpy=203 ms
-
-benchmark(size=1048576 bytes, times=2048):
-result(dst aligned, src aligned): memcpy_fast=203ms memcpy=191 ms
-result(dst aligned, src unalign): memcpy_fast=219ms memcpy=281 ms
-result(dst unalign, src aligned): memcpy_fast=218ms memcpy=328 ms
-result(dst unalign, src unalign): memcpy_fast=218ms memcpy=312 ms
-
-benchmark(size=4194304 bytes, times=512):
-result(dst aligned, src aligned): memcpy_fast=312ms memcpy=406 ms
-result(dst aligned, src unalign): memcpy_fast=296ms memcpy=421 ms
-result(dst unalign, src aligned): memcpy_fast=312ms memcpy=468 ms
-result(dst unalign, src unalign): memcpy_fast=297ms memcpy=452 ms
-
-benchmark(size=8388608 bytes, times=256):
-result(dst aligned, src aligned): memcpy_fast=281ms memcpy=452 ms
-result(dst aligned, src unalign): memcpy_fast=280ms memcpy=468 ms
-result(dst unalign, src aligned): memcpy_fast=298ms memcpy=514 ms
-result(dst unalign, src unalign): memcpy_fast=344ms memcpy=472 ms
-
-benchmark random access:
-memcpy_fast=515ms memcpy=1014ms
-
-*/
-
-
-
-
--- a/contrib/FastMemcpy/FastMemcpy.h
+++ b/contrib/FastMemcpy/FastMemcpy.h
@ -1,694 +0,0 @@
-//=====================================================================
-//
-// FastMemcpy.c - skywind3000@163.com, 2015
-//
-// feature:
-// 50% speed up in avg. vs standard memcpy (tested in vc2012/gcc5.1)
-//
-//=====================================================================
-#ifndef __FAST_MEMCPY_H__
-#define __FAST_MEMCPY_H__
-
-#include <stddef.h>
-#include <stdint.h>
-#include <emmintrin.h>
-
-
-//---------------------------------------------------------------------
-// force inline for compilers
-//---------------------------------------------------------------------
-#ifndef INLINE
-#ifdef __GNUC__
-#if (__GNUC__ > 3) || ((__GNUC__ == 3) && (__GNUC_MINOR__ >= 1))
-    #define INLINE         __inline__ __attribute__((always_inline))
-#else
-    #define INLINE         __inline__
-#endif
-#elif defined(_MSC_VER)
-	#define INLINE __forceinline
-#elif (defined(__BORLANDC__) || defined(__WATCOMC__))
-    #define INLINE __inline
-#else
-    #define INLINE
-#endif
-#endif
-
-typedef __attribute__((__aligned__(1))) uint16_t uint16_unaligned_t;
-typedef __attribute__((__aligned__(1))) uint32_t uint32_unaligned_t;
-typedef __attribute__((__aligned__(1))) uint64_t uint64_unaligned_t;
-
-//---------------------------------------------------------------------
-// fast copy for different sizes
-//---------------------------------------------------------------------
-static INLINE void memcpy_sse2_16(void *dst, const void *src) {
-	__m128i m0 = _mm_loadu_si128(((const __m128i*)src) + 0);
-	_mm_storeu_si128(((__m128i*)dst) + 0, m0);
-}
-
-static INLINE void memcpy_sse2_32(void *dst, const void *src) {
-	__m128i m0 = _mm_loadu_si128(((const __m128i*)src) + 0);
-	__m128i m1 = _mm_loadu_si128(((const __m128i*)src) + 1);
-	_mm_storeu_si128(((__m128i*)dst) + 0, m0);
-	_mm_storeu_si128(((__m128i*)dst) + 1, m1);
-}
-
-static INLINE void memcpy_sse2_64(void *dst, const void *src) {
-	__m128i m0 = _mm_loadu_si128(((const __m128i*)src) + 0);
-	__m128i m1 = _mm_loadu_si128(((const __m128i*)src) + 1);
-	__m128i m2 = _mm_loadu_si128(((const __m128i*)src) + 2);
-	__m128i m3 = _mm_loadu_si128(((const __m128i*)src) + 3);
-	_mm_storeu_si128(((__m128i*)dst) + 0, m0);
-	_mm_storeu_si128(((__m128i*)dst) + 1, m1);
-	_mm_storeu_si128(((__m128i*)dst) + 2, m2);
-	_mm_storeu_si128(((__m128i*)dst) + 3, m3);
-}
-
-static INLINE void memcpy_sse2_128(void *dst, const void *src) {
-	__m128i m0 = _mm_loadu_si128(((const __m128i*)src) + 0);
-	__m128i m1 = _mm_loadu_si128(((const __m128i*)src) + 1);
-	__m128i m2 = _mm_loadu_si128(((const __m128i*)src) + 2);
-	__m128i m3 = _mm_loadu_si128(((const __m128i*)src) + 3);
-	__m128i m4 = _mm_loadu_si128(((const __m128i*)src) + 4);
-	__m128i m5 = _mm_loadu_si128(((const __m128i*)src) + 5);
-	__m128i m6 = _mm_loadu_si128(((const __m128i*)src) + 6);
-	__m128i m7 = _mm_loadu_si128(((const __m128i*)src) + 7);
-	_mm_storeu_si128(((__m128i*)dst) + 0, m0);
-	_mm_storeu_si128(((__m128i*)dst) + 1, m1);
-	_mm_storeu_si128(((__m128i*)dst) + 2, m2);
-	_mm_storeu_si128(((__m128i*)dst) + 3, m3);
-	_mm_storeu_si128(((__m128i*)dst) + 4, m4);
-	_mm_storeu_si128(((__m128i*)dst) + 5, m5);
-	_mm_storeu_si128(((__m128i*)dst) + 6, m6);
-	_mm_storeu_si128(((__m128i*)dst) + 7, m7);
-}
-
-
-//---------------------------------------------------------------------
-// tiny memory copy with jump table optimized
-//---------------------------------------------------------------------
-/// Attribute is used to avoid an error with undefined behaviour sanitizer
-/// ../contrib/FastMemcpy/FastMemcpy.h:91:56: runtime error: applying zero offset to null pointer
-/// Found by 01307_orc_output_format.sh, cause - ORCBlockInputFormat and external ORC library.
-__attribute__((__no_sanitize__("undefined"))) static INLINE void *memcpy_tiny(void *dst, const void *src, size_t size) {
-	unsigned char *dd = ((unsigned char*)dst) + size;
-	const unsigned char *ss = ((const unsigned char*)src) + size;
-
-	switch (size) {
-	case 64:
-		memcpy_sse2_64(dd - 64, ss - 64);
-	case 0:
-		break;
-
-	case 65:
-		memcpy_sse2_64(dd - 65, ss - 65);
-	case 1:
-		dd[-1] = ss[-1];
-		break;
-
-	case 66:
-		memcpy_sse2_64(dd - 66, ss - 66);
-	case 2:
-		*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
-		break;
-
-	case 67:
-		memcpy_sse2_64(dd - 67, ss - 67);
-	case 3:
-		*((uint16_unaligned_t*)(dd - 3)) = *((uint16_unaligned_t*)(ss - 3));
-		dd[-1] = ss[-1];
-		break;
-
-	case 68:
-		memcpy_sse2_64(dd - 68, ss - 68);
-	case 4:
-		*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
-		break;
-
-	case 69:
-		memcpy_sse2_64(dd - 69, ss - 69);
-	case 5:
-		*((uint32_unaligned_t*)(dd - 5)) = *((uint32_unaligned_t*)(ss - 5));
-		dd[-1] = ss[-1];
-		break;
-
-	case 70:
-		memcpy_sse2_64(dd - 70, ss - 70);
-	case 6:
-		*((uint32_unaligned_t*)(dd - 6)) = *((uint32_unaligned_t*)(ss - 6));
-		*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
-		break;
-
-	case 71:
-		memcpy_sse2_64(dd - 71, ss - 71);
-	case 7:
-		*((uint32_unaligned_t*)(dd - 7)) = *((uint32_unaligned_t*)(ss - 7));
-		*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
-		break;
-
-	case 72:
-		memcpy_sse2_64(dd - 72, ss - 72);
-	case 8:
-		*((uint64_unaligned_t*)(dd - 8)) = *((uint64_unaligned_t*)(ss - 8));
-		break;
-
-	case 73:
-		memcpy_sse2_64(dd - 73, ss - 73);
-	case 9:
-		*((uint64_unaligned_t*)(dd - 9)) = *((uint64_unaligned_t*)(ss - 9));
-		dd[-1] = ss[-1];
-		break;
-
-	case 74:
-		memcpy_sse2_64(dd - 74, ss - 74);
-	case 10:
-		*((uint64_unaligned_t*)(dd - 10)) = *((uint64_unaligned_t*)(ss - 10));
-		*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
-		break;
-
-	case 75:
-		memcpy_sse2_64(dd - 75, ss - 75);
-	case 11:
-		*((uint64_unaligned_t*)(dd - 11)) = *((uint64_unaligned_t*)(ss - 11));
-		*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
-		break;
-
-	case 76:
-		memcpy_sse2_64(dd - 76, ss - 76);
-	case 12:
-		*((uint64_unaligned_t*)(dd - 12)) = *((uint64_unaligned_t*)(ss - 12));
-		*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
-		break;
-
-	case 77:
-		memcpy_sse2_64(dd - 77, ss - 77);
-	case 13:
-		*((uint64_unaligned_t*)(dd - 13)) = *((uint64_unaligned_t*)(ss - 13));
-		*((uint32_unaligned_t*)(dd - 5)) = *((uint32_unaligned_t*)(ss - 5));
-		dd[-1] = ss[-1];
-		break;
-
-	case 78:
-		memcpy_sse2_64(dd - 78, ss - 78);
-	case 14:
-		*((uint64_unaligned_t*)(dd - 14)) = *((uint64_unaligned_t*)(ss - 14));
-		*((uint64_unaligned_t*)(dd - 8)) = *((uint64_unaligned_t*)(ss - 8));
-		break;
-
-	case 79:
-		memcpy_sse2_64(dd - 79, ss - 79);
-	case 15:
-		*((uint64_unaligned_t*)(dd - 15)) = *((uint64_unaligned_t*)(ss - 15));
-		*((uint64_unaligned_t*)(dd - 8)) = *((uint64_unaligned_t*)(ss - 8));
-		break;
-
-	case 80:
-		memcpy_sse2_64(dd - 80, ss - 80);
-	case 16:
-		memcpy_sse2_16(dd - 16, ss - 16);
-		break;
-
-	case 81:
-		memcpy_sse2_64(dd - 81, ss - 81);
-	case 17:
-		memcpy_sse2_16(dd - 17, ss - 17);
-		dd[-1] = ss[-1];
-		break;
-
-	case 82:
-		memcpy_sse2_64(dd - 82, ss - 82);
-	case 18:
-		memcpy_sse2_16(dd - 18, ss - 18);
-		*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
-		break;
-
-	case 83:
-		memcpy_sse2_64(dd - 83, ss - 83);
-	case 19:
-		memcpy_sse2_16(dd - 19, ss - 19);
-		*((uint16_unaligned_t*)(dd - 3)) = *((uint16_unaligned_t*)(ss - 3));
-		dd[-1] = ss[-1];
-		break;
-
-	case 84:
-		memcpy_sse2_64(dd - 84, ss - 84);
-	case 20:
-		memcpy_sse2_16(dd - 20, ss - 20);
-		*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
-		break;
-
-	case 85:
-		memcpy_sse2_64(dd - 85, ss - 85);
-	case 21:
-		memcpy_sse2_16(dd - 21, ss - 21);
-		*((uint32_unaligned_t*)(dd - 5)) = *((uint32_unaligned_t*)(ss - 5));
-		dd[-1] = ss[-1];
-		break;
-
-	case 86:
-		memcpy_sse2_64(dd - 86, ss - 86);
-	case 22:
-		memcpy_sse2_16(dd - 22, ss - 22);
-		*((uint32_unaligned_t*)(dd - 6)) = *((uint32_unaligned_t*)(ss - 6));
-		*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
-		break;
-
-	case 87:
-		memcpy_sse2_64(dd - 87, ss - 87);
-	case 23:
-		memcpy_sse2_16(dd - 23, ss - 23);
-		*((uint32_unaligned_t*)(dd - 7)) = *((uint32_unaligned_t*)(ss - 7));
-		*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
-		break;
-
-	case 88:
-		memcpy_sse2_64(dd - 88, ss - 88);
-	case 24:
-		memcpy_sse2_16(dd - 24, ss - 24);
-		memcpy_sse2_16(dd - 16, ss - 16);
-		break;
-
-	case 89:
-		memcpy_sse2_64(dd - 89, ss - 89);
-	case 25:
-		memcpy_sse2_16(dd - 25, ss - 25);
-		memcpy_sse2_16(dd - 16, ss - 16);
-		break;
-
-	case 90:
-		memcpy_sse2_64(dd - 90, ss - 90);
-	case 26:
-		memcpy_sse2_16(dd - 26, ss - 26);
-		memcpy_sse2_16(dd - 16, ss - 16);
-		break;
-
-	case 91:
-		memcpy_sse2_64(dd - 91, ss - 91);
-	case 27:
-		memcpy_sse2_16(dd - 27, ss - 27);
-		memcpy_sse2_16(dd - 16, ss - 16);
-		break;
-
-	case 92:
-		memcpy_sse2_64(dd - 92, ss - 92);
-	case 28:
-		memcpy_sse2_16(dd - 28, ss - 28);
-		memcpy_sse2_16(dd - 16, ss - 16);
-		break;
-
-	case 93:
-		memcpy_sse2_64(dd - 93, ss - 93);
-	case 29:
-		memcpy_sse2_16(dd - 29, ss - 29);
-		memcpy_sse2_16(dd - 16, ss - 16);
-		break;
-
-	case 94:
-		memcpy_sse2_64(dd - 94, ss - 94);
-	case 30:
-		memcpy_sse2_16(dd - 30, ss - 30);
-		memcpy_sse2_16(dd - 16, ss - 16);
-		break;
-
-	case 95:
-		memcpy_sse2_64(dd - 95, ss - 95);
-	case 31:
-		memcpy_sse2_16(dd - 31, ss - 31);
-		memcpy_sse2_16(dd - 16, ss - 16);
-		break;
-
-	case 96:
-		memcpy_sse2_64(dd - 96, ss - 96);
-	case 32:
-		memcpy_sse2_32(dd - 32, ss - 32);
-		break;
-
-	case 97:
-		memcpy_sse2_64(dd - 97, ss - 97);
-	case 33:
-		memcpy_sse2_32(dd - 33, ss - 33);
-		dd[-1] = ss[-1];
-		break;
-
-	case 98:
-		memcpy_sse2_64(dd - 98, ss - 98);
-	case 34:
-		memcpy_sse2_32(dd - 34, ss - 34);
-		*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
-		break;
-
-	case 99:
-		memcpy_sse2_64(dd - 99, ss - 99);
-	case 35:
-		memcpy_sse2_32(dd - 35, ss - 35);
-		*((uint16_unaligned_t*)(dd - 3)) = *((uint16_unaligned_t*)(ss - 3));
-		dd[-1] = ss[-1];
-		break;
-
-	case 100:
-		memcpy_sse2_64(dd - 100, ss - 100);
-	case 36:
-		memcpy_sse2_32(dd - 36, ss - 36);
-		*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
-		break;
-
-	case 101:
-		memcpy_sse2_64(dd - 101, ss - 101);
-	case 37:
-		memcpy_sse2_32(dd - 37, ss - 37);
-		*((uint32_unaligned_t*)(dd - 5)) = *((uint32_unaligned_t*)(ss - 5));
-		dd[-1] = ss[-1];
-		break;
-
-	case 102:
-		memcpy_sse2_64(dd - 102, ss - 102);
-	case 38:
-		memcpy_sse2_32(dd - 38, ss - 38);
-		*((uint32_unaligned_t*)(dd - 6)) = *((uint32_unaligned_t*)(ss - 6));
-		*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
-		break;
-
-	case 103:
-		memcpy_sse2_64(dd - 103, ss - 103);
-	case 39:
-		memcpy_sse2_32(dd - 39, ss - 39);
-		*((uint32_unaligned_t*)(dd - 7)) = *((uint32_unaligned_t*)(ss - 7));
-		*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
-		break;
-
-	case 104:
-		memcpy_sse2_64(dd - 104, ss - 104);
-	case 40:
-		memcpy_sse2_32(dd - 40, ss - 40);
-		*((uint64_unaligned_t*)(dd - 8)) = *((uint64_unaligned_t*)(ss - 8));
-		break;
-
-	case 105:
-		memcpy_sse2_64(dd - 105, ss - 105);
-	case 41:
-		memcpy_sse2_32(dd - 41, ss - 41);
-		*((uint64_unaligned_t*)(dd - 9)) = *((uint64_unaligned_t*)(ss - 9));
-		dd[-1] = ss[-1];
-		break;
-
-	case 106:
-		memcpy_sse2_64(dd - 106, ss - 106);
-	case 42:
-		memcpy_sse2_32(dd - 42, ss - 42);
-		*((uint64_unaligned_t*)(dd - 10)) = *((uint64_unaligned_t*)(ss - 10));
-		*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
-		break;
-
-	case 107:
-		memcpy_sse2_64(dd - 107, ss - 107);
-	case 43:
-		memcpy_sse2_32(dd - 43, ss - 43);
-		*((uint64_unaligned_t*)(dd - 11)) = *((uint64_unaligned_t*)(ss - 11));
-		*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
-		break;
-
-	case 108:
-		memcpy_sse2_64(dd - 108, ss - 108);
-	case 44:
-		memcpy_sse2_32(dd - 44, ss - 44);
-		*((uint64_unaligned_t*)(dd - 12)) = *((uint64_unaligned_t*)(ss - 12));
-		*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
-		break;
-
-	case 109:
-		memcpy_sse2_64(dd - 109, ss - 109);
-	case 45:
-		memcpy_sse2_32(dd - 45, ss - 45);
-		*((uint64_unaligned_t*)(dd - 13)) = *((uint64_unaligned_t*)(ss - 13));
-		*((uint32_unaligned_t*)(dd - 5)) = *((uint32_unaligned_t*)(ss - 5));
-		dd[-1] = ss[-1];
-		break;
-
-	case 110:
-		memcpy_sse2_64(dd - 110, ss - 110);
-	case 46:
-		memcpy_sse2_32(dd - 46, ss - 46);
-		*((uint64_unaligned_t*)(dd - 14)) = *((uint64_unaligned_t*)(ss - 14));
-		*((uint64_unaligned_t*)(dd - 8)) = *((uint64_unaligned_t*)(ss - 8));
-		break;
-
-	case 111:
-		memcpy_sse2_64(dd - 111, ss - 111);
-	case 47:
-		memcpy_sse2_32(dd - 47, ss - 47);
-		*((uint64_unaligned_t*)(dd - 15)) = *((uint64_unaligned_t*)(ss - 15));
-		*((uint64_unaligned_t*)(dd - 8)) = *((uint64_unaligned_t*)(ss - 8));
-		break;
-
-	case 112:
-		memcpy_sse2_64(dd - 112, ss - 112);
-	case 48:
-		memcpy_sse2_32(dd - 48, ss - 48);
-		memcpy_sse2_16(dd - 16, ss - 16);
-		break;
-
-	case 113:
-		memcpy_sse2_64(dd - 113, ss - 113);
-	case 49:
-		memcpy_sse2_32(dd - 49, ss - 49);
-		memcpy_sse2_16(dd - 17, ss - 17);
-		dd[-1] = ss[-1];
-		break;
-
-	case 114:
-		memcpy_sse2_64(dd - 114, ss - 114);
-	case 50:
-		memcpy_sse2_32(dd - 50, ss - 50);
-		memcpy_sse2_16(dd - 18, ss - 18);
-		*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
-		break;
-
-	case 115:
-		memcpy_sse2_64(dd - 115, ss - 115);
-	case 51:
-		memcpy_sse2_32(dd - 51, ss - 51);
-		memcpy_sse2_16(dd - 19, ss - 19);
-		*((uint16_unaligned_t*)(dd - 3)) = *((uint16_unaligned_t*)(ss - 3));
-		dd[-1] = ss[-1];
-		break;
-
-	case 116:
-		memcpy_sse2_64(dd - 116, ss - 116);
-	case 52:
-		memcpy_sse2_32(dd - 52, ss - 52);
-		memcpy_sse2_16(dd - 20, ss - 20);
-		*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
-		break;
-
-	case 117:
-		memcpy_sse2_64(dd - 117, ss - 117);
-	case 53:
-		memcpy_sse2_32(dd - 53, ss - 53);
-		memcpy_sse2_16(dd - 21, ss - 21);
-		*((uint32_unaligned_t*)(dd - 5)) = *((uint32_unaligned_t*)(ss - 5));
-		dd[-1] = ss[-1];
-		break;
-
-	case 118:
-		memcpy_sse2_64(dd - 118, ss - 118);
-	case 54:
-		memcpy_sse2_32(dd - 54, ss - 54);
-		memcpy_sse2_16(dd - 22, ss - 22);
-		*((uint32_unaligned_t*)(dd - 6)) = *((uint32_unaligned_t*)(ss - 6));
-		*((uint16_unaligned_t*)(dd - 2)) = *((uint16_unaligned_t*)(ss - 2));
-		break;
-
-	case 119:
-		memcpy_sse2_64(dd - 119, ss - 119);
-	case 55:
-		memcpy_sse2_32(dd - 55, ss - 55);
-		memcpy_sse2_16(dd - 23, ss - 23);
-		*((uint32_unaligned_t*)(dd - 7)) = *((uint32_unaligned_t*)(ss - 7));
-		*((uint32_unaligned_t*)(dd - 4)) = *((uint32_unaligned_t*)(ss - 4));
-		break;
-
-	case 120:
-		memcpy_sse2_64(dd - 120, ss - 120);
-	case 56:
-		memcpy_sse2_32(dd - 56, ss - 56);
-		memcpy_sse2_16(dd - 24, ss - 24);
-		memcpy_sse2_16(dd - 16, ss - 16);
-		break;
-
-	case 121:
-		memcpy_sse2_64(dd - 121, ss - 121);
-	case 57:
-		memcpy_sse2_32(dd - 57, ss - 57);
-		memcpy_sse2_16(dd - 25, ss - 25);
-		memcpy_sse2_16(dd - 16, ss - 16);
-		break;
-
-	case 122:
-		memcpy_sse2_64(dd - 122, ss - 122);
-	case 58:
-		memcpy_sse2_32(dd - 58, ss - 58);
-		memcpy_sse2_16(dd - 26, ss - 26);
-		memcpy_sse2_16(dd - 16, ss - 16);
-		break;
-
-	case 123:
-		memcpy_sse2_64(dd - 123, ss - 123);
-	case 59:
-		memcpy_sse2_32(dd - 59, ss - 59);
-		memcpy_sse2_16(dd - 27, ss - 27);
-		memcpy_sse2_16(dd - 16, ss - 16);
-		break;
-
-	case 124:
-		memcpy_sse2_64(dd - 124, ss - 124);
-	case 60:
-		memcpy_sse2_32(dd - 60, ss - 60);
-		memcpy_sse2_16(dd - 28, ss - 28);
-		memcpy_sse2_16(dd - 16, ss - 16);
-		break;
-
-	case 125:
-		memcpy_sse2_64(dd - 125, ss - 125);
-	case 61:
-		memcpy_sse2_32(dd - 61, ss - 61);
-		memcpy_sse2_16(dd - 29, ss - 29);
-		memcpy_sse2_16(dd - 16, ss - 16);
-		break;
-
-	case 126:
-		memcpy_sse2_64(dd - 126, ss - 126);
-	case 62:
-		memcpy_sse2_32(dd - 62, ss - 62);
-		memcpy_sse2_16(dd - 30, ss - 30);
-		memcpy_sse2_16(dd - 16, ss - 16);
-		break;
-
-	case 127:
-		memcpy_sse2_64(dd - 127, ss - 127);
-	case 63:
-		memcpy_sse2_32(dd - 63, ss - 63);
-		memcpy_sse2_16(dd - 31, ss - 31);
-		memcpy_sse2_16(dd - 16, ss - 16);
-		break;
-
-	case 128:
-		memcpy_sse2_128(dd - 128, ss - 128);
-		break;
-	}
-
-	return dst;
-}
-
-
-//---------------------------------------------------------------------
-// main routine
-//---------------------------------------------------------------------
-static void* memcpy_fast(void *destination, const void *source, size_t size)
-{
-	unsigned char *dst = (unsigned char*)destination;
-	const unsigned char *src = (const unsigned char*)source;
-	static size_t cachesize = 0x200000; // L2-cache size
-	size_t padding;
-
-	// small memory copy
-	if (size <= 128) {
-		return memcpy_tiny(dst, src, size);
-	}
-
-	// align destination to 16 bytes boundary
-	padding = (16 - (((size_t)dst) & 15)) & 15;
-
-	if (padding > 0) {
-		__m128i head = _mm_loadu_si128((const __m128i*)src);
-		_mm_storeu_si128((__m128i*)dst, head);
-		dst += padding;
-		src += padding;
-		size -= padding;
-	}
-
-	// medium size copy
-	if (size <= cachesize) {
-		__m128i c0, c1, c2, c3, c4, c5, c6, c7;
-
-		for (; size >= 128; size -= 128) {
-			c0 = _mm_loadu_si128(((const __m128i*)src) + 0);
-			c1 = _mm_loadu_si128(((const __m128i*)src) + 1);
-			c2 = _mm_loadu_si128(((const __m128i*)src) + 2);
-			c3 = _mm_loadu_si128(((const __m128i*)src) + 3);
-			c4 = _mm_loadu_si128(((const __m128i*)src) + 4);
-			c5 = _mm_loadu_si128(((const __m128i*)src) + 5);
-			c6 = _mm_loadu_si128(((const __m128i*)src) + 6);
-			c7 = _mm_loadu_si128(((const __m128i*)src) + 7);
-			_mm_prefetch((const char*)(src + 256), _MM_HINT_NTA);
-			src += 128;
-			_mm_store_si128((((__m128i*)dst) + 0), c0);
-			_mm_store_si128((((__m128i*)dst) + 1), c1);
-			_mm_store_si128((((__m128i*)dst) + 2), c2);
-			_mm_store_si128((((__m128i*)dst) + 3), c3);
-			_mm_store_si128((((__m128i*)dst) + 4), c4);
-			_mm_store_si128((((__m128i*)dst) + 5), c5);
-			_mm_store_si128((((__m128i*)dst) + 6), c6);
-			_mm_store_si128((((__m128i*)dst) + 7), c7);
-			dst += 128;
-		}
-	}
-	else {		// big memory copy
-		__m128i c0, c1, c2, c3, c4, c5, c6, c7;
-
-		_mm_prefetch((const char*)(src), _MM_HINT_NTA);
-
-		if ((((size_t)src) & 15) == 0) {	// source aligned
-			for (; size >= 128; size -= 128) {
-				c0 = _mm_load_si128(((const __m128i*)src) + 0);
-				c1 = _mm_load_si128(((const __m128i*)src) + 1);
-				c2 = _mm_load_si128(((const __m128i*)src) + 2);
-				c3 = _mm_load_si128(((const __m128i*)src) + 3);
-				c4 = _mm_load_si128(((const __m128i*)src) + 4);
-				c5 = _mm_load_si128(((const __m128i*)src) + 5);
-				c6 = _mm_load_si128(((const __m128i*)src) + 6);
-				c7 = _mm_load_si128(((const __m128i*)src) + 7);
-				_mm_prefetch((const char*)(src + 256), _MM_HINT_NTA);
-				src += 128;
-				_mm_stream_si128((((__m128i*)dst) + 0), c0);
-				_mm_stream_si128((((__m128i*)dst) + 1), c1);
-				_mm_stream_si128((((__m128i*)dst) + 2), c2);
-				_mm_stream_si128((((__m128i*)dst) + 3), c3);
-				_mm_stream_si128((((__m128i*)dst) + 4), c4);
-				_mm_stream_si128((((__m128i*)dst) + 5), c5);
-				_mm_stream_si128((((__m128i*)dst) + 6), c6);
-				_mm_stream_si128((((__m128i*)dst) + 7), c7);
-				dst += 128;
-			}
-		}
-		else {							// source unaligned
-			for (; size >= 128; size -= 128) {
-				c0 = _mm_loadu_si128(((const __m128i*)src) + 0);
-				c1 = _mm_loadu_si128(((const __m128i*)src) + 1);
-				c2 = _mm_loadu_si128(((const __m128i*)src) + 2);
-				c3 = _mm_loadu_si128(((const __m128i*)src) + 3);
-				c4 = _mm_loadu_si128(((const __m128i*)src) + 4);
-				c5 = _mm_loadu_si128(((const __m128i*)src) + 5);
-				c6 = _mm_loadu_si128(((const __m128i*)src) + 6);
-				c7 = _mm_loadu_si128(((const __m128i*)src) + 7);
-				_mm_prefetch((const char*)(src + 256), _MM_HINT_NTA);
-				src += 128;
-				_mm_stream_si128((((__m128i*)dst) + 0), c0);
-				_mm_stream_si128((((__m128i*)dst) + 1), c1);
-				_mm_stream_si128((((__m128i*)dst) + 2), c2);
-				_mm_stream_si128((((__m128i*)dst) + 3), c3);
-				_mm_stream_si128((((__m128i*)dst) + 4), c4);
-				_mm_stream_si128((((__m128i*)dst) + 5), c5);
-				_mm_stream_si128((((__m128i*)dst) + 6), c6);
-				_mm_stream_si128((((__m128i*)dst) + 7), c7);
-				dst += 128;
-			}
-		}
-		_mm_sfence();
-	}
-
-	memcpy_tiny(dst, src, size);
-
-	return destination;
-}
-
-
-#endif
--- a/contrib/FastMemcpy/FastMemcpy_Avx.c
+++ b/contrib/FastMemcpy/FastMemcpy_Avx.c
@ -1,171 +0,0 @@
-//=====================================================================
-//
-// FastMemcpy.c - skywind3000@163.com, 2015
-//
-// feature:
-// 50% speed up in avg. vs standard memcpy (tested in vc2012/gcc4.9)
-//
-//=====================================================================
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <time.h>
-#include <assert.h>
-
-#if (defined(_WIN32) || defined(WIN32))
-#include <windows.h>
-#include <mmsystem.h>
-#ifdef _MSC_VER
-#pragma comment(lib, "winmm.lib")
-#endif
-#elif defined(__unix)
-#include <sys/time.h>
-#include <unistd.h>
-#else
-#error it can only be compiled under windows or unix
-#endif
-
-#include "FastMemcpy_Avx.h"
-
-
-unsigned int gettime()
-{
-	#if (defined(_WIN32) || defined(WIN32))
-	return timeGetTime();
-	#else
-	static struct timezone tz={ 0,0 };
-	struct timeval time;
-	gettimeofday(&time,&tz);
-	return (time.tv_sec * 1000 + time.tv_usec / 1000);
-	#endif
-}
-
-void sleepms(unsigned int millisec)
-{
-#if defined(_WIN32) || defined(WIN32)
-	Sleep(millisec);
-#else
-	usleep(millisec * 1000);
-#endif
-}
-
-
-
-void benchmark(int dstalign, int srcalign, size_t size, int times)
-{
-	char *DATA1 = (char*)malloc(size + 64);
-	char *DATA2 = (char*)malloc(size + 64);
-	size_t LINEAR1 = ((size_t)DATA1);
-	size_t LINEAR2 = ((size_t)DATA2);
-	char *ALIGN1 = (char*)(((64 - (LINEAR1 & 63)) & 63) + LINEAR1);
-	char *ALIGN2 = (char*)(((64 - (LINEAR2 & 63)) & 63) + LINEAR2);
-	char *dst = (dstalign)? ALIGN1 : (ALIGN1 + 1);
-	char *src = (srcalign)? ALIGN2 : (ALIGN2 + 3);
-	unsigned int t1, t2;
-	int k;
-	
-	sleepms(100);
-	t1 = gettime();
-	for (k = times; k > 0; k--) {
-		memcpy(dst, src, size);
-	}
-	t1 = gettime() - t1;
-	sleepms(100);
-	t2 = gettime();
-	for (k = times; k > 0; k--) {
-		memcpy_fast(dst, src, size);
-	}
-	t2 = gettime() - t2;
-
-	free(DATA1);
-	free(DATA2);
-
-	printf("result(dst %s, src %s): memcpy_fast=%dms memcpy=%d ms\n",  
-		dstalign? "aligned" : "unalign", 
-		srcalign? "aligned" : "unalign", (int)t2, (int)t1);
-}
-
-
-void bench(int copysize, int times)
-{
-	printf("benchmark(size=%d bytes, times=%d):\n", copysize, times);
-	benchmark(1, 1, copysize, times);
-	benchmark(1, 0, copysize, times);
-	benchmark(0, 1, copysize, times);
-	benchmark(0, 0, copysize, times);
-	printf("\n");
-}
-
-
-void random_bench(int maxsize, int times)
-{
-	static char A[11 * 1024 * 1024 + 2];
-	static char B[11 * 1024 * 1024 + 2];
-	static int random_offsets[0x10000];
-	static int random_sizes[0x8000];
-	unsigned int i, p1, p2;
-	unsigned int t1, t2;
-	for (i = 0; i < 0x10000; i++) {	// generate random offsets
-		random_offsets[i] = rand() % (10 * 1024 * 1024 + 1);
-	}
-	for (i = 0; i < 0x8000; i++) {	// generate random sizes
-		random_sizes[i] = 1 + rand() % maxsize;
-	}
-	sleepms(100);
-	t1 = gettime();
-	for (p1 = 0, p2 = 0, i = 0; i < times; i++) {
-		int offset1 = random_offsets[(p1++) & 0xffff];
-		int offset2 = random_offsets[(p1++) & 0xffff];
-		int size = random_sizes[(p2++) & 0x7fff];
-		memcpy(A + offset1, B + offset2, size);
-	}
-	t1 = gettime() - t1;
-	sleepms(100);
-	t2 = gettime();
-	for (p1 = 0, p2 = 0, i = 0; i < times; i++) {
-		int offset1 = random_offsets[(p1++) & 0xffff];
-		int offset2 = random_offsets[(p1++) & 0xffff];
-		int size = random_sizes[(p2++) & 0x7fff];
-		memcpy_fast(A + offset1, B + offset2, size);
-	}
-	t2 = gettime() - t2;
-	printf("benchmark random access:\n");
-	printf("memcpy_fast=%dms memcpy=%dms\n\n", (int)t2, (int)t1);
-}
-
-
-#ifdef _MSC_VER
-#pragma comment(lib, "winmm.lib")
-#endif
-
-int main(void)
-{
-#if 1
-	bench(32, 0x1000000);
-	bench(64, 0x1000000);
-	bench(512, 0x800000);
-	bench(1024, 0x400000);
-#endif
-	bench(4096, 0x80000);
-	bench(8192, 0x40000);
-#if 1
-	bench(1024 * 1024 * 1, 0x800);
-	bench(1024 * 1024 * 4, 0x200);
-#endif
-	bench(1024 * 1024 * 8, 0x100);
-	
-	random_bench(2048, 8000000);
-
-	return 0;
-}
-
-
-
-
-/*
-
-*/
-
-
-
-
--- a/contrib/FastMemcpy/FastMemcpy_Avx.h
+++ b/contrib/FastMemcpy/FastMemcpy_Avx.h
@ -1,492 +0,0 @@
-//=====================================================================
-//
-// FastMemcpy.c - skywind3000@163.com, 2015
-//
-// feature:
-// 50% speed up in avg. vs standard memcpy (tested in vc2012/gcc5.1)
-//
-//=====================================================================
-#ifndef __FAST_MEMCPY_H__
-#define __FAST_MEMCPY_H__
-
-#include <stddef.h>
-#include <stdint.h>
-#include <immintrin.h>
-
-
-//---------------------------------------------------------------------
-// force inline for compilers
-//---------------------------------------------------------------------
-#ifndef INLINE
-#ifdef __GNUC__
-#if (__GNUC__ > 3) || ((__GNUC__ == 3) && (__GNUC_MINOR__ >= 1))
-    #define INLINE         __inline__ __attribute__((always_inline))
-#else
-    #define INLINE         __inline__
-#endif
-#elif defined(_MSC_VER)
-	#define INLINE __forceinline
-#elif (defined(__BORLANDC__) || defined(__WATCOMC__))
-    #define INLINE __inline
-#else
-    #define INLINE 
-#endif
-#endif
-
-
-
-//---------------------------------------------------------------------
-// fast copy for different sizes
-//---------------------------------------------------------------------
-static INLINE void memcpy_avx_16(void *dst, const void *src) {
-#if 1
-	__m128i m0 = _mm_loadu_si128(((const __m128i*)src) + 0);
-	_mm_storeu_si128(((__m128i*)dst) + 0, m0);
-#else
-	*((uint64_t*)((char*)dst + 0)) = *((uint64_t*)((const char*)src + 0));
-	*((uint64_t*)((char*)dst + 8)) = *((uint64_t*)((const char*)src + 8));
-#endif
-}
-
-static INLINE void memcpy_avx_32(void *dst, const void *src) {
-	__m256i m0 = _mm256_loadu_si256(((const __m256i*)src) + 0);
-	_mm256_storeu_si256(((__m256i*)dst) + 0, m0);
-}
-
-static INLINE void memcpy_avx_64(void *dst, const void *src) {
-	__m256i m0 = _mm256_loadu_si256(((const __m256i*)src) + 0);
-	__m256i m1 = _mm256_loadu_si256(((const __m256i*)src) + 1);
-	_mm256_storeu_si256(((__m256i*)dst) + 0, m0);
-	_mm256_storeu_si256(((__m256i*)dst) + 1, m1);
-}
-
-static INLINE void memcpy_avx_128(void *dst, const void *src) {
-	__m256i m0 = _mm256_loadu_si256(((const __m256i*)src) + 0);
-	__m256i m1 = _mm256_loadu_si256(((const __m256i*)src) + 1);
-	__m256i m2 = _mm256_loadu_si256(((const __m256i*)src) + 2);
-	__m256i m3 = _mm256_loadu_si256(((const __m256i*)src) + 3);
-	_mm256_storeu_si256(((__m256i*)dst) + 0, m0);
-	_mm256_storeu_si256(((__m256i*)dst) + 1, m1);
-	_mm256_storeu_si256(((__m256i*)dst) + 2, m2);
-	_mm256_storeu_si256(((__m256i*)dst) + 3, m3);
-}
-
-static INLINE void memcpy_avx_256(void *dst, const void *src) {
-	__m256i m0 = _mm256_loadu_si256(((const __m256i*)src) + 0);
-	__m256i m1 = _mm256_loadu_si256(((const __m256i*)src) + 1);
-	__m256i m2 = _mm256_loadu_si256(((const __m256i*)src) + 2);
-	__m256i m3 = _mm256_loadu_si256(((const __m256i*)src) + 3);
-	__m256i m4 = _mm256_loadu_si256(((const __m256i*)src) + 4);
-	__m256i m5 = _mm256_loadu_si256(((const __m256i*)src) + 5);
-	__m256i m6 = _mm256_loadu_si256(((const __m256i*)src) + 6);
-	__m256i m7 = _mm256_loadu_si256(((const __m256i*)src) + 7);
-	_mm256_storeu_si256(((__m256i*)dst) + 0, m0);
-	_mm256_storeu_si256(((__m256i*)dst) + 1, m1);
-	_mm256_storeu_si256(((__m256i*)dst) + 2, m2);
-	_mm256_storeu_si256(((__m256i*)dst) + 3, m3);
-	_mm256_storeu_si256(((__m256i*)dst) + 4, m4);
-	_mm256_storeu_si256(((__m256i*)dst) + 5, m5);
-	_mm256_storeu_si256(((__m256i*)dst) + 6, m6);
-	_mm256_storeu_si256(((__m256i*)dst) + 7, m7);
-}
-
-
-//---------------------------------------------------------------------
-// tiny memory copy with jump table optimized
-//---------------------------------------------------------------------
-static INLINE void *memcpy_tiny(void *dst, const void *src, size_t size) {
-	unsigned char *dd = ((unsigned char*)dst) + size;
-	const unsigned char *ss = ((const unsigned char*)src) + size;
-
-	switch (size) { 
-	case 128: memcpy_avx_128(dd - 128, ss - 128);
-	case 0:  break;
-	case 129: memcpy_avx_128(dd - 129, ss - 129);
-	case 1: dd[-1] = ss[-1]; break;
-	case 130: memcpy_avx_128(dd - 130, ss - 130);
-	case 2: *((uint16_t*)(dd - 2)) = *((uint16_t*)(ss - 2)); break;
-	case 131: memcpy_avx_128(dd - 131, ss - 131);
-	case 3: *((uint16_t*)(dd - 3)) = *((uint16_t*)(ss - 3)); dd[-1] = ss[-1]; break;
-	case 132: memcpy_avx_128(dd - 132, ss - 132);
-	case 4: *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
-	case 133: memcpy_avx_128(dd - 133, ss - 133);
-	case 5: *((uint32_t*)(dd - 5)) = *((uint32_t*)(ss - 5)); dd[-1] = ss[-1]; break;
-	case 134: memcpy_avx_128(dd - 134, ss - 134);
-	case 6: *((uint32_t*)(dd - 6)) = *((uint32_t*)(ss - 6)); *((uint16_t*)(dd - 2)) = *((uint16_t*)(ss - 2)); break;
-	case 135: memcpy_avx_128(dd - 135, ss - 135);
-	case 7: *((uint32_t*)(dd - 7)) = *((uint32_t*)(ss - 7)); *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
-	case 136: memcpy_avx_128(dd - 136, ss - 136);
-	case 8: *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
-	case 137: memcpy_avx_128(dd - 137, ss - 137);
-	case 9: *((uint64_t*)(dd - 9)) = *((uint64_t*)(ss - 9)); dd[-1] = ss[-1]; break;
-	case 138: memcpy_avx_128(dd - 138, ss - 138);
-	case 10: *((uint64_t*)(dd - 10)) = *((uint64_t*)(ss - 10)); *((uint16_t*)(dd - 2)) = *((uint16_t*)(ss - 2)); break;
-	case 139: memcpy_avx_128(dd - 139, ss - 139);
-	case 11: *((uint64_t*)(dd - 11)) = *((uint64_t*)(ss - 11)); *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
-	case 140: memcpy_avx_128(dd - 140, ss - 140);
-	case 12: *((uint64_t*)(dd - 12)) = *((uint64_t*)(ss - 12)); *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
-	case 141: memcpy_avx_128(dd - 141, ss - 141);
-	case 13: *((uint64_t*)(dd - 13)) = *((uint64_t*)(ss - 13)); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
-	case 142: memcpy_avx_128(dd - 142, ss - 142);
-	case 14: *((uint64_t*)(dd - 14)) = *((uint64_t*)(ss - 14)); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
-	case 143: memcpy_avx_128(dd - 143, ss - 143);
-	case 15: *((uint64_t*)(dd - 15)) = *((uint64_t*)(ss - 15)); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
-	case 144: memcpy_avx_128(dd - 144, ss - 144);
-	case 16: memcpy_avx_16(dd - 16, ss - 16); break;
-	case 145: memcpy_avx_128(dd - 145, ss - 145);
-	case 17: memcpy_avx_16(dd - 17, ss - 17); dd[-1] = ss[-1]; break;
-	case 146: memcpy_avx_128(dd - 146, ss - 146);
-	case 18: memcpy_avx_16(dd - 18, ss - 18); *((uint16_t*)(dd - 2)) = *((uint16_t*)(ss - 2)); break;
-	case 147: memcpy_avx_128(dd - 147, ss - 147);
-	case 19: memcpy_avx_16(dd - 19, ss - 19); *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
-	case 148: memcpy_avx_128(dd - 148, ss - 148);
-	case 20: memcpy_avx_16(dd - 20, ss - 20); *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
-	case 149: memcpy_avx_128(dd - 149, ss - 149);
-	case 21: memcpy_avx_16(dd - 21, ss - 21); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
-	case 150: memcpy_avx_128(dd - 150, ss - 150);
-	case 22: memcpy_avx_16(dd - 22, ss - 22); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
-	case 151: memcpy_avx_128(dd - 151, ss - 151);
-	case 23: memcpy_avx_16(dd - 23, ss - 23); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
-	case 152: memcpy_avx_128(dd - 152, ss - 152);
-	case 24: memcpy_avx_16(dd - 24, ss - 24); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
-	case 153: memcpy_avx_128(dd - 153, ss - 153);
-	case 25: memcpy_avx_16(dd - 25, ss - 25); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 154: memcpy_avx_128(dd - 154, ss - 154);
-	case 26: memcpy_avx_16(dd - 26, ss - 26); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 155: memcpy_avx_128(dd - 155, ss - 155);
-	case 27: memcpy_avx_16(dd - 27, ss - 27); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 156: memcpy_avx_128(dd - 156, ss - 156);
-	case 28: memcpy_avx_16(dd - 28, ss - 28); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 157: memcpy_avx_128(dd - 157, ss - 157);
-	case 29: memcpy_avx_16(dd - 29, ss - 29); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 158: memcpy_avx_128(dd - 158, ss - 158);
-	case 30: memcpy_avx_16(dd - 30, ss - 30); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 159: memcpy_avx_128(dd - 159, ss - 159);
-	case 31: memcpy_avx_16(dd - 31, ss - 31); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 160: memcpy_avx_128(dd - 160, ss - 160);
-	case 32: memcpy_avx_32(dd - 32, ss - 32); break;
-	case 161: memcpy_avx_128(dd - 161, ss - 161);
-	case 33: memcpy_avx_32(dd - 33, ss - 33); dd[-1] = ss[-1]; break;
-	case 162: memcpy_avx_128(dd - 162, ss - 162);
-	case 34: memcpy_avx_32(dd - 34, ss - 34); *((uint16_t*)(dd - 2)) = *((uint16_t*)(ss - 2)); break;
-	case 163: memcpy_avx_128(dd - 163, ss - 163);
-	case 35: memcpy_avx_32(dd - 35, ss - 35); *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
-	case 164: memcpy_avx_128(dd - 164, ss - 164);
-	case 36: memcpy_avx_32(dd - 36, ss - 36); *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
-	case 165: memcpy_avx_128(dd - 165, ss - 165);
-	case 37: memcpy_avx_32(dd - 37, ss - 37); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
-	case 166: memcpy_avx_128(dd - 166, ss - 166);
-	case 38: memcpy_avx_32(dd - 38, ss - 38); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
-	case 167: memcpy_avx_128(dd - 167, ss - 167);
-	case 39: memcpy_avx_32(dd - 39, ss - 39); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
-	case 168: memcpy_avx_128(dd - 168, ss - 168);
-	case 40: memcpy_avx_32(dd - 40, ss - 40); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
-	case 169: memcpy_avx_128(dd - 169, ss - 169);
-	case 41: memcpy_avx_32(dd - 41, ss - 41); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 170: memcpy_avx_128(dd - 170, ss - 170);
-	case 42: memcpy_avx_32(dd - 42, ss - 42); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 171: memcpy_avx_128(dd - 171, ss - 171);
-	case 43: memcpy_avx_32(dd - 43, ss - 43); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 172: memcpy_avx_128(dd - 172, ss - 172);
-	case 44: memcpy_avx_32(dd - 44, ss - 44); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 173: memcpy_avx_128(dd - 173, ss - 173);
-	case 45: memcpy_avx_32(dd - 45, ss - 45); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 174: memcpy_avx_128(dd - 174, ss - 174);
-	case 46: memcpy_avx_32(dd - 46, ss - 46); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 175: memcpy_avx_128(dd - 175, ss - 175);
-	case 47: memcpy_avx_32(dd - 47, ss - 47); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 176: memcpy_avx_128(dd - 176, ss - 176);
-	case 48: memcpy_avx_32(dd - 48, ss - 48); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 177: memcpy_avx_128(dd - 177, ss - 177);
-	case 49: memcpy_avx_32(dd - 49, ss - 49); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 178: memcpy_avx_128(dd - 178, ss - 178);
-	case 50: memcpy_avx_32(dd - 50, ss - 50); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 179: memcpy_avx_128(dd - 179, ss - 179);
-	case 51: memcpy_avx_32(dd - 51, ss - 51); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 180: memcpy_avx_128(dd - 180, ss - 180);
-	case 52: memcpy_avx_32(dd - 52, ss - 52); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 181: memcpy_avx_128(dd - 181, ss - 181);
-	case 53: memcpy_avx_32(dd - 53, ss - 53); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 182: memcpy_avx_128(dd - 182, ss - 182);
-	case 54: memcpy_avx_32(dd - 54, ss - 54); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 183: memcpy_avx_128(dd - 183, ss - 183);
-	case 55: memcpy_avx_32(dd - 55, ss - 55); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 184: memcpy_avx_128(dd - 184, ss - 184);
-	case 56: memcpy_avx_32(dd - 56, ss - 56); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 185: memcpy_avx_128(dd - 185, ss - 185);
-	case 57: memcpy_avx_32(dd - 57, ss - 57); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 186: memcpy_avx_128(dd - 186, ss - 186);
-	case 58: memcpy_avx_32(dd - 58, ss - 58); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 187: memcpy_avx_128(dd - 187, ss - 187);
-	case 59: memcpy_avx_32(dd - 59, ss - 59); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 188: memcpy_avx_128(dd - 188, ss - 188);
-	case 60: memcpy_avx_32(dd - 60, ss - 60); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 189: memcpy_avx_128(dd - 189, ss - 189);
-	case 61: memcpy_avx_32(dd - 61, ss - 61); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 190: memcpy_avx_128(dd - 190, ss - 190);
-	case 62: memcpy_avx_32(dd - 62, ss - 62); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 191: memcpy_avx_128(dd - 191, ss - 191);
-	case 63: memcpy_avx_32(dd - 63, ss - 63); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 192: memcpy_avx_128(dd - 192, ss - 192);
-	case 64: memcpy_avx_64(dd - 64, ss - 64); break;
-	case 193: memcpy_avx_128(dd - 193, ss - 193);
-	case 65: memcpy_avx_64(dd - 65, ss - 65); dd[-1] = ss[-1]; break;
-	case 194: memcpy_avx_128(dd - 194, ss - 194);
-	case 66: memcpy_avx_64(dd - 66, ss - 66); *((uint16_t*)(dd - 2)) = *((uint16_t*)(ss - 2)); break;
-	case 195: memcpy_avx_128(dd - 195, ss - 195);
-	case 67: memcpy_avx_64(dd - 67, ss - 67); *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
-	case 196: memcpy_avx_128(dd - 196, ss - 196);
-	case 68: memcpy_avx_64(dd - 68, ss - 68); *((uint32_t*)(dd - 4)) = *((uint32_t*)(ss - 4)); break;
-	case 197: memcpy_avx_128(dd - 197, ss - 197);
-	case 69: memcpy_avx_64(dd - 69, ss - 69); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
-	case 198: memcpy_avx_128(dd - 198, ss - 198);
-	case 70: memcpy_avx_64(dd - 70, ss - 70); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
-	case 199: memcpy_avx_128(dd - 199, ss - 199);
-	case 71: memcpy_avx_64(dd - 71, ss - 71); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
-	case 200: memcpy_avx_128(dd - 200, ss - 200);
-	case 72: memcpy_avx_64(dd - 72, ss - 72); *((uint64_t*)(dd - 8)) = *((uint64_t*)(ss - 8)); break;
-	case 201: memcpy_avx_128(dd - 201, ss - 201);
-	case 73: memcpy_avx_64(dd - 73, ss - 73); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 202: memcpy_avx_128(dd - 202, ss - 202);
-	case 74: memcpy_avx_64(dd - 74, ss - 74); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 203: memcpy_avx_128(dd - 203, ss - 203);
-	case 75: memcpy_avx_64(dd - 75, ss - 75); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 204: memcpy_avx_128(dd - 204, ss - 204);
-	case 76: memcpy_avx_64(dd - 76, ss - 76); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 205: memcpy_avx_128(dd - 205, ss - 205);
-	case 77: memcpy_avx_64(dd - 77, ss - 77); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 206: memcpy_avx_128(dd - 206, ss - 206);
-	case 78: memcpy_avx_64(dd - 78, ss - 78); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 207: memcpy_avx_128(dd - 207, ss - 207);
-	case 79: memcpy_avx_64(dd - 79, ss - 79); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 208: memcpy_avx_128(dd - 208, ss - 208);
-	case 80: memcpy_avx_64(dd - 80, ss - 80); memcpy_avx_16(dd - 16, ss - 16); break;
-	case 209: memcpy_avx_128(dd - 209, ss - 209);
-	case 81: memcpy_avx_64(dd - 81, ss - 81); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 210: memcpy_avx_128(dd - 210, ss - 210);
-	case 82: memcpy_avx_64(dd - 82, ss - 82); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 211: memcpy_avx_128(dd - 211, ss - 211);
-	case 83: memcpy_avx_64(dd - 83, ss - 83); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 212: memcpy_avx_128(dd - 212, ss - 212);
-	case 84: memcpy_avx_64(dd - 84, ss - 84); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 213: memcpy_avx_128(dd - 213, ss - 213);
-	case 85: memcpy_avx_64(dd - 85, ss - 85); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 214: memcpy_avx_128(dd - 214, ss - 214);
-	case 86: memcpy_avx_64(dd - 86, ss - 86); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 215: memcpy_avx_128(dd - 215, ss - 215);
-	case 87: memcpy_avx_64(dd - 87, ss - 87); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 216: memcpy_avx_128(dd - 216, ss - 216);
-	case 88: memcpy_avx_64(dd - 88, ss - 88); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 217: memcpy_avx_128(dd - 217, ss - 217);
-	case 89: memcpy_avx_64(dd - 89, ss - 89); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 218: memcpy_avx_128(dd - 218, ss - 218);
-	case 90: memcpy_avx_64(dd - 90, ss - 90); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 219: memcpy_avx_128(dd - 219, ss - 219);
-	case 91: memcpy_avx_64(dd - 91, ss - 91); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 220: memcpy_avx_128(dd - 220, ss - 220);
-	case 92: memcpy_avx_64(dd - 92, ss - 92); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 221: memcpy_avx_128(dd - 221, ss - 221);
-	case 93: memcpy_avx_64(dd - 93, ss - 93); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 222: memcpy_avx_128(dd - 222, ss - 222);
-	case 94: memcpy_avx_64(dd - 94, ss - 94); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 223: memcpy_avx_128(dd - 223, ss - 223);
-	case 95: memcpy_avx_64(dd - 95, ss - 95); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 224: memcpy_avx_128(dd - 224, ss - 224);
-	case 96: memcpy_avx_64(dd - 96, ss - 96); memcpy_avx_32(dd - 32, ss - 32); break;
-	case 225: memcpy_avx_128(dd - 225, ss - 225);
-	case 97: memcpy_avx_64(dd - 97, ss - 97); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 226: memcpy_avx_128(dd - 226, ss - 226);
-	case 98: memcpy_avx_64(dd - 98, ss - 98); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 227: memcpy_avx_128(dd - 227, ss - 227);
-	case 99: memcpy_avx_64(dd - 99, ss - 99); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 228: memcpy_avx_128(dd - 228, ss - 228);
-	case 100: memcpy_avx_64(dd - 100, ss - 100); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 229: memcpy_avx_128(dd - 229, ss - 229);
-	case 101: memcpy_avx_64(dd - 101, ss - 101); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 230: memcpy_avx_128(dd - 230, ss - 230);
-	case 102: memcpy_avx_64(dd - 102, ss - 102); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 231: memcpy_avx_128(dd - 231, ss - 231);
-	case 103: memcpy_avx_64(dd - 103, ss - 103); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 232: memcpy_avx_128(dd - 232, ss - 232);
-	case 104: memcpy_avx_64(dd - 104, ss - 104); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 233: memcpy_avx_128(dd - 233, ss - 233);
-	case 105: memcpy_avx_64(dd - 105, ss - 105); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 234: memcpy_avx_128(dd - 234, ss - 234);
-	case 106: memcpy_avx_64(dd - 106, ss - 106); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 235: memcpy_avx_128(dd - 235, ss - 235);
-	case 107: memcpy_avx_64(dd - 107, ss - 107); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 236: memcpy_avx_128(dd - 236, ss - 236);
-	case 108: memcpy_avx_64(dd - 108, ss - 108); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 237: memcpy_avx_128(dd - 237, ss - 237);
-	case 109: memcpy_avx_64(dd - 109, ss - 109); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 238: memcpy_avx_128(dd - 238, ss - 238);
-	case 110: memcpy_avx_64(dd - 110, ss - 110); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 239: memcpy_avx_128(dd - 239, ss - 239);
-	case 111: memcpy_avx_64(dd - 111, ss - 111); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 240: memcpy_avx_128(dd - 240, ss - 240);
-	case 112: memcpy_avx_64(dd - 112, ss - 112); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 241: memcpy_avx_128(dd - 241, ss - 241);
-	case 113: memcpy_avx_64(dd - 113, ss - 113); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 242: memcpy_avx_128(dd - 242, ss - 242);
-	case 114: memcpy_avx_64(dd - 114, ss - 114); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 243: memcpy_avx_128(dd - 243, ss - 243);
-	case 115: memcpy_avx_64(dd - 115, ss - 115); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 244: memcpy_avx_128(dd - 244, ss - 244);
-	case 116: memcpy_avx_64(dd - 116, ss - 116); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 245: memcpy_avx_128(dd - 245, ss - 245);
-	case 117: memcpy_avx_64(dd - 117, ss - 117); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 246: memcpy_avx_128(dd - 246, ss - 246);
-	case 118: memcpy_avx_64(dd - 118, ss - 118); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 247: memcpy_avx_128(dd - 247, ss - 247);
-	case 119: memcpy_avx_64(dd - 119, ss - 119); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 248: memcpy_avx_128(dd - 248, ss - 248);
-	case 120: memcpy_avx_64(dd - 120, ss - 120); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 249: memcpy_avx_128(dd - 249, ss - 249);
-	case 121: memcpy_avx_64(dd - 121, ss - 121); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 250: memcpy_avx_128(dd - 250, ss - 250);
-	case 122: memcpy_avx_64(dd - 122, ss - 122); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 251: memcpy_avx_128(dd - 251, ss - 251);
-	case 123: memcpy_avx_64(dd - 123, ss - 123); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 252: memcpy_avx_128(dd - 252, ss - 252);
-	case 124: memcpy_avx_64(dd - 124, ss - 124); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 253: memcpy_avx_128(dd - 253, ss - 253);
-	case 125: memcpy_avx_64(dd - 125, ss - 125); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 254: memcpy_avx_128(dd - 254, ss - 254);
-	case 126: memcpy_avx_64(dd - 126, ss - 126); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 255: memcpy_avx_128(dd - 255, ss - 255);
-	case 127: memcpy_avx_64(dd - 127, ss - 127); memcpy_avx_64(dd - 64, ss - 64); break;
-	case 256: memcpy_avx_256(dd - 256, ss - 256); break;
-	}
-
-	return dst;
-}
-
-
-//---------------------------------------------------------------------
-// main routine
-//---------------------------------------------------------------------
-static void* memcpy_fast(void *destination, const void *source, size_t size)
-{
-	unsigned char *dst = (unsigned char*)destination;
-	const unsigned char *src = (const unsigned char*)source;
-	static size_t cachesize = 0x200000; // L3-cache size
-	size_t padding;
-
-	// small memory copy
-	if (size <= 256) {
-		memcpy_tiny(dst, src, size);
-		_mm256_zeroupper();
-		return destination;
-	}
-
-	// align destination to 16 bytes boundary
-	padding = (32 - (((size_t)dst) & 31)) & 31;
-
-#if 0
-	if (padding > 0) {
-		__m256i head = _mm256_loadu_si256((const __m256i*)src);
-		_mm256_storeu_si256((__m256i*)dst, head);
-		dst += padding;
-		src += padding;
-		size -= padding;
-	}
-#else
-	__m256i head = _mm256_loadu_si256((const __m256i*)src);
-	_mm256_storeu_si256((__m256i*)dst, head);
-	dst += padding;
-	src += padding;
-	size -= padding;
-#endif
-
-	// medium size copy
-	if (size <= cachesize) {
-		__m256i c0, c1, c2, c3, c4, c5, c6, c7;
-
-		for (; size >= 256; size -= 256) {
-			c0 = _mm256_loadu_si256(((const __m256i*)src) + 0);
-			c1 = _mm256_loadu_si256(((const __m256i*)src) + 1);
-			c2 = _mm256_loadu_si256(((const __m256i*)src) + 2);
-			c3 = _mm256_loadu_si256(((const __m256i*)src) + 3);
-			c4 = _mm256_loadu_si256(((const __m256i*)src) + 4);
-			c5 = _mm256_loadu_si256(((const __m256i*)src) + 5);
-			c6 = _mm256_loadu_si256(((const __m256i*)src) + 6);
-			c7 = _mm256_loadu_si256(((const __m256i*)src) + 7);
-			_mm_prefetch((const char*)(src + 512), _MM_HINT_NTA);
-			src += 256;
-			_mm256_storeu_si256((((__m256i*)dst) + 0), c0);
-			_mm256_storeu_si256((((__m256i*)dst) + 1), c1);
-			_mm256_storeu_si256((((__m256i*)dst) + 2), c2);
-			_mm256_storeu_si256((((__m256i*)dst) + 3), c3);
-			_mm256_storeu_si256((((__m256i*)dst) + 4), c4);
-			_mm256_storeu_si256((((__m256i*)dst) + 5), c5);
-			_mm256_storeu_si256((((__m256i*)dst) + 6), c6);
-			_mm256_storeu_si256((((__m256i*)dst) + 7), c7);
-			dst += 256;
-		}
-	}
-	else {		// big memory copy
-		__m256i c0, c1, c2, c3, c4, c5, c6, c7;
-		/* __m256i c0, c1, c2, c3, c4, c5, c6, c7; */
-
-		_mm_prefetch((const char*)(src), _MM_HINT_NTA);
-
-		if ((((size_t)src) & 31) == 0) {	// source aligned
-			for (; size >= 256; size -= 256) {
-				c0 = _mm256_load_si256(((const __m256i*)src) + 0);
-				c1 = _mm256_load_si256(((const __m256i*)src) + 1);
-				c2 = _mm256_load_si256(((const __m256i*)src) + 2);
-				c3 = _mm256_load_si256(((const __m256i*)src) + 3);
-				c4 = _mm256_load_si256(((const __m256i*)src) + 4);
-				c5 = _mm256_load_si256(((const __m256i*)src) + 5);
-				c6 = _mm256_load_si256(((const __m256i*)src) + 6);
-				c7 = _mm256_load_si256(((const __m256i*)src) + 7);
-				_mm_prefetch((const char*)(src + 512), _MM_HINT_NTA);
-				src += 256;
-				_mm256_stream_si256((((__m256i*)dst) + 0), c0);
-				_mm256_stream_si256((((__m256i*)dst) + 1), c1);
-				_mm256_stream_si256((((__m256i*)dst) + 2), c2);
-				_mm256_stream_si256((((__m256i*)dst) + 3), c3);
-				_mm256_stream_si256((((__m256i*)dst) + 4), c4);
-				_mm256_stream_si256((((__m256i*)dst) + 5), c5);
-				_mm256_stream_si256((((__m256i*)dst) + 6), c6);
-				_mm256_stream_si256((((__m256i*)dst) + 7), c7);
-				dst += 256;
-			}
-		}
-		else {							// source unaligned
-			for (; size >= 256; size -= 256) {
-				c0 = _mm256_loadu_si256(((const __m256i*)src) + 0);
-				c1 = _mm256_loadu_si256(((const __m256i*)src) + 1);
-				c2 = _mm256_loadu_si256(((const __m256i*)src) + 2);
-				c3 = _mm256_loadu_si256(((const __m256i*)src) + 3);
-				c4 = _mm256_loadu_si256(((const __m256i*)src) + 4);
-				c5 = _mm256_loadu_si256(((const __m256i*)src) + 5);
-				c6 = _mm256_loadu_si256(((const __m256i*)src) + 6);
-				c7 = _mm256_loadu_si256(((const __m256i*)src) + 7);
-				_mm_prefetch((const char*)(src + 512), _MM_HINT_NTA);
-				src += 256;
-				_mm256_stream_si256((((__m256i*)dst) + 0), c0);
-				_mm256_stream_si256((((__m256i*)dst) + 1), c1);
-				_mm256_stream_si256((((__m256i*)dst) + 2), c2);
-				_mm256_stream_si256((((__m256i*)dst) + 3), c3);
-				_mm256_stream_si256((((__m256i*)dst) + 4), c4);
-				_mm256_stream_si256((((__m256i*)dst) + 5), c5);
-				_mm256_stream_si256((((__m256i*)dst) + 6), c6);
-				_mm256_stream_si256((((__m256i*)dst) + 7), c7);
-				dst += 256;
-			}
-		}
-		_mm_sfence();
-	}
-
-	memcpy_tiny(dst, src, size);
-	_mm256_zeroupper();
-
-	return destination;
-}
-
-
-#endif
-
-
-
--- a/contrib/FastMemcpy/LICENSE
+++ b/contrib/FastMemcpy/LICENSE
@ -1,22 +0,0 @@
-The MIT License (MIT)
-
-Copyright (c) 2015 Linwei
-
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
-
--- a/contrib/FastMemcpy/README.md
+++ b/contrib/FastMemcpy/README.md
@ -1,20 +0,0 @@
-Internal implementation of `memcpy` function.
-
-It has the following advantages over `libc`-supplied implementation:
- it is linked statically, so the function is called directly, not through a `PLT` (procedure lookup table of shared library);
- it is linked statically, so the function can have position-dependent code;
- your binaries will not depend on `glibc`'s memcpy, that forces dependency on specific symbol version like `memcpy@@GLIBC_2.14` and consequently on specific version of `glibc` library;
- you can include `memcpy.h` directly and the function has the chance to be inlined, which is beneficial for small but unknown at compile time sizes of memory regions;
- this version of `memcpy` pretend to be faster (in our benchmarks, the difference is within few percents).
-
-Currently it uses the implementation from **Linwei** (skywind3000@163.com).
-Look at https://www.zhihu.com/question/35172305 for discussion.
-
-Drawbacks:
- only use SSE 2, doesn't use wider (AVX, AVX 512) vector registers when available;
- no CPU dispatching; doesn't take into account actual cache size.
-
-Also worth to look at:
- simple implementation from Facebook: https://github.com/facebook/folly/blob/master/folly/memcpy.S
- implementation from Agner Fog: http://www.agner.org/optimize/
- glibc source code.
--- a/contrib/FastMemcpy/memcpy_wrapper.c
+++ b/contrib/FastMemcpy/memcpy_wrapper.c
@ -1,6 +0,0 @@
-#include "FastMemcpy.h"
-
-void * memcpy(void * __restrict destination, const void * __restrict source, size_t size)
-{
-    return memcpy_fast(destination, source, size);
-}
--- a/contrib/NuRaft
+++ b/contrib/NuRaft
@ -1 +1 @@
-Subproject commit 9a0d78de4b90546368d954b6434f0e9a823e8d80
+Subproject commit 3d3683e77753cfe015a05fae95ddf418e19f59e1
--- a/contrib/abseil-cpp-cmake/CMakeLists.txt
+++ b/contrib/abseil-cpp-cmake/CMakeLists.txt
@ -0,0 +1,18 @@
+set(ABSL_ROOT_DIR "${ClickHouse_SOURCE_DIR}/contrib/abseil-cpp")
+if(NOT EXISTS "${ABSL_ROOT_DIR}/CMakeLists.txt")
+  message(FATAL_ERROR " submodule third_party/abseil-cpp is missing. To fix try run: \n git submodule update --init --recursive")
+endif()
+add_subdirectory("${ABSL_ROOT_DIR}" "${ClickHouse_BINARY_DIR}/contrib/abseil-cpp")
+
+add_library(abseil_swiss_tables INTERFACE)
+
+target_link_libraries(abseil_swiss_tables INTERFACE
+  absl::flat_hash_map
+  absl::flat_hash_set
+)
+
+get_target_property(FLAT_HASH_MAP_INCLUDE_DIR absl::flat_hash_map INTERFACE_INCLUDE_DIRECTORIES)
+target_include_directories (abseil_swiss_tables SYSTEM BEFORE INTERFACE ${FLAT_HASH_MAP_INCLUDE_DIR})
+
+get_target_property(FLAT_HASH_SET_INCLUDE_DIR absl::flat_hash_set INTERFACE_INCLUDE_DIRECTORIES)
+target_include_directories (abseil_swiss_tables SYSTEM BEFORE INTERFACE ${FLAT_HASH_SET_INCLUDE_DIR})
--- a/contrib/boringssl
+++ b/contrib/boringssl
@ -1 +1 @@
-Subproject commit 8b2bf912ba04823cfe9e7e8f5bb60cb7f6252449
+Subproject commit fd9ce1a0406f571507068b9555d0b545b8a18332
--- a/contrib/cassandra
+++ b/contrib/cassandra
@ -1 +1 @@
-Subproject commit b446d7eb68e6962f431e2b3771313bfe9a2bbd93
+Subproject commit c097fb5c7e63cc430016d9a8b240d8e63fbefa52
--- a/contrib/googletest
+++ b/contrib/googletest
@ -1 +1 @@
-Subproject commit 356f2d264a485db2fcc50ec1c672e0d37b6cb39b
+Subproject commit e7e591764baba0a0c3c9ad0014430e7a27331d16
--- a/contrib/grpc-cmake/CMakeLists.txt
+++ b/contrib/grpc-cmake/CMakeLists.txt
@ -39,11 +39,6 @@ set(_gRPC_SSL_LIBRARIES ${OPENSSL_LIBRARIES})

 # Use abseil-cpp from ClickHouse contrib, not from gRPC third_party.
 set(gRPC_ABSL_PROVIDER "clickhouse" CACHE STRING "" FORCE)
-set(ABSL_ROOT_DIR "${ClickHouse_SOURCE_DIR}/contrib/abseil-cpp")
-if(NOT EXISTS "${ABSL_ROOT_DIR}/CMakeLists.txt")
-  message(FATAL_ERROR " grpc: submodule third_party/abseil-cpp is missing. To fix try run: \n git submodule update --init --recursive")
-endif()
-add_subdirectory("${ABSL_ROOT_DIR}" "${ClickHouse_BINARY_DIR}/contrib/abseil-cpp")

 # Choose to build static or shared library for c-ares.
 if (MAKE_STATIC_LIBRARIES)
--- a/contrib/krb5-cmake/CMakeLists.txt
+++ b/contrib/krb5-cmake/CMakeLists.txt
@ -474,13 +474,6 @@ add_custom_command(
    WORKING_DIRECTORY "${KRB5_SOURCE_DIR}/util/et"
 )

-add_custom_target(
-    CREATE_COMPILE_ET ALL
-    DEPENDS ${KRB5_SOURCE_DIR}/util/et/compile_et
-    COMMENT "creating compile_et"
-    VERBATIM
-)
-
 file(GLOB_RECURSE ET_FILES
    "${KRB5_SOURCE_DIR}/*.et"
 )
@ -531,7 +524,7 @@ add_custom_command(


 add_custom_target(
-    ERROR_MAP_H ALL
+    ERROR_MAP_H
    DEPENDS ${KRB5_SOURCE_DIR}/lib/gssapi/krb5/error_map.h
    COMMENT "generating error_map.h"
    VERBATIM
@ -544,14 +537,14 @@ add_custom_command(
 )

 add_custom_target(
-    ERRMAP_H ALL
+    ERRMAP_H
    DEPENDS ${KRB5_SOURCE_DIR}/lib/gssapi/generic/errmap.h
    COMMENT "generating errmap.h"
    VERBATIM
 )

 add_custom_target(
-    KRB_5_H ALL
+    KRB_5_H
    DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/include/krb5/krb5.h
    COMMENT "generating krb5.h"
    VERBATIM
@ -564,15 +557,19 @@ add_dependencies(
    ERRMAP_H
    ERROR_MAP_H
    KRB_5_H
-    )
+)

 preprocess_et(processed_et_files ${ET_FILES})

-add_custom_command(
-    OUTPUT ${KRB5_SOURCE_DIR}/lib/gssapi/generic/errmap.h
-    COMMAND perl -w -I../../../util  ../../../util/gen.pl bimap errmap.h NAME=mecherrmap LEFT=OM_uint32 RIGHT=struct\ mecherror LEFTPRINT=print_OM_uint32 RIGHTPRINT=mecherror_print LEFTCMP=cmp_OM_uint32 RIGHTCMP=mecherror_cmp
-    WORKING_DIRECTORY "${KRB5_SOURCE_DIR}/lib/gssapi/generic"
-)
+if(CMAKE_SYSTEM_NAME MATCHES "Darwin")
+    add_custom_command(
+        OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/include_private/kcmrpc.h ${CMAKE_CURRENT_BINARY_DIR}/include_private/kcmrpc.c
+        COMMAND mig -header kcmrpc.h -user kcmrpc.c -sheader /dev/null -server /dev/null -I${KRB5_SOURCE_DIR}/lib/krb5/ccache ${KRB5_SOURCE_DIR}/lib/krb5/ccache/kcmrpc.defs
+        WORKING_DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/include_private"
+    )
+
+    list(APPEND ALL_SRCS ${CMAKE_CURRENT_BINARY_DIR}/include_private/kcmrpc.c)
+endif()

 target_sources(${KRB5_LIBRARY} PRIVATE
    ${ALL_SRCS}
@ -604,6 +601,25 @@ file(COPY ${KRB5_SOURCE_DIR}/util/et/com_err.h
    DESTINATION ${CMAKE_CURRENT_BINARY_DIR}/include/
 )

+file(COPY ${CMAKE_CURRENT_SOURCE_DIR}/osconf.h
+    DESTINATION ${CMAKE_CURRENT_BINARY_DIR}/include_private/
+)
+
+file(COPY ${CMAKE_CURRENT_SOURCE_DIR}/profile.h
+    DESTINATION ${CMAKE_CURRENT_BINARY_DIR}/include_private/
+)
+
+string(TOLOWER "${CMAKE_SYSTEM_NAME}" _system_name)
+
+file(COPY ${CMAKE_CURRENT_SOURCE_DIR}/autoconf_${_system_name}.h
+    DESTINATION ${CMAKE_CURRENT_BINARY_DIR}/include_private/
+)
+
+file(RENAME
+    ${CMAKE_CURRENT_BINARY_DIR}/include_private/autoconf_${_system_name}.h
+    ${CMAKE_CURRENT_BINARY_DIR}/include_private/autoconf.h
+)
+
 file(MAKE_DIRECTORY
    ${CMAKE_CURRENT_BINARY_DIR}/include/krb5
 )
@ -633,7 +649,7 @@ target_include_directories(${KRB5_LIBRARY} PUBLIC
 )

 target_include_directories(${KRB5_LIBRARY} PRIVATE
-    ${CMAKE_CURRENT_SOURCE_DIR}  #for autoconf.h
+    ${CMAKE_CURRENT_BINARY_DIR}/include_private # For autoconf.h and other generated headers.
    ${KRB5_SOURCE_DIR}
    ${KRB5_SOURCE_DIR}/include
    ${KRB5_SOURCE_DIR}/lib/gssapi/mechglue
--- a/contrib/krb5-cmake/autoconf_darwin.h
+++ b/contrib/krb5-cmake/autoconf_darwin.h
@ -0,0 +1,764 @@
+/* include/autoconf.h.  Generated from autoconf.h.in by configure.  */
+/* include/autoconf.h.in.  Generated from configure.in by autoheader.  */
+
+
+#ifndef KRB5_AUTOCONF_H
+#define KRB5_AUTOCONF_H
+
+
+/* Define if AES-NI support is enabled */
+/* #undef AESNI */
+
+/* Define if socket can't be bound to 0.0.0.0 */
+/* #undef BROKEN_STREAMS_SOCKETS */
+
+/* Define if va_list objects can be simply copied by assignment. */
+/* #undef CAN_COPY_VA_LIST */
+
+/* Define to reduce code size even if it means more cpu usage */
+/* #undef CONFIG_SMALL */
+
+/* Define if __attribute__((constructor)) works */
+#define CONSTRUCTOR_ATTR_WORKS 1
+
+/* Define to default ccache name */
+#define DEFCCNAME "FILE:/tmp/krb5cc_%{uid}"
+
+/* Define to default client keytab name */
+#define DEFCKTNAME "FILE:/etc/krb5/user/%{euid}/client.keytab"
+
+/* Define to default keytab name */
+#define DEFKTNAME "FILE:/etc/krb5.keytab"
+
+/* Define if library initialization should be delayed until first use */
+#define DELAY_INITIALIZER 1
+
+/* Define if __attribute__((destructor)) works */
+#define DESTRUCTOR_ATTR_WORKS 1
+
+/* Define to disable PKINIT plugin support */
+#define DISABLE_PKINIT 1
+
+/* Define if LDAP KDB support within the Kerberos library (mainly ASN.1 code)
+   should be enabled. */
+/* #undef ENABLE_LDAP */
+
+/* Define if translation functions should be used. */
+/* #undef ENABLE_NLS */
+
+/* Define if thread support enabled */
+#define ENABLE_THREADS 1
+
+/* Define as return type of endrpcent */
+#define ENDRPCENT_TYPE void
+
+/* Define if Fortuna PRNG is selected */
+#define FORTUNA 1
+
+/* Define to the type of elements in the array set by `getgroups'. Usually
+   this is either `int' or `gid_t'. */
+#define GETGROUPS_T gid_t
+
+/* Define if gethostbyname_r returns int rather than struct hostent * */
+/* #undef GETHOSTBYNAME_R_RETURNS_INT */
+
+/* Type of getpeername second argument. */
+#define GETPEERNAME_ARG3_TYPE GETSOCKNAME_ARG3_TYPE
+
+/* Define if getpwnam_r exists but takes only 4 arguments (e.g., POSIX draft 6
+   implementations like some Solaris releases). */
+/* #undef GETPWNAM_R_4_ARGS */
+
+/* Define if getpwnam_r returns an int */
+#define GETPWNAM_R_RETURNS_INT 1
+
+/* Define if getpwuid_r exists but takes only 4 arguments (e.g., POSIX draft 6
+   implementations like some Solaris releases). */
+/* #undef GETPWUID_R_4_ARGS */
+
+/* Define if getservbyname_r returns int rather than struct servent * */
+/* #undef GETSERVBYNAME_R_RETURNS_INT */
+
+/* Type of pointer target for argument 3 to getsockname */
+#define GETSOCKNAME_ARG3_TYPE socklen_t
+
+/* Define if gmtime_r returns int instead of struct tm pointer, as on old
+   HP-UX systems. */
+/* #undef GMTIME_R_RETURNS_INT */
+
+/* Define if va_copy macro or function is available. */
+#define HAS_VA_COPY 1
+
+/* Define to 1 if you have the `access' function. */
+#define HAVE_ACCESS 1
+
+/* Define to 1 if you have the <alloca.h> header file. */
+#define HAVE_ALLOCA_H 1
+
+/* Define to 1 if you have the <arpa/inet.h> header file. */
+#define HAVE_ARPA_INET_H 1
+
+/* Define to 1 if you have the `bswap16' function. */
+/* #undef HAVE_BSWAP16 */
+
+/* Define to 1 if you have the `bswap64' function. */
+/* #undef HAVE_BSWAP64 */
+
+/* Define to 1 if bswap_16 is available via byteswap.h */
+/* #undef HAVE_BSWAP_16 */
+
+/* Define to 1 if bswap_64 is available via byteswap.h */
+/* #undef HAVE_BSWAP_64 */
+
+/* Define if bt_rseq is available, for recursive btree traversal. */
+#define HAVE_BT_RSEQ 1
+
+/* Define to 1 if you have the <byteswap.h> header file. */
+/* #undef HAVE_BYTESWAP_H */
+
+/* Define to 1 if you have the `chmod' function. */
+#define HAVE_CHMOD 1
+
+/* Define if cmocka library is available. */
+/* #undef HAVE_CMOCKA */
+
+/* Define to 1 if you have the `compile' function. */
+/* #undef HAVE_COMPILE */
+
+/* Define if com_err has compatible gettext support */
+#define HAVE_COM_ERR_INTL 1
+
+/* Define to 1 if you have the <cpuid.h> header file. */
+/* #undef HAVE_CPUID_H */
+
+/* Define to 1 if you have the `daemon' function. */
+#define HAVE_DAEMON 1
+
+/* Define to 1 if you have the declaration of `strerror_r', and to 0 if you
+   don't. */
+#define HAVE_DECL_STRERROR_R 1
+
+/* Define to 1 if you have the <dirent.h> header file, and it defines `DIR'.
+   */
+#define HAVE_DIRENT_H 1
+
+/* Define to 1 if you have the <dlfcn.h> header file. */
+#define HAVE_DLFCN_H 1
+
+/* Define to 1 if you have the `dn_skipname' function. */
+#define HAVE_DN_SKIPNAME 1
+
+/* Define to 1 if you have the <endian.h> header file. */
+/* #undef HAVE_ENDIAN_H */
+
+/* Define to 1 if you have the <errno.h> header file. */
+#define HAVE_ERRNO_H 1
+
+/* Define to 1 if you have the `fchmod' function. */
+#define HAVE_FCHMOD 1
+
+/* Define to 1 if you have the <fcntl.h> header file. */
+#define HAVE_FCNTL_H 1
+
+/* Define to 1 if you have the `flock' function. */
+#define HAVE_FLOCK 1
+
+/* Define to 1 if you have the `fnmatch' function. */
+#define HAVE_FNMATCH 1
+
+/* Define to 1 if you have the <fnmatch.h> header file. */
+#define HAVE_FNMATCH_H 1
+
+/* Define if you have the getaddrinfo function */
+#define HAVE_GETADDRINFO 1
+
+/* Define to 1 if you have the `getcwd' function. */
+#define HAVE_GETCWD 1
+
+/* Define to 1 if you have the `getenv' function. */
+#define HAVE_GETENV 1
+
+/* Define to 1 if you have the `geteuid' function. */
+#define HAVE_GETEUID 1
+
+/* Define if gethostbyname_r exists and its return type is known */
+/* #undef HAVE_GETHOSTBYNAME_R */
+
+/* Define to 1 if you have the `getnameinfo' function. */
+#define HAVE_GETNAMEINFO 1
+
+/* Define if system getopt should be used. */
+#define HAVE_GETOPT 1
+
+/* Define if system getopt_long should be used. */
+#define HAVE_GETOPT_LONG 1
+
+/* Define if getpwnam_r is available and useful. */
+#define HAVE_GETPWNAM_R 1
+
+/* Define if getpwuid_r is available and useful. */
+#define HAVE_GETPWUID_R 1
+
+/* Define if getservbyname_r exists and its return type is known */
+/* #undef HAVE_GETSERVBYNAME_R */
+
+/* Have the gettimeofday function */
+#define HAVE_GETTIMEOFDAY 1
+
+/* Define to 1 if you have the `getusershell' function. */
+#define HAVE_GETUSERSHELL 1
+
+/* Define to 1 if you have the `gmtime_r' function. */
+#define HAVE_GMTIME_R 1
+
+/* Define to 1 if you have the <ifaddrs.h> header file. */
+#define HAVE_IFADDRS_H 1
+
+/* Define to 1 if you have the `inet_ntop' function. */
+#define HAVE_INET_NTOP 1
+
+/* Define to 1 if you have the `inet_pton' function. */
+#define HAVE_INET_PTON 1
+
+/* Define to 1 if the system has the type `int16_t'. */
+#define HAVE_INT16_T 1
+
+/* Define to 1 if the system has the type `int32_t'. */
+#define HAVE_INT32_T 1
+
+/* Define to 1 if the system has the type `int8_t'. */
+#define HAVE_INT8_T 1
+
+/* Define to 1 if you have the <inttypes.h> header file. */
+#define HAVE_INTTYPES_H 1
+
+/* Define to 1 if you have the <keyutils.h> header file. */
+/* #undef HAVE_KEYUTILS_H */
+
+/* Define to 1 if you have the <lber.h> header file. */
+/* #undef HAVE_LBER_H */
+
+/* Define to 1 if you have the <ldap.h> header file. */
+/* #undef HAVE_LDAP_H */
+
+/* Define to 1 if you have the `crypto' library (-lcrypto). */
+#define HAVE_LIBCRYPTO 1
+
+/* Define if building with libedit. */
+/* #undef HAVE_LIBEDIT */
+
+/* Define to 1 if you have the `nsl' library (-lnsl). */
+/* #undef HAVE_LIBNSL */
+
+/* Define to 1 if you have the `resolv' library (-lresolv). */
+#define HAVE_LIBRESOLV 1
+
+/* Define to 1 if you have the `socket' library (-lsocket). */
+/* #undef HAVE_LIBSOCKET */
+
+/* Define if the util library is available */
+#define HAVE_LIBUTIL 1
+
+/* Define to 1 if you have the <limits.h> header file. */
+#define HAVE_LIMITS_H 1
+
+/* Define to 1 if you have the `localtime_r' function. */
+#define HAVE_LOCALTIME_R 1
+
+/* Define to 1 if you have the <machine/byte_order.h> header file. */
+#define HAVE_MACHINE_BYTE_ORDER_H 1
+
+/* Define to 1 if you have the <machine/endian.h> header file. */
+#define HAVE_MACHINE_ENDIAN_H 1
+
+/* Define to 1 if you have the <memory.h> header file. */
+#define HAVE_MEMORY_H 1
+
+/* Define to 1 if you have the `mkstemp' function. */
+#define HAVE_MKSTEMP 1
+
+/* Define to 1 if you have the <ndir.h> header file, and it defines `DIR'. */
+/* #undef HAVE_NDIR_H */
+
+/* Define to 1 if you have the <netdb.h> header file. */
+#define HAVE_NETDB_H 1
+
+/* Define if netdb.h declares h_errno */
+#define HAVE_NETDB_H_H_ERRNO 1
+
+/* Define to 1 if you have the <netinet/in.h> header file. */
+#define HAVE_NETINET_IN_H 1
+
+/* Define to 1 if you have the `ns_initparse' function. */
+#define HAVE_NS_INITPARSE 1
+
+/* Define to 1 if you have the `ns_name_uncompress' function. */
+#define HAVE_NS_NAME_UNCOMPRESS 1
+
+/* Define if OpenSSL supports cms. */
+#define HAVE_OPENSSL_CMS 1
+
+/* Define to 1 if you have the <paths.h> header file. */
+#define HAVE_PATHS_H 1
+
+/* Define if persistent keyrings are supported */
+/* #undef HAVE_PERSISTENT_KEYRING */
+
+/* Define to 1 if you have the <poll.h> header file. */
+#define HAVE_POLL_H 1
+
+/* Define if #pragma weak references work */
+/* #undef HAVE_PRAGMA_WEAK_REF */
+
+/* Define if you have POSIX threads libraries and header files. */
+#define HAVE_PTHREAD 1
+
+/* Define to 1 if you have the `pthread_once' function. */
+#define HAVE_PTHREAD_ONCE 1
+
+/* Have PTHREAD_PRIO_INHERIT. */
+#define HAVE_PTHREAD_PRIO_INHERIT 1
+
+/* Define to 1 if you have the `pthread_rwlock_init' function. */
+#define HAVE_PTHREAD_RWLOCK_INIT 1
+
+/* Define if pthread_rwlock_init is provided in the thread library. */
+#define HAVE_PTHREAD_RWLOCK_INIT_IN_THREAD_LIB 1
+
+/* Define to 1 if you have the <pwd.h> header file. */
+#define HAVE_PWD_H 1
+
+/* Define if building with GNU Readline. */
+/* #undef HAVE_READLINE */
+
+/* Define if regcomp exists and functions */
+#define HAVE_REGCOMP 1
+
+/* Define to 1 if you have the `regexec' function. */
+#define HAVE_REGEXEC 1
+
+/* Define to 1 if you have the <regexpr.h> header file. */
+/* #undef HAVE_REGEXPR_H */
+
+/* Define to 1 if you have the <regex.h> header file. */
+#define HAVE_REGEX_H 1
+
+/* Define to 1 if you have the `res_nclose' function. */
+#define HAVE_RES_NCLOSE 1
+
+/* Define to 1 if you have the `res_ndestroy' function. */
+#define HAVE_RES_NDESTROY 1
+
+/* Define to 1 if you have the `res_ninit' function. */
+#define HAVE_RES_NINIT 1
+
+/* Define to 1 if you have the `res_nsearch' function. */
+#define HAVE_RES_NSEARCH 1
+
+/* Define to 1 if you have the `res_search' function */
+#define HAVE_RES_SEARCH 1
+
+/* Define to 1 if you have the `re_comp' function. */
+/* #undef HAVE_RE_COMP */
+
+/* Define to 1 if you have the `re_exec' function. */
+/* #undef HAVE_RE_EXEC */
+
+/* Define to 1 if you have the <sasl/sasl.h> header file. */
+/* #undef HAVE_SASL_SASL_H */
+
+/* Define if struct sockaddr contains sa_len */
+#define HAVE_SA_LEN 1
+
+/* Define to 1 if you have the `setegid' function. */
+#define HAVE_SETEGID 1
+
+/* Define to 1 if you have the `setenv' function. */
+#define HAVE_SETENV 1
+
+/* Define to 1 if you have the `seteuid' function. */
+#define HAVE_SETEUID 1
+
+/* Define if setluid provided in OSF/1 security library */
+/* #undef HAVE_SETLUID */
+
+/* Define to 1 if you have the `setregid' function. */
+#define HAVE_SETREGID 1
+
+/* Define to 1 if you have the `setresgid' function. */
+/* #undef HAVE_SETRESGID */
+
+/* Define to 1 if you have the `setresuid' function. */
+/* #undef HAVE_SETRESUID */
+
+/* Define to 1 if you have the `setreuid' function. */
+#define HAVE_SETREUID 1
+
+/* Define to 1 if you have the `setsid' function. */
+#define HAVE_SETSID 1
+
+/* Define to 1 if you have the `setvbuf' function. */
+#define HAVE_SETVBUF 1
+
+/* Define if there is a socklen_t type. If not, probably use size_t */
+#define HAVE_SOCKLEN_T 1
+
+/* Define to 1 if you have the `srand' function. */
+#define HAVE_SRAND 1
+
+/* Define to 1 if you have the `srand48' function. */
+#define HAVE_SRAND48 1
+
+/* Define to 1 if you have the `srandom' function. */
+#define HAVE_SRANDOM 1
+
+/* Define to 1 if the system has the type `ssize_t'. */
+#define HAVE_SSIZE_T 1
+
+/* Define to 1 if you have the `stat' function. */
+#define HAVE_STAT 1
+
+/* Define to 1 if you have the <stddef.h> header file. */
+#define HAVE_STDDEF_H 1
+
+/* Define to 1 if you have the <stdint.h> header file. */
+#define HAVE_STDINT_H 1
+
+/* Define to 1 if you have the <stdlib.h> header file. */
+#define HAVE_STDLIB_H 1
+
+/* Define to 1 if you have the `step' function. */
+/* #undef HAVE_STEP */
+
+/* Define to 1 if you have the `strchr' function. */
+#define HAVE_STRCHR 1
+
+/* Define to 1 if you have the `strdup' function. */
+#define HAVE_STRDUP 1
+
+/* Define to 1 if you have the `strerror' function. */
+#define HAVE_STRERROR 1
+
+/* Define to 1 if you have the `strerror_r' function. */
+#define HAVE_STRERROR_R 1
+
+/* Define to 1 if you have the <strings.h> header file. */
+#define HAVE_STRINGS_H 1
+
+/* Define to 1 if you have the <string.h> header file. */
+#define HAVE_STRING_H 1
+
+/* Define to 1 if you have the `strlcpy' function. */
+#define HAVE_STRLCPY 1
+
+/* Define to 1 if you have the `strptime' function. */
+#define HAVE_STRPTIME 1
+
+/* Define to 1 if the system has the type `struct cmsghdr'. */
+#define HAVE_STRUCT_CMSGHDR 1
+
+/* Define if there is a struct if_laddrconf. */
+/* #undef HAVE_STRUCT_IF_LADDRCONF */
+
+/* Define to 1 if the system has the type `struct in6_pktinfo'. */
+#define HAVE_STRUCT_IN6_PKTINFO 1
+
+/* Define to 1 if the system has the type `struct in_pktinfo'. */
+#define HAVE_STRUCT_IN_PKTINFO 1
+
+/* Define if there is a struct lifconf. */
+/* #undef HAVE_STRUCT_LIFCONF */
+
+/* Define to 1 if the system has the type `struct rt_msghdr'. */
+#define HAVE_STRUCT_RT_MSGHDR 1
+
+/* Define to 1 if the system has the type `struct sockaddr_storage'. */
+#define HAVE_STRUCT_SOCKADDR_STORAGE 1
+
+/* Define to 1 if `st_mtimensec' is a member of `struct stat'. */
+/* #undef HAVE_STRUCT_STAT_ST_MTIMENSEC */
+
+/* Define to 1 if `st_mtimespec.tv_nsec' is a member of `struct stat'. */
+#define HAVE_STRUCT_STAT_ST_MTIMESPEC_TV_NSEC 1
+
+/* Define to 1 if `st_mtim.tv_nsec' is a member of `struct stat'. */
+/* #undef HAVE_STRUCT_STAT_ST_MTIM_TV_NSEC */
+
+/* Define to 1 if you have the <sys/bswap.h> header file. */
+/* #undef HAVE_SYS_BSWAP_H */
+
+/* Define to 1 if you have the <sys/dir.h> header file, and it defines `DIR'.
+   */
+/* #undef HAVE_SYS_DIR_H */
+
+/* Define if sys_errlist in libc */
+#define HAVE_SYS_ERRLIST 1
+
+/* Define to 1 if you have the <sys/file.h> header file. */
+#define HAVE_SYS_FILE_H 1
+
+/* Define to 1 if you have the <sys/filio.h> header file. */
+#define HAVE_SYS_FILIO_H 1
+
+/* Define to 1 if you have the <sys/ndir.h> header file, and it defines `DIR'.
+   */
+/* #undef HAVE_SYS_NDIR_H */
+
+/* Define to 1 if you have the <sys/param.h> header file. */
+#define HAVE_SYS_PARAM_H 1
+
+/* Define to 1 if you have the <sys/select.h> header file. */
+#define HAVE_SYS_SELECT_H 1
+
+/* Define to 1 if you have the <sys/socket.h> header file. */
+#define HAVE_SYS_SOCKET_H 1
+
+/* Define to 1 if you have the <sys/sockio.h> header file. */
+#define HAVE_SYS_SOCKIO_H 1
+
+/* Define to 1 if you have the <sys/stat.h> header file. */
+#define HAVE_SYS_STAT_H 1
+
+/* Define to 1 if you have the <sys/time.h> header file. */
+#define HAVE_SYS_TIME_H 1
+
+/* Define to 1 if you have the <sys/types.h> header file. */
+#define HAVE_SYS_TYPES_H 1
+
+/* Define to 1 if you have the <sys/uio.h> header file. */
+#define HAVE_SYS_UIO_H 1
+
+/* Define if tcl.h found */
+/* #undef HAVE_TCL_H */
+
+/* Define if tcl/tcl.h found */
+/* #undef HAVE_TCL_TCL_H */
+
+/* Define to 1 if you have the `timegm' function. */
+#define HAVE_TIMEGM 1
+
+/* Define to 1 if you have the <time.h> header file. */
+#define HAVE_TIME_H 1
+
+/* Define to 1 if you have the <unistd.h> header file. */
+#define HAVE_UNISTD_H 1
+
+/* Define to 1 if you have the `unsetenv' function. */
+#define HAVE_UNSETENV 1
+
+/* Define to 1 if the system has the type `u_char'. */
+#define HAVE_U_CHAR 1
+
+/* Define to 1 if the system has the type `u_int'. */
+#define HAVE_U_INT 1
+
+/* Define to 1 if the system has the type `u_int16_t'. */
+#define HAVE_U_INT16_T 1
+
+/* Define to 1 if the system has the type `u_int32_t'. */
+#define HAVE_U_INT32_T 1
+
+/* Define to 1 if the system has the type `u_int8_t'. */
+#define HAVE_U_INT8_T 1
+
+/* Define to 1 if the system has the type `u_long'. */
+#define HAVE_U_LONG 1
+
+/* Define to 1 if you have the `vasprintf' function. */
+#define HAVE_VASPRINTF 1
+
+/* Define to 1 if you have the `vsnprintf' function. */
+#define HAVE_VSNPRINTF 1
+
+/* Define to 1 if you have the `vsprintf' function. */
+#define HAVE_VSPRINTF 1
+
+/* Define to 1 if the system has the type `__int128_t'. */
+#define HAVE___INT128_T 1
+
+/* Define to 1 if the system has the type `__uint128_t'. */
+#define HAVE___UINT128_T 1
+
+/* Define if errno.h declares perror */
+/* #undef HDR_HAS_PERROR */
+
+/* May need to be defined to enable IPv6 support, for example on IRIX */
+/* #undef INET6 */
+
+/* Define if MIT Project Athena default configuration should be used */
+/* #undef KRB5_ATHENA_COMPAT */
+
+/* Define for DNS support of locating realms and KDCs */
+#undef KRB5_DNS_LOOKUP
+
+/* Define to enable DNS lookups of Kerberos realm names */
+/* #undef KRB5_DNS_LOOKUP_REALM */
+
+/* Define if the KDC should return only vague error codes to clients */
+/* #undef KRBCONF_VAGUE_ERRORS */
+
+/* define if the system header files are missing prototype for daemon() */
+#define NEED_DAEMON_PROTO 1
+
+/* Define if in6addr_any is not defined in libc */
+#define NEED_INSIXADDR_ANY 1
+
+/* define if the system header files are missing prototype for
+   ss_execute_command() */
+/* #undef NEED_SS_EXECUTE_COMMAND_PROTO */
+
+/* define if the system header files are missing prototype for strptime() */
+/* #undef NEED_STRPTIME_PROTO */
+
+/* define if the system header files are missing prototype for swab() */
+/* #undef NEED_SWAB_PROTO */
+
+/* Define if need to declare sys_errlist */
+/* #undef NEED_SYS_ERRLIST */
+
+/* define if the system header files are missing prototype for vasprintf() */
+/* #undef NEED_VASPRINTF_PROTO */
+
+/* Define if the KDC should use no lookaside cache */
+/* #undef NOCACHE */
+
+/* Define if references to pthread routines should be non-weak. */
+/* #undef NO_WEAK_PTHREADS */
+
+/* Define if lex produes code with yylineno */
+/* #undef NO_YYLINENO */
+
+/* Define to the address where bug reports for this package should be sent. */
+#define PACKAGE_BUGREPORT "krb5-bugs@mit.edu"
+
+/* Define to the full name of this package. */
+#define PACKAGE_NAME "Kerberos 5"
+
+/* Define to the full name and version of this package. */
+#define PACKAGE_STRING "Kerberos 5 1.17.1"
+
+/* Define to the one symbol short name of this package. */
+#define PACKAGE_TARNAME "krb5"
+
+/* Define to the home page for this package. */
+#define PACKAGE_URL ""
+
+/* Define to the version of this package. */
+#define PACKAGE_VERSION "1.17.1"
+
+/* Define if setjmp indicates POSIX interface */
+#define POSIX_SETJMP 1
+
+/* Define if POSIX signal handling is used */
+#define POSIX_SIGNALS 1
+
+/* Define if POSIX signal handlers are used */
+#define POSIX_SIGTYPE 1
+
+/* Define if termios.h exists and tcsetattr exists */
+#define POSIX_TERMIOS 1
+
+/* Define to necessary symbol if this constant uses a non-standard name on
+   your system. */
+/* #undef PTHREAD_CREATE_JOINABLE */
+
+/* Define as the return type of signal handlers (`int' or `void'). */
+#define RETSIGTYPE void
+
+/* Define as return type of setrpcent */
+#define SETRPCENT_TYPE void
+
+/* The size of `size_t', as computed by sizeof. */
+#define SIZEOF_SIZE_T 8
+
+/* The size of `time_t', as computed by sizeof. */
+#define SIZEOF_TIME_T 8
+
+/* Define to use OpenSSL for SPAKE preauth */
+#define SPAKE_OPENSSL 1
+
+/* Define for static plugin linkage */
+/* #undef STATIC_PLUGINS */
+
+/* Define to 1 if you have the ANSI C header files. */
+#define STDC_HEADERS 1
+
+/* Define to 1 if strerror_r returns char *. */
+/* #undef STRERROR_R_CHAR_P */
+
+/* Define if sys_errlist is defined in errno.h */
+#define SYS_ERRLIST_DECLARED 1
+
+/* Define to 1 if you can safely include both <sys/time.h> and <time.h>. */
+#define TIME_WITH_SYS_TIME 1
+
+/* Define if no TLS implementation is selected */
+/* #undef TLS_IMPL_NONE */
+
+/* Define if TLS implementation is OpenSSL */
+#define TLS_IMPL_OPENSSL 1
+
+/* Define if you have dirent.h functionality */
+#define USE_DIRENT_H 1
+
+/* Define if dlopen should be used */
+#define USE_DLOPEN 1
+
+/* Define if the keyring ccache should be enabled */
+/* #undef USE_KEYRING_CCACHE */
+
+/* Define if link-time options for library finalization will be used */
+/* #undef USE_LINKER_FINI_OPTION */
+
+/* Define if link-time options for library initialization will be used */
+/* #undef USE_LINKER_INIT_OPTION */
+
+/* Define if sigprocmask should be used */
+#define USE_SIGPROCMASK 1
+
+/* Define if wait takes int as a argument */
+#define WAIT_USES_INT 1
+
+/* Define to 1 if `lex' declares `yytext' as a `char *' by default, not a
+   `char[]'. */
+#define YYTEXT_POINTER 1
+
+/* Define to enable extensions in glibc */
+#define _GNU_SOURCE 1
+
+/* Define to enable C11 extensions */
+#define __STDC_WANT_LIB_EXT1__ 1
+
+/* Define to empty if `const' does not conform to ANSI C. */
+/* #undef const */
+
+/* Define to `int' if <sys/types.h> doesn't define. */
+/* #undef gid_t */
+
+/* Define to `__inline__' or `__inline' if that's what the C compiler
+   calls it, or to nothing if 'inline' is not supported under any name.  */
+#ifndef __cplusplus
+/* #undef inline */
+#endif
+
+/* Define krb5_sigtype to type of signal handler */
+#define krb5_sigtype void
+
+/* Define to `int' if <sys/types.h> does not define. */
+/* #undef mode_t */
+
+/* Define to `long int' if <sys/types.h> does not define. */
+/* #undef off_t */
+
+/* Define to `long' if <sys/types.h> does not define. */
+/* #undef time_t */
+
+/* Define to `int' if <sys/types.h> doesn't define. */
+/* #undef uid_t */
+
+
+#if defined(__GNUC__) && !defined(inline)
+/* Silence gcc pedantic warnings about ANSI C.  */
+# define inline __inline__
+#endif
+#endif /* KRB5_AUTOCONF_H */
--- a/contrib/krb5-cmake/autoconf_linux.h
+++ b/contrib/krb5-cmake/autoconf_linux.h
--- a/contrib/mariadb-connector-c
+++ b/contrib/mariadb-connector-c
@ -1 +1 @@
-Subproject commit 21f451d4d3157ffed31ec60a8b76c407190e66bd
+Subproject commit f4476ee7311b35b593750f6ae2cbdb62a4006374
--- a/contrib/poco
+++ b/contrib/poco
@ -1 +1 @@
-Subproject commit fbaaba4a02e29987b8c584747a496c79528f125f
+Subproject commit 83beecccb09eec0c9fd2669cacea03ede1d9f138
--- a/debian/changelog
+++ b/debian/changelog
@ -1,5 +1,5 @@
-clickhouse (21.3.1.1) unstable; urgency=low
+clickhouse (21.4.1.1) unstable; urgency=low

  * Modified source code

- -- clickhouse-release <clickhouse-release@yandex-team.ru>  Mon, 01 Feb 2021 12:50:53 +0300
+ -- clickhouse-release <clickhouse-release@yandex-team.ru>  Sat, 06 Mar 2021 14:43:27 +0300
--- a/docker/client/Dockerfile
+++ b/docker/client/Dockerfile
@ -1,7 +1,7 @@
 FROM ubuntu:18.04

 ARG repository="deb https://repo.clickhouse.tech/deb/stable/ main/"
-ARG version=21.3.1.*
+ARG version=21.4.1.*

 RUN apt-get update \
    && apt-get install --yes --no-install-recommends \
--- a/docker/server/Dockerfile
+++ b/docker/server/Dockerfile
@ -1,7 +1,7 @@
 FROM ubuntu:20.04

 ARG repository="deb https://repo.clickhouse.tech/deb/stable/ main/"
-ARG version=21.3.1.*
+ARG version=21.4.1.*
 ARG gosu_ver=1.10

 # user/group precreated explicitly with fixed uid/gid on purpose.
--- a/docker/test/Dockerfile
+++ b/docker/test/Dockerfile
@ -1,7 +1,7 @@
 FROM ubuntu:18.04

 ARG repository="deb https://repo.clickhouse.tech/deb/stable/ main/"
-ARG version=21.3.1.*
+ARG version=21.4.1.*

 RUN apt-get update && \
    apt-get install -y apt-transport-https dirmngr && \
--- a/docker/test/fasttest/run.sh
+++ b/docker/test/fasttest/run.sh
@ -151,6 +151,7 @@ function clone_submodules
        cd "$FASTTEST_SOURCE"

        SUBMODULES_TO_UPDATE=(
+            contrib/abseil-cpp
            contrib/antlr4-runtime
            contrib/boost
            contrib/zlib-ng
--- a/docker/test/performance-comparison/config/config.d/user_files.xml
+++ b/docker/test/performance-comparison/config/config.d/user_files.xml
@ -5,5 +5,6 @@
    <!-- Path to configuration file with users, access rights, profiles of settings, quotas. -->
    <users_config>users.xml</users_config>

-    <access_control_path>/var/lib/clickhouse/access/</access_control_path>
+    <!-- Path to directory where users created by SQL commands are stored. -->
+    <access_control_path>access/</access_control_path>
 </yandex>
--- a/docker/test/split_build_smoke_test/Dockerfile
+++ b/docker/test/split_build_smoke_test/Dockerfile
@ -2,5 +2,6 @@
 FROM yandex/clickhouse-binary-builder

 COPY run.sh /run.sh
+COPY process_split_build_smoke_test_result.py /

 CMD /run.sh
--- a/docker/test/split_build_smoke_test/process_split_build_smoke_test_result.py
+++ b/docker/test/split_build_smoke_test/process_split_build_smoke_test_result.py
@ -0,0 +1,61 @@
+#!/usr/bin/env python3
+
+import os
+import logging
+import argparse
+import csv
+
+RESULT_LOG_NAME = "run.log"
+
+def process_result(result_folder):
+
+    status = "success"
+    description = 'Server started and responded'
+    summary = [("Smoke test", "OK")]
+    with open(os.path.join(result_folder, RESULT_LOG_NAME), 'r') as run_log:
+        lines = run_log.read().split('\n')
+        if not lines or lines[0].strip() != 'OK':
+            status = "failure"
+            logging.info("Lines is not ok: %s", str('\n'.join(lines)))
+            summary = [("Smoke test", "FAIL")]
+            description = 'Server failed to respond, see result in logs'
+
+    result_logs = []
+    server_log_path = os.path.join(result_folder, "clickhouse-server.log")
+    stderr_log_path = os.path.join(result_folder, "stderr.log")
+    client_stderr_log_path = os.path.join(result_folder, "clientstderr.log")
+
+    if os.path.exists(server_log_path):
+        result_logs.append(server_log_path)
+
+    if os.path.exists(stderr_log_path):
+        result_logs.append(stderr_log_path)
+
+    if os.path.exists(client_stderr_log_path):
+        result_logs.append(client_stderr_log_path)
+
+    return status, description, summary, result_logs
+
+
+def write_results(results_file, status_file, results, status):
+    with open(results_file, 'w') as f:
+        out = csv.writer(f, delimiter='\t')
+        out.writerows(results)
+    with open(status_file, 'w') as f:
+        out = csv.writer(f, delimiter='\t')
+        out.writerow(status)
+
+
+if __name__ == "__main__":
+    logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
+    parser = argparse.ArgumentParser(description="ClickHouse script for parsing results of split build smoke test")
+    parser.add_argument("--in-results-dir", default='/test_output/')
+    parser.add_argument("--out-results-file", default='/test_output/test_results.tsv')
+    parser.add_argument("--out-status-file", default='/test_output/check_status.tsv')
+    args = parser.parse_args()
+
+    state, description, test_results, logs = process_result(args.in_results_dir)
+    logging.info("Result parsed")
+    status = (state, description)
+    write_results(args.out_results_file, args.out_status_file, test_results, status)
+    logging.info("Result written")
--- a/docker/test/split_build_smoke_test/run.sh
+++ b/docker/test/split_build_smoke_test/run.sh
@ -5,16 +5,18 @@ set -x
 install_and_run_server() {
    mkdir /unpacked
    tar -xzf /package_folder/shared_build.tgz -C /unpacked --strip 1
-    LD_LIBRARY_PATH=/unpacked /unpacked/clickhouse-server --config /unpacked/config/config.xml >/var/log/clickhouse-server/stderr.log 2>&1 &
+    LD_LIBRARY_PATH=/unpacked /unpacked/clickhouse-server --config /unpacked/config/config.xml >/test_output/stderr.log 2>&1 &
 }

 run_client() {
    for i in {1..100}; do
        sleep 1
-        LD_LIBRARY_PATH=/unpacked /unpacked/clickhouse-client --query "select 'OK'" 2>/var/log/clickhouse-server/clientstderr.log && break
+        LD_LIBRARY_PATH=/unpacked /unpacked/clickhouse-client --query "select 'OK'" > /test_output/run.log 2> /test_output/clientstderr.log && break
        [[ $i == 100 ]] && echo 'FAIL'
    done
 }

 install_and_run_server
 run_client
+mv /var/log/clickhouse-server/clickhouse-server.log /test_output/clickhouse-server.log
+/process_split_build_smoke_test_result.py || echo -e "failure\tCannot parse results" > /test_output/check_status.tsv
--- a/docker/test/sqlancer/Dockerfile
+++ b/docker/test/sqlancer/Dockerfile
@ -1,7 +1,7 @@
 # docker build -t yandex/clickhouse-sqlancer-test .
 FROM ubuntu:20.04

-RUN apt-get update --yes && env DEBIAN_FRONTEND=noninteractive apt-get install wget unzip git openjdk-14-jdk maven --yes --no-install-recommends
+RUN apt-get update --yes && env DEBIAN_FRONTEND=noninteractive apt-get install wget unzip git openjdk-14-jdk maven python3 --yes --no-install-recommends

 RUN wget https://github.com/sqlancer/sqlancer/archive/master.zip -O /sqlancer.zip
 RUN mkdir /sqlancer && \
@ -10,4 +10,5 @@ RUN mkdir /sqlancer && \
 RUN cd /sqlancer/sqlancer-master && mvn package -DskipTests

 COPY run.sh /
+COPY process_sqlancer_result.py /
 CMD ["/bin/bash", "/run.sh"]
--- a/docker/test/sqlancer/process_sqlancer_result.py
+++ b/docker/test/sqlancer/process_sqlancer_result.py
@ -0,0 +1,74 @@
+#!/usr/bin/env python3
+
+import os
+import logging
+import argparse
+import csv
+
+
+def process_result(result_folder):
+    status = "success"
+    summary = []
+    paths = []
+    tests = ["TLPWhere", "TLPGroupBy", "TLPHaving", "TLPWhereGroupBy", "TLPDistinct", "TLPAggregate"]
+
+    for test in tests:
+        err_path = '{}/{}.err'.format(result_folder, test)
+        out_path = '{}/{}.out'.format(result_folder, test)
+        if not os.path.exists(err_path):
+            logging.info("No output err on path %s", err_path)
+            summary.append((test, "SKIPPED"))
+        elif not os.path.exists(out_path):
+            logging.info("No output log on path %s", out_path)
+        else:
+            paths.append(err_path)
+            paths.append(out_path)
+            with open(err_path, 'r') as f:
+                if 'AssertionError' in f.read():
+                    summary.append((test, "FAIL"))
+                else:
+                    summary.append((test, "OK"))
+
+    logs_path = '{}/logs.tar.gz'.format(result_folder)
+    if not os.path.exists(logs_path):
+        logging.info("No logs tar on path %s", logs_path)
+    else:
+        paths.append(logs_path)
+    stdout_path = '{}/stdout.log'.format(result_folder)
+    if not os.path.exists(stdout_path):
+        logging.info("No stdout log on path %s", stdout_path)
+    else:
+        paths.append(stdout_path)
+    stderr_path = '{}/stderr.log'.format(result_folder)
+    if not os.path.exists(stderr_path):
+        logging.info("No stderr log on path %s", stderr_path)
+    else:
+        paths.append(stderr_path)
+
+    description = "SQLancer test run. See report"
+
+    return status, description, summary, paths
+
+
+def write_results(results_file, status_file, results, status):
+    with open(results_file, 'w') as f:
+        out = csv.writer(f, delimiter='\t')
+        out.writerows(results)
+    with open(status_file, 'w') as f:
+        out = csv.writer(f, delimiter='\t')
+        out.writerow(status)
+
+
+if __name__ == "__main__":
+    logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
+    parser = argparse.ArgumentParser(description="ClickHouse script for parsing results of sqlancer test")
+    parser.add_argument("--in-results-dir", default='/test_output/')
+    parser.add_argument("--out-results-file", default='/test_output/test_results.tsv')
+    parser.add_argument("--out-status-file", default='/test_output/check_status.tsv')
+    args = parser.parse_args()
+
+    state, description, test_results, logs = process_result(args.in_results_dir)
+    logging.info("Result parsed")
+    status = (state, description)
+    write_results(args.out_results_file, args.out_status_file, test_results, status)
+    logging.info("Result written")
--- a/docker/test/sqlancer/run.sh
+++ b/docker/test/sqlancer/run.sh
@ -29,4 +29,5 @@ tail -n 1000 /var/log/clickhouse-server/stderr.log > /test_output/stderr.log
 tail -n 1000 /var/log/clickhouse-server/stdout.log > /test_output/stdout.log
 tail -n 1000 /var/log/clickhouse-server/clickhouse-server.log > /test_output/clickhouse-server.log

+/process_sqlancer_result.py || echo -e "failure\tCannot parse results" > /test_output/check_status.tsv
 ls /test_output
--- a/docker/test/stateful/run.sh
+++ b/docker/test/stateful/run.sh
@ -65,3 +65,11 @@ if [[ -n "$USE_DATABASE_REPLICATED" ]] && [[ "$USE_DATABASE_REPLICATED" -eq 1 ]]
 fi

 clickhouse-test --testname --shard --zookeeper --no-stateless --hung-check --print-time "$SKIP_LIST_OPT" "${ADDITIONAL_OPTIONS[@]}" "$SKIP_TESTS_OPTION" 2>&1 | ts '%Y-%m-%d %H:%M:%S' | tee test_output/test_result.txt
+
+./process_functional_tests_result.py || echo -e "failure\tCannot parse results" > /test_output/check_status.tsv
+
+pigz < /var/log/clickhouse-server/clickhouse-server.log > /test_output/clickhouse-server.log.gz ||:
+mv /var/log/clickhouse-server/stderr.log /test_output/ ||:
+if [[ -n "$WITH_COVERAGE" ]] && [[ "$WITH_COVERAGE" -eq 1 ]]; then
+    tar -chf /test_output/clickhouse_coverage.tar.gz /profraw ||:
+fi
--- a/docker/test/stateless/Dockerfile
+++ b/docker/test/stateless/Dockerfile
@ -46,4 +46,5 @@ ENV NUM_TRIES=1
 ENV MAX_RUN_TIME=0

 COPY run.sh /
+COPY process_functional_tests_result.py /
 CMD ["/bin/bash", "/run.sh"]
--- a/docker/test/stateless/process_functional_tests_result.py
+++ b/docker/test/stateless/process_functional_tests_result.py
@ -0,0 +1,118 @@
+#!/usr/bin/env python3
+
+import os
+import logging
+import argparse
+import csv
+
+OK_SIGN = "[ OK "
+FAIL_SING = "[ FAIL "
+TIMEOUT_SING = "[ Timeout! "
+UNKNOWN_SIGN = "[ UNKNOWN "
+SKIPPED_SIGN = "[ SKIPPED "
+HUNG_SIGN = "Found hung queries in processlist"
+
+def process_test_log(log_path):
+    total = 0
+    skipped = 0
+    unknown = 0
+    failed = 0
+    success = 0
+    hung = False
+    test_results = []
+    with open(log_path, 'r') as test_file:
+        for line in test_file:
+            line = line.strip()
+            if HUNG_SIGN in line:
+                hung = True
+            if any(sign in line for sign in (OK_SIGN, FAIL_SING, UNKNOWN_SIGN, SKIPPED_SIGN)):
+                test_name = line.split(' ')[2].split(':')[0]
+
+                test_time = ''
+                try:
+                    time_token = line.split(']')[1].strip().split()[0]
+                    float(time_token)
+                    test_time = time_token
+                except:
+                    pass
+
+                total += 1
+                if TIMEOUT_SING in line:
+                    failed += 1
+                    test_results.append((test_name, "Timeout", test_time))
+                elif FAIL_SING in line:
+                    failed += 1
+                    test_results.append((test_name, "FAIL", test_time))
+                elif UNKNOWN_SIGN in line:
+                    unknown += 1
+                    test_results.append((test_name, "FAIL", test_time))
+                elif SKIPPED_SIGN in line:
+                    skipped += 1
+                    test_results.append((test_name, "SKIPPED", test_time))
+                else:
+                    success += int(OK_SIGN in line)
+                    test_results.append((test_name, "OK", test_time))
+    return total, skipped, unknown, failed, success, hung, test_results
+
+def process_result(result_path):
+    test_results = []
+    state = "success"
+    description = ""
+    files = os.listdir(result_path)
+    if files:
+        logging.info("Find files in result folder %s", ','.join(files))
+        result_path = os.path.join(result_path, 'test_result.txt')
+    else:
+        result_path = None
+        description = "No output log"
+        state = "error"
+
+    if result_path and os.path.exists(result_path):
+        total, skipped, unknown, failed, success, hung, test_results = process_test_log(result_path)
+        is_flacky_check = 1 < int(os.environ.get('NUM_TRIES', 1))
+        # If no tests were run (success == 0) it indicates an error (e.g. server did not start or crashed immediately)
+        # But it's Ok for "flaky checks" - they can contain just one test for check which is marked as skipped.
+        if failed != 0 or unknown != 0 or (success == 0 and (not is_flacky_check)):
+            state = "failure"
+
+        if hung:
+            description = "Some queries hung, "
+            state = "failure"
+        else:
+            description = ""
+
+        description += "fail: {}, passed: {}".format(failed, success)
+        if skipped != 0:
+            description += ", skipped: {}".format(skipped)
+        if unknown != 0:
+            description += ", unknown: {}".format(unknown)
+    else:
+        state = "failure"
+        description = "Output log doesn't exist"
+        test_results = []
+
+    return state, description, test_results
+
+
+def write_results(results_file, status_file, results, status):
+    with open(results_file, 'w') as f:
+        out = csv.writer(f, delimiter='\t')
+        out.writerows(results)
+    with open(status_file, 'w') as f:
+        out = csv.writer(f, delimiter='\t')
+        out.writerow(status)
+
+
+if __name__ == "__main__":
+    logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
+    parser = argparse.ArgumentParser(description="ClickHouse script for parsing results of functional tests")
+    parser.add_argument("--in-results-dir", default='/test_output/')
+    parser.add_argument("--out-results-file", default='/test_output/test_results.tsv')
+    parser.add_argument("--out-status-file", default='/test_output/check_status.tsv')
+    args = parser.parse_args()
+
+    state, description, test_results = process_result(args.in_results_dir)
+    logging.info("Result parsed")
+    status = (state, description)
+    write_results(args.out_results_file, args.out_status_file, test_results, status)
+    logging.info("Result written")
--- a/docker/test/stateless/run.sh
+++ b/docker/test/stateless/run.sh
@ -72,5 +72,12 @@ export -f run_tests

 timeout "$MAX_RUN_TIME" bash -c run_tests ||:

+./process_functional_tests_result.py || echo -e "failure\tCannot parse results" > /test_output/check_status.tsv
+
+pigz < /var/log/clickhouse-server/clickhouse-server.log > /test_output/clickhouse-server.log.gz ||:
+mv /var/log/clickhouse-server/stderr.log /test_output/ ||:
+if [[ -n "$WITH_COVERAGE" ]] && [[ "$WITH_COVERAGE" -eq 1 ]]; then
+    tar -chf /test_output/clickhouse_coverage.tar.gz /profraw ||:
+fi
 tar -chf /test_output/text_log_dump.tar /var/lib/clickhouse/data/system/text_log ||:
 tar -chf /test_output/query_log_dump.tar /var/lib/clickhouse/data/system/query_log ||:
--- a/docker/test/stress/run.sh
+++ b/docker/test/stress/run.sh
@ -53,10 +53,14 @@ handle SIGBUS stop print
 handle SIGABRT stop print
 continue
 thread apply all backtrace
-continue
+detach
+quit
 " > script.gdb

-    gdb -batch -command script.gdb -p "$(cat /var/run/clickhouse-server/clickhouse-server.pid)" &
+    # FIXME Hung check may work incorrectly because of attached gdb
+    # 1. False positives are possible
+    # 2. We cannot attach another gdb to get stacktraces if some queries hung
+    gdb -batch -command script.gdb -p "$(cat /var/run/clickhouse-server/clickhouse-server.pid)" >> /test_output/gdb.log &
 }

 configure
@ -78,11 +82,56 @@ clickhouse-client --query "RENAME TABLE datasets.hits_v1 TO test.hits"
 clickhouse-client --query "RENAME TABLE datasets.visits_v1 TO test.visits"
 clickhouse-client --query "SHOW TABLES FROM test"

-./stress --hung-check --output-folder test_output --skip-func-tests "$SKIP_TESTS_OPTION" && echo "OK" > /test_output/script_exit_code.txt || echo "FAIL" > /test_output/script_exit_code.txt
+./stress --hung-check --output-folder test_output --skip-func-tests "$SKIP_TESTS_OPTION" \
+    && echo -e 'Test script exit code\tOK' >> /test_output/test_results.tsv \
+    || echo -e 'Test script failed\tFAIL' >> /test_output/test_results.tsv

 stop
-# TODO remove me when persistent snapshots will be ready
-rm -fr /var/lib/clickhouse/coordination ||:
 start

-clickhouse-client --query "SELECT 'Server successfuly started'" > /test_output/alive_check.txt || echo 'Server failed to start' > /test_output/alive_check.txt
+clickhouse-client --query "SELECT 'Server successfully started', 'OK'" >> /test_output/test_results.tsv \
+                       || echo -e 'Server failed to start\tFAIL' >> /test_output/test_results.tsv
+
+[ -f /var/log/clickhouse-server/clickhouse-server.log ] || echo -e "Server log does not exist\tFAIL"
+[ -f /var/log/clickhouse-server/stderr.log ] || echo -e "Stderr log does not exist\tFAIL"
+
+# Print Fatal log messages to stdout
+zgrep -Fa " <Fatal> " /var/log/clickhouse-server/clickhouse-server.log
+
+# Grep logs for sanitizer asserts, crashes and other critical errors
+
+# Sanitizer asserts
+zgrep -Fa "==================" /var/log/clickhouse-server/stderr.log >> /test_output/tmp
+zgrep -Fa "WARNING" /var/log/clickhouse-server/stderr.log >> /test_output/tmp
+zgrep -Fav "ASan doesn't fully support makecontext/swapcontext functions" > /dev/null \
+    && echo -e 'Sanitizer assert (in stderr.log)\tFAIL' >> /test_output/test_results.tsv \
+    || echo -e 'No sanitizer asserts\tOK' >> /test_output/test_results.tsv
+rm -f /test_output/tmp
+
+# Logical errors
+zgrep -Fa "Code: 49, e.displayText() = DB::Exception:" /var/log/clickhouse-server/clickhouse-server.log > /dev/null \
+    && echo -e 'Logical error thrown (see clickhouse-server.log)\tFAIL' >> /test_output/test_results.tsv \
+    || echo -e 'No logical errors\tOK' >> /test_output/test_results.tsv
+
+# Crash
+zgrep -Fa "########################################" /var/log/clickhouse-server/clickhouse-server.log > /dev/null \
+    && echo -e 'Killed by signal (in clickhouse-server.log)\tFAIL' >> /test_output/test_results.tsv \
+    || echo -e 'Not crashed\tOK' >> /test_output/test_results.tsv
+
+# It also checks for OOM or crash without stacktrace (printed by watchdog)
+zgrep -Fa " <Fatal> " /var/log/clickhouse-server/clickhouse-server.log > /dev/null \
+    && echo -e 'Fatal message in clickhouse-server.log\tFAIL' >> /test_output/test_results.tsv \
+    || echo -e 'No fatal messages in clickhouse-server.log\tOK' >> /test_output/test_results.tsv
+
+zgrep -Fa "########################################" /test_output/* > /dev/null \
+    && echo -e 'Killed by signal (output files)\tFAIL' >> /test_output/test_results.tsv
+
+# Put logs into /test_output/
+pigz < /var/log/clickhouse-server/clickhouse-server.log > /test_output/clickhouse-server.log.gz
+tar -chf /test_output/coordination.tar /var/lib/clickhouse/coordination ||:
+mv /var/log/clickhouse-server/stderr.log /test_output/
+tar -chf /test_output/query_log_dump.tar /var/lib/clickhouse/data/system/query_log ||:
+
+# Write check result into check_status.tsv
+clickhouse-local --structure "test String, res String" -q "SELECT 'failure', test FROM table WHERE res != 'OK' order by (lower(test) like '%hung%') LIMIT 1" < /test_output/test_results.tsv > /test_output/check_status.tsv
+[ -s /test_output/check_status.tsv ] || echo -e "success\tNo errors found" > /test_output/check_status.tsv
--- a/docker/test/stress/stress
+++ b/docker/test/stress/stress
@ -58,6 +58,37 @@ def run_func_test(cmd, output_prefix, num_processes, skip_tests_option, global_t
        time.sleep(0.5)
    return pipes

+def prepare_for_hung_check():
+    # FIXME this function should not exist, but...
+
+    # We attach gdb to clickhouse-server before running tests
+    # to print stacktraces of all crashes even if clickhouse cannot print it for some reason.
+    # However, it obstruct checking for hung queries.
+    logging.info("Will terminate gdb (if any)")
+    call("kill -TERM $(pidof gdb)", shell=True, stderr=STDOUT)
+
+    # Some tests set too low memory limit for default user and forget to reset in back.
+    # It may cause SYSTEM queries to fail, let's disable memory limit.
+    call("clickhouse client --max_memory_usage_for_user=0 -q 'SELECT 1 FORMAT Null'", shell=True, stderr=STDOUT)
+
+    # Some tests execute SYSTEM STOP MERGES or similar queries.
+    # It may cause some ALTERs to hang.
+    # Possibly we should fix tests and forbid to use such queries without specifying table.
+    call("clickhouse client -q 'SYSTEM START MERGES'", shell=True, stderr=STDOUT)
+    call("clickhouse client -q 'SYSTEM START DISTRIBUTED SENDS'", shell=True, stderr=STDOUT)
+    call("clickhouse client -q 'SYSTEM START TTL MERGES'", shell=True, stderr=STDOUT)
+    call("clickhouse client -q 'SYSTEM START MOVES'", shell=True, stderr=STDOUT)
+    call("clickhouse client -q 'SYSTEM START FETCHES'", shell=True, stderr=STDOUT)
+    call("clickhouse client -q 'SYSTEM START REPLICATED SENDS'", shell=True, stderr=STDOUT)
+    call("clickhouse client -q 'SYSTEM START REPLICATION QUEUES'", shell=True, stderr=STDOUT)
+
+    # Issue #21004, live views are experimental, so let's just suppress it
+    call("""clickhouse client -q "KILL QUERY WHERE upper(query) LIKE 'WATCH %'" """, shell=True, stderr=STDOUT)
+
+    # Wait for last queries to finish if any, not longer than 120 seconds
+    call("""clickhouse client -q "select sleepEachRow((
+            select maxOrDefault(120 - elapsed) + 1 from system.processes where query not like '%from system.processes%' and elapsed < 120
+            ) / 120) from numbers(120) format Null" """, shell=True, stderr=STDOUT)

 if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
@ -88,11 +119,14 @@ if __name__ == "__main__":

    logging.info("All processes finished")
    if args.hung_check:
+        prepare_for_hung_check()
        logging.info("Checking if some queries hung")
        cmd = "{} {} {}".format(args.test_cmd, "--hung-check", "00001_select_1")
        res = call(cmd, shell=True, stderr=STDOUT)
+        hung_check_status = "No queries hung\tOK\n"
        if res != 0:
            logging.info("Hung check failed with exit code {}".format(res))
-            sys.exit(1)
+            hung_check_status = "Hung check failed\tFAIL\n"
+        open(os.path.join(args.output_folder, "test_results.tsv"), 'w+').write(hung_check_status)

    logging.info("Stress test finished")
--- a/docker/test/style/Dockerfile
+++ b/docker/test/style/Dockerfile
@ -10,14 +10,6 @@ RUN apt-get update && env DEBIAN_FRONTEND=noninteractive apt-get install --yes \
    yamllint \
    && pip3 install codespell

-
-# For |& syntax
-SHELL ["bash", "-c"]
-
-CMD cd /ClickHouse/utils/check-style && \
-    ./check-style -n              |& tee /test_output/style_output.txt && \
-    ./check-typos                 |& tee /test_output/typos_output.txt && \
-    ./check-whitespaces -n        |& tee /test_output/whitespaces_output.txt && \
-    ./check-duplicate-includes.sh |& tee /test_output/duplicate_output.txt && \
-    ./shellcheck-run.sh           |& tee /test_output/shellcheck_output.txt && \
-    true
+COPY run.sh /
+COPY process_style_check_result.py /
+CMD ["/bin/bash", "/run.sh"]
--- a/docker/test/style/process_style_check_result.py
+++ b/docker/test/style/process_style_check_result.py
@ -0,0 +1,96 @@
+#!/usr/bin/env python3
+
+import os
+import logging
+import argparse
+import csv
+
+
+def process_result(result_folder):
+    status = "success"
+    description = ""
+    test_results = []
+
+    style_log_path = '{}/style_output.txt'.format(result_folder)
+    if not os.path.exists(style_log_path):
+        logging.info("No style check log on path %s", style_log_path)
+        return "exception", "No style check log", []
+    elif os.stat(style_log_path).st_size != 0:
+        description += "Style check failed. "
+        test_results.append(("Style check", "FAIL"))
+        status = "failure"  # Disabled for now
+    else:
+        test_results.append(("Style check", "OK"))
+
+    typos_log_path = '{}/typos_output.txt'.format(result_folder)
+    if not os.path.exists(style_log_path):
+        logging.info("No typos check log on path %s", style_log_path)
+        return "exception", "No typos check log", []
+    elif os.stat(typos_log_path).st_size != 0:
+        description += "Typos check failed. "
+        test_results.append(("Typos check", "FAIL"))
+        status = "failure"
+    else:
+        test_results.append(("Typos check", "OK"))
+
+    whitespaces_log_path = '{}/whitespaces_output.txt'.format(result_folder)
+    if not os.path.exists(style_log_path):
+        logging.info("No whitespaces check log on path %s", style_log_path)
+        return "exception", "No whitespaces check log", []
+    elif os.stat(whitespaces_log_path).st_size != 0:
+        description += "Whitespaces check failed. "
+        test_results.append(("Whitespaces check", "FAIL"))
+        status = "failure"
+    else:
+        test_results.append(("Whitespaces check", "OK"))
+
+    duplicate_log_path = '{}/duplicate_output.txt'.format(result_folder)
+    if not os.path.exists(duplicate_log_path):
+        logging.info("No header duplicates check log on path %s", duplicate_log_path)
+        return "exception", "No header duplicates check log", []
+    elif os.stat(duplicate_log_path).st_size != 0:
+        description += " Header duplicates check failed. "
+        test_results.append(("Header duplicates check", "FAIL"))
+        status = "failure"
+    else:
+        test_results.append(("Header duplicates check", "OK"))
+
+    shellcheck_log_path = '{}/shellcheck_output.txt'.format(result_folder)
+    if not os.path.exists(shellcheck_log_path):
+        logging.info("No shellcheck  log on path %s", shellcheck_log_path)
+        return "exception", "No shellcheck log", []
+    elif os.stat(shellcheck_log_path).st_size != 0:
+        description += " Shellcheck check failed. "
+        test_results.append(("Shellcheck ", "FAIL"))
+        status = "failure"
+    else:
+        test_results.append(("Shellcheck", "OK"))
+
+    if not description:
+        description += "Style check success"
+
+    return status, description, test_results
+
+
+def write_results(results_file, status_file, results, status):
+    with open(results_file, 'w') as f:
+        out = csv.writer(f, delimiter='\t')
+        out.writerows(results)
+    with open(status_file, 'w') as f:
+        out = csv.writer(f, delimiter='\t')
+        out.writerow(status)
+
+
+if __name__ == "__main__":
+    logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
+    parser = argparse.ArgumentParser(description="ClickHouse script for parsing results of style check")
+    parser.add_argument("--in-results-dir", default='/test_output/')
+    parser.add_argument("--out-results-file", default='/test_output/test_results.tsv')
+    parser.add_argument("--out-status-file", default='/test_output/check_status.tsv')
+    args = parser.parse_args()
+
+    state, description, test_results = process_result(args.in_results_dir)
+    logging.info("Result parsed")
+    status = (state, description)
+    write_results(args.out_results_file, args.out_status_file, test_results, status)
+    logging.info("Result written")
--- a/docker/test/style/run.sh
+++ b/docker/test/style/run.sh
@ -0,0 +1,9 @@
+#!/bin/bash
+
+cd /ClickHouse/utils/check-style || echo -e "failure\tRepo not found" > /test_output/check_status.tsv
+./check-style -n              |& tee /test_output/style_output.txt
+./check-typos                 |& tee /test_output/typos_output.txt
+./check-whitespaces -n        |& tee /test_output/whitespaces_output.txt
+./check-duplicate-includes.sh |& tee /test_output/duplicate_output.txt
+./shellcheck-run.sh           |& tee /test_output/shellcheck_output.txt
+/process_style_check_result.py || echo -e "failure\tCannot parse results" > /test_output/check_status.tsv
--- a/docker/test/testflows/runner/Dockerfile
+++ b/docker/test/testflows/runner/Dockerfile
@ -35,7 +35,7 @@ RUN apt-get update \
 ENV TZ=Europe/Moscow
 RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

-RUN pip3 install urllib3 testflows==1.6.72 docker-compose docker dicttoxml kazoo tzlocal
+RUN pip3 install urllib3 testflows==1.6.74 docker-compose docker dicttoxml kazoo tzlocal

 ENV DOCKER_CHANNEL stable
 ENV DOCKER_VERSION 17.09.1-ce
@ -61,6 +61,7 @@ RUN set -eux; \

 COPY modprobe.sh /usr/local/bin/modprobe
 COPY dockerd-entrypoint.sh /usr/local/bin/
+COPY process_testflows_result.py /usr/local/bin/

 RUN set -x \
  && addgroup --system dockremap \
@ -72,5 +73,5 @@ RUN set -x \
 VOLUME /var/lib/docker
 EXPOSE 2375
 ENTRYPOINT ["dockerd-entrypoint.sh"]
-CMD ["sh", "-c", "python3 regression.py --no-color -o classic --local --clickhouse-binary-path ${CLICKHOUSE_TESTS_SERVER_BIN_PATH} --log test.log ${TESTFLOWS_OPTS}; cat test.log | tfs report results --format json > results.json"]
+CMD ["sh", "-c", "python3 regression.py --no-color -o classic --local --clickhouse-binary-path ${CLICKHOUSE_TESTS_SERVER_BIN_PATH} --log test.log ${TESTFLOWS_OPTS}; cat test.log | tfs report results --format json > results.json; /usr/local/bin/process_testflows_result.py || echo -e 'failure\tCannot parse results' > check_status.tsv"]

--- a/docker/test/testflows/runner/process_testflows_result.py
+++ b/docker/test/testflows/runner/process_testflows_result.py
@ -0,0 +1,67 @@
+#!/usr/bin/env python3
+
+import os
+import logging
+import argparse
+import csv
+import json
+
+
+def process_result(result_folder):
+    json_path = os.path.join(result_folder, "results.json")
+    if not os.path.exists(json_path):
+        return "success", "No testflows in branch", None, []
+
+    test_binary_log = os.path.join(result_folder, "test.log")
+    with open(json_path) as source:
+        results = json.loads(source.read())
+
+    total_tests = 0
+    total_ok = 0
+    total_fail = 0
+    total_other = 0
+    test_results = []
+    for test in results["tests"]:
+        test_name = test['test']['test_name']
+        test_result = test['result']['result_type'].upper()
+        test_time = str(test['result']['message_rtime'])
+        total_tests += 1
+        if test_result == "OK":
+            total_ok += 1
+        elif test_result == "FAIL" or test_result == "ERROR":
+            total_fail += 1
+        else:
+            total_other += 1
+
+        test_results.append((test_name, test_result, test_time))
+    if total_fail != 0:
+        status = "failure"
+    else:
+        status = "success"
+
+    description = "failed: {}, passed: {}, other: {}".format(total_fail, total_ok, total_other)
+    return status, description, test_results, [json_path, test_binary_log]
+
+
+def write_results(results_file, status_file, results, status):
+    with open(results_file, 'w') as f:
+        out = csv.writer(f, delimiter='\t')
+        out.writerows(results)
+    with open(status_file, 'w') as f:
+        out = csv.writer(f, delimiter='\t')
+        out.writerow(status)
+
+if __name__ == "__main__":
+    logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
+    parser = argparse.ArgumentParser(description="ClickHouse script for parsing results of Testflows tests")
+    parser.add_argument("--in-results-dir", default='./')
+    parser.add_argument("--out-results-file", default='./test_results.tsv')
+    parser.add_argument("--out-status-file", default='./check_status.tsv')
+    args = parser.parse_args()
+
+    state, description, test_results, logs = process_result(args.in_results_dir)
+    logging.info("Result parsed")
+    status = (state, description)
+    write_results(args.out_results_file, args.out_status_file, test_results, status)
+    logging.info("Result written")
+
--- a/docker/test/unit/Dockerfile
+++ b/docker/test/unit/Dockerfile
@ -5,6 +5,6 @@ ENV TZ=Europe/Moscow
 RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone
 RUN apt-get install gdb

-CMD service zookeeper start && sleep 7 && /usr/share/zookeeper/bin/zkCli.sh -server localhost:2181 -create create /clickhouse_test ''; \
-    gdb -q  -ex 'set print inferior-events off' -ex 'set confirm off' -ex 'set print thread-events off' -ex run -ex bt -ex quit --args ./unit_tests_dbms | tee test_output/test_result.txt
-
+COPY run.sh /
+COPY process_unit_tests_result.py /
+CMD ["/bin/bash", "/run.sh"]
--- a/docker/test/unit/process_unit_tests_result.py
+++ b/docker/test/unit/process_unit_tests_result.py
@ -0,0 +1,96 @@
+#!/usr/bin/env python3
+
+import os
+import logging
+import argparse
+import csv
+
+OK_SIGN = 'OK ]'
+FAILED_SIGN = 'FAILED  ]'
+SEGFAULT = 'Segmentation fault'
+SIGNAL = 'received signal SIG'
+PASSED = 'PASSED'
+
+def get_test_name(line):
+    elements = reversed(line.split(' '))
+    for element in elements:
+        if '(' not in element and ')' not in element:
+            return element
+    raise Exception("No test name in line '{}'".format(line))
+
+def process_result(result_folder):
+    summary = []
+    total_counter = 0
+    failed_counter = 0
+    result_log_path = '{}/test_result.txt'.format(result_folder)
+    if not os.path.exists(result_log_path):
+        logging.info("No output log on path %s", result_log_path)
+        return "exception", "No output log", []
+
+    status = "success"
+    description = ""
+    passed = False
+    with open(result_log_path, 'r') as test_result:
+        for line in test_result:
+            if OK_SIGN in line:
+                logging.info("Found ok line: '%s'", line)
+                test_name = get_test_name(line.strip())
+                logging.info("Test name: '%s'", test_name)
+                summary.append((test_name, "OK"))
+                total_counter += 1
+            elif FAILED_SIGN in line and 'listed below' not in line and 'ms)' in line:
+                logging.info("Found fail line: '%s'", line)
+                test_name = get_test_name(line.strip())
+                logging.info("Test name: '%s'", test_name)
+                summary.append((test_name, "FAIL"))
+                total_counter += 1
+                failed_counter += 1
+            elif SEGFAULT in line:
+                logging.info("Found segfault line: '%s'", line)
+                status = "failure"
+                description += "Segmentation fault. "
+                break
+            elif SIGNAL in line:
+                logging.info("Received signal line: '%s'", line)
+                status = "failure"
+                description += "Exit on signal. "
+                break
+            elif PASSED in line:
+                logging.info("PASSED record found: '%s'", line)
+                passed = True
+
+    if not passed:
+        status = "failure"
+        description += "PASSED record not found. "
+
+    if failed_counter != 0:
+        status = "failure"
+
+    if not description:
+        description += "fail: {}, passed: {}".format(failed_counter, total_counter - failed_counter)
+
+    return status, description, summary
+
+
+def write_results(results_file, status_file, results, status):
+    with open(results_file, 'w') as f:
+        out = csv.writer(f, delimiter='\t')
+        out.writerows(results)
+    with open(status_file, 'w') as f:
+        out = csv.writer(f, delimiter='\t')
+        out.writerow(status)
+
+if __name__ == "__main__":
+    logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
+    parser = argparse.ArgumentParser(description="ClickHouse script for parsing results of unit tests")
+    parser.add_argument("--in-results-dir", default='/test_output/')
+    parser.add_argument("--out-results-file", default='/test_output/test_results.tsv')
+    parser.add_argument("--out-status-file", default='/test_output/check_status.tsv')
+    args = parser.parse_args()
+
+    state, description, test_results = process_result(args.in_results_dir)
+    logging.info("Result parsed")
+    status = (state, description)
+    write_results(args.out_results_file, args.out_status_file, test_results, status)
+    logging.info("Result written")
+
--- a/docker/test/unit/run.sh
+++ b/docker/test/unit/run.sh
@ -0,0 +1,7 @@
+#!/bin/bash
+
+set -x
+
+service zookeeper start && sleep 7 && /usr/share/zookeeper/bin/zkCli.sh -server localhost:2181 -create create /clickhouse_test '';
+gdb -q  -ex 'set print inferior-events off' -ex 'set confirm off' -ex 'set print thread-events off' -ex run -ex bt -ex quit --args ./unit_tests_dbms | tee test_output/test_result.txt
+./process_unit_tests_result.py  || echo -e "failure\tCannot parse results" > /test_output/check_status.tsv
--- a/docs/.gitignore
+++ b/docs/.gitignore
@ -0,0 +1 @@
+build
--- a/docs/_description_templates/template-engine.md
+++ b/docs/_description_templates/template-engine.md
@ -58,6 +58,6 @@ Result:

 Follow up with any text to clarify the example.

-## See Also {#see-also}
+**See Also** 

 -   [link](#)
--- a/docs/_description_templates/template-function.md
+++ b/docs/_description_templates/template-function.md
@ -14,12 +14,12 @@ More text (Optional).

 **Arguments** (Optional)

-   `x` — Description. [Type name](relative/path/to/type/dscr.md#type).
-   `y` — Description. [Type name](relative/path/to/type/dscr.md#type).
+-   `x` — Description. Optional (only for optional arguments). Possible values: <values list>. Default value: <value>. [Type name](relative/path/to/type/dscr.md#type). 
+-   `y` — Description. Optional (only for optional arguments). Possible values: <values list>.Default value: <value>. [Type name](relative/path/to/type/dscr.md#type). 

 **Parameters** (Optional, only for parametric aggregate functions)

-   `z` — Description. [Type name](relative/path/to/type/dscr.md#type).
+-   `z` — Description. Optional (only for optional parameters). Possible values: <values list>. Default value: <value>. [Type name](relative/path/to/type/dscr.md#type).

 **Returned value(s)**

--- a/docs/_description_templates/template-server-setting.md
+++ b/docs/_description_templates/template-server-setting.md
@ -8,14 +8,14 @@ Possible value: ...

 Default value: ...

-Settings: (Optional)
+**Settings** (Optional)

 If the section contains several settings, list them here. Specify possible values and default values:

 -   setting_1 — Description.
 -   setting_2 — Description.

-**Example:**
+**Example**

 ```xml
 <server_setting_name>
--- a/docs/_description_templates/template-statement.md
+++ b/docs/_description_templates/template-statement.md
@ -1,14 +1,14 @@
-# Statement name (for example, SHOW USER)
+# Statement name (for example, SHOW USER) {#statement-name-in-lower-case}

 Brief description of what the statement does.

-Syntax:
+**Syntax**

 ```sql
 Syntax of the statement.
 ```

-## Other necessary sections of the description (Optional)
+## Other necessary sections of the description (Optional) {#anchor}

 Examples of descriptions with a complicated structure:

@ -17,7 +17,7 @@ Examples of descriptions with a complicated structure:
 - https://clickhouse.tech/docs/en/sql-reference/statements/select/join/


-## See Also (Optional)
+**See Also** (Optional)

 Links to related topics as a list.

--- a/docs/en/commercial/cloud.md
+++ b/docs/en/commercial/cloud.md
@ -29,6 +29,17 @@ toc_title: Cloud
 -   Cross-AZ scaling for performance and high availability
 -   Built-in monitoring and SQL query editor

+## Alibaba Cloud {#alibaba-cloud}
+
+Alibaba Cloud Managed Service for ClickHouse [China Site](https://www.aliyun.com/product/clickhouse) (Will be available at international site at May, 2021) provides the following key features:
+-   Highly reliable cloud disk storage engine based on Alibaba Cloud Apsara distributed system
+-   Expand capacity on demand without manual data migration
+-   Support single-node, single-replica, multi-node, and multi-replica architectures, and support hot and cold data tiering
+-   Support access allow-list, one-key recovery, multi-layer network security protection, cloud disk encryption
+-   Seamless integration with cloud log systems, databases, and data application tools
+-   Built-in monitoring and database management platform
+-   Professional database expert technical support and service
+
 ## Tencent Cloud {#tencent-cloud}

 [Tencent Managed Service for ClickHouse](https://cloud.tencent.com/product/cdwch) provides the following key features:
--- a/docs/en/development/build.md
+++ b/docs/en/development/build.md
@ -170,7 +170,7 @@ $ ./release
 Normally all tools of the ClickHouse bundle, such as `clickhouse-server`, `clickhouse-client` etc., are linked into a single static executable, `clickhouse`. This executable must be re-linked on every change, which might be slow. Two common ways to improve linking time are to use `lld` linker, and use the 'split' build configuration, which builds a separate binary for every tool, and further splits the code into serveral shared libraries. To enable these tweaks, pass the following flags to `cmake`:

 ```
-DCMAKE_C_FLAGS="-fuse-ld=lld" -DCMAKE_CXX_FLAGS="-fuse-ld=lld" -DUSE_STATIC_LIBRARIES=0 -DSPLIT_SHARED_LIBRARIES=1 -DCLICKHOUSE_SPLIT_BINARY=1
+-DCMAKE_C_FLAGS="--ld-path=lld" -DCMAKE_CXX_FLAGS="--ld-path=lld" -DUSE_STATIC_LIBRARIES=0 -DSPLIT_SHARED_LIBRARIES=1 -DCLICKHOUSE_SPLIT_BINARY=1
 ```

 ## You Don’t Have to Build ClickHouse {#you-dont-have-to-build-clickhouse}
--- a/docs/en/engines/table-engines/integrations/embedded-rocksdb.md
+++ b/docs/en/engines/table-engines/integrations/embedded-rocksdb.md
@ -39,4 +39,4 @@ ENGINE = EmbeddedRocksDB
 PRIMARY KEY key
 ```

-[Original article](https://clickhouse.tech/docs/en/operations/table_engines/embedded-rocksdb/) <!--hide-->
+[Original article](https://clickhouse.tech/docs/en/engines/table-engines/integrations/embedded-rocksdb/) <!--hide-->
--- a/docs/en/engines/table-engines/integrations/hdfs.md
+++ b/docs/en/engines/table-engines/integrations/hdfs.md
@ -5,7 +5,7 @@ toc_title: HDFS

 # HDFS {#table_engines-hdfs}

-This engine provides integration with [Apache Hadoop](https://en.wikipedia.org/wiki/Apache_Hadoop) ecosystem by allowing to manage data on [HDFS](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html)via ClickHouse. This engine is similar
+This engine provides integration with [Apache Hadoop](https://en.wikipedia.org/wiki/Apache_Hadoop) ecosystem by allowing to manage data on [HDFS](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) via ClickHouse. This engine is similar
 to the [File](../../../engines/table-engines/special/file.md#table_engines-file) and [URL](../../../engines/table-engines/special/url.md#table_engines-url) engines, but provides Hadoop-specific features.

 ## Usage {#usage}
@ -174,7 +174,7 @@ Similar to GraphiteMergeTree, the HDFS engine supports extended configuration us
 | dfs\_domain\_socket\_path                             | ""                      |


-[HDFS Configuration Reference ](https://hawq.apache.org/docs/userguide/2.3.0.0-incubating/reference/HDFSConfigurationParameterReference.html) might explain some parameters.
+[HDFS Configuration Reference](https://hawq.apache.org/docs/userguide/2.3.0.0-incubating/reference/HDFSConfigurationParameterReference.html) might explain some parameters.


 #### ClickHouse extras {#clickhouse-extras}
@ -185,7 +185,6 @@ Similar to GraphiteMergeTree, the HDFS engine supports extended configuration us
 |hadoop\_kerberos\_kinit\_command                       | kinit                   |

 #### Limitations {#limitations}
-
  * hadoop\_security\_kerberos\_ticket\_cache\_path can be global only, not user specific

 ## Kerberos support {#kerberos-support}
@ -207,4 +206,4 @@ If hadoop\_kerberos\_keytab, hadoop\_kerberos\_principal or hadoop\_kerberos\_ki

 -   [Virtual columns](../../../engines/table-engines/index.md#table_engines-virtual_columns)

-[Original article](https://clickhouse.tech/docs/en/operations/table_engines/hdfs/) <!--hide-->
+[Original article](https://clickhouse.tech/docs/en/engines/table-engines/integrations/hdfs/) <!--hide-->
--- a/docs/en/engines/table-engines/integrations/index.md
+++ b/docs/en/engines/table-engines/integrations/index.md
@ -18,3 +18,6 @@ List of supported integrations:
 -   [Kafka](../../../engines/table-engines/integrations/kafka.md)
 -   [EmbeddedRocksDB](../../../engines/table-engines/integrations/embedded-rocksdb.md)
 -   [RabbitMQ](../../../engines/table-engines/integrations/rabbitmq.md)
+-   [PostgreSQL](../../../engines/table-engines/integrations/postgresql.md)
+
+[Original article](https://clickhouse.tech/docs/en/engines/table-engines/integrations/) <!--hide-->
--- a/docs/en/engines/table-engines/integrations/jdbc.md
+++ b/docs/en/engines/table-engines/integrations/jdbc.md
@ -85,4 +85,4 @@ FROM jdbc_table

 -   [JDBC table function](../../../sql-reference/table-functions/jdbc.md).

-[Original article](https://clickhouse.tech/docs/en/operations/table_engines/jdbc/) <!--hide-->
+[Original article](https://clickhouse.tech/docs/en/engines/table-engines/integrations/jdbc/) <!--hide-->
--- a/docs/en/engines/table-engines/integrations/kafka.md
+++ b/docs/en/engines/table-engines/integrations/kafka.md
@ -194,4 +194,4 @@ Example:
 -   [Virtual columns](../../../engines/table-engines/index.md#table_engines-virtual_columns)
 -   [background_schedule_pool_size](../../../operations/settings/settings.md#background_schedule_pool_size)

-[Original article](https://clickhouse.tech/docs/en/operations/table_engines/kafka/) <!--hide-->
+[Original article](https://clickhouse.tech/docs/en/engines/table-engines/integrations/kafka/) <!--hide-->
--- a/docs/en/engines/table-engines/integrations/mongodb.md
+++ b/docs/en/engines/table-engines/integrations/mongodb.md
@ -54,4 +54,4 @@ SELECT COUNT() FROM mongo_table;
 └─────────┘
 ```

-[Original article](https://clickhouse.tech/docs/en/operations/table_engines/integrations/mongodb/) <!--hide-->
+[Original article](https://clickhouse.tech/docs/en/engines/table-engines/integrations/mongodb/) <!--hide-->
--- a/docs/en/engines/table-engines/integrations/mysql.md
+++ b/docs/en/engines/table-engines/integrations/mysql.md
@ -24,6 +24,7 @@ The table structure can differ from the original MySQL table structure:

 -   Column names should be the same as in the original MySQL table, but you can use just some of these columns and in any order.
 -   Column types may differ from those in the original MySQL table. ClickHouse tries to [cast](../../../sql-reference/functions/type-conversion-functions.md#type_conversion_function-cast) values to the ClickHouse data types.
+-   Setting `external_table_functions_use_nulls` defines how to handle Nullable columns. Default is true, if false - table function will not make nullable columns and will insert default values instead of nulls. This is also applicable for null values inside array data types.

 **Engine Parameters**

@ -100,4 +101,4 @@ SELECT * FROM mysql_table
 -   [The ‘mysql’ table function](../../../sql-reference/table-functions/mysql.md)
 -   [Using MySQL as a source of external dictionary](../../../sql-reference/dictionaries/external-dictionaries/external-dicts-dict-sources.md#dicts-external_dicts_dict_sources-mysql)

-[Original article](https://clickhouse.tech/docs/en/operations/table_engines/mysql/) <!--hide-->
+[Original article](https://clickhouse.tech/docs/en/engines/table-engines/integrations/mysql/) <!--hide-->
--- a/docs/en/engines/table-engines/integrations/odbc.md
+++ b/docs/en/engines/table-engines/integrations/odbc.md
@ -29,6 +29,7 @@ The table structure can differ from the source table structure:

 -   Column names should be the same as in the source table, but you can use just some of these columns and in any order.
 -   Column types may differ from those in the source table. ClickHouse tries to [cast](../../../sql-reference/functions/type-conversion-functions.md#type_conversion_function-cast) values to the ClickHouse data types.
+-   Setting `external_table_functions_use_nulls` defines how to handle Nullable columns. Default is true, if false - table function will not make nullable columns and will insert default values instead of nulls. This is also applicable for null values inside array data types.

 **Engine Parameters**

@ -127,4 +128,4 @@ SELECT * FROM odbc_t
 -   [ODBC external dictionaries](../../../sql-reference/dictionaries/external-dictionaries/external-dicts-dict-sources.md#dicts-external_dicts_dict_sources-odbc)
 -   [ODBC table function](../../../sql-reference/table-functions/odbc.md)

-[Original article](https://clickhouse.tech/docs/en/operations/table_engines/odbc/) <!--hide-->
+[Original article](https://clickhouse.tech/docs/en/engines/table-engines/integrations/odbc/) <!--hide-->
--- a/docs/en/engines/table-engines/integrations/postgresql.md
+++ b/docs/en/engines/table-engines/integrations/postgresql.md
@ -0,0 +1,106 @@
+---
+toc_priority: 8
+toc_title: PostgreSQL
+---
+
+# PosgtreSQL {#postgresql}
+
+The PostgreSQL engine allows you to perform `SELECT` queries on data that is stored on a remote PostgreSQL server.
+
+## Creating a Table {#creating-a-table}
+
+``` sql
+CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
+(
+    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1] [TTL expr1],
+    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2] [TTL expr2],
+    ...
+) ENGINE = PostgreSQL('host:port', 'database', 'table', 'user', 'password');
+```
+
+See a detailed description of the [CREATE TABLE](../../../sql-reference/statements/create/table.md#create-table-query) query.
+
+The table structure can differ from the original PostgreSQL table structure:
+
+-   Column names should be the same as in the original PostgreSQL table, but you can use just some of these columns and in any order.
+-   Column types may differ from those in the original PostgreSQL table. ClickHouse tries to [cast](../../../sql-reference/functions/type-conversion-functions.md#type_conversion_function-cast) values to the ClickHouse data types.
+-   Setting `external_table_functions_use_nulls` defines how to handle Nullable columns. Default is 1, if 0 - table function will not make nullable columns and will insert default values instead of nulls. This is also applicable for null values inside array data types.
+
+**Engine Parameters**
+
+-   `host:port` — PostgreSQL server address.
+
+-   `database` — Remote database name.
+
+-   `table` — Remote table name.
+
+-   `user` — PostgreSQL user.
+
+-   `password` — User password.
+
+SELECT Queries on PostgreSQL side run as `COPY (SELECT ...) TO STDOUT` inside read-only PostgreSQL transaction with commit after each `SELECT` query.
+
+Simple `WHERE` clauses such as `=, !=, >, >=, <, <=, IN` are executed on the PostgreSQL server.
+
+All joins, aggregations, sorting, `IN [ array ]` conditions and the `LIMIT` sampling constraint are executed in ClickHouse only after the query to PostgreSQL finishes.
+
+INSERT Queries on PostgreSQL side run as `COPY "table_name" (field1, field2, ... fieldN) FROM STDIN` inside PostgreSQL transaction with auto-commit after each `INSERT` statement.
+
+PostgreSQL Array types converts into ClickHouse arrays.
+Be careful in PostgreSQL an array data created like a type_name[] may contain multi-dimensional arrays of different dimensions in different table rows in same column, but in ClickHouse it is only allowed to have multidimensional arrays of the same count of dimensions in all table rows in same column.
+
+## Usage Example {#usage-example}
+
+Table in PostgreSQL:
+
+``` text
+postgres=# CREATE TABLE "public"."test" (
+"int_id" SERIAL,
+"int_nullable" INT NULL DEFAULT NULL,
+"float" FLOAT NOT NULL,
+"str" VARCHAR(100) NOT NULL DEFAULT '',
+"float_nullable" FLOAT NULL DEFAULT NULL,
+PRIMARY KEY (int_id));
+
+CREATE TABLE
+
+postgres=# insert into test (int_id, str, "float") VALUES (1,'test',2);
+INSERT 0 1
+
+postgresql> select * from test;
+ int_id | int_nullable | float | str  | float_nullable
+--------+--------------+-------+------+----------------
+      1 |              |     2 | test |
+(1 row)
+```
+
+Table in ClickHouse, retrieving data from the PostgreSQL table created above:
+
+``` sql
+CREATE TABLE default.postgresql_table
+(
+    `float_nullable` Nullable(Float32),
+    `str` String,
+    `int_id` Int32
+)
+ENGINE = PostgreSQL('localhost:5432', 'public', 'test', 'postges_user', 'postgres_password');
+```
+
+``` sql
+SELECT * FROM postgresql_table WHERE str IN ('test') 
+```
+
+``` text
+┌─float_nullable─┬─str──┬─int_id─┐
+│           ᴺᵁᴸᴸ │ test │      1 │
+└────────────────┴──────┴────────┘
+1 rows in set. Elapsed: 0.019 sec.
+```
+
+
+## See Also {#see-also}
+
+-   [The ‘postgresql’ table function](../../../sql-reference/table-functions/postgresql.md)
+-   [Using PostgreSQL as a source of external dictionary](../../../sql-reference/dictionaries/external-dictionaries/external-dicts-dict-sources.md#dicts-external_dicts_dict_sources-postgresql)
+
+[Original article](https://clickhouse.tech/docs/en/engines/table-engines/integrations/postgresql/) <!--hide-->
--- a/docs/en/engines/table-engines/integrations/rabbitmq.md
+++ b/docs/en/engines/table-engines/integrations/rabbitmq.md
@ -163,3 +163,5 @@ Example:
 -   `_redelivered` - `redelivered` flag of the message.
 -   `_message_id` - messageID of the received message; non-empty if was set, when message was published.
 -   `_timestamp` - timestamp of the received message; non-empty if was set, when message was published.
+
+[Original article](https://clickhouse.tech/docs/en/engines/table-engines/integrations/rabbitmq/) <!--hide-->
--- a/docs/en/engines/table-engines/integrations/s3.md
+++ b/docs/en/engines/table-engines/integrations/s3.md
@ -6,11 +6,11 @@ toc_title: S3
 # S3 {#table_engines-s3}

 This engine provides integration with [Amazon S3](https://aws.amazon.com/s3/) ecosystem. This engine is similar
-to the [HDFS](../../../engines/table-engines/special/file.md#table_engines-hdfs) engine, but provides S3-specific features.
+to the [HDFS](../../../engines/table-engines/integrations/hdfs.md#table_engines-hdfs) engine, but provides S3-specific features.

 ## Usage {#usage}

-``` sql
+```sql
 ENGINE = S3(path, [aws_access_key_id, aws_secret_access_key,] format, structure, [compression])
 ```

@ -25,23 +25,23 @@ ENGINE = S3(path, [aws_access_key_id, aws_secret_access_key,] format, structure,

 **1.** Set up the `s3_engine_table` table:

-``` sql
+```sql
 CREATE TABLE s3_engine_table (name String, value UInt32) ENGINE=S3('https://storage.yandexcloud.net/my-test-bucket-768/test-data.csv.gz', 'CSV', 'name String, value UInt32', 'gzip')
 ```

 **2.** Fill file:

-``` sql
+```sql
 INSERT INTO s3_engine_table VALUES ('one', 1), ('two', 2), ('three', 3)
 ```

 **3.** Query the data:

-``` sql
+```sql
 SELECT * FROM s3_engine_table LIMIT 2
 ```

-``` text
+```text
 ┌─name─┬─value─┐
 │ one  │     1 │
 │ two  │     2 │
@ -69,7 +69,7 @@ Constructions with `{}` are similar to the [remote](../../../sql-reference/table

 **Example**

-1. Suppose we have several files in TSV format with the following URIs on HDFS:
+1. Suppose we have several files in CSV format with the following URIs on S3:

 -   ‘https://storage.yandexcloud.net/my-test-bucket-768/some_prefix/some_file_1.csv’
 -   ‘https://storage.yandexcloud.net/my-test-bucket-768/some_prefix/some_file_2.csv’
@ -82,19 +82,19 @@ Constructions with `{}` are similar to the [remote](../../../sql-reference/table

 <!-- -->

-``` sql
+```sql
 CREATE TABLE table_with_range (name String, value UInt32) ENGINE = S3('https://storage.yandexcloud.net/my-test-bucket-768/{some,another}_prefix/some_file_{1..3}', 'CSV')
 ```

 3. Another way:

-``` sql
+```sql
 CREATE TABLE table_with_question_mark (name String, value UInt32) ENGINE = S3('https://storage.yandexcloud.net/my-test-bucket-768/{some,another}_prefix/some_file_?', 'CSV')
 ```

 4. Table consists of all the files in both directories (all files should satisfy format and schema described in query):

-``` sql
+```sql
 CREATE TABLE table_with_asterisk (name String, value UInt32) ENGINE = S3('https://storage.yandexcloud.net/my-test-bucket-768/{some,another}_prefix/*', 'CSV')
 ```

@ -105,7 +105,7 @@ CREATE TABLE table_with_asterisk (name String, value UInt32) ENGINE = S3('https:

 Create table with files named `file-000.csv`, `file-001.csv`, … , `file-999.csv`:

-``` sql
+```sql
 CREATE TABLE big_table (name String, value UInt32) ENGINE = S3('https://storage.yandexcloud.net/my-test-bucket-768/big_prefix/file-{000..999}.csv', 'CSV')
 ```

@ -124,7 +124,7 @@ The following settings can be set before query execution or placed into configur

 -   `s3_max_single_part_upload_size` — Default value is `64Mb`. The maximum size of object to upload using singlepart upload to S3.
 -   `s3_min_upload_part_size` — Default value is `512Mb`. The minimum size of part to upload during multipart upload to [S3 Multipart upload](https://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html).
-   `s3_max_redirects` — Default value is `10`. Max number of S3 redirects hops allowed.
+-   `s3_max_redirects` — Default value is `10`. Max number of HTTP redirects S3 hops allowed.

 Security consideration: if malicious user can specify arbitrary S3 URLs, `s3_max_redirects` must be set to zero to avoid [SSRF](https://en.wikipedia.org/wiki/Server-side_request_forgery) attacks; or alternatively, `remote_host_filter` must be specified in server configuration.

@ -153,4 +153,4 @@ Example:
 </s3>
 ```

-[Original article](https://clickhouse.tech/docs/en/operations/table_engines/s3/) <!--hide-->
+[Original article](https://clickhouse.tech/docs/en/engines/table-engines/integrations/s3/) <!--hide-->
--- a/docs/en/engines/table-engines/mergetree-family/mergetree.md
+++ b/docs/en/engines/table-engines/mergetree-family/mergetree.md
@ -353,7 +353,7 @@ The `set` index can be used with all functions. Function subsets for other index
 | Function (operator) / Index                                                                                | primary key | minmax | ngrambf_v1 | tokenbf_v1 | bloom_filter |
 |------------------------------------------------------------------------------------------------------------|-------------|--------|-------------|-------------|---------------|
 | [equals (=, ==)](../../../sql-reference/functions/comparison-functions.md#function-equals)                 | ✔           | ✔      | ✔           | ✔           | ✔             |
-| [notEquals(!=, \<\>)](../../../sql-reference/functions/comparison-functions.md#function-notequals)         | ✔           | ✔      | ✔           | ✔           | ✔             |
+| [notEquals(!=, <>)](../../../sql-reference/functions/comparison-functions.md#function-notequals)         | ✔           | ✔      | ✔           | ✔           | ✔             |
 | [like](../../../sql-reference/functions/string-search-functions.md#function-like)                          | ✔           | ✔      | ✔           | ✔           | ✗             |
 | [notLike](../../../sql-reference/functions/string-search-functions.md#function-notlike)                    | ✔           | ✔      | ✔           | ✔           | ✗             |
 | [startsWith](../../../sql-reference/functions/string-functions.md#startswith)                              | ✔           | ✔      | ✔           | ✔           | ✗             |
@ -361,10 +361,10 @@ The `set` index can be used with all functions. Function subsets for other index
 | [multiSearchAny](../../../sql-reference/functions/string-search-functions.md#function-multisearchany)      | ✗           | ✗      | ✔           | ✗           | ✗             |
 | [in](../../../sql-reference/functions/in-functions.md#in-functions)                                        | ✔           | ✔      | ✔           | ✔           | ✔             |
 | [notIn](../../../sql-reference/functions/in-functions.md#in-functions)                                     | ✔           | ✔      | ✔           | ✔           | ✔             |
-| [less (\<)](../../../sql-reference/functions/comparison-functions.md#function-less)                        | ✔           | ✔      | ✗           | ✗           | ✗             |
-| [greater (\>)](../../../sql-reference/functions/comparison-functions.md#function-greater)                  | ✔           | ✔      | ✗           | ✗           | ✗             |
-| [lessOrEquals (\<=)](../../../sql-reference/functions/comparison-functions.md#function-lessorequals)       | ✔           | ✔      | ✗           | ✗           | ✗             |
-| [greaterOrEquals (\>=)](../../../sql-reference/functions/comparison-functions.md#function-greaterorequals) | ✔           | ✔      | ✗           | ✗           | ✗             |
+| [less (<)](../../../sql-reference/functions/comparison-functions.md#function-less)                        | ✔           | ✔      | ✗           | ✗           | ✗             |
+| [greater (>)](../../../sql-reference/functions/comparison-functions.md#function-greater)                  | ✔           | ✔      | ✗           | ✗           | ✗             |
+| [lessOrEquals (<=)](../../../sql-reference/functions/comparison-functions.md#function-lessorequals)       | ✔           | ✔      | ✗           | ✗           | ✗             |
+| [greaterOrEquals (>=)](../../../sql-reference/functions/comparison-functions.md#function-greaterorequals) | ✔           | ✔      | ✗           | ✗           | ✗             |
 | [empty](../../../sql-reference/functions/array-functions.md#function-empty)                                | ✔           | ✔      | ✗           | ✗           | ✗             |
 | [notEmpty](../../../sql-reference/functions/array-functions.md#function-notempty)                          | ✔           | ✔      | ✗           | ✗           | ✗             |
 | hasToken                                                                                                   | ✗           | ✗      | ✗           | ✔           | ✗             |
@ -529,7 +529,7 @@ CREATE TABLE table_for_aggregation
    y Int
 )
 ENGINE = MergeTree
-ORDER BY k1, k2
+ORDER BY (k1, k2)
 TTL d + INTERVAL 1 MONTH GROUP BY k1, k2 SET x = max(x), y = min(y);
 ```

@ -701,6 +701,32 @@ The `default` storage policy implies using only one volume, which consists of on

 The number of threads performing background moves of data parts can be changed by [background_move_pool_size](../../../operations/settings/settings.md#background_move_pool_size) setting.

+### Details {#details}
+
+In the case of `MergeTree` tables, data is getting to disk in different ways:
+
+-   As a result of an insert (`INSERT` query).
+-   During background merges and [mutations](../../../sql-reference/statements/alter/index.md#alter-mutations).
+-   When downloading from another replica.
+-   As a result of partition freezing [ALTER TABLE … FREEZE PARTITION](../../../sql-reference/statements/alter/partition.md#alter_freeze-partition).
+
+In all these cases except for mutations and partition freezing, a part is stored on a volume and a disk according to the given storage policy:
+
+1.  The first volume (in the order of definition) that has enough disk space for storing a part (`unreserved_space > current_part_size`) and allows for storing parts of a given size (`max_data_part_size_bytes > current_part_size`) is chosen.
+2.  Within this volume, that disk is chosen that follows the one, which was used for storing the previous chunk of data, and that has free space more than the part size (`unreserved_space - keep_free_space_bytes > current_part_size`).
+
+Under the hood, mutations and partition freezing make use of [hard links](https://en.wikipedia.org/wiki/Hard_link). Hard links between different disks are not supported, therefore in such cases the resulting parts are stored on the same disks as the initial ones.
+
+In the background, parts are moved between volumes on the basis of the amount of free space (`move_factor` parameter) according to the order the volumes are declared in the configuration file.
+Data is never transferred from the last one and into the first one. One may use system tables [system.part_log](../../../operations/system-tables/part_log.md#system_tables-part-log) (field `type = MOVE_PART`) and [system.parts](../../../operations/system-tables/parts.md#system_tables-parts) (fields `path` and `disk`) to monitor background moves. Also, the detailed information can be found in server logs.
+
+User can force moving a part or a partition from one volume to another using the query [ALTER TABLE … MOVE PART\|PARTITION … TO VOLUME\|DISK …](../../../sql-reference/statements/alter/partition.md#alter_move-partition), all the restrictions for background operations are taken into account. The query initiates a move on its own and does not wait for background operations to be completed. User will get an error message if not enough free space is available or if any of the required conditions are not met.
+
+Moving data does not interfere with data replication. Therefore, different storage policies can be specified for the same table on different replicas.
+
+After the completion of background merges and mutations, old parts are removed only after a certain amount of time (`old_parts_lifetime`).
+During this time, they are not moved to other volumes or disks. Therefore, until the parts are finally removed, they are still taken into account for evaluation of the occupied disk space.
+
 ## Using S3 for Data Storage {#table_engine-mergetree-s3}

 `MergeTree` family table engines is able to store data to [S3](https://aws.amazon.com/s3/) using a disk with type `s3`.
@ -722,7 +748,6 @@ Configuration markup:
            </proxy>
            <connect_timeout_ms>10000</connect_timeout_ms>
            <request_timeout_ms>5000</request_timeout_ms>
-            <max_connections>100</max_connections>
            <retry_attempts>10</retry_attempts>
            <min_bytes_for_seek>1000</min_bytes_for_seek>
            <metadata_path>/var/lib/clickhouse/disks/s3/</metadata_path>
@ -745,7 +770,6 @@ Optional parameters:
 -   `proxy` — Proxy configuration for S3 endpoint. Each `uri` element inside `proxy` block should contain a proxy URL. 
 -   `connect_timeout_ms` — Socket connect timeout in milliseconds. Default value is `10 seconds`. 
 -   `request_timeout_ms` — Request timeout in milliseconds. Default value is `5 seconds`. 
-   `max_connections` — S3 connections pool size. Default value is `100`. 
 -   `retry_attempts` — Number of retry attempts in case of failed request. Default value is `10`. 
 -   `min_bytes_for_seek` — Minimal number of bytes to use seek operation instead of sequential read. Default value is `1 Mb`. 
 -   `metadata_path` — Path on local FS to store metadata files for S3. Default value is `/var/lib/clickhouse/disks/<disk_name>/`. 
@ -793,30 +817,4 @@ S3 disk can be configured as `main` or `cold` storage:

 In case of `cold` option a data can be moved to S3 if local disk free size will be smaller than `move_factor * disk_size` or by TTL move rule. 

-### Details {#details}
-
-In the case of `MergeTree` tables, data is getting to disk in different ways:
-
-   As a result of an insert (`INSERT` query).
-   During background merges and [mutations](../../../sql-reference/statements/alter/index.md#alter-mutations).
-   When downloading from another replica.
-   As a result of partition freezing [ALTER TABLE … FREEZE PARTITION](../../../sql-reference/statements/alter/partition.md#alter_freeze-partition).
-
-In all these cases except for mutations and partition freezing, a part is stored on a volume and a disk according to the given storage policy:
-
-1.  The first volume (in the order of definition) that has enough disk space for storing a part (`unreserved_space > current_part_size`) and allows for storing parts of a given size (`max_data_part_size_bytes > current_part_size`) is chosen.
-2.  Within this volume, that disk is chosen that follows the one, which was used for storing the previous chunk of data, and that has free space more than the part size (`unreserved_space - keep_free_space_bytes > current_part_size`).
-
-Under the hood, mutations and partition freezing make use of [hard links](https://en.wikipedia.org/wiki/Hard_link). Hard links between different disks are not supported, therefore in such cases the resulting parts are stored on the same disks as the initial ones.
-
-In the background, parts are moved between volumes on the basis of the amount of free space (`move_factor` parameter) according to the order the volumes are declared in the configuration file.
-Data is never transferred from the last one and into the first one. One may use system tables [system.part_log](../../../operations/system-tables/part_log.md#system_tables-part-log) (field `type = MOVE_PART`) and [system.parts](../../../operations/system-tables/parts.md#system_tables-parts) (fields `path` and `disk`) to monitor background moves. Also, the detailed information can be found in server logs.
-
-User can force moving a part or a partition from one volume to another using the query [ALTER TABLE … MOVE PART\|PARTITION … TO VOLUME\|DISK …](../../../sql-reference/statements/alter/partition.md#alter_move-partition), all the restrictions for background operations are taken into account. The query initiates a move on its own and does not wait for background operations to be completed. User will get an error message if not enough free space is available or if any of the required conditions are not met.
-
-Moving data does not interfere with data replication. Therefore, different storage policies can be specified for the same table on different replicas.
-
-After the completion of background merges and mutations, old parts are removed only after a certain amount of time (`old_parts_lifetime`).
-During this time, they are not moved to other volumes or disks. Therefore, until the parts are finally removed, they are still taken into account for evaluation of the occupied disk space.
-
 [Original article](https://clickhouse.tech/docs/ru/operations/table_engines/mergetree/) <!--hide-->
--- a/docs/en/engines/table-engines/special/distributed.md
+++ b/docs/en/engines/table-engines/special/distributed.md
@ -73,19 +73,18 @@ Clusters are set like this:
 ``` xml
 <remote_servers>
    <logs>
+        <!-- Inter-server per-cluster secret for Distributed queries
+             default: no secret (no authentication will be performed)
+
+             If set, then Distributed queries will be validated on shards, so at least:
+             - such cluster should exist on the shard,
+             - such cluster should have the same secret.
+
+             And also (and which is more important), the initial_user will
+             be used as current user for the query.
+        -->
+        <!-- <secret></secret> -->
        <shard>
-            <!-- Inter-server per-cluster secret for Distributed queries
-                 default: no secret (no authentication will be performed)
-
-                 If set, then Distributed queries will be validated on shards, so at least:
-                 - such cluster should exist on the shard,
-                 - such cluster should have the same secret.
-
-                 And also (and which is more important), the initial_user will
-                 be used as current user for the query.
-            -->
-            <!-- <secret></secret> -->
-
            <!-- Optional. Shard weight when writing data. Default: 1. -->
            <weight>1</weight>
            <!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
--- a/docs/en/getting-started/playground.md
+++ b/docs/en/getting-started/playground.md
@ -38,10 +38,10 @@ The queries are executed as a read-only user. It implies some limitations:

 The following settings are also enforced:

- [max_result_bytes=10485760](../operations/settings/query_complexity/#max-result-bytes)
- [max_result_rows=2000](../operations/settings/query_complexity/#setting-max_result_rows)
- [result_overflow_mode=break](../operations/settings/query_complexity/#result-overflow-mode)
- [max_execution_time=60000](../operations/settings/query_complexity/#max-execution-time)
+- [max_result_bytes=10485760](../operations/settings/query-complexity/#max-result-bytes)
+- [max_result_rows=2000](../operations/settings/query-complexity/#setting-max_result_rows)
+- [result_overflow_mode=break](../operations/settings/query-complexity/#result-overflow-mode)
+- [max_execution_time=60000](../operations/settings/query-complexity/#max-execution-time)

 ## Examples {#examples}

--- a/docs/en/interfaces/formats.md
+++ b/docs/en/interfaces/formats.md
@ -1254,7 +1254,7 @@ ClickHouse supports configurable precision of `Decimal` type. The `INSERT` query

 Unsupported Parquet data types: `DATE32`, `TIME32`, `FIXED_SIZE_BINARY`, `JSON`, `UUID`, `ENUM`.

-Data types of ClickHouse table columns can differ from the corresponding fields of the Parquet data inserted. When inserting data, ClickHouse interprets data types according to the table above and then [cast](../query_language/functions/type_conversion_functions/#type_conversion_function-cast) the data to that data type which is set for the ClickHouse table column.
+Data types of ClickHouse table columns can differ from the corresponding fields of the Parquet data inserted. When inserting data, ClickHouse interprets data types according to the table above and then [cast](../sql-reference/functions/type-conversion-functions/#type_conversion_function-cast) the data to that data type which is set for the ClickHouse table column.

 ### Inserting and Selecting Data {#inserting-and-selecting-data}

@ -1359,15 +1359,15 @@ When working with the `Regexp` format, you can use the following settings:
    - Escaped (similarly to [TSV](#tabseparated))
    - Quoted (similarly to [Values](#data-format-values))
    - Raw (extracts subpatterns as a whole, no escaping rules)
- `format_regexp_skip_unmatched` — [UInt8](../sql-reference/data-types/int-uint.md). Defines the need to throw an exeption in case the `format_regexp` expression does not match the imported data. Can be set to `0` or `1`. 
+- `format_regexp_skip_unmatched` — [UInt8](../sql-reference/data-types/int-uint.md). Defines the need to throw an exeption in case the `format_regexp` expression does not match the imported data. Can be set to `0` or `1`.

-**Usage** 
+**Usage**

-The regular expression from `format_regexp` setting is applied to every line of imported data. The number of subpatterns in the regular expression must be equal to the number of columns in imported dataset. 
+The regular expression from `format_regexp` setting is applied to every line of imported data. The number of subpatterns in the regular expression must be equal to the number of columns in imported dataset.

-Lines of the imported data must be separated by newline character `'\n'` or DOS-style newline `"\r\n"`. 
+Lines of the imported data must be separated by newline character `'\n'` or DOS-style newline `"\r\n"`.

-The content of every matched subpattern is parsed with the method of corresponding data type, according to `format_regexp_escaping_rule` setting. 
+The content of every matched subpattern is parsed with the method of corresponding data type, according to `format_regexp_escaping_rule` setting.

 If the regular expression does not match the line and `format_regexp_skip_unmatched` is set to 1, the line is silently skipped. If `format_regexp_skip_unmatched` is set to 0, exception is thrown.

--- a/docs/en/interfaces/third-party/gui.md
+++ b/docs/en/interfaces/third-party/gui.md
@ -167,4 +167,21 @@ Features:

 [How to configure ClickHouse in Looker.](https://docs.looker.com/setup-and-management/database-config/clickhouse)

+### SeekTable {#seektable}
+
+[SeekTable](https://www.seektable.com) is a self-service BI tool for data exploration and operational reporting. SeekTable is available both as a cloud service and a self-hosted version. SeekTable reports may be embedded into any web-app.
+
+Features:
+
+-   Business users-friendly reports builder.
+-   Powerful report parameters for SQL filtering and report-specific query customizations.
+-   Can connect to ClickHouse both with a native TCP/IP endpoint and a HTTP(S) interface (2 different drivers).
+-   It is possible to use all power of CH SQL dialect in dimensions/measures definitions
+-   [Web API](https://www.seektable.com/help/web-api-integration) for automated reports generation.
+-   Supports reports development flow with account data [backup/restore](https://www.seektable.com/help/self-hosted-backup-restore), data models (cubes) / reports configuration is a human-readable XML and can be stored under version control.
+
+SeekTable is [free](https://www.seektable.com/help/cloud-pricing) for personal/individual usage.
+
+[How to configure ClickHouse connection in SeekTable.](https://www.seektable.com/help/clickhouse-pivot-table)
+
 [Original article](https://clickhouse.tech/docs/en/interfaces/third-party/gui/) <!--hide-->
--- a/docs/en/introduction/distinctive-features.md
+++ b/docs/en/introduction/distinctive-features.md
@ -7,9 +7,9 @@ toc_title: Distinctive Features

 ## True Column-Oriented Database Management System {#true-column-oriented-dbms}

-In a true column-oriented DBMS, no extra data is stored with the values. Among other things, this means that constant-length values must be supported, to avoid storing their length “number” next to the values. As an example, a billion UInt8-type values should consume around 1 GB uncompressed, or this strongly affects the CPU use. It is essential to store data compactly (without any “garbage”) even when uncompressed, since the speed of decompression (CPU usage) depends mainly on the volume of uncompressed data.
+In a real column-oriented DBMS, no extra data is stored with the values. Among other things, this means that constant-length values must be supported, to avoid storing their length “number” next to the values. For example, a billion UInt8-type values should consume around 1 GB uncompressed, or this strongly affects the CPU use. It is essential to store data compactly (without any “garbage”) even when uncompressed since the speed of decompression (CPU usage) depends mainly on the volume of uncompressed data.

-It is worth noting because there are systems that can store values of different columns separately, but that can’t effectively process analytical queries due to their optimization for other scenarios. Examples are HBase, BigTable, Cassandra, and HyperTable. In these systems, you would get throughput around a hundred thousand rows per second, but not hundreds of millions of rows per second.
+It is worth noting because there are systems that can store values of different columns separately, but that can’t effectively process analytical queries due to their optimization for other scenarios. Examples are HBase, BigTable, Cassandra, and HyperTable. You would get throughput around a hundred thousand rows per second in these systems, but not hundreds of millions of rows per second.

 It’s also worth noting that ClickHouse is a database management system, not a single database. ClickHouse allows creating tables and databases in runtime, loading data, and running queries without reconfiguring and restarting the server.

--- a/docs/en/operations/external-authenticators/ldap.md
+++ b/docs/en/operations/external-authenticators/ldap.md
@ -1,4 +1,4 @@
-# LDAP {#external-authenticators-ldap} 
+# LDAP {#external-authenticators-ldap}

 LDAP server can be used to authenticate ClickHouse users. There are two different approaches for doing this:

@ -87,14 +87,13 @@ Note, that user `my_user` refers to `my_ldap_server`. This LDAP server must be c

 When SQL-driven [Access Control and Account Management](../access-rights.md#access-control) is enabled in ClickHouse, users that are authenticated by LDAP servers can also be created using the [CRATE USER](../../sql-reference/statements/create/user.md#create-user-statement) statement.

-
 ```sql
-CREATE USER my_user IDENTIFIED WITH ldap_server BY 'my_ldap_server'
+CREATE USER my_user IDENTIFIED WITH ldap SERVER 'my_ldap_server'
 ```

 ## LDAP Exernal User Directory {#ldap-external-user-directory}

-In addition to the locally defined users, a remote LDAP server can be used as a source of user definitions. In order to achieve this, specify previously defined LDAP server name (see [LDAP Server Definition](#ldap-server-definition)) in the `ldap` section inside the `users_directories` section of the `config.xml` file.
+In addition to the locally defined users, a remote LDAP server can be used as a source of user definitions. In order to achieve this, specify previously defined LDAP server name (see [LDAP Server Definition](#ldap-server-definition)) in an `ldap` section inside the `users_directories` section of the `config.xml` file.

 At each login attempt, ClickHouse will try to find the user definition locally and authenticate it as usual, but if the user is not defined, ClickHouse will assume it exists in the external LDAP directory, and will try to "bind" to the specified DN at the LDAP server using the provided credentials. If successful, the user will be considered existing and authenticated. The user will be assigned roles from the list specified in the `roles` section. Additionally, LDAP "search" can be performed and results can be transformed and treated as role names and then be assigned to the user if the `role_mapping` section is also configured. All this implies that the SQL-driven [Access Control and Account Management](../access-rights.md#access-control) is enabled and roles are created using the [CREATE ROLE](../../sql-reference/statements/create/role.md#create-role-statement) statement.

@ -153,4 +152,3 @@ Parameters:
        - `prefix` - prefix, that will be expected to be in front of each string in the original
          list of strings returned by the LDAP search. Prefix will be removed from the original
          strings and resulting strings will be treated as local role names. Empty, by default.
-
--- a/docs/en/operations/system-tables/data_type_families.md
+++ b/docs/en/operations/system-tables/data_type_families.md
@ -1,6 +1,6 @@
 # system.data_type_families {#system_tables-data_type_families}

-Contains information about supported [data types](../../sql-reference/data-types/).
+Contains information about supported [data types](../../sql-reference/data-types/index.md).

 Columns:

--- a/docs/en/operations/system-tables/part_log.md
+++ b/docs/en/operations/system-tables/part_log.md
@ -17,7 +17,6 @@ The `system.part_log` table contains the following columns:
 -   `event_date` ([Date](../../sql-reference/data-types/date.md)) — Event date.
 -   `event_time` ([DateTime](../../sql-reference/data-types/datetime.md)) — Event time.
 -   `event_time_microseconds` ([DateTime64](../../sql-reference/data-types/datetime64.md)) — Event time with microseconds precision.
-
 -   `duration_ms` ([UInt64](../../sql-reference/data-types/int-uint.md)) — Duration.
 -   `database` ([String](../../sql-reference/data-types/string.md)) — Name of the database the data part is in.
 -   `table` ([String](../../sql-reference/data-types/string.md)) — Name of the table the data part is in.
--- a/docs/en/operations/system-tables/replication_queue.md
+++ b/docs/en/operations/system-tables/replication_queue.md
@ -70,12 +70,12 @@ num_tries:              36
 last_exception:         Code: 226, e.displayText() = DB::Exception: Marks file '/opt/clickhouse/data/merge/visits_v2/tmp_fetch_20201130_121373_121384_2/CounterID.mrk' doesn't exist (version 20.8.7.15 (official build))
 last_attempt_time:      2020-12-08 17:35:54
 num_postponed:          0
-postpone_reason:        
+postpone_reason:
 last_postpone_time:     1970-01-01 03:00:00
 ```

 **See Also**

-   [Managing ReplicatedMergeTree Tables](../../sql-reference/statements/system.md/#query-language-system-replicated)
+-   [Managing ReplicatedMergeTree Tables](../../sql-reference/statements/system.md#query-language-system-replicated)

 [Original article](https://clickhouse.tech/docs/en/operations/system_tables/replication_queue) <!--hide-->
--- a/docs/en/operations/system-tables/settings.md
+++ b/docs/en/operations/system-tables/settings.md
@ -48,5 +48,6 @@ SELECT * FROM system.settings WHERE changed AND name='load_balancing'
 -   [Settings](../../operations/settings/index.md#session-settings-intro)
 -   [Permissions for Queries](../../operations/settings/permissions-for-queries.md#settings_readonly)
 -   [Constraints on Settings](../../operations/settings/constraints-on-settings.md)
+-   [SHOW SETTINGS](../../sql-reference/statements/show.md#show-settings) statement

 [Original article](https://clickhouse.tech/docs/en/operations/system_tables/settings) <!--hide-->
--- a/docs/en/operations/system-tables/trace_log.md
+++ b/docs/en/operations/system-tables/trace_log.md
@ -52,4 +52,5 @@ trace:                   [371912858,371912789,371798468,371799717,371801313,3717
 size:                    5244400
 ```

- [Original article](https://clickhouse.tech/docs/en/operations/system_tables/trace_log) <!--hide-->
+ [Original article](https://clickhouse.tech/docs/en/operations/system-tables/trace_log) <!--hide-->
+ 
--- a/docs/en/sql-reference/aggregate-functions/combinators.md
+++ b/docs/en/sql-reference/aggregate-functions/combinators.md
@ -250,4 +250,3 @@ FROM people
 ```


-[Original article](https://clickhouse.tech/docs/en/query_language/agg_functions/combinators/) <!--hide-->
--- a/docs/en/sql-reference/aggregate-functions/index.md
+++ b/docs/en/sql-reference/aggregate-functions/index.md
@ -59,4 +59,3 @@ SELECT groupArray(y) FROM t_null_big
 `groupArray` does not include `NULL` in the resulting array.


-[Original article](https://clickhouse.tech/docs/en/query_language/agg_functions/) <!--hide-->
--- a/docs/en/sql-reference/aggregate-functions/parametric-functions.md
+++ b/docs/en/sql-reference/aggregate-functions/parametric-functions.md
@ -254,8 +254,8 @@ windowFunnel(window, [mode])(timestamp, cond1, cond2, ..., condN)
 **Parameters**

 -   `window` — Length of the sliding window. The unit of `window` depends on the `timestamp` itself and varies. Determined using the expression `timestamp of cond2 <= timestamp of cond1 + window`.
-   `mode` - It is an optional argument.
-    -   `'strict'` - When the `'strict'` is set, the windowFunnel() applies conditions only for the unique values.
+-   `mode` — It is an optional argument.
+    -   `'strict'` — When the `'strict'` is set, the windowFunnel() applies conditions only for the unique values.

 **Returned value**

@ -336,14 +336,14 @@ retention(cond1, cond2, ..., cond32);

 **Arguments**

-   `cond` — an expression that returns a `UInt8` result (1 or 0).
+-   `cond` — An expression that returns a `UInt8` result (1 or 0).

 **Returned value**

 The array of 1 or 0.

-   1 — condition was met for the event.
-   0 — condition wasn’t met for the event.
+-   1 — Condition was met for the event.
+-   0 — Condition wasn’t met for the event.

 Type: `UInt8`.

@ -500,7 +500,6 @@ Problem: Generate a report that shows only keywords that produced at least 5 uni
 Solution: Write in the GROUP BY query SearchPhrase HAVING uniqUpTo(4)(UserID) >= 5
 ```

-[Original article](https://clickhouse.tech/docs/en/query_language/agg_functions/parametric_functions/) <!--hide-->

 ## sumMapFiltered(keys_to_keep)(keys, values) {#summapfilteredkeys-to-keepkeys-values}

--- a/docs/en/sql-reference/aggregate-functions/reference/avg.md
+++ b/docs/en/sql-reference/aggregate-functions/reference/avg.md
@ -14,26 +14,19 @@ avg(x)

 **Arguments**

-   `x` — Values.
-
-`x` must be
-[Integer](../../../sql-reference/data-types/int-uint.md),
-[floating-point](../../../sql-reference/data-types/float.md), or 
-[Decimal](../../../sql-reference/data-types/decimal.md).
+-   `x` — input values, must be [Integer](../../../sql-reference/data-types/int-uint.md), [Float](../../../sql-reference/data-types/float.md), or [Decimal](../../../sql-reference/data-types/decimal.md).

 **Returned value**

- `NaN` if the supplied parameter is empty.
- Mean otherwise.
-
-**Return type** is always [Float64](../../../sql-reference/data-types/float.md).
+-   The arithmetic mean, always as [Float64](../../../sql-reference/data-types/float.md).
+-   `NaN` if the input parameter `x` is empty.

 **Example**

 Query:

 ``` sql
-SELECT avg(x) FROM values('x Int8', 0, 1, 2, 3, 4, 5)
+SELECT avg(x) FROM values('x Int8', 0, 1, 2, 3, 4, 5);
 ```

 Result:
@ -46,11 +39,20 @@ Result:

 **Example**

+Create a temp table:
+
 Query:

 ``` sql
 CREATE table test (t UInt8) ENGINE = Memory;
-SELECT avg(t) FROM test
+```
+
+Get the arithmetic mean: 
+
+Query:
+
+```
+SELECT avg(t) FROM test;
 ```

 Result:
@ -60,3 +62,5 @@ Result:
 │    nan │
 └────────┘
 ```
+
+[Original article](https://clickhouse.tech/docs/en/sql-reference/aggregate-functions/reference/avg/) <!--hide-->
--- a/docs/en/sql-reference/aggregate-functions/reference/count.md
+++ b/docs/en/sql-reference/aggregate-functions/reference/count.md
@ -7,8 +7,9 @@ toc_priority: 1
 Counts the number of rows or not-NULL values.

 ClickHouse supports the following syntaxes for `count`:
- `count(expr)` or `COUNT(DISTINCT expr)`.
- `count()` or `COUNT(*)`. The `count()` syntax is ClickHouse-specific.
+
+-   `count(expr)` or `COUNT(DISTINCT expr)`.
+-   `count()` or `COUNT(*)`. The `count()` syntax is ClickHouse-specific.

 **Arguments**

--- a/docs/en/sql-reference/aggregate-functions/reference/grouparrayinsertat.md
+++ b/docs/en/sql-reference/aggregate-functions/reference/grouparrayinsertat.md
@ -9,7 +9,7 @@ Inserts a value into the array at the specified position.
 **Syntax**

 ``` sql
-groupArrayInsertAt(default_x, size)(x, pos);
+groupArrayInsertAt(default_x, size)(x, pos)
 ```

 If in one query several values are inserted into the same position, the function behaves in the following ways:
@ -21,8 +21,8 @@ If in one query several values are inserted into the same position, the function

 -   `x` — Value to be inserted. [Expression](../../../sql-reference/syntax.md#syntax-expressions) resulting in one of the [supported data types](../../../sql-reference/data-types/index.md).
 -   `pos` — Position at which the specified element `x` is to be inserted. Index numbering in the array starts from zero. [UInt32](../../../sql-reference/data-types/int-uint.md#uint-ranges).
-   `default_x`— Default value for substituting in empty positions. Optional parameter. [Expression](../../../sql-reference/syntax.md#syntax-expressions) resulting in the data type configured for the `x` parameter. If `default_x` is not defined, the [default values](../../../sql-reference/statements/create/table.md#create-default-values) are used.
-   `size`— Length of the resulting array. Optional parameter. When using this parameter, the default value `default_x` must be specified. [UInt32](../../../sql-reference/data-types/int-uint.md#uint-ranges).
+-   `default_x` — Default value for substituting in empty positions. Optional parameter. [Expression](../../../sql-reference/syntax.md#syntax-expressions) resulting in the data type configured for the `x` parameter. If `default_x` is not defined, the [default values](../../../sql-reference/statements/create/table.md#create-default-values) are used.
+-   `size` — Length of the resulting array. Optional parameter. When using this parameter, the default value `default_x` must be specified. [UInt32](../../../sql-reference/data-types/int-uint.md#uint-ranges).

 **Returned value**

--- a/docs/en/sql-reference/aggregate-functions/reference/groupbitmapor.md
+++ b/docs/en/sql-reference/aggregate-functions/reference/groupbitmapor.md
@ -14,7 +14,7 @@ groupBitmapOr(expr)

 `expr` – An expression that results in `AggregateFunction(groupBitmap, UInt*)` type.

-**Return value**
+**Returned value**

 Value of the `UInt64` type.

--- a/Show More
+++ b/Show More