Merge branch 'master' of github.com:yandex/ClickHouse

This commit is contained in:
BayoNet 2019-09-19 18:04:27 +03:00
commit cfcfcb7fc1
119 changed files with 2246 additions and 7711 deletions

3
.gitmodules vendored
View File

@ -103,3 +103,6 @@
[submodule "contrib/orc"]
path = contrib/orc
url = https://github.com/apache/orc
[submodule "contrib/sparsehash-c11"]
path = contrib/sparsehash-c11
url = https://github.com/sparsehash/sparsehash-c11.git

View File

@ -1,3 +1,280 @@
## ClickHouse release 19.14.3.3, 2019-09-10
### New Feature
* `WITH FILL` modifier for `ORDER BY`. (continuation of [#5069](https://github.com/yandex/ClickHouse/issues/5069)) [#6610](https://github.com/yandex/ClickHouse/pull/6610) ([Anton Popov](https://github.com/CurtizJ))
* `WITH TIES` modifier for `LIMIT`. (continuation of [#5069](https://github.com/yandex/ClickHouse/issues/5069)) [#6610](https://github.com/yandex/ClickHouse/pull/6610) ([Anton Popov](https://github.com/CurtizJ))
* Parse unquoted `NULL` literal as NULL (if setting `format_csv_unquoted_null_literal_as_null=1`). Initialize null fields with default values if data type of this field is not nullable (if setting `input_format_null_as_default=1`). [#5990](https://github.com/yandex/ClickHouse/issues/5990) [#6055](https://github.com/yandex/ClickHouse/pull/6055) ([tavplubix](https://github.com/tavplubix))
* Support for wildcards in paths of table functions `file` and `hdfs`. If the path contains wildcards, the table will be readonly. Example of usage: `select * from hdfs('hdfs://hdfs1:9000/some_dir/another_dir/*/file{0..9}{0..9}')` and `select * from file('some_dir/{some_file,another_file,yet_another}.tsv', 'TSV', 'value UInt32')`. [#6092](https://github.com/yandex/ClickHouse/pull/6092) ([Olga Khvostikova](https://github.com/stavrolia))
* New `system.metric_log` table which stores values of `system.events` and `system.metrics` with specified time interval. [#6363](https://github.com/yandex/ClickHouse/issues/6363) [#6467](https://github.com/yandex/ClickHouse/pull/6467) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)) [#6530](https://github.com/yandex/ClickHouse/pull/6530) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Allow to write ClickHouse text logs to `system.text_log` table. [#6037](https://github.com/yandex/ClickHouse/issues/6037) [#6103](https://github.com/yandex/ClickHouse/pull/6103) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov)) [#6164](https://github.com/yandex/ClickHouse/pull/6164) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Show private symbols in stack traces (this is done via parsing symbol tables of ELF files). Added information about file and line number in stack traces if debug info is present. Speedup symbol name lookup with indexing symbols present in program. Added new SQL functions for introspection: `demangle` and `addressToLine`. Renamed function `symbolizeAddress` to `addressToSymbol` for consistency. Function `addressToSymbol` will return mangled name for performance reasons and you have to apply `demangle`. Added setting `allow_introspection_functions` which is turned off by default. [#6201](https://github.com/yandex/ClickHouse/pull/6201) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Table function `values` (the name is case-insensitive). It allows to read from `VALUES` list proposed in [#5984](https://github.com/yandex/ClickHouse/issues/5984). Example: `SELECT * FROM VALUES('a UInt64, s String', (1, 'one'), (2, 'two'), (3, 'three'))`. [#6217](https://github.com/yandex/ClickHouse/issues/6217). [#6209](https://github.com/yandex/ClickHouse/pull/6209) ([dimarub2000](https://github.com/dimarub2000))
* Added an ability to alter storage settings. Syntax: `ALTER TABLE <table> MODIFY SETTING <setting> = <value>`. [#6366](https://github.com/yandex/ClickHouse/pull/6366) [#6669](https://github.com/yandex/ClickHouse/pull/6669) [#6685](https://github.com/yandex/ClickHouse/pull/6685) ([alesapin](https://github.com/alesapin))
* Support for removing of detached parts. Syntax: `ALTER TABLE <table_name> DROP DETACHED PART '<part_id>'`. [#6158](https://github.com/yandex/ClickHouse/pull/6158) ([tavplubix](https://github.com/tavplubix))
* Table constraints. Allows to add constraint to table definition which will be checked at insert. [#5273](https://github.com/yandex/ClickHouse/pull/5273) ([Gleb Novikov](https://github.com/NanoBjorn)) [#6652](https://github.com/yandex/ClickHouse/pull/6652) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Suppport for cascaded materialized views. [#6324](https://github.com/yandex/ClickHouse/pull/6324) ([Amos Bird](https://github.com/amosbird))
* Turn on query profiler by default to sample every query execution thread once a second. [#6283](https://github.com/yandex/ClickHouse/pull/6283) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Input format `ORC`. [#6454](https://github.com/yandex/ClickHouse/pull/6454) [#6703](https://github.com/yandex/ClickHouse/pull/6703) ([akonyaev90](https://github.com/akonyaev90))
* Added two new functions: `sigmoid` and `tanh` (that are useful for machine learning applications). [#6254](https://github.com/yandex/ClickHouse/pull/6254) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Function `hasToken(haystack, token)`, `hasTokenCaseInsensitive(haystack, token)` to check if given token is in haystack. Token is a maximal length substring between two non alphanumeric ASCII characters (or boundaries of haystack). Token must be a constant string. Supported by tokenbf_v1 index specialization. [#6596](https://github.com/yandex/ClickHouse/pull/6596), [#6662](https://github.com/yandex/ClickHouse/pull/6662) ([Vasily Nemkov](https://github.com/Enmk))
* New function `neighbor(value, offset[, default_value])`. Allows to reach prev/next value within column in a block of data. [#5925](https://github.com/yandex/ClickHouse/pull/5925) ([Alex Krash](https://github.com/alex-krash)) [6685365ab8c5b74f9650492c88a012596eb1b0c6](https://github.com/yandex/ClickHouse/commit/6685365ab8c5b74f9650492c88a012596eb1b0c6) [341e2e4587a18065c2da1ca888c73389f48ce36c](https://github.com/yandex/ClickHouse/commit/341e2e4587a18065c2da1ca888c73389f48ce36c) [Alexey Milovidov](https://github.com/alexey-milovidov)
* Created a function `currentUser()`, returning login of authorized user. Added alias `user()` for compatibility with MySQL. [#6470](https://github.com/yandex/ClickHouse/pull/6470) ([Alex Krash](https://github.com/alex-krash))
* New aggregate functions `quantilesExactInclusive` and `quantilesExactExclusive` which were proposed in [#5885](https://github.com/yandex/ClickHouse/issues/5885). [#6477](https://github.com/yandex/ClickHouse/pull/6477) ([dimarub2000](https://github.com/dimarub2000))
* Function `bitmapRange(bitmap, range_begin, range_end)` which returns new set with specified range (not include the `range_end`). [#6314](https://github.com/yandex/ClickHouse/pull/6314) ([Zhichang Yu](https://github.com/yuzhichang))
* Function `geohashesInBox(longitude_min, latitude_min, longitude_max, latitude_max, precision)` which creates array of precision-long strings of geohash-boxes covering provided area. [#6127](https://github.com/yandex/ClickHouse/pull/6127) ([Vasily Nemkov](https://github.com/Enmk))
* Implement support for INSERT query with `Kafka` tables. [#6012](https://github.com/yandex/ClickHouse/pull/6012) ([Ivan](https://github.com/abyss7))
* Added support for `_partition` and `_timestamp` virtual columns to Kafka engine. [#6400](https://github.com/yandex/ClickHouse/pull/6400) ([Ivan](https://github.com/abyss7))
* Possibility to remove sensitive data from `query_log`, server logs, process list with regexp-based rules. [#5710](https://github.com/yandex/ClickHouse/pull/5710) ([filimonov](https://github.com/filimonov))
### Experimental Feature
* Input and output data format `Template`. It allows to specify custom format string for input and output. [#4354](https://github.com/yandex/ClickHouse/issues/4354) [#6727](https://github.com/yandex/ClickHouse/pull/6727) ([tavplubix](https://github.com/tavplubix))
* Implementation of `LIVE VIEW` tables that were originally proposed in [#2898](https://github.com/yandex/ClickHouse/pull/2898), prepared in [#3925](https://github.com/yandex/ClickHouse/issues/3925), and then updated in [#5541](https://github.com/yandex/ClickHouse/issues/5541). See [#5541](https://github.com/yandex/ClickHouse/issues/5541) for detailed description. [#5541](https://github.com/yandex/ClickHouse/issues/5541) ([vzakaznikov](https://github.com/vzakaznikov)) [#6425](https://github.com/yandex/ClickHouse/pull/6425) ([Nikolai Kochetov](https://github.com/KochetovNicolai)) [#6656](https://github.com/yandex/ClickHouse/pull/6656) ([vzakaznikov](https://github.com/vzakaznikov)) Note that `LIVE VIEW` feature may be removed in next versions.
### Bug Fix
* Fix segmentation fault when the table has skip indices and vertical merge happens. [#6723](https://github.com/yandex/ClickHouse/pull/6723) ([alesapin](https://github.com/alesapin))
* Fix per-column TTL with non-trivial column defaults. Previously in case of force TTL merge with `OPTIMIZE ... FINAL` query, expired values was replaced by type defaults instead of user-specified column defaults. [#6796](https://github.com/yandex/ClickHouse/pull/6796) ([Anton Popov](https://github.com/CurtizJ))
* Fix Kafka messages duplication problem on normal server restart. [#6597](https://github.com/yandex/ClickHouse/pull/6597) ([Ivan](https://github.com/abyss7))
* Fixed infinite loop when reading Kafka messages. Do not pause/resume consumer on subscription at all - otherwise it may get paused indefinitely in some scenarios. [#6354](https://github.com/yandex/ClickHouse/pull/6354) ([Ivan](https://github.com/abyss7))
* Fix `Key expression contains comparison between inconvertible types` exception in `bitmapContains` function. [#6136](https://github.com/yandex/ClickHouse/issues/6136) [#6146](https://github.com/yandex/ClickHouse/issues/6146) [#6156](https://github.com/yandex/ClickHouse/pull/6156) ([dimarub2000](https://github.com/dimarub2000))
* Fix segfault with enabled `optimize_skip_unused_shards` and missing sharding key. [#6384](https://github.com/yandex/ClickHouse/pull/6384) ([Anton Popov](https://github.com/CurtizJ))
* Fixed wrong code in mutations that may lead to memory corruption. Fixed segfault with read of address `0x14c0` that may happed due to concurrent `DROP TABLE` and `SELECT` from `system.parts` or `system.parts_columns`. Fixed race condition in preparation of mutation queries. Fixed deadlock caused by `OPTIMIZE` of Replicated tables and concurrent modification operations like ALTERs. [#6514](https://github.com/yandex/ClickHouse/pull/6514) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix bug introduced in query profiler which leads to endless recv from socket. [#6386](https://github.com/yandex/ClickHouse/pull/6386) ([alesapin](https://github.com/alesapin))
* Removed extra verbose logging in MySQL interface [#6389](https://github.com/yandex/ClickHouse/pull/6389) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Return ability to parse boolean settings from 'true' and 'false' in configuration file. [#6278](https://github.com/yandex/ClickHouse/pull/6278) ([alesapin](https://github.com/alesapin))
* Fix crash in `quantile` and `median` function over `Nullable(Decimal128)`. [#6378](https://github.com/yandex/ClickHouse/pull/6378) ([Artem Zuikov](https://github.com/4ertus2))
* Fixed possible incomplete result returned by `SELECT` query with `WHERE` condition on primary key contained conversion to Float type. It was caused by incorrect checking of monotonicity in `toFloat` function. [#6248](https://github.com/yandex/ClickHouse/issues/6248) [#6374](https://github.com/yandex/ClickHouse/pull/6374) ([dimarub2000](https://github.com/dimarub2000))
* Check `max_expanded_ast_elements` setting for mutations. Clear mutations after `TRUNCATE TABLE`. [#6205](https://github.com/yandex/ClickHouse/pull/6205) ([Winter Zhang](https://github.com/zhang2014))
* Fix excessive CPU usage while executing `JSONExtractRaw` function over a boolean value. [#6208](https://github.com/yandex/ClickHouse/pull/6208) ([Vitaly Baranov](https://github.com/vitlibar))
* Fixed an issue when long `ALTER UPDATE` or `ALTER DELETE` may prevent regular merges to run. Prevent mutations from executing if there is no enough free threads available. [#6502](https://github.com/yandex/ClickHouse/issues/6502) [#6617](https://github.com/yandex/ClickHouse/pull/6617) ([tavplubix](https://github.com/tavplubix))
* Fix JOIN results for key columns when used with `join_use_nulls`. Attach Nulls instead of columns defaults. [#6249](https://github.com/yandex/ClickHouse/pull/6249) ([Artem Zuikov](https://github.com/4ertus2))
* Fix `JSONExtract` function while extracting a `Tuple` from JSON. [#6718](https://github.com/yandex/ClickHouse/pull/6718) ([Vitaly Baranov](https://github.com/vitlibar))
* Fix for data race in StorageMerge [#6717](https://github.com/yandex/ClickHouse/pull/6717) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix for skip indices with vertical merge and alter. Fix for `Bad size of marks file` exception. [#6594](https://github.com/yandex/ClickHouse/issues/6594) [#6713](https://github.com/yandex/ClickHouse/pull/6713) ([alesapin](https://github.com/alesapin))
* Fix rare crash in `ALTER MODIFY COLUMN` and vertical merge when one of merged/altered parts is empty (0 rows) [#6746](https://github.com/yandex/ClickHouse/issues/6746) [#6780](https://github.com/yandex/ClickHouse/pull/6780) ([alesapin](https://github.com/alesapin))
* Fixed wrong behaviour of `nullIf` function for constant arguments. [#6518](https://github.com/yandex/ClickHouse/pull/6518) ([Guillaume Tassery](https://github.com/YiuRULE)) [#6580](https://github.com/yandex/ClickHouse/pull/6580) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed bug in conversion of `LowCardinality` types in `AggregateFunctionFactory`. This fixes [#6257](https://github.com/yandex/ClickHouse/issues/6257). [#6281](https://github.com/yandex/ClickHouse/pull/6281) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Fixed possible data loss after `ALTER DELETE` query on table with skipping index. [#6224](https://github.com/yandex/ClickHouse/issues/6224) [#6282](https://github.com/yandex/ClickHouse/pull/6282) ([Nikita Vasilev](https://github.com/nikvas0))
* Fix wrong behavior and possible segfaults in `topK` and `topKWeighted` aggregated functions. [#6404](https://github.com/yandex/ClickHouse/pull/6404) ([Anton Popov](https://github.com/CurtizJ))
* Fixed unsafe code around `getIdentifier` function. [#6401](https://github.com/yandex/ClickHouse/issues/6401) [#6409](https://github.com/yandex/ClickHouse/pull/6409) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed bug in MySQL wire protocol (is used while connecting to ClickHouse form MySQL client). Caused by heap buffer overflow in `PacketPayloadWriteBuffer`. [#6212](https://github.com/yandex/ClickHouse/pull/6212) ([Yuriy Baranov](https://github.com/yurriy))
* Fixed memory leak in `bitmapSubsetInRange` function. [#6819](https://github.com/yandex/ClickHouse/pull/6819) ([Zhichang Yu](https://github.com/yuzhichang))
* Fix rare bug when mutation executed after granularity change. [#6816](https://github.com/yandex/ClickHouse/pull/6816) ([alesapin](https://github.com/alesapin))
* Allow protobuf message with all fields by default. [#6132](https://github.com/yandex/ClickHouse/pull/6132) ([Vitaly Baranov](https://github.com/vitlibar))
* Resolve a bug with `nullIf` function when we send a `NULL` argument on the second argument. [#6446](https://github.com/yandex/ClickHouse/pull/6446) ([Guillaume Tassery](https://github.com/YiuRULE))
* Fix rare bug with wrong memory allocation/deallocation in complex key cache dictionaries with string fields which leads to infinite memory consumption (looks like memory leak). Bug reproduces when string size was a power of two starting from eight (8, 16, 32, etc). [#6447](https://github.com/yandex/ClickHouse/pull/6447) ([alesapin](https://github.com/alesapin))
* Fixed Gorilla encoding on small sequences which caused exception `Cannot write after end of buffer`. [#6398](https://github.com/yandex/ClickHouse/issues/6398) [#6444](https://github.com/yandex/ClickHouse/pull/6444) ([Vasily Nemkov](https://github.com/Enmk))
* Fixed error with processing "timezone" in server configuration file. [#6709](https://github.com/yandex/ClickHouse/pull/6709) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Allow to use not nullable types in JOINs with `join_use_nulls` enabled. [#6705](https://github.com/yandex/ClickHouse/pull/6705) ([Artem Zuikov](https://github.com/4ertus2))
* Disable `Poco::AbstractConfiguration` substitutions in query in `clickhouse-client`. [#6706](https://github.com/yandex/ClickHouse/pull/6706) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed mismatched header in streams happened in case of reading from empty distributed table with sample and prewhere. [#6167](https://github.com/yandex/ClickHouse/issues/6167) ([Lixiang Qian](https://github.com/fancyqlx)) [#6823](https://github.com/yandex/ClickHouse/pull/6823) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Avoid deadlock in `REPLACE PARTITION`. [#6677](https://github.com/yandex/ClickHouse/pull/6677) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Query transformation for `MySQL`, `ODBC`, `JDBC` table functions now works properly for `SELECT WHERE` queries with multiple `AND` expressions. [#6381](https://github.com/yandex/ClickHouse/issues/6381) [#6676](https://github.com/yandex/ClickHouse/pull/6676) ([dimarub2000](https://github.com/dimarub2000))
* Fixed bug in function `arrayEnumerateUniqRanked`. [#6779](https://github.com/yandex/ClickHouse/pull/6779) ([proller](https://github.com/proller))
* Fixed parsing of `AggregateFunction` values embedded in query. [6575](https://github.com/yandex/ClickHouse/issues/6575) [#6773](https://github.com/yandex/ClickHouse/pull/6773) ([Zhichang Yu](https://github.com/yuzhichang))
* Using `arrayReduce` for constant arguments may lead to segfault. [#6242](https://github.com/yandex/ClickHouse/issues/6242) [#6326](https://github.com/yandex/ClickHouse/pull/6326) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix inconsistent parts which can appear if replica was restored after `DROP PARTITION`. [#6522](https://github.com/yandex/ClickHouse/issues/6522) [#6523](https://github.com/yandex/ClickHouse/pull/6523) ([tavplubix](https://github.com/tavplubix))
* Fix crash when casting types to `Decimal` that do not support it. Throw exception instead. [#6297](https://github.com/yandex/ClickHouse/pull/6297) ([Artem Zuikov](https://github.com/4ertus2))
* Fixed hang in `JSONExtractRaw` function. [#6195](https://github.com/yandex/ClickHouse/issues/6195) [#6198](https://github.com/yandex/ClickHouse/pull/6198) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed crash when using `IN` clause with a subquery with a tuple. [#6125](https://github.com/yandex/ClickHouse/issues/6125) [#6550](https://github.com/yandex/ClickHouse/pull/6550) ([tavplubix](https://github.com/tavplubix))
* Fixes the regression while pushing to materialized view. [#6415](https://github.com/yandex/ClickHouse/pull/6415) ([Ivan](https://github.com/abyss7))
* Fixed possible inconsistent state of table while executing `DROP` query for replicated table while zookeeper is not accessible. [#6045](https://github.com/yandex/ClickHouse/issues/6045) [#6413](https://github.com/yandex/ClickHouse/pull/6413) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov))
* Fix bug with incorrect skip indices serialization and aggregation with adaptive granularity. [#6594](https://github.com/yandex/ClickHouse/issues/6594). [#6748](https://github.com/yandex/ClickHouse/pull/6748) ([alesapin](https://github.com/alesapin))
* Fix `WITH ROLLUP` and `WITH CUBE` modifiers of `GROUP BY` with two-level aggregation. [#6225](https://github.com/yandex/ClickHouse/pull/6225) ([Anton Popov](https://github.com/CurtizJ))
* Improve error handling in cache dictionaries. [#6737](https://github.com/yandex/ClickHouse/pull/6737) ([Vitaly Baranov](https://github.com/vitlibar))
* Parquet: Fix reading boolean columns. [#6579](https://github.com/yandex/ClickHouse/pull/6579) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix bug with writing secondary indices marks with adaptive granularity. [#6126](https://github.com/yandex/ClickHouse/pull/6126) ([alesapin](https://github.com/alesapin))
* Fix initialization order while server startup. Since `StorageMergeTree::background_task_handle` is initialized in `startup()` the `MergeTreeBlockOutputStream::write()` may try to use it before initialization. Just check if it is initialized. [#6080](https://github.com/yandex/ClickHouse/pull/6080) ([Ivan](https://github.com/abyss7))
* Fixed crash in `extractAll()` function. [#6644](https://github.com/yandex/ClickHouse/pull/6644) ([Artem Zuikov](https://github.com/4ertus2))
* Fixed wrong behaviour of `trim` functions family. [#6647](https://github.com/yandex/ClickHouse/pull/6647) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Clearing the data buffer from the previous read operation that was completed with an error. [#6026](https://github.com/yandex/ClickHouse/pull/6026) ([Nikolay](https://github.com/bopohaa))
* Fix bug with enabling adaptive granularity when creating new replica for Replicated*MergeTree table. [#6394](https://github.com/yandex/ClickHouse/issues/6394) [#6452](https://github.com/yandex/ClickHouse/pull/6452) ([alesapin](https://github.com/alesapin))
* Fixed possible crash during server startup in case of exception happened in `libunwind` during exception at access to uninitialised `ThreadStatus` structure. [#6456](https://github.com/yandex/ClickHouse/pull/6456) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov))
* Fixed data race in `system.parts` table and `ALTER` query. [#6245](https://github.com/yandex/ClickHouse/issues/6245). [#6513](https://github.com/yandex/ClickHouse/pull/6513) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix crash in `yandexConsistentHash` function. Found by fuzz test. [#6304](https://github.com/yandex/ClickHouse/issues/6304) [#6305](https://github.com/yandex/ClickHouse/pull/6305) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed the possibility of hanging queries when server is overloaded and global thread pool becomes near full. This have higher chance to happen on clusters with large number of shards (hundreds), because distributed queries allocate a thread per connection to each shard. For example, this issue may reproduce if a cluster of 330 shards is processing 30 concurrent distributed queries. This issue affects all versions starting from 19.2. [#6301](https://github.com/yandex/ClickHouse/pull/6301) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed logic of `arrayEnumerateUniqRanked` function. [#6423](https://github.com/yandex/ClickHouse/pull/6423) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix segfault when decoding symbol table. [#6603](https://github.com/yandex/ClickHouse/pull/6603) ([Amos Bird](https://github.com/amosbird))
* Fixed mismatched header in streams happened in case of reading from empty distributed table with sample and prewhere. [#6167](https://github.com/yandex/ClickHouse/pull/6167) ([Lixiang Qian](https://github.com/fancyqlx))
* Fixed irrelevant exception in cast of `LowCardinality(Nullable)` to not-Nullable column in case if it doesn't contain Nulls (e.g. in query like `SELECT CAST(CAST('Hello' AS LowCardinality(Nullable(String))) AS String)`. [#6094](https://github.com/yandex/ClickHouse/issues/6094) [#6119](https://github.com/yandex/ClickHouse/pull/6119) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Removed extra quoting of description in `system.settings` table. [#6696](https://github.com/yandex/ClickHouse/issues/6696) [#6699](https://github.com/yandex/ClickHouse/pull/6699) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Avoid possible deadlock in `TRUNCATE` of Replicated table. [#6695](https://github.com/yandex/ClickHouse/pull/6695) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix case with same column names in `GLOBAL JOIN ON` section. [#6181](https://github.com/yandex/ClickHouse/pull/6181) ([Artem Zuikov](https://github.com/4ertus2))
* Fix reading in order of sorting key. [#6189](https://github.com/yandex/ClickHouse/pull/6189) ([Anton Popov](https://github.com/CurtizJ))
* Fix `ALTER TABLE ... UPDATE` query for tables with `enable_mixed_granularity_parts=1`. [#6543](https://github.com/yandex/ClickHouse/pull/6543) ([alesapin](https://github.com/alesapin))
* Fixed the case when server may close listening sockets but not shutdown and continue serving remaining queries. You may end up with two running clickhouse-server processes. Sometimes, the server may return an error `bad_function_call` for remaining queries. [#6231](https://github.com/yandex/ClickHouse/pull/6231) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Table function `url` had the vulnerability allowed the attacker to inject arbitrary HTTP headers in the request. This issue was found by [Nikita Tikhomirov](https://github.com/NSTikhomirov). [#6466](https://github.com/yandex/ClickHouse/pull/6466) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix bug opened by [#4405](https://github.com/yandex/ClickHouse/pull/4405) (since 19.4.0). Reproduces in queries to Distributed tables over MergeTree tables when we doesn't query any columns (`SELECT 1`). [#6236](https://github.com/yandex/ClickHouse/pull/6236) ([alesapin](https://github.com/alesapin))
* Fixed overflow in integer division of signed type to unsigned type. The behaviour was exactly as in C or C++ language (integer promotion rules) that may be surprising. Please note that the overflow is still possible when dividing large signed number to large unsigned number or vice-versa (but that case is less usual). The issue existed in all server versions. [#6214](https://github.com/yandex/ClickHouse/issues/6214) [#6233](https://github.com/yandex/ClickHouse/pull/6233) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Limit maximum sleep time for throttling when `max_execution_speed` or `max_execution_speed_bytes` is set. Fixed false errors like `Estimated query execution time (inf seconds) is too long`. [#5547](https://github.com/yandex/ClickHouse/issues/5547) [#6232](https://github.com/yandex/ClickHouse/pull/6232) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix useless `AST` check in Set index. [#6510](https://github.com/yandex/ClickHouse/issues/6510) [#6651](https://github.com/yandex/ClickHouse/pull/6651) ([Nikita Vasilev](https://github.com/nikvas0))
* Fixed issues about using `MATERIALIZED` columns and aliases in `MaterializedView`. [#448](https://github.com/yandex/ClickHouse/issues/448) [#3484](https://github.com/yandex/ClickHouse/issues/3484) [#3450](https://github.com/yandex/ClickHouse/issues/3450) [#2878](https://github.com/yandex/ClickHouse/issues/2878) [#2285](https://github.com/yandex/ClickHouse/issues/2285) [#3796](https://github.com/yandex/ClickHouse/pull/3796) ([Amos Bird](https://github.com/amosbird)) [#6316](https://github.com/yandex/ClickHouse/pull/6316) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix `FormatFactory` behaviour for input streams which are not implemented as processor. [#6495](https://github.com/yandex/ClickHouse/pull/6495) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Fixed typo. [#6631](https://github.com/yandex/ClickHouse/pull/6631) ([Alex Ryndin](https://github.com/alexryndin))
* Typo in the error message ( is -> are ). [#6839](https://github.com/yandex/ClickHouse/pull/6839) ([Denis Zhuravlev](https://github.com/den-crane))
* Fixed error while parsing of columns list from string if type contained a comma (this issue was relevant for `File`, `URL`, `HDFS` storages) [#6217](https://github.com/yandex/ClickHouse/issues/6217). [#6209](https://github.com/yandex/ClickHouse/pull/6209) ([dimarub2000](https://github.com/dimarub2000))
### Security Fix
* Fix two vulnerabilities in codecs in decompression phase (malicious user can fabricate compressed data that will lead to buffer overflow in decompression). [#6670](https://github.com/yandex/ClickHouse/pull/6670) ([Artem Zuikov](https://github.com/4ertus2))
* If the attacker has write access to ZooKeeper and is able to run custom server available from the network where ClickHouse run, it can create custom-built malicious server that will act as ClickHouse replica and register it in ZooKeeper. When another replica will fetch data part from malicious replica, it can force clickhouse-server to write to arbitrary path on filesystem. Found by Eldar Zaitov, information security team at Yandex. [#6247](https://github.com/yandex/ClickHouse/pull/6247) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed the possibility of a fabricated query to cause server crash due to stack overflow in SQL parser. Fixed the possibility of stack overflow in Merge and Distributed tables, materialized views and conditions for row-level security that involve subqueries. [#6433](https://github.com/yandex/ClickHouse/pull/6433) ([alexey-milovidov](https://github.com/alexey-milovidov))
### Improvement
* Correct implementation of ternary logic for `AND/OR`. [#6048](https://github.com/yandex/ClickHouse/pull/6048) ([Alexander Kazakov](https://github.com/Akazz))
* Now values and rows with expired TTL will be removed after `OPTIMIZE ... FINAL` query from old parts without TTL infos or with outdated TTL infos, e.g. after `ALTER ... MODIFY TTL` query. Added queries `SYSTEM STOP/START TTL MERGES` to disallow/allow assign merges with TTL and filter expired values in all merges. [#6274](https://github.com/yandex/ClickHouse/pull/6274) ([Anton Popov](https://github.com/CurtizJ))
* Possibility to change the location of ClickHouse history file for client using `CLICKHOUSE_HISTORY_FILE` env. [#6840](https://github.com/yandex/ClickHouse/pull/6840) ([filimonov](https://github.com/filimonov))
* Remove `dry_run` flag from `InterpreterSelectQuery`. ... [#6375](https://github.com/yandex/ClickHouse/pull/6375) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Support `ASOF JOIN` with `ON` section. [#6211](https://github.com/yandex/ClickHouse/pull/6211) ([Artem Zuikov](https://github.com/4ertus2))
* Better support of skip indexes for mutations and replication. Support for `MATERIALIZE/CLEAR INDEX ... IN PARTITION` query. `UPDATE x = x` recalculates all indices that use column `x`. [#5053](https://github.com/yandex/ClickHouse/pull/5053) ([Nikita Vasilev](https://github.com/nikvas0))
* Allow to `ATTACH` live views (for example, at the server startup) regardless to `allow_experimental_live_view` setting. [#6754](https://github.com/yandex/ClickHouse/pull/6754) ([alexey-milovidov](https://github.com/alexey-milovidov))
* For stack traces gathered by query profiler, do not include stack frames generated by the query profiler itself. [#6250](https://github.com/yandex/ClickHouse/pull/6250) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Now table functions `values`, `file`, `url`, `hdfs` have support for ALIAS columns. [#6255](https://github.com/yandex/ClickHouse/pull/6255) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Throw an exception if `config.d` file doesn't have the corresponding root element as the config file. [#6123](https://github.com/yandex/ClickHouse/pull/6123) ([dimarub2000](https://github.com/dimarub2000))
* Print extra info in exception message for `no space left on device`. [#6182](https://github.com/yandex/ClickHouse/issues/6182), [#6252](https://github.com/yandex/ClickHouse/issues/6252) [#6352](https://github.com/yandex/ClickHouse/pull/6352) ([tavplubix](https://github.com/tavplubix))
* When determining shards of a `Distributed` table to be covered by a read query (for `optimize_skip_unused_shards` = 1) ClickHouse now checks conditions from both `prewhere` and `where` clauses of select statement. [#6521](https://github.com/yandex/ClickHouse/pull/6521) ([Alexander Kazakov](https://github.com/Akazz))
* Enabled `SIMDJSON` for machines without AVX2 but with SSE 4.2 and PCLMUL instruction set. [#6285](https://github.com/yandex/ClickHouse/issues/6285) [#6320](https://github.com/yandex/ClickHouse/pull/6320) ([alexey-milovidov](https://github.com/alexey-milovidov))
* ClickHouse can work on filesystems without `O_DIRECT` support (such as ZFS and BtrFS) without additional tuning. [#4449](https://github.com/yandex/ClickHouse/issues/4449) [#6730](https://github.com/yandex/ClickHouse/pull/6730) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Support push down predicate for final subquery. [#6120](https://github.com/yandex/ClickHouse/pull/6120) ([TCeason](https://github.com/TCeason)) [#6162](https://github.com/yandex/ClickHouse/pull/6162) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Better `JOIN ON` keys extraction [#6131](https://github.com/yandex/ClickHouse/pull/6131) ([Artem Zuikov](https://github.com/4ertus2))
* Upated `SIMDJSON`. [#6285](https://github.com/yandex/ClickHouse/issues/6285). [#6306](https://github.com/yandex/ClickHouse/pull/6306) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Optimize selecting of smallest column for `SELECT count()` query. [#6344](https://github.com/yandex/ClickHouse/pull/6344) ([Amos Bird](https://github.com/amosbird))
* Added `strict` parameter in `windowFunnel()`. When the `strict` is set, the `windowFunnel()` applies conditions only for the unique values. [#6548](https://github.com/yandex/ClickHouse/pull/6548) ([achimbab](https://github.com/achimbab))
* Safer interface of `mysqlxx::Pool`. [#6150](https://github.com/yandex/ClickHouse/pull/6150) ([avasiliev](https://github.com/avasiliev))
* Options line size when executing with `--help` option now corresponds with terminal size. [#6590](https://github.com/yandex/ClickHouse/pull/6590) ([dimarub2000](https://github.com/dimarub2000))
* Disable "read in order" optimization for aggregation without keys. [#6599](https://github.com/yandex/ClickHouse/pull/6599) ([Anton Popov](https://github.com/CurtizJ))
* HTTP status code for `INCORRECT_DATA` and `TYPE_MISMATCH` error codes was changed from default `500 Internal Server Error` to `400 Bad Request`. [#6271](https://github.com/yandex/ClickHouse/pull/6271) ([Alexander Rodin](https://github.com/a-rodin))
* Move Join object from `ExpressionAction` into `AnalyzedJoin`. `ExpressionAnalyzer` and `ExpressionAction` do not know about `Join` class anymore. Its logic is hidden by `AnalyzedJoin` iface. [#6801](https://github.com/yandex/ClickHouse/pull/6801) ([Artem Zuikov](https://github.com/4ertus2))
* Fixed possible deadlock of distributed queries when one of shards is localhost but the query is sent via network connection. [#6759](https://github.com/yandex/ClickHouse/pull/6759) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Changed semantic of multiple tables `RENAME` to avoid possible deadlocks. [#6757](https://github.com/yandex/ClickHouse/issues/6757). [#6756](https://github.com/yandex/ClickHouse/pull/6756) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Rewritten MySQL compatibility server to prevent loading full packet payload in memory. Decreased memory consumption for each connection to approximately `2 * DBMS_DEFAULT_BUFFER_SIZE` (read/write buffers). [#5811](https://github.com/yandex/ClickHouse/pull/5811) ([Yuriy Baranov](https://github.com/yurriy))
* Move AST alias interpreting logic out of parser that doesn't have to know anything about query semantics. [#6108](https://github.com/yandex/ClickHouse/pull/6108) ([Artem Zuikov](https://github.com/4ertus2))
* Slightly more safe parsing of `NamesAndTypesList`. [#6408](https://github.com/yandex/ClickHouse/issues/6408). [#6410](https://github.com/yandex/ClickHouse/pull/6410) ([alexey-milovidov](https://github.com/alexey-milovidov))
* `clickhouse-copier`: Allow use `where_condition` from config with `partition_key` alias in query for checking partition existence (Earlier it was used only in reading data queries). [#6577](https://github.com/yandex/ClickHouse/pull/6577) ([proller](https://github.com/proller))
* Added optional message argument in `throwIf`. ([#5772](https://github.com/yandex/ClickHouse/issues/5772)) [#6329](https://github.com/yandex/ClickHouse/pull/6329) ([Vdimir](https://github.com/Vdimir))
* Server exception got while sending insertion data is now being processed in client as well. [#5891](https://github.com/yandex/ClickHouse/issues/5891) [#6711](https://github.com/yandex/ClickHouse/pull/6711) ([dimarub2000](https://github.com/dimarub2000))
* Added a metric `DistributedFilesToInsert` that shows the total number of files in filesystem that are selected to send to remote servers by Distributed tables. The number is summed across all shards. [#6600](https://github.com/yandex/ClickHouse/pull/6600) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Move most of JOINs prepare logic from `ExpressionAction/ExpressionAnalyzer` to `AnalyzedJoin`. [#6785](https://github.com/yandex/ClickHouse/pull/6785) ([Artem Zuikov](https://github.com/4ertus2))
* Fix TSan [warning](https://clickhouse-test-reports.s3.yandex.net/6399/c1c1d1daa98e199e620766f1bd06a5921050a00d/functional_stateful_tests_(thread).html) 'lock-order-inversion'. [#6740](https://github.com/yandex/ClickHouse/pull/6740) ([Vasily Nemkov](https://github.com/Enmk))
* Better information messages about lack of Linux capabilities. Logging fatal errors with "fatal" level, that will make it easier to find in `system.text_log`. [#6441](https://github.com/yandex/ClickHouse/pull/6441) ([alexey-milovidov](https://github.com/alexey-milovidov))
* When enable dumping temporary data to the disk to restrict memory usage during `GROUP BY`, `ORDER BY`, it didn't check the free disk space. The fix add a new setting `min_free_disk_space`, when the free disk space it smaller then the threshold, the query will stop and throw `ErrorCodes::NOT_ENOUGH_SPACE`. [#6678](https://github.com/yandex/ClickHouse/pull/6678) ([Weiqing Xu](https://github.com/weiqxu)) [#6691](https://github.com/yandex/ClickHouse/pull/6691) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Removed recursive rwlock by thread. It makes no sense, because threads are reused between queries. `SELECT` query may acquire a lock in one thread, hold a lock from another thread and exit from first thread. In the same time, first thread can be reused by `DROP` query. This will lead to false "Attempt to acquire exclusive lock recursively" messages. [#6771](https://github.com/yandex/ClickHouse/pull/6771) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Split `ExpressionAnalyzer.appendJoin()`. Prepare a place in `ExpressionAnalyzer` for `MergeJoin`. [#6524](https://github.com/yandex/ClickHouse/pull/6524) ([Artem Zuikov](https://github.com/4ertus2))
* Added `mysql_native_password` authentication plugin to MySQL compatibility server. [#6194](https://github.com/yandex/ClickHouse/pull/6194) ([Yuriy Baranov](https://github.com/yurriy))
* Less number of `clock_gettime` calls; fixed ABI compatibility between debug/release in `Allocator` (insignificant issue). [#6197](https://github.com/yandex/ClickHouse/pull/6197) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Move `collectUsedColumns` from `ExpressionAnalyzer` to `SyntaxAnalyzer`. `SyntaxAnalyzer` makes `required_source_columns` itself now. [#6416](https://github.com/yandex/ClickHouse/pull/6416) ([Artem Zuikov](https://github.com/4ertus2))
* Add setting `joined_subquery_requires_alias` to require aliases for subselects and table functions in `FROM` that more than one table is present (i.e. queries with JOINs). [#6733](https://github.com/yandex/ClickHouse/pull/6733) ([Artem Zuikov](https://github.com/4ertus2))
* Extract `GetAggregatesVisitor` class from `ExpressionAnalyzer`. [#6458](https://github.com/yandex/ClickHouse/pull/6458) ([Artem Zuikov](https://github.com/4ertus2))
* `system.query_log`: change data type of `type` column to `Enum`. [#6265](https://github.com/yandex/ClickHouse/pull/6265) ([Nikita Mikhaylov](https://github.com/nikitamikhaylov))
* Static linking of `sha256_password` authentication plugin. [#6512](https://github.com/yandex/ClickHouse/pull/6512) ([Yuriy Baranov](https://github.com/yurriy))
* Avoid extra dependency for the setting `compile` to work. In previous versions, the user may get error like `cannot open crti.o`, `unable to find library -lc` etc. [#6309](https://github.com/yandex/ClickHouse/pull/6309) ([alexey-milovidov](https://github.com/alexey-milovidov))
* More validation of the input that may come from malicious replica. [#6303](https://github.com/yandex/ClickHouse/pull/6303) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Now `clickhouse-obfuscator` file is available in `clickhouse-client` package. In previous versions it was available as `clickhouse obfuscator` (with whitespace). [#5816](https://github.com/yandex/ClickHouse/issues/5816) [#6609](https://github.com/yandex/ClickHouse/pull/6609) ([dimarub2000](https://github.com/dimarub2000))
* Fixed deadlock when we have at least two queries that read at least two tables in different order and another query that performs DDL operation on one of tables. Fixed another very rare deadlock. [#6764](https://github.com/yandex/ClickHouse/pull/6764) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Added `os_thread_ids` column to `system.processes` and `system.query_log` for better debugging possibilities. [#6763](https://github.com/yandex/ClickHouse/pull/6763) ([alexey-milovidov](https://github.com/alexey-milovidov))
* A workaround for PHP mysqlnd extension bugs which occur when `sha256_password` is used as a default authentication plugin (described in [#6031](https://github.com/yandex/ClickHouse/issues/6031)). [#6113](https://github.com/yandex/ClickHouse/pull/6113) ([Yuriy Baranov](https://github.com/yurriy))
* Remove unneeded place with changed nullability columns. [#6693](https://github.com/yandex/ClickHouse/pull/6693) ([Artem Zuikov](https://github.com/4ertus2))
* Set default value of `queue_max_wait_ms` to zero, because current value (five seconds) makes no sense. There are rare circumstances when this settings has any use. Added settings `replace_running_query_max_wait_ms`, `kafka_max_wait_ms` and `connection_pool_max_wait_ms` for disambiguation. [#6692](https://github.com/yandex/ClickHouse/pull/6692) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Extract `SelectQueryExpressionAnalyzer` from `ExpressionAnalyzer`. Keep the last one for non-select queries. [#6499](https://github.com/yandex/ClickHouse/pull/6499) ([Artem Zuikov](https://github.com/4ertus2))
* Removed duplicating input and output formats. [#6239](https://github.com/yandex/ClickHouse/pull/6239) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Allow user to override `poll_interval` and `idle_connection_timeout` settings on connection. [#6230](https://github.com/yandex/ClickHouse/pull/6230) ([alexey-milovidov](https://github.com/alexey-milovidov))
* `MergeTree` now has an additional option `ttl_only_drop_parts` (disabled by default) to avoid partial pruning of parts, so that they dropped completely when all the rows in a part are expired. [#6191](https://github.com/yandex/ClickHouse/pull/6191) ([Sergi Vladykin](https://github.com/svladykin))
* Type checks for set index functions. Throw exception if function got a wrong type. This fixes fuzz test with UBSan. [#6511](https://github.com/yandex/ClickHouse/pull/6511) ([Nikita Vasilev](https://github.com/nikvas0))
### Performance Improvement
* Optimize queries with `ORDER BY expressions` clause, where `expressions` have coinciding prefix with sorting key in `MergeTree` tables. This optimization is controlled by `optimize_read_in_order` setting. [#6054](https://github.com/yandex/ClickHouse/pull/6054) [#6629](https://github.com/yandex/ClickHouse/pull/6629) ([Anton Popov](https://github.com/CurtizJ))
* Allow to use multiple threads during parts loading and removal. [#6372](https://github.com/yandex/ClickHouse/issues/6372) [#6074](https://github.com/yandex/ClickHouse/issues/6074) [#6438](https://github.com/yandex/ClickHouse/pull/6438) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Implemented batch variant of updating aggregate function states. It may lead to performance benefits. [#6435](https://github.com/yandex/ClickHouse/pull/6435) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Using `FastOps` library for functions `exp`, `log`, `sigmoid`, `tanh`. FastOps is a fast vector math library from Michael Parakhin (Yandex CTO). Improved performance of `exp` and `log` functions more than 6 times. The functions `exp` and `log` from `Float32` argument will return `Float32` (in previous versions they always return `Float64`). Now `exp(nan)` may return `inf`. The result of `exp` and `log` functions may be not the nearest machine representable number to the true answer. [#6254](https://github.com/yandex/ClickHouse/pull/6254) ([alexey-milovidov](https://github.com/alexey-milovidov)) Using Danila Kutenin variant to make fastops working [#6317](https://github.com/yandex/ClickHouse/pull/6317) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Disable consecutive key optimization for `UInt8/16`. [#6298](https://github.com/yandex/ClickHouse/pull/6298) [#6701](https://github.com/yandex/ClickHouse/pull/6701) ([akuzm](https://github.com/akuzm))
* Improved performance of `simdjson` library by getting rid of dynamic allocation in `ParsedJson::Iterator`. [#6479](https://github.com/yandex/ClickHouse/pull/6479) ([Vitaly Baranov](https://github.com/vitlibar))
* Pre-fault pages when allocating memory with `mmap()`. [#6667](https://github.com/yandex/ClickHouse/pull/6667) ([akuzm](https://github.com/akuzm))
* Fix performance bug in `Decimal` comparison. [#6380](https://github.com/yandex/ClickHouse/pull/6380) ([Artem Zuikov](https://github.com/4ertus2))
### Build/Testing/Packaging Improvement
* Remove Compiler (runtime template instantiation) because we've win over it's performance. [#6646](https://github.com/yandex/ClickHouse/pull/6646) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Added performance test to show degradation of performance in gcc-9 in more isolated way. [#6302](https://github.com/yandex/ClickHouse/pull/6302) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Added table function `numbers_mt`, which is multithreaded version of `numbers`. Updated performance tests with hash functions. [#6554](https://github.com/yandex/ClickHouse/pull/6554) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Comparison mode in `clickhouse-benchmark` [#6220](https://github.com/yandex/ClickHouse/issues/6220) [#6343](https://github.com/yandex/ClickHouse/pull/6343) ([dimarub2000](https://github.com/dimarub2000))
* Best effort for printing stack traces. Also added `SIGPROF` as a debugging signal to print stack trace of a running thread. [#6529](https://github.com/yandex/ClickHouse/pull/6529) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Every function in its own file, part 10. [#6321](https://github.com/yandex/ClickHouse/pull/6321) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Remove doubled const `TABLE_IS_READ_ONLY`. [#6566](https://github.com/yandex/ClickHouse/pull/6566) ([filimonov](https://github.com/filimonov))
* Formatting changes for `StringHashMap` PR [#5417](https://github.com/yandex/ClickHouse/issues/5417). [#6700](https://github.com/yandex/ClickHouse/pull/6700) ([akuzm](https://github.com/akuzm))
* Better subquery for join creation in `ExpressionAnalyzer`. [#6824](https://github.com/yandex/ClickHouse/pull/6824) ([Artem Zuikov](https://github.com/4ertus2))
* Remove a redundant condition (found by PVS Studio). [#6775](https://github.com/yandex/ClickHouse/pull/6775) ([akuzm](https://github.com/akuzm))
* Separate the hash table interface for `ReverseIndex`. [#6672](https://github.com/yandex/ClickHouse/pull/6672) ([akuzm](https://github.com/akuzm))
* Refactoring of settings. [#6689](https://github.com/yandex/ClickHouse/pull/6689) ([alesapin](https://github.com/alesapin))
* Add comments for `set` index functions. [#6319](https://github.com/yandex/ClickHouse/pull/6319) ([Nikita Vasilev](https://github.com/nikvas0))
* Increase OOM score in debug version on Linux. [#6152](https://github.com/yandex/ClickHouse/pull/6152) ([akuzm](https://github.com/akuzm))
* HDFS HA now work in debug build. [#6650](https://github.com/yandex/ClickHouse/pull/6650) ([Weiqing Xu](https://github.com/weiqxu))
* Added a test to `transform_query_for_external_database`. [#6388](https://github.com/yandex/ClickHouse/pull/6388) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Add test for multiple materialized views for Kafka table. [#6509](https://github.com/yandex/ClickHouse/pull/6509) ([Ivan](https://github.com/abyss7))
* Make a better build scheme. [#6500](https://github.com/yandex/ClickHouse/pull/6500) ([Ivan](https://github.com/abyss7))
* Fixed `test_external_dictionaries` integration in case it was executed under non root user. [#6507](https://github.com/yandex/ClickHouse/pull/6507) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* The bug reproduces when total size of written packets exceeds `DBMS_DEFAULT_BUFFER_SIZE`. [#6204](https://github.com/yandex/ClickHouse/pull/6204) ([Yuriy Baranov](https://github.com/yurriy))
* Added a test for `RENAME` table race condition [#6752](https://github.com/yandex/ClickHouse/pull/6752) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Avoid data race on Settings in `KILL QUERY`. [#6753](https://github.com/yandex/ClickHouse/pull/6753) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Add integration test for handling errors by a cache dictionary. [#6755](https://github.com/yandex/ClickHouse/pull/6755) ([Vitaly Baranov](https://github.com/vitlibar))
* Move `input_format_defaults_for_omitted_fields` to incompatible changes [#6573](https://github.com/yandex/ClickHouse/pull/6573) ([Artem Zuikov](https://github.com/4ertus2))
* Disable parsing of ELF object files on Mac OS, because it makes no sense. [#6578](https://github.com/yandex/ClickHouse/pull/6578) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Attempt to make changelog generator better. [#6327](https://github.com/yandex/ClickHouse/pull/6327) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Adding `-Wshadow` switch to the GCC. [#6325](https://github.com/yandex/ClickHouse/pull/6325) ([kreuzerkrieg](https://github.com/kreuzerkrieg))
* Removed obsolete code for `mimalloc` support. [#6715](https://github.com/yandex/ClickHouse/pull/6715) ([alexey-milovidov](https://github.com/alexey-milovidov))
* `zlib-ng` determines x86 capabilities and saves this info to global variables. This is done in defalteInit call, which may be made by different threads simultaneously. To avoid multithreaded writes, do it on library startup. [#6141](https://github.com/yandex/ClickHouse/pull/6141) ([akuzm](https://github.com/akuzm))
* Regression test for a bug which in join which was fixed in [#5192](https://github.com/yandex/ClickHouse/issues/5192). [#6147](https://github.com/yandex/ClickHouse/pull/6147) ([Bakhtiyor Ruziev](https://github.com/theruziev))
* Fixed MSan report. [#6144](https://github.com/yandex/ClickHouse/pull/6144) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix flapping TTL test. [#6782](https://github.com/yandex/ClickHouse/pull/6782) ([Anton Popov](https://github.com/CurtizJ))
* Fixed false data race in `MergeTreeDataPart::is_frozen` field. [#6583](https://github.com/yandex/ClickHouse/pull/6583) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed timeouts in fuzz test. In previous version, it managed to find false hangup in query `SELECT * FROM numbers_mt(gccMurmurHash(''))`. [#6582](https://github.com/yandex/ClickHouse/pull/6582) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Added debug checks to `static_cast` of columns. [#6581](https://github.com/yandex/ClickHouse/pull/6581) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Support for Oracle Linux in official RPM packages. [#6356](https://github.com/yandex/ClickHouse/issues/6356) [#6585](https://github.com/yandex/ClickHouse/pull/6585) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Changed json perftests from `once` to `loop` type. [#6536](https://github.com/yandex/ClickHouse/pull/6536) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* `odbc-bridge.cpp` defines `main()` so it should not be included in `clickhouse-lib`. [#6538](https://github.com/yandex/ClickHouse/pull/6538) ([Orivej Desh](https://github.com/orivej))
* Test for crash in `FULL|RIGHT JOIN` with nulls in right table's keys. [#6362](https://github.com/yandex/ClickHouse/pull/6362) ([Artem Zuikov](https://github.com/4ertus2))
* Added a test for the limit on expansion of aliases just in case. [#6442](https://github.com/yandex/ClickHouse/pull/6442) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Added previous declaration checks for MySQL 8 integration. [#6569](https://github.com/yandex/ClickHouse/pull/6569) ([Rafael David Tinoco](https://github.com/rafaeldtinoco))
* Switched from `boost::filesystem` to `std::filesystem` where appropriate. [#6253](https://github.com/yandex/ClickHouse/pull/6253) [#6385](https://github.com/yandex/ClickHouse/pull/6385) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Added RPM packages to website. [#6251](https://github.com/yandex/ClickHouse/pull/6251) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Add a test for fixed `Unknown identifier` exception in `IN` section. [#6708](https://github.com/yandex/ClickHouse/pull/6708) ([Artem Zuikov](https://github.com/4ertus2))
* Simplify `shared_ptr_helper` because people facing difficulties understanding it. [#6675](https://github.com/yandex/ClickHouse/pull/6675) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Added performance tests for fixed Gorilla and DoubleDelta codec. [#6179](https://github.com/yandex/ClickHouse/pull/6179) ([Vasily Nemkov](https://github.com/Enmk))
* Split the integration test `test_dictionaries` into 4 separate tests. [#6776](https://github.com/yandex/ClickHouse/pull/6776) ([Vitaly Baranov](https://github.com/vitlibar))
* Fix PVS-Studio warning in `PipelineExecutor`. [#6777](https://github.com/yandex/ClickHouse/pull/6777) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Allow to use `library` dictionary source with ASan. [#6482](https://github.com/yandex/ClickHouse/pull/6482) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Added option to generate changelog from a list of PRs. [#6350](https://github.com/yandex/ClickHouse/pull/6350) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Lock the `TinyLog` storage when reading. [#6226](https://github.com/yandex/ClickHouse/pull/6226) ([akuzm](https://github.com/akuzm))
* Check for broken symlinks in CI. [#6634](https://github.com/yandex/ClickHouse/pull/6634) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Increase timeout for "stack overflow" test because it may take a long time in debug build. [#6637](https://github.com/yandex/ClickHouse/pull/6637) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Added a check for double whitespaces. [#6643](https://github.com/yandex/ClickHouse/pull/6643) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fix `new/delete` memory tracking when build with sanitizers. Tracking is not clear. It only prevents memory limit exceptions in tests. [#6450](https://github.com/yandex/ClickHouse/pull/6450) ([Artem Zuikov](https://github.com/4ertus2))
* Enable back the check of undefined symbols while linking. [#6453](https://github.com/yandex/ClickHouse/pull/6453) ([Ivan](https://github.com/abyss7))
* Avoid rebuilding `hyperscan` every day. [#6307](https://github.com/yandex/ClickHouse/pull/6307) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Fixed UBSan report in `ProtobufWriter`. [#6163](https://github.com/yandex/ClickHouse/pull/6163) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Don't allow to use query profiler with sanitizers because it is not compatible. [#6769](https://github.com/yandex/ClickHouse/pull/6769) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Add test for reloading a dictionary after fail by timer. [#6114](https://github.com/yandex/ClickHouse/pull/6114) ([Vitaly Baranov](https://github.com/vitlibar))
* Fix inconsistency in `PipelineExecutor::prepareProcessor` argument type. [#6494](https://github.com/yandex/ClickHouse/pull/6494) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Added a test for bad URIs. [#6493](https://github.com/yandex/ClickHouse/pull/6493) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Added more checks to `CAST` function. This should get more information about segmentation fault in fuzzy test. [#6346](https://github.com/yandex/ClickHouse/pull/6346) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Added `gcc-9` support to `docker/builder` container that builds image locally. [#6333](https://github.com/yandex/ClickHouse/pull/6333) ([Gleb Novikov](https://github.com/NanoBjorn))
* Test for primary key with `LowCardinality(String)`. [#5044](https://github.com/yandex/ClickHouse/issues/5044) [#6219](https://github.com/yandex/ClickHouse/pull/6219) ([dimarub2000](https://github.com/dimarub2000))
* Fixed tests affected by slow stack traces printing. [#6315](https://github.com/yandex/ClickHouse/pull/6315) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Add a test case for crash in `groupUniqArray` fixed in [#6029](https://github.com/yandex/ClickHouse/pull/6029). [#4402](https://github.com/yandex/ClickHouse/issues/4402) [#6129](https://github.com/yandex/ClickHouse/pull/6129) ([akuzm](https://github.com/akuzm))
* Fixed indices mutations tests. [#6645](https://github.com/yandex/ClickHouse/pull/6645) ([Nikita Vasilev](https://github.com/nikvas0))
* Attempt to fix performance test. [#6392](https://github.com/yandex/ClickHouse/pull/6392) ([alexey-milovidov](https://github.com/alexey-milovidov))
* In performance test, do not read query log for queries we didn't run. [#6427](https://github.com/yandex/ClickHouse/pull/6427) ([akuzm](https://github.com/akuzm))
* Materialized view now could be created with any low cardinality types regardless to the setting about suspicious low cardinality types. [#6428](https://github.com/yandex/ClickHouse/pull/6428) ([Olga Khvostikova](https://github.com/stavrolia))
* Updated tests for `send_logs_level` setting. [#6207](https://github.com/yandex/ClickHouse/pull/6207) ([Nikolai Kochetov](https://github.com/KochetovNicolai))
* Fix build under gcc-8.2. [#6196](https://github.com/yandex/ClickHouse/pull/6196) ([Max Akhmedov](https://github.com/zlobober))
* Fix build with internal libc++. [#6724](https://github.com/yandex/ClickHouse/pull/6724) ([Ivan](https://github.com/abyss7))
* Fix shared build with `rdkafka` library [#6101](https://github.com/yandex/ClickHouse/pull/6101) ([Ivan](https://github.com/abyss7))
* Fixes for Mac OS build (incomplete). [#6390](https://github.com/yandex/ClickHouse/pull/6390) ([alexey-milovidov](https://github.com/alexey-milovidov)) [#6429](https://github.com/yandex/ClickHouse/pull/6429) ([alex-zaitsev](https://github.com/alex-zaitsev))
* Fix "splitted" build. [#6618](https://github.com/yandex/ClickHouse/pull/6618) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Other build fixes: [#6186](https://github.com/yandex/ClickHouse/pull/6186) ([Amos Bird](https://github.com/amosbird)) [#6486](https://github.com/yandex/ClickHouse/pull/6486) [#6348](https://github.com/yandex/ClickHouse/pull/6348) ([vxider](https://github.com/Vxider)) [#6744](https://github.com/yandex/ClickHouse/pull/6744) ([Ivan](https://github.com/abyss7)) [#6016](https://github.com/yandex/ClickHouse/pull/6016) [#6421](https://github.com/yandex/ClickHouse/pull/6421) [#6491](https://github.com/yandex/ClickHouse/pull/6491) ([proller](https://github.com/proller))
* Fix kafka tests. [#6805](https://github.com/yandex/ClickHouse/pull/6805) ([Ivan](https://github.com/abyss7))
### Backward Incompatible Change
* Removed rarely used table function `catBoostPool` and storage `CatBoostPool`. If you have used this table function, please write email to `clickhouse-feedback@yandex-team.com`. Note that CatBoost integration remains and will be supported. [#6279](https://github.com/yandex/ClickHouse/pull/6279) ([alexey-milovidov](https://github.com/alexey-milovidov))
* Disable `ANY RIGHT JOIN` and `ANY FULL JOIN` by default. Set `any_join_get_any_from_right_table` setting to enable them. [#5126](https://github.com/yandex/ClickHouse/issues/5126) [#6351](https://github.com/yandex/ClickHouse/pull/6351) ([Artem Zuikov](https://github.com/4ertus2))
## ClickHouse release 19.13.3.26, 2019-08-22
### Bug Fix

View File

@ -156,6 +156,26 @@ endif ()
# Make sure the final executable has symbols exported
set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -rdynamic")
option (ADD_GDB_INDEX_FOR_GOLD "Set to add .gdb-index to resulting binaries for gold linker. NOOP if lld is used." 0)
if (NOT CMAKE_BUILD_TYPE_UC STREQUAL "RELEASE")
if (LINKER_NAME STREQUAL "lld")
set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--gdb-index")
set (CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} -Wl,--gdb-index")
message (STATUS "Adding .gdb-index via --gdb-index linker option.")
# we use another tool for gdb-index, because gold linker removes section .debug_aranges, which used inside clickhouse stacktraces
# http://sourceware-org.1504.n7.nabble.com/gold-No-debug-aranges-section-when-linking-with-gdb-index-td540965.html#a556932
elseif (LINKER_NAME STREQUAL "gold" AND ADD_GDB_INDEX_FOR_GOLD)
find_program (GDB_ADD_INDEX_EXE NAMES "gdb-add-index" DOC "Path to gdb-add-index executable")
if (NOT GDB_ADD_INDEX_EXE)
set (USE_GDB_ADD_INDEX 0)
message (WARNING "Cannot add gdb index to binaries, because gold linker is used, but gdb-add-index executable not found.")
else()
set (USE_GDB_ADD_INDEX 1)
message (STATUS "gdb-add-index found: ${GDB_ADD_INDEX_EXE}")
endif()
endif ()
endif()
cmake_host_system_information(RESULT AVAILABLE_PHYSICAL_MEMORY QUERY AVAILABLE_PHYSICAL_MEMORY) # Not available under freebsd
if(NOT AVAILABLE_PHYSICAL_MEMORY OR AVAILABLE_PHYSICAL_MEMORY GREATER 8000)
option(COMPILER_PIPE "-pipe compiler option [less /tmp usage, more ram usage]" ON)
@ -290,18 +310,6 @@ else ()
set (CLICKHOUSE_ETC_DIR "${CMAKE_INSTALL_PREFIX}/etc")
endif ()
option (UNBUNDLED "Try find all libraries in system. We recommend to avoid this mode for production builds, because we cannot guarantee exact versions and variants of libraries your system has installed. This mode exists for enthusiastic developers who search for trouble. Also it is useful for maintainers of OS packages." OFF)
if (UNBUNDLED)
set(NOT_UNBUNDLED 0)
else ()
set(NOT_UNBUNDLED 1)
endif ()
# Using system libs can cause lot of warnings in includes.
if (UNBUNDLED OR NOT (OS_LINUX OR APPLE) OR ARCH_32)
option (NO_WERROR "Disable -Werror compiler option" ON)
endif ()
message (STATUS "Building for: ${CMAKE_SYSTEM} ${CMAKE_SYSTEM_PROCESSOR} ${CMAKE_LIBRARY_ARCHITECTURE} ; USE_STATIC_LIBRARIES=${USE_STATIC_LIBRARIES} MAKE_STATIC_LIBRARIES=${MAKE_STATIC_LIBRARIES} SPLIT_SHARED=${SPLIT_SHARED_LIBRARIES} UNBUNDLED=${UNBUNDLED} CCACHE=${CCACHE_FOUND} ${CCACHE_VERSION}")
include(GNUInstallDirs)

View File

@ -13,9 +13,12 @@ ClickHouse is an open-source column-oriented database management system that all
* You can also [fill this form](https://forms.yandex.com/surveys/meet-yandex-clickhouse-team/) to meet Yandex ClickHouse team in person.
## Upcoming Events
* [ClickHouse Meetup in Munich](https://www.meetup.com/ClickHouse-Meetup-Munich/events/264185199/) on September 17.
* [ClickHouse Meetup in Paris](https://www.eventbrite.com/e/clickhouse-paris-meetup-2019-registration-68493270215) on October 3.
* [ClickHouse Meetup in San Francisco](https://www.meetup.com/San-Francisco-Bay-Area-ClickHouse-Meetup/events/264242199/) on October 9.
* [ClickHouse Meetup in Hong Kong](https://www.meetup.com/Hong-Kong-Machine-Learning-Meetup/events/263580542/) on October 17.
* [ClickHouse Meetup in Shenzhen](https://www.huodongxing.com/event/3483759917300) on October 20.
* [ClickHouse Meetup in Shanghai](https://www.huodongxing.com/event/4483760336000) on October 27.
* [ClickHouse Meetup in Tokyo](https://clickhouse.connpass.com/event/147001/) on November 14.
* [ClickHouse Meetup in Istanbul](https://www.eventbrite.com/e/clickhouse-meetup-istanbul-create-blazing-fast-experiences-w-clickhouse-tickets-73101120419) on November 19.
* [ClickHouse Meetup in Ankara](https://www.eventbrite.com/e/clickhouse-meetup-ankara-create-blazing-fast-experiences-w-clickhouse-tickets-73100530655) on November 21.

View File

@ -1,13 +1,13 @@
option (USE_INTERNAL_SPARCEHASH_LIBRARY "Set to FALSE to use system sparsehash library instead of bundled" ${NOT_UNBUNDLED})
option (USE_INTERNAL_SPARSEHASH_LIBRARY "Set to FALSE to use system sparsehash library instead of bundled" ${NOT_UNBUNDLED})
if (NOT USE_INTERNAL_SPARCEHASH_LIBRARY)
find_path (SPARCEHASH_INCLUDE_DIR NAMES sparsehash/sparse_hash_map PATHS ${SPARCEHASH_INCLUDE_PATHS})
if (NOT USE_INTERNAL_SPARSEHASH_LIBRARY)
find_path (SPARSEHASH_INCLUDE_DIR NAMES sparsehash/sparse_hash_map PATHS ${SPARSEHASH_INCLUDE_PATHS})
endif ()
if (SPARCEHASH_INCLUDE_DIR)
if (SPARSEHASH_INCLUDE_DIR)
else ()
set (USE_INTERNAL_SPARCEHASH_LIBRARY 1)
set (SPARCEHASH_INCLUDE_DIR "${ClickHouse_SOURCE_DIR}/contrib/libsparsehash")
set (USE_INTERNAL_SPARSEHASH_LIBRARY 1)
set (SPARSEHASH_INCLUDE_DIR "${ClickHouse_SOURCE_DIR}/contrib/sparsehash-c11")
endif ()
message (STATUS "Using sparsehash: ${SPARCEHASH_INCLUDE_DIR}")
message (STATUS "Using sparsehash: ${SPARSEHASH_INCLUDE_DIR}")

View File

@ -1,2 +0,0 @@
google-sparsehash@googlegroups.com

View File

@ -1,28 +0,0 @@
Copyright (c) 2005, Google Inc.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
* Neither the name of Google Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@ -1,188 +0,0 @@
== 23 Ferbruary 2012 ==
A backwards incompatibility arose from flattening the include headers
structure for the <google> folder.
This is now fixed in 2.0.2. You only need to upgrade if you had previously
included files from the <google/sparsehash> folder.
== 1 February 2012 ==
A minor bug related to the namespace switch from google to sparsehash
stopped the build from working when perftools is also installed.
This is now fixed in 2.0.1. You only need to upgrade if you have perftools
installed.
== 31 January 2012 ==
I've just released sparsehash 2.0.
The `google-sparsehash` project has been renamed to `sparsehash`. I
(csilvers) am stepping down as maintainer, to be replaced by the team
of Donovan Hide and Geoff Pike. Welcome to the team, Donovan and
Geoff! Donovan has been an active contributor to sparsehash bug
reports and discussions in the past, and Geoff has been closely
involved with sparsehash inside Google (in addition to writing the
[http://code.google.com/p/cityhash CityHash hash function]). The two
of them together should be a formidable force. For good.
I bumped the major version number up to 2 to reflect the new community
ownership of the project. All the
[http://sparsehash.googlecode.com/svn/tags/sparsehash-2.0/ChangeLog changes]
are related to the renaming.
The only functional change from sparsehash 1.12 is that I've renamed
the `google/` include-directory to be `sparsehash/` instead. New code
should `#include <sparsehash/sparse_hash_map>`/etc. I've kept the old
names around as forwarding headers to the new, so `#include
<google/sparse_hash_map>` will continue to work.
Note that the classes and functions remain in the `google` C++
namespace (I didn't change that to `sparsehash` as well); I think
that's a trickier transition, and can happen in a future release.
=== 18 January 2011 ===
The `google-sparsehash` Google Code page has been renamed to
`sparsehash`, in preparation for the project being renamed to
`sparsehash`. In the coming weeks, I'll be stepping down as
maintainer for the sparsehash project, and as part of that Google is
relinquishing ownership of the project; it will now be entirely
community run. The name change reflects that shift.
=== 20 December 2011 ===
I've just released sparsehash 1.12. This release features improved
I/O (serialization) support. Support is finally added to serialize
and unserialize `dense_hash_map`/`set`, paralleling the existing code
for `sparse_hash_map`/`set`. In addition, the serialization API has
gotten simpler, with a single `serialize()` method to write to disk,
and an `unserialize()` method to read from disk. Finally, support has
gotten more generic, with built-in support for both C `FILE*`s and C++
streams, and an extension mechanism to support arbitrary sources and
sinks.
There are also more minor changes, including minor bugfixes, an
improved deleted-key test, and a minor addition to the `sparsetable`
API. See the [http://google-sparsehash.googlecode.com/svn/tags/sparsehash-1.12/ChangeLog ChangeLog]
for full details.
=== 23 June 2011 ===
I've just released sparsehash 1.11. The major user-visible change is
that the default behavior is improved -- using the hash_map/set is
faster -- for hashtables where the key is a pointer. We now notice
that case and ignore the low 2-3 bits (which are almost always 0 for
pointers) when hashing.
Another user-visible change is we've removed the tests for whether the
STL (vector, pair, etc) is defined in the 'std' namespace. gcc 2.95
is the most recent compiler I know of to put STL types and functions
in the global namespace. If you need to use such an old compiler, do
not update to the latest sparsehash release.
We've also changed the internal tools we use to integrate
Googler-supplied patches to sparsehash into the opensource release.
These new tools should result in more frequent updates with better
change descriptions. They will also result in future ChangeLog
entries being much more verbose (for better or for worse).
A full list of changes is described in
[http://google-sparsehash.googlecode.com/svn/tags/sparsehash-1.11/ChangeLog ChangeLog].
=== 21 January 2011 ===
I've just released sparsehash 1.10. This fixes a performance
regression in sparsehash 1.8, where sparse_hash_map would copy
hashtable keys by value even when the key was explicitly a reference.
It also fixes compiler warnings from MSVC 10, which uses some c++0x
features that did not interact well with sparsehash.
There is no reason to upgrade unless you use references for your
hashtable keys, or compile with MSVC 10. A full list of changes is
described in
[http://google-sparsehash.googlecode.com/svn/tags/sparsehash-1.10/ChangeLog ChangeLog].
=== 24 September 2010 ===
I've just released sparsehash 1.9. This fixes a size regression in
sparsehash 1.8, where the new allocator would take up space in
`sparse_hash_map`, doubling the sparse_hash_map overhead (from 1-2
bits per bucket to 3 or so). All users are encouraged to upgrade.
This change also marks enums as being Plain Old Data, which can speed
up hashtables with enum keys and/or values. A full list of changes is
described in
[http://google-sparsehash.googlecode.com/svn/tags/sparsehash-1.9/ChangeLog ChangeLog].
=== 29 July 2010 ===
I've just released sparsehash 1.8. This includes improved support for
`Allocator`, including supporting the allocator constructor arg and
`get_allocator()` access method.
To work around a bug in gcc 4.0.x, I've renamed the static variables
`HT_OCCUPANCY_FLT` and `HT_SHRINK_FLT` to `HT_OCCUPANCY_PCT` and
`HT_SHRINK_PCT`, and changed their type from float to int. This
should not be a user-visible change, since these variables are only
used in the internal hashtable classes (sparsehash clients should use
`max_load_factor()` and `min_load_factor()` instead of modifying these
static variables), but if you do access these constants, you will need
to change your code.
Internally, the biggest change is a revamp of the test suite. It now
has more complete coverage, and a more capable timing tester. There
are other, more minor changes as well. A full list of changes is
described in the
[http://google-sparsehash.googlecode.com/svn/tags/sparsehash-1.8/ChangeLog ChangeLog].
=== 31 March 2010 ===
I've just released sparsehash 1.7. The major news here is the
addition of `Allocator` support. Previously, these hashtable classes
would just ignore the `Allocator` template parameter. They now
respect it, and even inherit `size_type`, `pointer`, etc. from the
allocator class. By default, they use a special allocator we provide
that uses libc `malloc` and `free` to allocate. The hash classes
notice when this special allocator is being used, and use `realloc`
when it can. This means that the default allocator is significantly
faster than custom allocators are likely to be (since realloc-like
functionality is not supported by STL allocators).
There are a few more minor changes as well. A full list of changes is
described in the
[http://google-sparsehash.googlecode.com/svn/tags/sparsehash-1.7/ChangeLog ChangeLog].
=== 11 January 2010 ===
I've just released sparsehash 1.6. The API has widened a bit with the
addition of `deleted_key()` and `empty_key()`, which let you query
what values these keys have. A few rather obscure bugs have been
fixed (such as an error when copying one hashtable into another when
the empty_keys differ). A full list of changes is described in the
[http://google-sparsehash.googlecode.com/svn/tags/sparsehash-1.6/ChangeLog ChangeLog].
=== 9 May 2009 ===
I've just released sparsehash 1.5.1. Hot on the heels of sparsehash
1.5, this release fixes a longstanding bug in the sparsehash code,
where `equal_range` would always return an empty range. It now works
as documented. All sparsehash users are encouraged to upgrade.
=== 7 May 2009 ===
I've just released sparsehash 1.5. This release introduces tr1
compatibility: I've added `rehash`, `begin(i)`, and other methods that
are expected to be part of the `unordered_map` API once `tr1` in
introduced. This allows `sparse_hash_map`, `dense_hash_map`,
`sparse_hash_set`, and `dense_hash_set` to be (almost) drop-in
replacements for `unordered_map` and `unordered_set`.
There is no need to upgrade unless you need this functionality, or
need one of the other, more minor, changes described in the
[http://google-sparsehash.googlecode.com/svn/tags/sparsehash-1.5/ChangeLog ChangeLog].

View File

@ -1,143 +0,0 @@
This directory contains several hash-map implementations, similar in
API to SGI's hash_map class, but with different performance
characteristics. sparse_hash_map uses very little space overhead, 1-2
bits per entry. dense_hash_map is very fast, particularly on lookup.
(sparse_hash_set and dense_hash_set are the set versions of these
routines.) On the other hand, these classes have requirements that
may not make them appropriate for all applications.
All these implementation use a hashtable with internal quadratic
probing. This method is space-efficient -- there is no pointer
overhead -- and time-efficient for good hash functions.
COMPILING
---------
To compile test applications with these classes, run ./configure
followed by make. To install these header files on your system, run
'make install'. (On Windows, the instructions are different; see
README_windows.txt.) See INSTALL for more details.
This code should work on any modern C++ system. It has been tested on
Linux (Ubuntu, Fedora, RedHat, Debian), Solaris 10 x86, FreeBSD 6.0,
OS X 10.3 and 10.4, and Windows under both VC++7 and VC++8.
USING
-----
See the html files in the doc directory for small example programs
that use these classes. It's enough to just include the header file:
#include <sparsehash/sparse_hash_map> // or sparse_hash_set, dense_hash_map, ...
google::sparse_hash_set<int, int> number_mapper;
and use the class the way you would other hash-map implementations.
(Though see "API" below for caveats.)
By default (you can change it via a flag to ./configure), these hash
implementations are defined in the google namespace.
API
---
The API for sparse_hash_map, dense_hash_map, sparse_hash_set, and
dense_hash_set, are a superset of the API of SGI's hash_map class.
See doc/sparse_hash_map.html, et al., for more information about the
API.
The usage of these classes differ from SGI's hash_map, and other
hashtable implementations, in the following major ways:
1) dense_hash_map requires you to set aside one key value as the
'empty bucket' value, set via the set_empty_key() method. This
*MUST* be called before you can use the dense_hash_map. It is
illegal to insert any elements into a dense_hash_map whose key is
equal to the empty-key.
2) For both dense_hash_map and sparse_hash_map, if you wish to delete
elements from the hashtable, you must set aside a key value as the
'deleted bucket' value, set via the set_deleted_key() method. If
your hash-map is insert-only, there is no need to call this
method. If you call set_deleted_key(), it is illegal to insert any
elements into a dense_hash_map or sparse_hash_map whose key is
equal to the deleted-key.
3) These hash-map implementation support I/O. See below.
There are also some smaller differences:
1) The constructor takes an optional argument that specifies the
number of elements you expect to insert into the hashtable. This
differs from SGI's hash_map implementation, which takes an optional
number of buckets.
2) erase() does not immediately reclaim memory. As a consequence,
erase() does not invalidate any iterators, making loops like this
correct:
for (it = ht.begin(); it != ht.end(); ++it)
if (...) ht.erase(it);
As another consequence, a series of erase() calls can leave your
hashtable using more memory than it needs to. The hashtable will
automatically compact at the next call to insert(), but to
manually compact a hashtable, you can call
ht.resize(0)
I/O
---
In addition to the normal hash-map operations, sparse_hash_map can
read and write hashtables to disk. (dense_hash_map also has the API,
but it has not yet been implemented, and writes will always fail.)
In the simplest case, writing a hashtable is as easy as calling two
methods on the hashtable:
ht.write_metadata(fp);
ht.write_nopointer_data(fp);
Reading in this data is equally simple:
google::sparse_hash_map<...> ht;
ht.read_metadata(fp);
ht.read_nopointer_data(fp);
The above is sufficient if the key and value do not contain any
pointers: they are basic C types or agglomorations of basic C types.
If the key and/or value do contain pointers, you can still store the
hashtable by replacing write_nopointer_data() with a custom writing
routine. See sparse_hash_map.html et al. for more information.
SPARSETABLE
-----------
In addition to the hash-map and hash-set classes, this package also
provides sparsetable.h, an array implementation that uses space
proportional to the number of elements in the array, rather than the
maximum element index. It uses very little space overhead: 1 bit per
entry. See doc/sparsetable.html for the API.
RESOURCE USAGE
--------------
* sparse_hash_map has memory overhead of about 2 bits per hash-map
entry.
* dense_hash_map has a factor of 2-3 memory overhead: if your
hashtable data takes X bytes, dense_hash_map will use 3X-4X memory
total.
Hashtables tend to double in size when resizing, creating an
additional 50% space overhead. dense_hash_map does in fact have a
significant "high water mark" memory use requirement.
sparse_hash_map, however, is written to need very little space
overhead when resizing: only a few bits per hashtable entry.
PERFORMANCE
-----------
You can compile and run the included file time_hash_map.cc to examine
the performance of sparse_hash_map, dense_hash_map, and your native
hash_map implementation on your system. One test against the
SGI hash_map implementation gave the following timing information for
a simple find() call:
SGI hash_map: 22 ns
dense_hash_map: 13 ns
sparse_hash_map: 117 ns
SGI map: 113 ns
See doc/performance.html for more detailed charts on resource usage
and performance data.
---
16 March 2005
(Last updated: 12 September 2010)

View File

@ -1,369 +0,0 @@
// Copyright (c) 2005, Google Inc.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// ----
//
// This is just a very thin wrapper over densehashtable.h, just
// like sgi stl's stl_hash_map is a very thin wrapper over
// stl_hashtable. The major thing we define is operator[], because
// we have a concept of a data_type which stl_hashtable doesn't
// (it only has a key and a value).
//
// NOTE: this is exactly like sparse_hash_map.h, with the word
// "sparse" replaced by "dense", except for the addition of
// set_empty_key().
//
// YOU MUST CALL SET_EMPTY_KEY() IMMEDIATELY AFTER CONSTRUCTION.
//
// Otherwise your program will die in mysterious ways. (Note if you
// use the constructor that takes an InputIterator range, you pass in
// the empty key in the constructor, rather than after. As a result,
// this constructor differs from the standard STL version.)
//
// In other respects, we adhere mostly to the STL semantics for
// hash-map. One important exception is that insert() may invalidate
// iterators entirely -- STL semantics are that insert() may reorder
// iterators, but they all still refer to something valid in the
// hashtable. Not so for us. Likewise, insert() may invalidate
// pointers into the hashtable. (Whether insert invalidates iterators
// and pointers depends on whether it results in a hashtable resize).
// On the plus side, delete() doesn't invalidate iterators or pointers
// at all, or even change the ordering of elements.
//
// Here are a few "power user" tips:
//
// 1) set_deleted_key():
// If you want to use erase() you *must* call set_deleted_key(),
// in addition to set_empty_key(), after construction.
// The deleted and empty keys must differ.
//
// 2) resize(0):
// When an item is deleted, its memory isn't freed right
// away. This allows you to iterate over a hashtable,
// and call erase(), without invalidating the iterator.
// To force the memory to be freed, call resize(0).
// For tr1 compatibility, this can also be called as rehash(0).
//
// 3) min_load_factor(0.0)
// Setting the minimum load factor to 0.0 guarantees that
// the hash table will never shrink.
//
// Roughly speaking:
// (1) dense_hash_map: fastest, uses the most memory unless entries are small
// (2) sparse_hash_map: slowest, uses the least memory
// (3) hash_map / unordered_map (STL): in the middle
//
// Typically I use sparse_hash_map when I care about space and/or when
// I need to save the hashtable on disk. I use hash_map otherwise. I
// don't personally use dense_hash_set ever; some people use it for
// small sets with lots of lookups.
//
// - dense_hash_map has, typically, about 78% memory overhead (if your
// data takes up X bytes, the hash_map uses .78X more bytes in overhead).
// - sparse_hash_map has about 4 bits overhead per entry.
// - sparse_hash_map can be 3-7 times slower than the others for lookup and,
// especially, inserts. See time_hash_map.cc for details.
//
// See /usr/(local/)?doc/sparsehash-*/dense_hash_map.html
// for information about how to use this class.
#ifndef _DENSE_HASH_MAP_H_
#define _DENSE_HASH_MAP_H_
#include <sparsehash/internal/sparseconfig.h>
#include <algorithm> // needed by stl_alloc
#include <functional> // for equal_to<>, select1st<>, etc
#include <memory> // for alloc
#include <utility> // for pair<>
#include <sparsehash/internal/densehashtable.h> // IWYU pragma: export
#include <sparsehash/internal/libc_allocator_with_realloc.h>
#include HASH_FUN_H // for hash<>
_START_GOOGLE_NAMESPACE_
template <class Key, class T,
class HashFcn = SPARSEHASH_HASH<Key>, // defined in sparseconfig.h
class EqualKey = std::equal_to<Key>,
class Alloc = libc_allocator_with_realloc<std::pair<const Key, T> > >
class dense_hash_map {
private:
// Apparently select1st is not stl-standard, so we define our own
struct SelectKey {
typedef const Key& result_type;
const Key& operator()(const std::pair<const Key, T>& p) const {
return p.first;
}
};
struct SetKey {
void operator()(std::pair<const Key, T>* value, const Key& new_key) const {
*const_cast<Key*>(&value->first) = new_key;
// It would be nice to clear the rest of value here as well, in
// case it's taking up a lot of memory. We do this by clearing
// the value. This assumes T has a zero-arg constructor!
value->second = T();
}
};
// For operator[].
struct DefaultValue {
std::pair<const Key, T> operator()(const Key& key) {
return std::make_pair(key, T());
}
};
// The actual data
typedef dense_hashtable<std::pair<const Key, T>, Key, HashFcn, SelectKey,
SetKey, EqualKey, Alloc> ht;
ht rep;
public:
typedef typename ht::key_type key_type;
typedef T data_type;
typedef T mapped_type;
typedef typename ht::value_type value_type;
typedef typename ht::hasher hasher;
typedef typename ht::key_equal key_equal;
typedef Alloc allocator_type;
typedef typename ht::size_type size_type;
typedef typename ht::difference_type difference_type;
typedef typename ht::pointer pointer;
typedef typename ht::const_pointer const_pointer;
typedef typename ht::reference reference;
typedef typename ht::const_reference const_reference;
typedef typename ht::iterator iterator;
typedef typename ht::const_iterator const_iterator;
typedef typename ht::local_iterator local_iterator;
typedef typename ht::const_local_iterator const_local_iterator;
// Iterator functions
iterator begin() { return rep.begin(); }
iterator end() { return rep.end(); }
const_iterator begin() const { return rep.begin(); }
const_iterator end() const { return rep.end(); }
// These come from tr1's unordered_map. For us, a bucket has 0 or 1 elements.
local_iterator begin(size_type i) { return rep.begin(i); }
local_iterator end(size_type i) { return rep.end(i); }
const_local_iterator begin(size_type i) const { return rep.begin(i); }
const_local_iterator end(size_type i) const { return rep.end(i); }
// Accessor functions
allocator_type get_allocator() const { return rep.get_allocator(); }
hasher hash_funct() const { return rep.hash_funct(); }
hasher hash_function() const { return hash_funct(); }
key_equal key_eq() const { return rep.key_eq(); }
// Constructors
explicit dense_hash_map(size_type expected_max_items_in_table = 0,
const hasher& hf = hasher(),
const key_equal& eql = key_equal(),
const allocator_type& alloc = allocator_type())
: rep(expected_max_items_in_table, hf, eql, SelectKey(), SetKey(), alloc) {
}
template <class InputIterator>
dense_hash_map(InputIterator f, InputIterator l,
const key_type& empty_key_val,
size_type expected_max_items_in_table = 0,
const hasher& hf = hasher(),
const key_equal& eql = key_equal(),
const allocator_type& alloc = allocator_type())
: rep(expected_max_items_in_table, hf, eql, SelectKey(), SetKey(), alloc) {
set_empty_key(empty_key_val);
rep.insert(f, l);
}
// We use the default copy constructor
// We use the default operator=()
// We use the default destructor
void clear() { rep.clear(); }
// This clears the hash map without resizing it down to the minimum
// bucket count, but rather keeps the number of buckets constant
void clear_no_resize() { rep.clear_no_resize(); }
void swap(dense_hash_map& hs) { rep.swap(hs.rep); }
// Functions concerning size
size_type size() const { return rep.size(); }
size_type max_size() const { return rep.max_size(); }
bool empty() const { return rep.empty(); }
size_type bucket_count() const { return rep.bucket_count(); }
size_type max_bucket_count() const { return rep.max_bucket_count(); }
// These are tr1 methods. bucket() is the bucket the key is or would be in.
size_type bucket_size(size_type i) const { return rep.bucket_size(i); }
size_type bucket(const key_type& key) const { return rep.bucket(key); }
float load_factor() const {
return size() * 1.0f / bucket_count();
}
float max_load_factor() const {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
return grow;
}
void max_load_factor(float new_grow) {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
rep.set_resizing_parameters(shrink, new_grow);
}
// These aren't tr1 methods but perhaps ought to be.
float min_load_factor() const {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
return shrink;
}
void min_load_factor(float new_shrink) {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
rep.set_resizing_parameters(new_shrink, grow);
}
// Deprecated; use min_load_factor() or max_load_factor() instead.
void set_resizing_parameters(float shrink, float grow) {
rep.set_resizing_parameters(shrink, grow);
}
void resize(size_type hint) { rep.resize(hint); }
void rehash(size_type hint) { resize(hint); } // the tr1 name
// Lookup routines
iterator find(const key_type& key) { return rep.find(key); }
const_iterator find(const key_type& key) const { return rep.find(key); }
data_type& operator[](const key_type& key) { // This is our value-add!
// If key is in the hashtable, returns find(key)->second,
// otherwise returns insert(value_type(key, T()).first->second.
// Note it does not create an empty T unless the find fails.
return rep.template find_or_insert<DefaultValue>(key).second;
}
size_type count(const key_type& key) const { return rep.count(key); }
std::pair<iterator, iterator> equal_range(const key_type& key) {
return rep.equal_range(key);
}
std::pair<const_iterator, const_iterator> equal_range(const key_type& key)
const {
return rep.equal_range(key);
}
// Insertion routines
std::pair<iterator, bool> insert(const value_type& obj) {
return rep.insert(obj);
}
template <class InputIterator> void insert(InputIterator f, InputIterator l) {
rep.insert(f, l);
}
void insert(const_iterator f, const_iterator l) {
rep.insert(f, l);
}
// Required for std::insert_iterator; the passed-in iterator is ignored.
iterator insert(iterator, const value_type& obj) {
return insert(obj).first;
}
// Deletion and empty routines
// THESE ARE NON-STANDARD! I make you specify an "impossible" key
// value to identify deleted and empty buckets. You can change the
// deleted key as time goes on, or get rid of it entirely to be insert-only.
void set_empty_key(const key_type& key) { // YOU MUST CALL THIS!
rep.set_empty_key(value_type(key, data_type())); // rep wants a value
}
key_type empty_key() const {
return rep.empty_key().first; // rep returns a value
}
void set_deleted_key(const key_type& key) { rep.set_deleted_key(key); }
void clear_deleted_key() { rep.clear_deleted_key(); }
key_type deleted_key() const { return rep.deleted_key(); }
// These are standard
size_type erase(const key_type& key) { return rep.erase(key); }
void erase(iterator it) { rep.erase(it); }
void erase(iterator f, iterator l) { rep.erase(f, l); }
// Comparison
bool operator==(const dense_hash_map& hs) const { return rep == hs.rep; }
bool operator!=(const dense_hash_map& hs) const { return rep != hs.rep; }
// I/O -- this is an add-on for writing hash map to disk
//
// For maximum flexibility, this does not assume a particular
// file type (though it will probably be a FILE *). We just pass
// the fp through to rep.
// If your keys and values are simple enough, you can pass this
// serializer to serialize()/unserialize(). "Simple enough" means
// value_type is a POD type that contains no pointers. Note,
// however, we don't try to normalize endianness.
typedef typename ht::NopointerSerializer NopointerSerializer;
// serializer: a class providing operator()(OUTPUT*, const value_type&)
// (writing value_type to OUTPUT). You can specify a
// NopointerSerializer object if appropriate (see above).
// fp: either a FILE*, OR an ostream*/subclass_of_ostream*, OR a
// pointer to a class providing size_t Write(const void*, size_t),
// which writes a buffer into a stream (which fp presumably
// owns) and returns the number of bytes successfully written.
// Note basic_ostream<not_char> is not currently supported.
template <typename ValueSerializer, typename OUTPUT>
bool serialize(ValueSerializer serializer, OUTPUT* fp) {
return rep.serialize(serializer, fp);
}
// serializer: a functor providing operator()(INPUT*, value_type*)
// (reading from INPUT and into value_type). You can specify a
// NopointerSerializer object if appropriate (see above).
// fp: either a FILE*, OR an istream*/subclass_of_istream*, OR a
// pointer to a class providing size_t Read(void*, size_t),
// which reads into a buffer from a stream (which fp presumably
// owns) and returns the number of bytes successfully read.
// Note basic_istream<not_char> is not currently supported.
// NOTE: Since value_type is std::pair<const Key, T>, ValueSerializer
// may need to do a const cast in order to fill in the key.
template <typename ValueSerializer, typename INPUT>
bool unserialize(ValueSerializer serializer, INPUT* fp) {
return rep.unserialize(serializer, fp);
}
};
// We need a global swap as well
template <class Key, class T, class HashFcn, class EqualKey, class Alloc>
inline void swap(dense_hash_map<Key, T, HashFcn, EqualKey, Alloc>& hm1,
dense_hash_map<Key, T, HashFcn, EqualKey, Alloc>& hm2) {
hm1.swap(hm2);
}
_END_GOOGLE_NAMESPACE_
#endif /* _DENSE_HASH_MAP_H_ */

View File

@ -1,338 +0,0 @@
// Copyright (c) 2005, Google Inc.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// ---
//
// This is just a very thin wrapper over densehashtable.h, just
// like sgi stl's stl_hash_set is a very thin wrapper over
// stl_hashtable. The major thing we define is operator[], because
// we have a concept of a data_type which stl_hashtable doesn't
// (it only has a key and a value).
//
// This is more different from dense_hash_map than you might think,
// because all iterators for sets are const (you obviously can't
// change the key, and for sets there is no value).
//
// NOTE: this is exactly like sparse_hash_set.h, with the word
// "sparse" replaced by "dense", except for the addition of
// set_empty_key().
//
// YOU MUST CALL SET_EMPTY_KEY() IMMEDIATELY AFTER CONSTRUCTION.
//
// Otherwise your program will die in mysterious ways. (Note if you
// use the constructor that takes an InputIterator range, you pass in
// the empty key in the constructor, rather than after. As a result,
// this constructor differs from the standard STL version.)
//
// In other respects, we adhere mostly to the STL semantics for
// hash-map. One important exception is that insert() may invalidate
// iterators entirely -- STL semantics are that insert() may reorder
// iterators, but they all still refer to something valid in the
// hashtable. Not so for us. Likewise, insert() may invalidate
// pointers into the hashtable. (Whether insert invalidates iterators
// and pointers depends on whether it results in a hashtable resize).
// On the plus side, delete() doesn't invalidate iterators or pointers
// at all, or even change the ordering of elements.
//
// Here are a few "power user" tips:
//
// 1) set_deleted_key():
// If you want to use erase() you must call set_deleted_key(),
// in addition to set_empty_key(), after construction.
// The deleted and empty keys must differ.
//
// 2) resize(0):
// When an item is deleted, its memory isn't freed right
// away. This allows you to iterate over a hashtable,
// and call erase(), without invalidating the iterator.
// To force the memory to be freed, call resize(0).
// For tr1 compatibility, this can also be called as rehash(0).
//
// 3) min_load_factor(0.0)
// Setting the minimum load factor to 0.0 guarantees that
// the hash table will never shrink.
//
// Roughly speaking:
// (1) dense_hash_set: fastest, uses the most memory unless entries are small
// (2) sparse_hash_set: slowest, uses the least memory
// (3) hash_set / unordered_set (STL): in the middle
//
// Typically I use sparse_hash_set when I care about space and/or when
// I need to save the hashtable on disk. I use hash_set otherwise. I
// don't personally use dense_hash_set ever; some people use it for
// small sets with lots of lookups.
//
// - dense_hash_set has, typically, about 78% memory overhead (if your
// data takes up X bytes, the hash_set uses .78X more bytes in overhead).
// - sparse_hash_set has about 4 bits overhead per entry.
// - sparse_hash_set can be 3-7 times slower than the others for lookup and,
// especially, inserts. See time_hash_map.cc for details.
//
// See /usr/(local/)?doc/sparsehash-*/dense_hash_set.html
// for information about how to use this class.
#ifndef _DENSE_HASH_SET_H_
#define _DENSE_HASH_SET_H_
#include <sparsehash/internal/sparseconfig.h>
#include <algorithm> // needed by stl_alloc
#include <functional> // for equal_to<>, select1st<>, etc
#include <memory> // for alloc
#include <utility> // for pair<>
#include <sparsehash/internal/densehashtable.h> // IWYU pragma: export
#include <sparsehash/internal/libc_allocator_with_realloc.h>
#include HASH_FUN_H // for hash<>
_START_GOOGLE_NAMESPACE_
template <class Value,
class HashFcn = SPARSEHASH_HASH<Value>, // defined in sparseconfig.h
class EqualKey = std::equal_to<Value>,
class Alloc = libc_allocator_with_realloc<Value> >
class dense_hash_set {
private:
// Apparently identity is not stl-standard, so we define our own
struct Identity {
typedef const Value& result_type;
const Value& operator()(const Value& v) const { return v; }
};
struct SetKey {
void operator()(Value* value, const Value& new_key) const {
*value = new_key;
}
};
// The actual data
typedef dense_hashtable<Value, Value, HashFcn, Identity, SetKey,
EqualKey, Alloc> ht;
ht rep;
public:
typedef typename ht::key_type key_type;
typedef typename ht::value_type value_type;
typedef typename ht::hasher hasher;
typedef typename ht::key_equal key_equal;
typedef Alloc allocator_type;
typedef typename ht::size_type size_type;
typedef typename ht::difference_type difference_type;
typedef typename ht::const_pointer pointer;
typedef typename ht::const_pointer const_pointer;
typedef typename ht::const_reference reference;
typedef typename ht::const_reference const_reference;
typedef typename ht::const_iterator iterator;
typedef typename ht::const_iterator const_iterator;
typedef typename ht::const_local_iterator local_iterator;
typedef typename ht::const_local_iterator const_local_iterator;
// Iterator functions -- recall all iterators are const
iterator begin() const { return rep.begin(); }
iterator end() const { return rep.end(); }
// These come from tr1's unordered_set. For us, a bucket has 0 or 1 elements.
local_iterator begin(size_type i) const { return rep.begin(i); }
local_iterator end(size_type i) const { return rep.end(i); }
// Accessor functions
allocator_type get_allocator() const { return rep.get_allocator(); }
hasher hash_funct() const { return rep.hash_funct(); }
hasher hash_function() const { return hash_funct(); } // tr1 name
key_equal key_eq() const { return rep.key_eq(); }
// Constructors
explicit dense_hash_set(size_type expected_max_items_in_table = 0,
const hasher& hf = hasher(),
const key_equal& eql = key_equal(),
const allocator_type& alloc = allocator_type())
: rep(expected_max_items_in_table, hf, eql, Identity(), SetKey(), alloc) {
}
template <class InputIterator>
dense_hash_set(InputIterator f, InputIterator l,
const key_type& empty_key_val,
size_type expected_max_items_in_table = 0,
const hasher& hf = hasher(),
const key_equal& eql = key_equal(),
const allocator_type& alloc = allocator_type())
: rep(expected_max_items_in_table, hf, eql, Identity(), SetKey(), alloc) {
set_empty_key(empty_key_val);
rep.insert(f, l);
}
// We use the default copy constructor
// We use the default operator=()
// We use the default destructor
void clear() { rep.clear(); }
// This clears the hash set without resizing it down to the minimum
// bucket count, but rather keeps the number of buckets constant
void clear_no_resize() { rep.clear_no_resize(); }
void swap(dense_hash_set& hs) { rep.swap(hs.rep); }
// Functions concerning size
size_type size() const { return rep.size(); }
size_type max_size() const { return rep.max_size(); }
bool empty() const { return rep.empty(); }
size_type bucket_count() const { return rep.bucket_count(); }
size_type max_bucket_count() const { return rep.max_bucket_count(); }
// These are tr1 methods. bucket() is the bucket the key is or would be in.
size_type bucket_size(size_type i) const { return rep.bucket_size(i); }
size_type bucket(const key_type& key) const { return rep.bucket(key); }
float load_factor() const {
return size() * 1.0f / bucket_count();
}
float max_load_factor() const {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
return grow;
}
void max_load_factor(float new_grow) {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
rep.set_resizing_parameters(shrink, new_grow);
}
// These aren't tr1 methods but perhaps ought to be.
float min_load_factor() const {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
return shrink;
}
void min_load_factor(float new_shrink) {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
rep.set_resizing_parameters(new_shrink, grow);
}
// Deprecated; use min_load_factor() or max_load_factor() instead.
void set_resizing_parameters(float shrink, float grow) {
rep.set_resizing_parameters(shrink, grow);
}
void resize(size_type hint) { rep.resize(hint); }
void rehash(size_type hint) { resize(hint); } // the tr1 name
// Lookup routines
iterator find(const key_type& key) const { return rep.find(key); }
size_type count(const key_type& key) const { return rep.count(key); }
std::pair<iterator, iterator> equal_range(const key_type& key) const {
return rep.equal_range(key);
}
// Insertion routines
std::pair<iterator, bool> insert(const value_type& obj) {
std::pair<typename ht::iterator, bool> p = rep.insert(obj);
return std::pair<iterator, bool>(p.first, p.second); // const to non-const
}
template <class InputIterator> void insert(InputIterator f, InputIterator l) {
rep.insert(f, l);
}
void insert(const_iterator f, const_iterator l) {
rep.insert(f, l);
}
// Required for std::insert_iterator; the passed-in iterator is ignored.
iterator insert(iterator, const value_type& obj) {
return insert(obj).first;
}
// Deletion and empty routines
// THESE ARE NON-STANDARD! I make you specify an "impossible" key
// value to identify deleted and empty buckets. You can change the
// deleted key as time goes on, or get rid of it entirely to be insert-only.
void set_empty_key(const key_type& key) { rep.set_empty_key(key); }
key_type empty_key() const { return rep.empty_key(); }
void set_deleted_key(const key_type& key) { rep.set_deleted_key(key); }
void clear_deleted_key() { rep.clear_deleted_key(); }
key_type deleted_key() const { return rep.deleted_key(); }
// These are standard
size_type erase(const key_type& key) { return rep.erase(key); }
void erase(iterator it) { rep.erase(it); }
void erase(iterator f, iterator l) { rep.erase(f, l); }
// Comparison
bool operator==(const dense_hash_set& hs) const { return rep == hs.rep; }
bool operator!=(const dense_hash_set& hs) const { return rep != hs.rep; }
// I/O -- this is an add-on for writing metainformation to disk
//
// For maximum flexibility, this does not assume a particular
// file type (though it will probably be a FILE *). We just pass
// the fp through to rep.
// If your keys and values are simple enough, you can pass this
// serializer to serialize()/unserialize(). "Simple enough" means
// value_type is a POD type that contains no pointers. Note,
// however, we don't try to normalize endianness.
typedef typename ht::NopointerSerializer NopointerSerializer;
// serializer: a class providing operator()(OUTPUT*, const value_type&)
// (writing value_type to OUTPUT). You can specify a
// NopointerSerializer object if appropriate (see above).
// fp: either a FILE*, OR an ostream*/subclass_of_ostream*, OR a
// pointer to a class providing size_t Write(const void*, size_t),
// which writes a buffer into a stream (which fp presumably
// owns) and returns the number of bytes successfully written.
// Note basic_ostream<not_char> is not currently supported.
template <typename ValueSerializer, typename OUTPUT>
bool serialize(ValueSerializer serializer, OUTPUT* fp) {
return rep.serialize(serializer, fp);
}
// serializer: a functor providing operator()(INPUT*, value_type*)
// (reading from INPUT and into value_type). You can specify a
// NopointerSerializer object if appropriate (see above).
// fp: either a FILE*, OR an istream*/subclass_of_istream*, OR a
// pointer to a class providing size_t Read(void*, size_t),
// which reads into a buffer from a stream (which fp presumably
// owns) and returns the number of bytes successfully read.
// Note basic_istream<not_char> is not currently supported.
template <typename ValueSerializer, typename INPUT>
bool unserialize(ValueSerializer serializer, INPUT* fp) {
return rep.unserialize(serializer, fp);
}
};
template <class Val, class HashFcn, class EqualKey, class Alloc>
inline void swap(dense_hash_set<Val, HashFcn, EqualKey, Alloc>& hs1,
dense_hash_set<Val, HashFcn, EqualKey, Alloc>& hs2) {
hs1.swap(hs2);
}
_END_GOOGLE_NAMESPACE_
#endif /* _DENSE_HASH_SET_H_ */

File diff suppressed because it is too large Load Diff

View File

@ -1,381 +0,0 @@
// Copyright (c) 2010, Google Inc.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// ---
//
// Provides classes shared by both sparse and dense hashtable.
//
// sh_hashtable_settings has parameters for growing and shrinking
// a hashtable. It also packages zero-size functor (ie. hasher).
//
// Other functions and classes provide common code for serializing
// and deserializing hashtables to a stream (such as a FILE*).
#ifndef UTIL_GTL_HASHTABLE_COMMON_H_
#define UTIL_GTL_HASHTABLE_COMMON_H_
#include <sparsehash/internal/sparseconfig.h>
#include <assert.h>
#include <stdio.h>
#include <stddef.h> // for size_t
#include <iosfwd>
#include <stdexcept> // For length_error
_START_GOOGLE_NAMESPACE_
template <bool> struct SparsehashCompileAssert { };
#define SPARSEHASH_COMPILE_ASSERT(expr, msg) \
static_assert(expr, #msg)
namespace sparsehash_internal {
// Adaptor methods for reading/writing data from an INPUT or OUPTUT
// variable passed to serialize() or unserialize(). For now we
// have implemented INPUT/OUTPUT for FILE*, istream*/ostream* (note
// they are pointers, unlike typical use), or else a pointer to
// something that supports a Read()/Write() method.
//
// For technical reasons, we implement read_data/write_data in two
// stages. The actual work is done in *_data_internal, which takes
// the stream argument twice: once as a template type, and once with
// normal type information. (We only use the second version.) We do
// this because of how C++ picks what function overload to use. If we
// implemented this the naive way:
// bool read_data(istream* is, const void* data, size_t length);
// template<typename T> read_data(T* fp, const void* data, size_t length);
// C++ would prefer the second version for every stream type except
// istream. However, we want C++ to prefer the first version for
// streams that are *subclasses* of istream, such as istringstream.
// This is not possible given the way template types are resolved. So
// we split the stream argument in two, one of which is templated and
// one of which is not. The specialized functions (like the istream
// version above) ignore the template arg and use the second, 'type'
// arg, getting subclass matching as normal. The 'catch-all'
// functions (the second version above) use the template arg to deduce
// the type, and use a second, void* arg to achieve the desired
// 'catch-all' semantics.
// ----- low-level I/O for FILE* ----
template<typename Ignored>
inline bool read_data_internal(Ignored*, FILE* fp,
void* data, size_t length) {
return fread(data, length, 1, fp) == 1;
}
template<typename Ignored>
inline bool write_data_internal(Ignored*, FILE* fp,
const void* data, size_t length) {
return fwrite(data, length, 1, fp) == 1;
}
// ----- low-level I/O for iostream ----
// We want the caller to be responsible for #including <iostream>, not
// us, because iostream is a big header! According to the standard,
// it's only legal to delay the instantiation the way we want to if
// the istream/ostream is a template type. So we jump through hoops.
template<typename ISTREAM>
inline bool read_data_internal_for_istream(ISTREAM* fp,
void* data, size_t length) {
return fp->read(reinterpret_cast<char*>(data), length).good();
}
template<typename Ignored>
inline bool read_data_internal(Ignored*, std::istream* fp,
void* data, size_t length) {
return read_data_internal_for_istream(fp, data, length);
}
template<typename OSTREAM>
inline bool write_data_internal_for_ostream(OSTREAM* fp,
const void* data, size_t length) {
return fp->write(reinterpret_cast<const char*>(data), length).good();
}
template<typename Ignored>
inline bool write_data_internal(Ignored*, std::ostream* fp,
const void* data, size_t length) {
return write_data_internal_for_ostream(fp, data, length);
}
// ----- low-level I/O for custom streams ----
// The INPUT type needs to support a Read() method that takes a
// buffer and a length and returns the number of bytes read.
template <typename INPUT>
inline bool read_data_internal(INPUT* fp, void*,
void* data, size_t length) {
return static_cast<size_t>(fp->Read(data, length)) == length;
}
// The OUTPUT type needs to support a Write() operation that takes
// a buffer and a length and returns the number of bytes written.
template <typename OUTPUT>
inline bool write_data_internal(OUTPUT* fp, void*,
const void* data, size_t length) {
return static_cast<size_t>(fp->Write(data, length)) == length;
}
// ----- low-level I/O: the public API ----
template <typename INPUT>
inline bool read_data(INPUT* fp, void* data, size_t length) {
return read_data_internal(fp, fp, data, length);
}
template <typename OUTPUT>
inline bool write_data(OUTPUT* fp, const void* data, size_t length) {
return write_data_internal(fp, fp, data, length);
}
// Uses read_data() and write_data() to read/write an integer.
// length is the number of bytes to read/write (which may differ
// from sizeof(IntType), allowing us to save on a 32-bit system
// and load on a 64-bit system). Excess bytes are taken to be 0.
// INPUT and OUTPUT must match legal inputs to read/write_data (above).
template <typename INPUT, typename IntType>
bool read_bigendian_number(INPUT* fp, IntType* value, size_t length) {
*value = 0;
unsigned char byte;
// We require IntType to be unsigned or else the shifting gets all screwy.
SPARSEHASH_COMPILE_ASSERT(static_cast<IntType>(-1) > static_cast<IntType>(0),
serializing_int_requires_an_unsigned_type);
for (size_t i = 0; i < length; ++i) {
if (!read_data(fp, &byte, sizeof(byte))) return false;
*value |= static_cast<IntType>(byte) << ((length - 1 - i) * 8);
}
return true;
}
template <typename OUTPUT, typename IntType>
bool write_bigendian_number(OUTPUT* fp, IntType value, size_t length) {
unsigned char byte;
// We require IntType to be unsigned or else the shifting gets all screwy.
SPARSEHASH_COMPILE_ASSERT(static_cast<IntType>(-1) > static_cast<IntType>(0),
serializing_int_requires_an_unsigned_type);
for (size_t i = 0; i < length; ++i) {
byte = (sizeof(value) <= length-1 - i)
? 0 : static_cast<unsigned char>((value >> ((length-1 - i) * 8)) & 255);
if (!write_data(fp, &byte, sizeof(byte))) return false;
}
return true;
}
// If your keys and values are simple enough, you can pass this
// serializer to serialize()/unserialize(). "Simple enough" means
// value_type is a POD type that contains no pointers. Note,
// however, we don't try to normalize endianness.
// This is the type used for NopointerSerializer.
template <typename value_type> struct pod_serializer {
template <typename INPUT>
bool operator()(INPUT* fp, value_type* value) const {
return read_data(fp, value, sizeof(*value));
}
template <typename OUTPUT>
bool operator()(OUTPUT* fp, const value_type& value) const {
return write_data(fp, &value, sizeof(value));
}
};
// Settings contains parameters for growing and shrinking the table.
// It also packages zero-size functor (ie. hasher).
//
// It does some munging of the hash value in cases where we think
// (fear) the original hash function might not be very good. In
// particular, the default hash of pointers is the identity hash,
// so probably all the low bits are 0. We identify when we think
// we're hashing a pointer, and chop off the low bits. Note this
// isn't perfect: even when the key is a pointer, we can't tell
// for sure that the hash is the identity hash. If it's not, this
// is needless work (and possibly, though not likely, harmful).
template<typename Key, typename HashFunc,
typename SizeType, int HT_MIN_BUCKETS>
class sh_hashtable_settings : public HashFunc {
public:
typedef Key key_type;
typedef HashFunc hasher;
typedef SizeType size_type;
public:
sh_hashtable_settings(const hasher& hf,
const float ht_occupancy_flt,
const float ht_empty_flt)
: hasher(hf),
enlarge_threshold_(0),
shrink_threshold_(0),
consider_shrink_(false),
use_empty_(false),
use_deleted_(false),
num_ht_copies_(0) {
set_enlarge_factor(ht_occupancy_flt);
set_shrink_factor(ht_empty_flt);
}
size_type hash(const key_type& v) const {
// We munge the hash value when we don't trust hasher::operator().
return hash_munger<Key>::MungedHash(hasher::operator()(v));
}
float enlarge_factor() const {
return enlarge_factor_;
}
void set_enlarge_factor(float f) {
enlarge_factor_ = f;
}
float shrink_factor() const {
return shrink_factor_;
}
void set_shrink_factor(float f) {
shrink_factor_ = f;
}
size_type enlarge_threshold() const {
return enlarge_threshold_;
}
void set_enlarge_threshold(size_type t) {
enlarge_threshold_ = t;
}
size_type shrink_threshold() const {
return shrink_threshold_;
}
void set_shrink_threshold(size_type t) {
shrink_threshold_ = t;
}
size_type enlarge_size(size_type x) const {
return static_cast<size_type>(x * enlarge_factor_);
}
size_type shrink_size(size_type x) const {
return static_cast<size_type>(x * shrink_factor_);
}
bool consider_shrink() const {
return consider_shrink_;
}
void set_consider_shrink(bool t) {
consider_shrink_ = t;
}
bool use_empty() const {
return use_empty_;
}
void set_use_empty(bool t) {
use_empty_ = t;
}
bool use_deleted() const {
return use_deleted_;
}
void set_use_deleted(bool t) {
use_deleted_ = t;
}
size_type num_ht_copies() const {
return static_cast<size_type>(num_ht_copies_);
}
void inc_num_ht_copies() {
++num_ht_copies_;
}
// Reset the enlarge and shrink thresholds
void reset_thresholds(size_type num_buckets) {
set_enlarge_threshold(enlarge_size(num_buckets));
set_shrink_threshold(shrink_size(num_buckets));
// whatever caused us to reset already considered
set_consider_shrink(false);
}
// Caller is resposible for calling reset_threshold right after
// set_resizing_parameters.
void set_resizing_parameters(float shrink, float grow) {
assert(shrink >= 0.0);
assert(grow <= 1.0);
if (shrink > grow/2.0f)
shrink = grow / 2.0f; // otherwise we thrash hashtable size
set_shrink_factor(shrink);
set_enlarge_factor(grow);
}
// This is the smallest size a hashtable can be without being too crowded
// If you like, you can give a min #buckets as well as a min #elts
size_type min_buckets(size_type num_elts, size_type min_buckets_wanted) {
float enlarge = enlarge_factor();
size_type sz = HT_MIN_BUCKETS; // min buckets allowed
while ( sz < min_buckets_wanted ||
num_elts >= static_cast<size_type>(sz * enlarge) ) {
// This just prevents overflowing size_type, since sz can exceed
// max_size() here.
if (static_cast<size_type>(sz * 2) < sz) {
throw std::length_error("resize overflow"); // protect against overflow
}
sz *= 2;
}
return sz;
}
private:
template<class HashKey> class hash_munger {
public:
static size_t MungedHash(size_t hash) {
return hash;
}
};
// This matches when the hashtable key is a pointer.
template<class HashKey> class hash_munger<HashKey*> {
public:
static size_t MungedHash(size_t hash) {
// TODO(csilvers): consider rotating instead:
// static const int shift = (sizeof(void *) == 4) ? 2 : 3;
// return (hash << (sizeof(hash) * 8) - shift)) | (hash >> shift);
// This matters if we ever change sparse/dense_hash_* to compare
// hashes before comparing actual values. It's speedy on x86.
return hash / sizeof(void*); // get rid of known-0 bits
}
};
size_type enlarge_threshold_; // table.size() * enlarge_factor
size_type shrink_threshold_; // table.size() * shrink_factor
float enlarge_factor_; // how full before resize
float shrink_factor_; // how empty before resize
// consider_shrink=true if we should try to shrink before next insert
bool consider_shrink_;
bool use_empty_; // used only by densehashtable, not sparsehashtable
bool use_deleted_; // false until delkey has been set
// num_ht_copies is a counter incremented every Copy/Move
unsigned int num_ht_copies_;
};
} // namespace sparsehash_internal
#undef SPARSEHASH_COMPILE_ASSERT
_END_GOOGLE_NAMESPACE_
#endif // UTIL_GTL_HASHTABLE_COMMON_H_

View File

@ -1,119 +0,0 @@
// Copyright (c) 2010, Google Inc.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// ---
#ifndef UTIL_GTL_LIBC_ALLOCATOR_WITH_REALLOC_H_
#define UTIL_GTL_LIBC_ALLOCATOR_WITH_REALLOC_H_
#include <sparsehash/internal/sparseconfig.h>
#include <stdlib.h> // for malloc/realloc/free
#include <stddef.h> // for ptrdiff_t
#include <new> // for placement new
_START_GOOGLE_NAMESPACE_
template<class T>
class libc_allocator_with_realloc {
public:
typedef T value_type;
typedef size_t size_type;
typedef ptrdiff_t difference_type;
typedef T* pointer;
typedef const T* const_pointer;
typedef T& reference;
typedef const T& const_reference;
libc_allocator_with_realloc() {}
libc_allocator_with_realloc(const libc_allocator_with_realloc&) {}
~libc_allocator_with_realloc() {}
pointer address(reference r) const { return &r; }
const_pointer address(const_reference r) const { return &r; }
pointer allocate(size_type n, const_pointer = 0) {
return static_cast<pointer>(malloc(n * sizeof(value_type)));
}
void deallocate(pointer p, size_type) {
free(p);
}
pointer reallocate(pointer p, size_type n) {
return static_cast<pointer>(realloc(p, n * sizeof(value_type)));
}
size_type max_size() const {
return static_cast<size_type>(-1) / sizeof(value_type);
}
void construct(pointer p, const value_type& val) {
new(p) value_type(val);
}
void destroy(pointer p) { p->~value_type(); }
template <class U>
libc_allocator_with_realloc(const libc_allocator_with_realloc<U>&) {}
template<class U>
struct rebind {
typedef libc_allocator_with_realloc<U> other;
};
};
// libc_allocator_with_realloc<void> specialization.
template<>
class libc_allocator_with_realloc<void> {
public:
typedef void value_type;
typedef size_t size_type;
typedef ptrdiff_t difference_type;
typedef void* pointer;
typedef const void* const_pointer;
template<class U>
struct rebind {
typedef libc_allocator_with_realloc<U> other;
};
};
template<class T>
inline bool operator==(const libc_allocator_with_realloc<T>&,
const libc_allocator_with_realloc<T>&) {
return true;
}
template<class T>
inline bool operator!=(const libc_allocator_with_realloc<T>&,
const libc_allocator_with_realloc<T>&) {
return false;
}
_END_GOOGLE_NAMESPACE_
#endif // UTIL_GTL_LIBC_ALLOCATOR_WITH_REALLOC_H_

View File

@ -1,46 +0,0 @@
/*
* NOTE: This file is for internal use only.
* Do not use these #defines in your own program!
*/
/* Namespace for Google classes */
#define GOOGLE_NAMESPACE ::google
/* the location of the header defining hash functions */
#define HASH_FUN_H <functional>
/* the namespace of the hash<> function */
#define HASH_NAMESPACE std
/* Define to 1 if you have the <inttypes.h> header file. */
#define HAVE_INTTYPES_H 1
/* Define to 1 if the system has the type `long long'. */
#define HAVE_LONG_LONG 1
/* Define to 1 if you have the `memcpy' function. */
#define HAVE_MEMCPY 1
/* Define to 1 if you have the <stdint.h> header file. */
#define HAVE_STDINT_H 1
/* Define to 1 if you have the <sys/types.h> header file. */
#define HAVE_SYS_TYPES_H 1
/* Define to 1 if the system has the type `uint16_t'. */
#define HAVE_UINT16_T 1
/* Define to 1 if the system has the type `u_int16_t'. */
#define HAVE_U_INT16_T 1
/* Define to 1 if the system has the type `__uint16'. */
/* #undef HAVE___UINT16 */
/* The system-provided hash function including the namespace. */
#define SPARSEHASH_HASH HASH_NAMESPACE::hash
/* Stops putting the code inside the Google namespace */
#define _END_GOOGLE_NAMESPACE_ }
/* Puts following code inside the Google namespace */
#define _START_GOOGLE_NAMESPACE_ namespace google {

File diff suppressed because it is too large Load Diff

View File

@ -1,363 +0,0 @@
// Copyright (c) 2005, Google Inc.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// ---
//
// This is just a very thin wrapper over sparsehashtable.h, just
// like sgi stl's stl_hash_map is a very thin wrapper over
// stl_hashtable. The major thing we define is operator[], because
// we have a concept of a data_type which stl_hashtable doesn't
// (it only has a key and a value).
//
// We adhere mostly to the STL semantics for hash-map. One important
// exception is that insert() may invalidate iterators entirely -- STL
// semantics are that insert() may reorder iterators, but they all
// still refer to something valid in the hashtable. Not so for us.
// Likewise, insert() may invalidate pointers into the hashtable.
// (Whether insert invalidates iterators and pointers depends on
// whether it results in a hashtable resize). On the plus side,
// delete() doesn't invalidate iterators or pointers at all, or even
// change the ordering of elements.
//
// Here are a few "power user" tips:
//
// 1) set_deleted_key():
// Unlike STL's hash_map, if you want to use erase() you
// *must* call set_deleted_key() after construction.
//
// 2) resize(0):
// When an item is deleted, its memory isn't freed right
// away. This is what allows you to iterate over a hashtable
// and call erase() without invalidating the iterator.
// To force the memory to be freed, call resize(0).
// For tr1 compatibility, this can also be called as rehash(0).
//
// 3) min_load_factor(0.0)
// Setting the minimum load factor to 0.0 guarantees that
// the hash table will never shrink.
//
// Roughly speaking:
// (1) dense_hash_map: fastest, uses the most memory unless entries are small
// (2) sparse_hash_map: slowest, uses the least memory
// (3) hash_map / unordered_map (STL): in the middle
//
// Typically I use sparse_hash_map when I care about space and/or when
// I need to save the hashtable on disk. I use hash_map otherwise. I
// don't personally use dense_hash_map ever; some people use it for
// small maps with lots of lookups.
//
// - dense_hash_map has, typically, about 78% memory overhead (if your
// data takes up X bytes, the hash_map uses .78X more bytes in overhead).
// - sparse_hash_map has about 4 bits overhead per entry.
// - sparse_hash_map can be 3-7 times slower than the others for lookup and,
// especially, inserts. See time_hash_map.cc for details.
//
// See /usr/(local/)?doc/sparsehash-*/sparse_hash_map.html
// for information about how to use this class.
#ifndef _SPARSE_HASH_MAP_H_
#define _SPARSE_HASH_MAP_H_
#include <sparsehash/internal/sparseconfig.h>
#include <algorithm> // needed by stl_alloc
#include <functional> // for equal_to<>, select1st<>, etc
#include <memory> // for alloc
#include <utility> // for pair<>
#include <sparsehash/internal/libc_allocator_with_realloc.h>
#include <sparsehash/internal/sparsehashtable.h> // IWYU pragma: export
#include HASH_FUN_H // for hash<>
_START_GOOGLE_NAMESPACE_
template <class Key, class T,
class HashFcn = SPARSEHASH_HASH<Key>, // defined in sparseconfig.h
class EqualKey = std::equal_to<Key>,
class Alloc = libc_allocator_with_realloc<std::pair<const Key, T> > >
class sparse_hash_map {
private:
// Apparently select1st is not stl-standard, so we define our own
struct SelectKey {
typedef const Key& result_type;
const Key& operator()(const std::pair<const Key, T>& p) const {
return p.first;
}
};
struct SetKey {
void operator()(std::pair<const Key, T>* value, const Key& new_key) const {
*const_cast<Key*>(&value->first) = new_key;
// It would be nice to clear the rest of value here as well, in
// case it's taking up a lot of memory. We do this by clearing
// the value. This assumes T has a zero-arg constructor!
value->second = T();
}
};
// For operator[].
struct DefaultValue {
std::pair<const Key, T> operator()(const Key& key) {
return std::make_pair(key, T());
}
};
// The actual data
typedef sparse_hashtable<std::pair<const Key, T>, Key, HashFcn, SelectKey,
SetKey, EqualKey, Alloc> ht;
ht rep;
public:
typedef typename ht::key_type key_type;
typedef T data_type;
typedef T mapped_type;
typedef typename ht::value_type value_type;
typedef typename ht::hasher hasher;
typedef typename ht::key_equal key_equal;
typedef Alloc allocator_type;
typedef typename ht::size_type size_type;
typedef typename ht::difference_type difference_type;
typedef typename ht::pointer pointer;
typedef typename ht::const_pointer const_pointer;
typedef typename ht::reference reference;
typedef typename ht::const_reference const_reference;
typedef typename ht::iterator iterator;
typedef typename ht::const_iterator const_iterator;
typedef typename ht::local_iterator local_iterator;
typedef typename ht::const_local_iterator const_local_iterator;
// Iterator functions
iterator begin() { return rep.begin(); }
iterator end() { return rep.end(); }
const_iterator begin() const { return rep.begin(); }
const_iterator end() const { return rep.end(); }
// These come from tr1's unordered_map. For us, a bucket has 0 or 1 elements.
local_iterator begin(size_type i) { return rep.begin(i); }
local_iterator end(size_type i) { return rep.end(i); }
const_local_iterator begin(size_type i) const { return rep.begin(i); }
const_local_iterator end(size_type i) const { return rep.end(i); }
// Accessor functions
allocator_type get_allocator() const { return rep.get_allocator(); }
hasher hash_funct() const { return rep.hash_funct(); }
hasher hash_function() const { return hash_funct(); }
key_equal key_eq() const { return rep.key_eq(); }
// Constructors
explicit sparse_hash_map(size_type expected_max_items_in_table = 0,
const hasher& hf = hasher(),
const key_equal& eql = key_equal(),
const allocator_type& alloc = allocator_type())
: rep(expected_max_items_in_table, hf, eql, SelectKey(), SetKey(), alloc) {
}
template <class InputIterator>
sparse_hash_map(InputIterator f, InputIterator l,
size_type expected_max_items_in_table = 0,
const hasher& hf = hasher(),
const key_equal& eql = key_equal(),
const allocator_type& alloc = allocator_type())
: rep(expected_max_items_in_table, hf, eql, SelectKey(), SetKey(), alloc) {
rep.insert(f, l);
}
// We use the default copy constructor
// We use the default operator=()
// We use the default destructor
void clear() { rep.clear(); }
void swap(sparse_hash_map& hs) { rep.swap(hs.rep); }
// Functions concerning size
size_type size() const { return rep.size(); }
size_type max_size() const { return rep.max_size(); }
bool empty() const { return rep.empty(); }
size_type bucket_count() const { return rep.bucket_count(); }
size_type max_bucket_count() const { return rep.max_bucket_count(); }
// These are tr1 methods. bucket() is the bucket the key is or would be in.
size_type bucket_size(size_type i) const { return rep.bucket_size(i); }
size_type bucket(const key_type& key) const { return rep.bucket(key); }
float load_factor() const {
return size() * 1.0f / bucket_count();
}
float max_load_factor() const {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
return grow;
}
void max_load_factor(float new_grow) {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
rep.set_resizing_parameters(shrink, new_grow);
}
// These aren't tr1 methods but perhaps ought to be.
float min_load_factor() const {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
return shrink;
}
void min_load_factor(float new_shrink) {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
rep.set_resizing_parameters(new_shrink, grow);
}
// Deprecated; use min_load_factor() or max_load_factor() instead.
void set_resizing_parameters(float shrink, float grow) {
rep.set_resizing_parameters(shrink, grow);
}
void resize(size_type hint) { rep.resize(hint); }
void rehash(size_type hint) { resize(hint); } // the tr1 name
// Lookup routines
iterator find(const key_type& key) { return rep.find(key); }
const_iterator find(const key_type& key) const { return rep.find(key); }
data_type& operator[](const key_type& key) { // This is our value-add!
// If key is in the hashtable, returns find(key)->second,
// otherwise returns insert(value_type(key, T()).first->second.
// Note it does not create an empty T unless the find fails.
return rep.template find_or_insert<DefaultValue>(key).second;
}
size_type count(const key_type& key) const { return rep.count(key); }
std::pair<iterator, iterator> equal_range(const key_type& key) {
return rep.equal_range(key);
}
std::pair<const_iterator, const_iterator> equal_range(const key_type& key)
const {
return rep.equal_range(key);
}
// Insertion routines
std::pair<iterator, bool> insert(const value_type& obj) {
return rep.insert(obj);
}
template <class InputIterator> void insert(InputIterator f, InputIterator l) {
rep.insert(f, l);
}
void insert(const_iterator f, const_iterator l) {
rep.insert(f, l);
}
// Required for std::insert_iterator; the passed-in iterator is ignored.
iterator insert(iterator, const value_type& obj) {
return insert(obj).first;
}
// Deletion routines
// THESE ARE NON-STANDARD! I make you specify an "impossible" key
// value to identify deleted buckets. You can change the key as
// time goes on, or get rid of it entirely to be insert-only.
void set_deleted_key(const key_type& key) {
rep.set_deleted_key(key);
}
void clear_deleted_key() { rep.clear_deleted_key(); }
key_type deleted_key() const { return rep.deleted_key(); }
// These are standard
size_type erase(const key_type& key) { return rep.erase(key); }
void erase(iterator it) { rep.erase(it); }
void erase(iterator f, iterator l) { rep.erase(f, l); }
// Comparison
bool operator==(const sparse_hash_map& hs) const { return rep == hs.rep; }
bool operator!=(const sparse_hash_map& hs) const { return rep != hs.rep; }
// I/O -- this is an add-on for writing metainformation to disk
//
// For maximum flexibility, this does not assume a particular
// file type (though it will probably be a FILE *). We just pass
// the fp through to rep.
// If your keys and values are simple enough, you can pass this
// serializer to serialize()/unserialize(). "Simple enough" means
// value_type is a POD type that contains no pointers. Note,
// however, we don't try to normalize endianness.
typedef typename ht::NopointerSerializer NopointerSerializer;
// serializer: a class providing operator()(OUTPUT*, const value_type&)
// (writing value_type to OUTPUT). You can specify a
// NopointerSerializer object if appropriate (see above).
// fp: either a FILE*, OR an ostream*/subclass_of_ostream*, OR a
// pointer to a class providing size_t Write(const void*, size_t),
// which writes a buffer into a stream (which fp presumably
// owns) and returns the number of bytes successfully written.
// Note basic_ostream<not_char> is not currently supported.
template <typename ValueSerializer, typename OUTPUT>
bool serialize(ValueSerializer serializer, OUTPUT* fp) {
return rep.serialize(serializer, fp);
}
// serializer: a functor providing operator()(INPUT*, value_type*)
// (reading from INPUT and into value_type). You can specify a
// NopointerSerializer object if appropriate (see above).
// fp: either a FILE*, OR an istream*/subclass_of_istream*, OR a
// pointer to a class providing size_t Read(void*, size_t),
// which reads into a buffer from a stream (which fp presumably
// owns) and returns the number of bytes successfully read.
// Note basic_istream<not_char> is not currently supported.
// NOTE: Since value_type is std::pair<const Key, T>, ValueSerializer
// may need to do a const cast in order to fill in the key.
// NOTE: if Key or T are not POD types, the serializer MUST use
// placement-new to initialize their values, rather than a normal
// equals-assignment or similar. (The value_type* passed into the
// serializer points to garbage memory.)
template <typename ValueSerializer, typename INPUT>
bool unserialize(ValueSerializer serializer, INPUT* fp) {
return rep.unserialize(serializer, fp);
}
// The four methods below are DEPRECATED.
// Use serialize() and unserialize() for new code.
template <typename OUTPUT>
bool write_metadata(OUTPUT *fp) { return rep.write_metadata(fp); }
template <typename INPUT>
bool read_metadata(INPUT *fp) { return rep.read_metadata(fp); }
template <typename OUTPUT>
bool write_nopointer_data(OUTPUT *fp) { return rep.write_nopointer_data(fp); }
template <typename INPUT>
bool read_nopointer_data(INPUT *fp) { return rep.read_nopointer_data(fp); }
};
// We need a global swap as well
template <class Key, class T, class HashFcn, class EqualKey, class Alloc>
inline void swap(sparse_hash_map<Key, T, HashFcn, EqualKey, Alloc>& hm1,
sparse_hash_map<Key, T, HashFcn, EqualKey, Alloc>& hm2) {
hm1.swap(hm2);
}
_END_GOOGLE_NAMESPACE_
#endif /* _SPARSE_HASH_MAP_H_ */

View File

@ -1,338 +0,0 @@
// Copyright (c) 2005, Google Inc.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// ---
//
// This is just a very thin wrapper over sparsehashtable.h, just
// like sgi stl's stl_hash_set is a very thin wrapper over
// stl_hashtable. The major thing we define is operator[], because
// we have a concept of a data_type which stl_hashtable doesn't
// (it only has a key and a value).
//
// This is more different from sparse_hash_map than you might think,
// because all iterators for sets are const (you obviously can't
// change the key, and for sets there is no value).
//
// We adhere mostly to the STL semantics for hash-map. One important
// exception is that insert() may invalidate iterators entirely -- STL
// semantics are that insert() may reorder iterators, but they all
// still refer to something valid in the hashtable. Not so for us.
// Likewise, insert() may invalidate pointers into the hashtable.
// (Whether insert invalidates iterators and pointers depends on
// whether it results in a hashtable resize). On the plus side,
// delete() doesn't invalidate iterators or pointers at all, or even
// change the ordering of elements.
//
// Here are a few "power user" tips:
//
// 1) set_deleted_key():
// Unlike STL's hash_map, if you want to use erase() you
// *must* call set_deleted_key() after construction.
//
// 2) resize(0):
// When an item is deleted, its memory isn't freed right
// away. This allows you to iterate over a hashtable,
// and call erase(), without invalidating the iterator.
// To force the memory to be freed, call resize(0).
// For tr1 compatibility, this can also be called as rehash(0).
//
// 3) min_load_factor(0.0)
// Setting the minimum load factor to 0.0 guarantees that
// the hash table will never shrink.
//
// Roughly speaking:
// (1) dense_hash_set: fastest, uses the most memory unless entries are small
// (2) sparse_hash_set: slowest, uses the least memory
// (3) hash_set / unordered_set (STL): in the middle
//
// Typically I use sparse_hash_set when I care about space and/or when
// I need to save the hashtable on disk. I use hash_set otherwise. I
// don't personally use dense_hash_set ever; some people use it for
// small sets with lots of lookups.
//
// - dense_hash_set has, typically, about 78% memory overhead (if your
// data takes up X bytes, the hash_set uses .78X more bytes in overhead).
// - sparse_hash_set has about 4 bits overhead per entry.
// - sparse_hash_set can be 3-7 times slower than the others for lookup and,
// especially, inserts. See time_hash_map.cc for details.
//
// See /usr/(local/)?doc/sparsehash-*/sparse_hash_set.html
// for information about how to use this class.
#ifndef _SPARSE_HASH_SET_H_
#define _SPARSE_HASH_SET_H_
#include <sparsehash/internal/sparseconfig.h>
#include <algorithm> // needed by stl_alloc
#include <functional> // for equal_to<>
#include <memory> // for alloc (which we don't use)
#include <utility> // for pair<>
#include <sparsehash/internal/libc_allocator_with_realloc.h>
#include <sparsehash/internal/sparsehashtable.h> // IWYU pragma: export
#include HASH_FUN_H // for hash<>
_START_GOOGLE_NAMESPACE_
template <class Value,
class HashFcn = SPARSEHASH_HASH<Value>, // defined in sparseconfig.h
class EqualKey = std::equal_to<Value>,
class Alloc = libc_allocator_with_realloc<Value> >
class sparse_hash_set {
private:
// Apparently identity is not stl-standard, so we define our own
struct Identity {
typedef const Value& result_type;
const Value& operator()(const Value& v) const { return v; }
};
struct SetKey {
void operator()(Value* value, const Value& new_key) const {
*value = new_key;
}
};
typedef sparse_hashtable<Value, Value, HashFcn, Identity, SetKey,
EqualKey, Alloc> ht;
ht rep;
public:
typedef typename ht::key_type key_type;
typedef typename ht::value_type value_type;
typedef typename ht::hasher hasher;
typedef typename ht::key_equal key_equal;
typedef Alloc allocator_type;
typedef typename ht::size_type size_type;
typedef typename ht::difference_type difference_type;
typedef typename ht::const_pointer pointer;
typedef typename ht::const_pointer const_pointer;
typedef typename ht::const_reference reference;
typedef typename ht::const_reference const_reference;
typedef typename ht::const_iterator iterator;
typedef typename ht::const_iterator const_iterator;
typedef typename ht::const_local_iterator local_iterator;
typedef typename ht::const_local_iterator const_local_iterator;
// Iterator functions -- recall all iterators are const
iterator begin() const { return rep.begin(); }
iterator end() const { return rep.end(); }
// These come from tr1's unordered_set. For us, a bucket has 0 or 1 elements.
local_iterator begin(size_type i) const { return rep.begin(i); }
local_iterator end(size_type i) const { return rep.end(i); }
// Accessor functions
allocator_type get_allocator() const { return rep.get_allocator(); }
hasher hash_funct() const { return rep.hash_funct(); }
hasher hash_function() const { return hash_funct(); } // tr1 name
key_equal key_eq() const { return rep.key_eq(); }
// Constructors
explicit sparse_hash_set(size_type expected_max_items_in_table = 0,
const hasher& hf = hasher(),
const key_equal& eql = key_equal(),
const allocator_type& alloc = allocator_type())
: rep(expected_max_items_in_table, hf, eql, Identity(), SetKey(), alloc) {
}
template <class InputIterator>
sparse_hash_set(InputIterator f, InputIterator l,
size_type expected_max_items_in_table = 0,
const hasher& hf = hasher(),
const key_equal& eql = key_equal(),
const allocator_type& alloc = allocator_type())
: rep(expected_max_items_in_table, hf, eql, Identity(), SetKey(), alloc) {
rep.insert(f, l);
}
// We use the default copy constructor
// We use the default operator=()
// We use the default destructor
void clear() { rep.clear(); }
void swap(sparse_hash_set& hs) { rep.swap(hs.rep); }
// Functions concerning size
size_type size() const { return rep.size(); }
size_type max_size() const { return rep.max_size(); }
bool empty() const { return rep.empty(); }
size_type bucket_count() const { return rep.bucket_count(); }
size_type max_bucket_count() const { return rep.max_bucket_count(); }
// These are tr1 methods. bucket() is the bucket the key is or would be in.
size_type bucket_size(size_type i) const { return rep.bucket_size(i); }
size_type bucket(const key_type& key) const { return rep.bucket(key); }
float load_factor() const {
return size() * 1.0f / bucket_count();
}
float max_load_factor() const {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
return grow;
}
void max_load_factor(float new_grow) {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
rep.set_resizing_parameters(shrink, new_grow);
}
// These aren't tr1 methods but perhaps ought to be.
float min_load_factor() const {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
return shrink;
}
void min_load_factor(float new_shrink) {
float shrink, grow;
rep.get_resizing_parameters(&shrink, &grow);
rep.set_resizing_parameters(new_shrink, grow);
}
// Deprecated; use min_load_factor() or max_load_factor() instead.
void set_resizing_parameters(float shrink, float grow) {
rep.set_resizing_parameters(shrink, grow);
}
void resize(size_type hint) { rep.resize(hint); }
void rehash(size_type hint) { resize(hint); } // the tr1 name
// Lookup routines
iterator find(const key_type& key) const { return rep.find(key); }
size_type count(const key_type& key) const { return rep.count(key); }
std::pair<iterator, iterator> equal_range(const key_type& key) const {
return rep.equal_range(key);
}
// Insertion routines
std::pair<iterator, bool> insert(const value_type& obj) {
std::pair<typename ht::iterator, bool> p = rep.insert(obj);
return std::pair<iterator, bool>(p.first, p.second); // const to non-const
}
template <class InputIterator> void insert(InputIterator f, InputIterator l) {
rep.insert(f, l);
}
void insert(const_iterator f, const_iterator l) {
rep.insert(f, l);
}
// Required for std::insert_iterator; the passed-in iterator is ignored.
iterator insert(iterator, const value_type& obj) {
return insert(obj).first;
}
// Deletion routines
// THESE ARE NON-STANDARD! I make you specify an "impossible" key
// value to identify deleted buckets. You can change the key as
// time goes on, or get rid of it entirely to be insert-only.
void set_deleted_key(const key_type& key) { rep.set_deleted_key(key); }
void clear_deleted_key() { rep.clear_deleted_key(); }
key_type deleted_key() const { return rep.deleted_key(); }
// These are standard
size_type erase(const key_type& key) { return rep.erase(key); }
void erase(iterator it) { rep.erase(it); }
void erase(iterator f, iterator l) { rep.erase(f, l); }
// Comparison
bool operator==(const sparse_hash_set& hs) const { return rep == hs.rep; }
bool operator!=(const sparse_hash_set& hs) const { return rep != hs.rep; }
// I/O -- this is an add-on for writing metainformation to disk
//
// For maximum flexibility, this does not assume a particular
// file type (though it will probably be a FILE *). We just pass
// the fp through to rep.
// If your keys and values are simple enough, you can pass this
// serializer to serialize()/unserialize(). "Simple enough" means
// value_type is a POD type that contains no pointers. Note,
// however, we don't try to normalize endianness.
typedef typename ht::NopointerSerializer NopointerSerializer;
// serializer: a class providing operator()(OUTPUT*, const value_type&)
// (writing value_type to OUTPUT). You can specify a
// NopointerSerializer object if appropriate (see above).
// fp: either a FILE*, OR an ostream*/subclass_of_ostream*, OR a
// pointer to a class providing size_t Write(const void*, size_t),
// which writes a buffer into a stream (which fp presumably
// owns) and returns the number of bytes successfully written.
// Note basic_ostream<not_char> is not currently supported.
template <typename ValueSerializer, typename OUTPUT>
bool serialize(ValueSerializer serializer, OUTPUT* fp) {
return rep.serialize(serializer, fp);
}
// serializer: a functor providing operator()(INPUT*, value_type*)
// (reading from INPUT and into value_type). You can specify a
// NopointerSerializer object if appropriate (see above).
// fp: either a FILE*, OR an istream*/subclass_of_istream*, OR a
// pointer to a class providing size_t Read(void*, size_t),
// which reads into a buffer from a stream (which fp presumably
// owns) and returns the number of bytes successfully read.
// Note basic_istream<not_char> is not currently supported.
// NOTE: Since value_type is const Key, ValueSerializer
// may need to do a const cast in order to fill in the key.
// NOTE: if Key is not a POD type, the serializer MUST use
// placement-new to initialize its value, rather than a normal
// equals-assignment or similar. (The value_type* passed into
// the serializer points to garbage memory.)
template <typename ValueSerializer, typename INPUT>
bool unserialize(ValueSerializer serializer, INPUT* fp) {
return rep.unserialize(serializer, fp);
}
// The four methods below are DEPRECATED.
// Use serialize() and unserialize() for new code.
template <typename OUTPUT>
bool write_metadata(OUTPUT *fp) { return rep.write_metadata(fp); }
template <typename INPUT>
bool read_metadata(INPUT *fp) { return rep.read_metadata(fp); }
template <typename OUTPUT>
bool write_nopointer_data(OUTPUT *fp) { return rep.write_nopointer_data(fp); }
template <typename INPUT>
bool read_nopointer_data(INPUT *fp) { return rep.read_nopointer_data(fp); }
};
template <class Val, class HashFcn, class EqualKey, class Alloc>
inline void swap(sparse_hash_set<Val, HashFcn, EqualKey, Alloc>& hs1,
sparse_hash_set<Val, HashFcn, EqualKey, Alloc>& hs2) {
hs1.swap(hs2);
}
_END_GOOGLE_NAMESPACE_
#endif /* _SPARSE_HASH_SET_H_ */

File diff suppressed because it is too large Load Diff

View File

@ -1,134 +0,0 @@
// Copyright 2005 Google Inc.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// ----
//
// Template metaprogramming utility functions.
//
// This code is compiled directly on many platforms, including client
// platforms like Windows, Mac, and embedded systems. Before making
// any changes here, make sure that you're not breaking any platforms.
//
//
// The names choosen here reflect those used in tr1 and the boost::mpl
// library, there are similar operations used in the Loki library as
// well. I prefer the boost names for 2 reasons:
// 1. I think that portions of the Boost libraries are more likely to
// be included in the c++ standard.
// 2. It is not impossible that some of the boost libraries will be
// included in our own build in the future.
// Both of these outcomes means that we may be able to directly replace
// some of these with boost equivalents.
//
#ifndef BASE_TEMPLATE_UTIL_H_
#define BASE_TEMPLATE_UTIL_H_
#include <sparsehash/internal/sparseconfig.h>
_START_GOOGLE_NAMESPACE_
// Types small_ and big_ are guaranteed such that sizeof(small_) <
// sizeof(big_)
typedef char small_;
struct big_ {
char dummy[2];
};
// Identity metafunction.
template <class T>
struct identity_ {
typedef T type;
};
// integral_constant, defined in tr1, is a wrapper for an integer
// value. We don't really need this generality; we could get away
// with hardcoding the integer type to bool. We use the fully
// general integer_constant for compatibility with tr1.
template<class T, T v>
struct integral_constant {
static const T value = v;
typedef T value_type;
typedef integral_constant<T, v> type;
};
template <class T, T v> const T integral_constant<T, v>::value;
// Abbreviations: true_type and false_type are structs that represent boolean
// true and false values. Also define the boost::mpl versions of those names,
// true_ and false_.
typedef integral_constant<bool, true> true_type;
typedef integral_constant<bool, false> false_type;
typedef true_type true_;
typedef false_type false_;
// if_ is a templatized conditional statement.
// if_<cond, A, B> is a compile time evaluation of cond.
// if_<>::type contains A if cond is true, B otherwise.
template<bool cond, typename A, typename B>
struct if_{
typedef A type;
};
template<typename A, typename B>
struct if_<false, A, B> {
typedef B type;
};
// type_equals_ is a template type comparator, similar to Loki IsSameType.
// type_equals_<A, B>::value is true iff "A" is the same type as "B".
//
// New code should prefer base::is_same, defined in base/type_traits.h.
// It is functionally identical, but is_same is the standard spelling.
template<typename A, typename B>
struct type_equals_ : public false_ {
};
template<typename A>
struct type_equals_<A, A> : public true_ {
};
// and_ is a template && operator.
// and_<A, B>::value evaluates "A::value && B::value".
template<typename A, typename B>
struct and_ : public integral_constant<bool, (A::value && B::value)> {
};
// or_ is a template || operator.
// or_<A, B>::value evaluates "A::value || B::value".
template<typename A, typename B>
struct or_ : public integral_constant<bool, (A::value || B::value)> {
};
_END_GOOGLE_NAMESPACE_
#endif // BASE_TEMPLATE_UTIL_H_

View File

@ -1,342 +0,0 @@
// Copyright (c) 2006, Google Inc.
// All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
// * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// ----
//
// This code is compiled directly on many platforms, including client
// platforms like Windows, Mac, and embedded systems. Before making
// any changes here, make sure that you're not breaking any platforms.
//
// Define a small subset of tr1 type traits. The traits we define are:
// is_integral
// is_floating_point
// is_pointer
// is_enum
// is_reference
// is_pod
// has_trivial_constructor
// has_trivial_copy
// has_trivial_assign
// has_trivial_destructor
// remove_const
// remove_volatile
// remove_cv
// remove_reference
// add_reference
// remove_pointer
// is_same
// is_convertible
// We can add more type traits as required.
#ifndef BASE_TYPE_TRAITS_H_
#define BASE_TYPE_TRAITS_H_
#include <sparsehash/internal/sparseconfig.h>
#include <utility> // For pair
#include <sparsehash/template_util.h> // For true_type and false_type
_START_GOOGLE_NAMESPACE_
template <class T> struct is_integral;
template <class T> struct is_floating_point;
template <class T> struct is_pointer;
// MSVC can't compile this correctly, and neither can gcc 3.3.5 (at least)
#if !defined(_MSC_VER) && !(defined(__GNUC__) && __GNUC__ <= 3)
// is_enum uses is_convertible, which is not available on MSVC.
template <class T> struct is_enum;
#endif
template <class T> struct is_reference;
template <class T> struct is_pod;
template <class T> struct has_trivial_constructor;
template <class T> struct has_trivial_copy;
template <class T> struct has_trivial_assign;
template <class T> struct has_trivial_destructor;
template <class T> struct remove_const;
template <class T> struct remove_volatile;
template <class T> struct remove_cv;
template <class T> struct remove_reference;
template <class T> struct add_reference;
template <class T> struct remove_pointer;
template <class T, class U> struct is_same;
#if !defined(_MSC_VER) && !(defined(__GNUC__) && __GNUC__ <= 3)
template <class From, class To> struct is_convertible;
#endif
// is_integral is false except for the built-in integer types. A
// cv-qualified type is integral if and only if the underlying type is.
template <class T> struct is_integral : false_type { };
template<> struct is_integral<bool> : true_type { };
template<> struct is_integral<char> : true_type { };
template<> struct is_integral<unsigned char> : true_type { };
template<> struct is_integral<signed char> : true_type { };
#if defined(_MSC_VER)
// wchar_t is not by default a distinct type from unsigned short in
// Microsoft C.
// See http://msdn2.microsoft.com/en-us/library/dh8che7s(VS.80).aspx
template<> struct is_integral<__wchar_t> : true_type { };
#else
template<> struct is_integral<wchar_t> : true_type { };
#endif
template<> struct is_integral<short> : true_type { };
template<> struct is_integral<unsigned short> : true_type { };
template<> struct is_integral<int> : true_type { };
template<> struct is_integral<unsigned int> : true_type { };
template<> struct is_integral<long> : true_type { };
template<> struct is_integral<unsigned long> : true_type { };
#ifdef HAVE_LONG_LONG
template<> struct is_integral<long long> : true_type { };
template<> struct is_integral<unsigned long long> : true_type { };
#endif
template <class T> struct is_integral<const T> : is_integral<T> { };
template <class T> struct is_integral<volatile T> : is_integral<T> { };
template <class T> struct is_integral<const volatile T> : is_integral<T> { };
// is_floating_point is false except for the built-in floating-point types.
// A cv-qualified type is integral if and only if the underlying type is.
template <class T> struct is_floating_point : false_type { };
template<> struct is_floating_point<float> : true_type { };
template<> struct is_floating_point<double> : true_type { };
template<> struct is_floating_point<long double> : true_type { };
template <class T> struct is_floating_point<const T>
: is_floating_point<T> { };
template <class T> struct is_floating_point<volatile T>
: is_floating_point<T> { };
template <class T> struct is_floating_point<const volatile T>
: is_floating_point<T> { };
// is_pointer is false except for pointer types. A cv-qualified type (e.g.
// "int* const", as opposed to "int const*") is cv-qualified if and only if
// the underlying type is.
template <class T> struct is_pointer : false_type { };
template <class T> struct is_pointer<T*> : true_type { };
template <class T> struct is_pointer<const T> : is_pointer<T> { };
template <class T> struct is_pointer<volatile T> : is_pointer<T> { };
template <class T> struct is_pointer<const volatile T> : is_pointer<T> { };
#if !defined(_MSC_VER) && !(defined(__GNUC__) && __GNUC__ <= 3)
namespace internal {
template <class T> struct is_class_or_union {
template <class U> static small_ tester(void (U::*)());
template <class U> static big_ tester(...);
static const bool value = sizeof(tester<T>(0)) == sizeof(small_);
};
// is_convertible chokes if the first argument is an array. That's why
// we use add_reference here.
template <bool NotUnum, class T> struct is_enum_impl
: is_convertible<typename add_reference<T>::type, int> { };
template <class T> struct is_enum_impl<true, T> : false_type { };
} // namespace internal
// Specified by TR1 [4.5.1] primary type categories.
// Implementation note:
//
// Each type is either void, integral, floating point, array, pointer,
// reference, member object pointer, member function pointer, enum,
// union or class. Out of these, only integral, floating point, reference,
// class and enum types are potentially convertible to int. Therefore,
// if a type is not a reference, integral, floating point or class and
// is convertible to int, it's a enum. Adding cv-qualification to a type
// does not change whether it's an enum.
//
// Is-convertible-to-int check is done only if all other checks pass,
// because it can't be used with some types (e.g. void or classes with
// inaccessible conversion operators).
template <class T> struct is_enum
: internal::is_enum_impl<
is_same<T, void>::value ||
is_integral<T>::value ||
is_floating_point<T>::value ||
is_reference<T>::value ||
internal::is_class_or_union<T>::value,
T> { };
template <class T> struct is_enum<const T> : is_enum<T> { };
template <class T> struct is_enum<volatile T> : is_enum<T> { };
template <class T> struct is_enum<const volatile T> : is_enum<T> { };
#endif
// is_reference is false except for reference types.
template<typename T> struct is_reference : false_type {};
template<typename T> struct is_reference<T&> : true_type {};
// We can't get is_pod right without compiler help, so fail conservatively.
// We will assume it's false except for arithmetic types, enumerations,
// pointers and cv-qualified versions thereof. Note that std::pair<T,U>
// is not a POD even if T and U are PODs.
template <class T> struct is_pod
: integral_constant<bool, (is_integral<T>::value ||
is_floating_point<T>::value ||
#if !defined(_MSC_VER) && !(defined(__GNUC__) && __GNUC__ <= 3)
// is_enum is not available on MSVC.
is_enum<T>::value ||
#endif
is_pointer<T>::value)> { };
template <class T> struct is_pod<const T> : is_pod<T> { };
template <class T> struct is_pod<volatile T> : is_pod<T> { };
template <class T> struct is_pod<const volatile T> : is_pod<T> { };
// We can't get has_trivial_constructor right without compiler help, so
// fail conservatively. We will assume it's false except for: (1) types
// for which is_pod is true. (2) std::pair of types with trivial
// constructors. (3) array of a type with a trivial constructor.
// (4) const versions thereof.
template <class T> struct has_trivial_constructor : is_pod<T> { };
template <class T, class U> struct has_trivial_constructor<std::pair<T, U> >
: integral_constant<bool,
(has_trivial_constructor<T>::value &&
has_trivial_constructor<U>::value)> { };
template <class A, int N> struct has_trivial_constructor<A[N]>
: has_trivial_constructor<A> { };
template <class T> struct has_trivial_constructor<const T>
: has_trivial_constructor<T> { };
// We can't get has_trivial_copy right without compiler help, so fail
// conservatively. We will assume it's false except for: (1) types
// for which is_pod is true. (2) std::pair of types with trivial copy
// constructors. (3) array of a type with a trivial copy constructor.
// (4) const versions thereof.
template <class T> struct has_trivial_copy : is_pod<T> { };
template <class T, class U> struct has_trivial_copy<std::pair<T, U> >
: integral_constant<bool,
(has_trivial_copy<T>::value &&
has_trivial_copy<U>::value)> { };
template <class A, int N> struct has_trivial_copy<A[N]>
: has_trivial_copy<A> { };
template <class T> struct has_trivial_copy<const T> : has_trivial_copy<T> { };
// We can't get has_trivial_assign right without compiler help, so fail
// conservatively. We will assume it's false except for: (1) types
// for which is_pod is true. (2) std::pair of types with trivial copy
// constructors. (3) array of a type with a trivial assign constructor.
template <class T> struct has_trivial_assign : is_pod<T> { };
template <class T, class U> struct has_trivial_assign<std::pair<T, U> >
: integral_constant<bool,
(has_trivial_assign<T>::value &&
has_trivial_assign<U>::value)> { };
template <class A, int N> struct has_trivial_assign<A[N]>
: has_trivial_assign<A> { };
// We can't get has_trivial_destructor right without compiler help, so
// fail conservatively. We will assume it's false except for: (1) types
// for which is_pod is true. (2) std::pair of types with trivial
// destructors. (3) array of a type with a trivial destructor.
// (4) const versions thereof.
template <class T> struct has_trivial_destructor : is_pod<T> { };
template <class T, class U> struct has_trivial_destructor<std::pair<T, U> >
: integral_constant<bool,
(has_trivial_destructor<T>::value &&
has_trivial_destructor<U>::value)> { };
template <class A, int N> struct has_trivial_destructor<A[N]>
: has_trivial_destructor<A> { };
template <class T> struct has_trivial_destructor<const T>
: has_trivial_destructor<T> { };
// Specified by TR1 [4.7.1]
template<typename T> struct remove_const { typedef T type; };
template<typename T> struct remove_const<T const> { typedef T type; };
template<typename T> struct remove_volatile { typedef T type; };
template<typename T> struct remove_volatile<T volatile> { typedef T type; };
template<typename T> struct remove_cv {
typedef typename remove_const<typename remove_volatile<T>::type>::type type;
};
// Specified by TR1 [4.7.2] Reference modifications.
template<typename T> struct remove_reference { typedef T type; };
template<typename T> struct remove_reference<T&> { typedef T type; };
template <typename T> struct add_reference { typedef T& type; };
template <typename T> struct add_reference<T&> { typedef T& type; };
// Specified by TR1 [4.7.4] Pointer modifications.
template<typename T> struct remove_pointer { typedef T type; };
template<typename T> struct remove_pointer<T*> { typedef T type; };
template<typename T> struct remove_pointer<T* const> { typedef T type; };
template<typename T> struct remove_pointer<T* volatile> { typedef T type; };
template<typename T> struct remove_pointer<T* const volatile> {
typedef T type; };
// Specified by TR1 [4.6] Relationships between types
template<typename T, typename U> struct is_same : public false_type { };
template<typename T> struct is_same<T, T> : public true_type { };
// Specified by TR1 [4.6] Relationships between types
#if !defined(_MSC_VER) && !(defined(__GNUC__) && __GNUC__ <= 3)
namespace internal {
// This class is an implementation detail for is_convertible, and you
// don't need to know how it works to use is_convertible. For those
// who care: we declare two different functions, one whose argument is
// of type To and one with a variadic argument list. We give them
// return types of different size, so we can use sizeof to trick the
// compiler into telling us which function it would have chosen if we
// had called it with an argument of type From. See Alexandrescu's
// _Modern C++ Design_ for more details on this sort of trick.
template <typename From, typename To>
struct ConvertHelper {
static small_ Test(To);
static big_ Test(...);
static From Create();
};
} // namespace internal
// Inherits from true_type if From is convertible to To, false_type otherwise.
template <typename From, typename To>
struct is_convertible
: integral_constant<bool,
sizeof(internal::ConvertHelper<From, To>::Test(
internal::ConvertHelper<From, To>::Create()))
== sizeof(small_)> {
};
#endif
_END_GOOGLE_NAMESPACE_
// Right now these macros are no-ops, and mostly just document the fact
// these types are PODs, for human use. They may be made more contentful
// later. The typedef is just to make it legal to put a semicolon after
// these macros.
#define DECLARE_POD(TypeName) typedef int Dummy_Type_For_DECLARE_POD
#define DECLARE_NESTED_POD(TypeName) DECLARE_POD(TypeName)
#define PROPAGATE_POD_FROM_TEMPLATE_ARGUMENT(TemplateName) \
typedef int Dummy_Type_For_PROPAGATE_POD_FROM_TEMPLATE_ARGUMENT
#define ENFORCE_POD(TypeName) typedef int Dummy_Type_For_ENFORCE_POD
#endif // BASE_TYPE_TRAITS_H_

1
contrib/sparsehash-c11 vendored Submodule

@ -0,0 +1 @@
Subproject commit cf0bffaa456f23bc4174462a789b90f8b6f5f42f

View File

@ -176,6 +176,7 @@ add_object_library(clickhouse_processors_formats_impl src/Processors/Formats/Imp
add_object_library(clickhouse_processors_transforms src/Processors/Transforms)
add_object_library(clickhouse_processors_sources src/Processors/Sources)
if (MAKE_STATIC_LIBRARIES OR NOT SPLIT_SHARED_LIBRARIES)
add_library (dbms STATIC ${dbms_headers} ${dbms_sources})
set (all_modules dbms)
@ -394,7 +395,7 @@ if (OPENSSL_CRYPTO_LIBRARY)
endif ()
dbms_target_include_directories (SYSTEM BEFORE PRIVATE ${DIVIDE_INCLUDE_DIR})
dbms_target_include_directories (SYSTEM BEFORE PRIVATE ${SPARCEHASH_INCLUDE_DIR})
dbms_target_include_directories (SYSTEM BEFORE PRIVATE ${SPARSEHASH_INCLUDE_DIR})
if (USE_PROTOBUF)
dbms_target_link_libraries (PRIVATE ${Protobuf_LIBRARY})
@ -414,6 +415,16 @@ endif()
if (USE_JEMALLOC)
dbms_target_include_directories (SYSTEM BEFORE PRIVATE ${JEMALLOC_INCLUDE_DIR}) # used in Interpreters/AsynchronousMetrics.cpp
target_include_directories (clickhouse_common_io SYSTEM BEFORE PRIVATE ${JEMALLOC_INCLUDE_DIR}) # new_delete.cpp
# common/memory.h
if (MAKE_STATIC_LIBRARIES OR NOT SPLIT_SHARED_LIBRARIES)
# skip if we have bundled build, since jemalloc is static in this case
elseif (${JEMALLOC_LIBRARIES} MATCHES "${CMAKE_STATIC_LIBRARY_SUFFIX}$")
# if the library is static we do not need to link with it,
# since in this case it will be in libs/libcommon,
# and we do not want to link with jemalloc multiple times.
else()
target_link_libraries(clickhouse_common_io PRIVATE ${JEMALLOC_LIBRARIES})
endif()
endif ()
dbms_target_include_directories (PUBLIC ${DBMS_INCLUDE_DIR} PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/src/Formats/include)

View File

@ -205,6 +205,9 @@ else ()
add_custom_target (clickhouse-bundle ALL DEPENDS ${CLICKHOUSE_BUNDLE})
if (USE_GDB_ADD_INDEX)
add_custom_command(TARGET clickhouse POST_BUILD COMMAND ${GDB_ADD_INDEX_EXE} clickhouse COMMENT "Adding .gdb-index to clickhouse" VERBATIM)
endif()
endif ()
if (TARGET clickhouse-server AND TARGET copy-headers)

View File

@ -4,7 +4,11 @@ set(CLICKHOUSE_CLIENT_SOURCES
)
set(CLICKHOUSE_CLIENT_LINK PRIVATE clickhouse_common_config clickhouse_functions clickhouse_aggregate_functions clickhouse_common_io clickhouse_parsers string_utils ${LINE_EDITING_LIBS} ${Boost_PROGRAM_OPTIONS_LIBRARY})
set(CLICKHOUSE_CLIENT_INCLUDE SYSTEM PRIVATE ${READLINE_INCLUDE_DIR} PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/include)
set(CLICKHOUSE_CLIENT_INCLUDE PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/include)
if (READLINE_INCLUDE_DIR)
set(CLICKHOUSE_CLIENT_INCLUDE ${CLICKHOUSE_CLIENT_INCLUDE} SYSTEM PRIVATE ${READLINE_INCLUDE_DIR})
endif ()
include(CheckSymbolExists)
check_symbol_exists(readpassphrase readpassphrase.h HAVE_READPASSPHRASE)

View File

@ -42,6 +42,10 @@ set_target_properties(clickhouse-odbc-bridge PROPERTIES RUNTIME_OUTPUT_DIRECTORY
clickhouse_program_link_split_binary(odbc-bridge)
if (USE_GDB_ADD_INDEX)
add_custom_command(TARGET clickhouse-odbc-bridge POST_BUILD COMMAND ${GDB_ADD_INDEX_EXE} ../clickhouse-odbc-bridge COMMENT "Adding .gdb-index to clickhouse-odbc-bridge" VERBATIM)
endif()
install(TARGETS clickhouse-odbc-bridge RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR} COMPONENT clickhouse)
if(ENABLE_TESTS)

View File

@ -31,9 +31,10 @@ void StopConditionsSet::loadFromConfig(const ConfigurationPtr & stop_conditions_
else if (key == "average_speed_not_changing_for_ms")
average_speed_not_changing_for_ms.value = stop_conditions_view->getUInt64(key);
else
throw Exception("Met unkown stop condition: " + key, ErrorCodes::LOGICAL_ERROR);
}
throw Exception("Met unknown stop condition: " + key, ErrorCodes::LOGICAL_ERROR);
++initialized_count;
}
}
void StopConditionsSet::reset()

View File

@ -495,6 +495,48 @@ public:
return count;
}
UInt64 rb_min() const
{
UInt64 min_val = UINT32_MAX;
if (isSmall())
{
for (const auto & x : small)
{
T val = x.getValue();
if (UInt64(val) < min_val)
{
min_val = UInt64(val);
}
}
}
else
{
min_val = UInt64(roaring_bitmap_minimum(rb));
}
return min_val;
}
UInt64 rb_max() const
{
UInt64 max_val = 0;
if (isSmall())
{
for (const auto & x : small)
{
T val = x.getValue();
if (UInt64(val) > max_val)
{
max_val = UInt64(val);
}
}
}
else
{
max_val = UInt64(roaring_bitmap_maximum(rb));
}
return max_val;
}
private:
/// To read and write the DB Buffer directly, migrate code from CRoaring
void db_roaring_bitmap_add_many(DB::ReadBuffer & dbBuf, roaring_bitmap_t * r, size_t n_args)

View File

@ -156,6 +156,18 @@ void ColumnNullable::insertFrom(const IColumn & src, size_t n)
getNullMapData().push_back(src_concrete.getNullMapData()[n]);
}
void ColumnNullable::insertFromNotNullable(const IColumn & src, size_t n)
{
getNestedColumn().insertFrom(src, n);
getNullMapData().push_back(0);
}
void ColumnNullable::insertRangeFromNotNullable(const IColumn & src, size_t start, size_t length)
{
getNestedColumn().insertRangeFrom(src, start, length);
getNullMapData().resize_fill(getNullMapData().size() + length, 0);
}
void ColumnNullable::popBack(size_t n)
{
getNestedColumn().popBack(n);

View File

@ -61,6 +61,9 @@ public:
void insert(const Field & x) override;
void insertFrom(const IColumn & src, size_t n) override;
void insertFromNotNullable(const IColumn & src, size_t n);
void insertRangeFromNotNullable(const IColumn & src, size_t start, size_t length);
void insertDefault() override
{
getNestedColumn().insertDefault();

View File

@ -178,8 +178,13 @@ protected:
// hash tables, it makes sense to pre-fault the pages by passing
// MAP_POPULATE to mmap(). This takes some time, but should be faster
// overall than having a hot loop interrupted by page faults.
// It is only supported on Linux.
#if defined(__linux__)
static constexpr int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS
| (mmap_populate ? MAP_POPULATE : 0);
#else
static constexpr int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS;
#endif
private:
void * allocNoTrack(size_t size, size_t alignment)

View File

@ -97,13 +97,23 @@ private:
size_t size_after_grow = 0;
if (head->size() < linear_growth_threshold)
size_after_grow = head->size() * growth_factor;
{
size_after_grow = std::max(min_next_size, head->size() * growth_factor);
}
else
size_after_grow = linear_growth_threshold;
if (size_after_grow < min_next_size)
size_after_grow = min_next_size;
{
// allocContinue() combined with linear growth results in quadratic
// behavior: we append the data by small amounts, and when it
// doesn't fit, we create a new chunk and copy all the previous data
// into it. The number of times we do this is directly proportional
// to the total size of data that is going to be serialized. To make
// the copying happen less often, round the next size up to the
// linear_growth_threshold.
size_after_grow = ((min_next_size + linear_growth_threshold - 1)
/ linear_growth_threshold) * linear_growth_threshold;
}
assert(size_after_grow >= min_next_size);
return roundUpToPageSize(size_after_grow);
}
@ -180,65 +190,68 @@ public:
return head->pos;
}
/** Begin or expand allocation of contiguous piece of memory without alignment.
* 'begin' - current begin of piece of memory, if it need to be expanded, or nullptr, if it need to be started.
* If there is no space in chunk to expand current piece of memory - then copy all piece to new chunk and change value of 'begin'.
* NOTE This method is usable only for latest allocation. For earlier allocations, see 'realloc' method.
/** Begin or expand a contiguous range of memory.
* 'range_start' is the start of range. If nullptr, a new range is
* allocated.
* If there is no space in the current chunk to expand the range,
* the entire range is copied to a new, bigger memory chunk, and the value
* of 'range_start' is updated.
* If the optional 'start_alignment' is specified, the start of range is
* kept aligned to this value.
*
* NOTE This method is usable only for the last allocation made on this
* Arena. For earlier allocations, see 'realloc' method.
*/
char * allocContinue(size_t size, char const *& begin)
char * allocContinue(size_t additional_bytes, char const *& range_start,
size_t start_alignment = 0)
{
while (unlikely(head->pos + size > head->end))
if (!range_start)
{
char * prev_end = head->pos;
addChunk(size);
// Start a new memory range.
char * result = start_alignment
? alignedAlloc(additional_bytes, start_alignment)
: alloc(additional_bytes);
if (begin)
begin = insert(begin, prev_end - begin);
else
break;
range_start = result;
return result;
}
char * res = head->pos;
head->pos += size;
// Extend an existing memory range with 'additional_bytes'.
if (!begin)
begin = res;
// This method only works for extending the last allocation. For lack of
// original size, check a weaker condition: that 'begin' is at least in
// the current Chunk.
assert(range_start >= head->begin && range_start < head->end);
ASAN_UNPOISON_MEMORY_REGION(res, size + pad_right);
return res;
if (head->pos + additional_bytes <= head->end)
{
// The new size fits into the last chunk, so just alloc the
// additional size. We can alloc without alignment here, because it
// only applies to the start of the range, and we don't change it.
return alloc(additional_bytes);
}
char * alignedAllocContinue(size_t size, char const *& begin, size_t alignment)
{
char * res;
// New range doesn't fit into this chunk, will copy to a new one.
//
// Note: among other things, this method is used to provide a hack-ish
// implementation of realloc over Arenas in ArenaAllocators. It wastes a
// lot of memory -- quadratically so when we reach the linear allocation
// threshold. This deficiency is intentionally left as is, and should be
// solved not by complicating this method, but by rethinking the
// approach to memory management for aggregate function states, so that
// we can provide a proper realloc().
const size_t existing_bytes = head->pos - range_start;
const size_t new_bytes = existing_bytes + additional_bytes;
const char * old_range = range_start;
do
{
void * head_pos = head->pos;
size_t space = head->end - head->pos;
char * new_range = start_alignment
? alignedAlloc(new_bytes, start_alignment)
: alloc(new_bytes);
res = static_cast<char *>(std::align(alignment, size, head_pos, space));
if (res)
{
head->pos = static_cast<char *>(head_pos);
head->pos += size;
break;
}
memcpy(new_range, old_range, existing_bytes);
char * prev_end = head->pos;
addChunk(size + alignment);
if (begin)
begin = alignedInsert(begin, prev_end - begin, alignment);
else
break;
} while (true);
if (!begin)
begin = res;
ASAN_UNPOISON_MEMORY_REGION(res, size + pad_right);
return res;
range_start = new_range;
return new_range + existing_bytes;
}
/// NOTE Old memory region is wasted.

View File

@ -54,7 +54,7 @@ public:
if (data + old_size == arena->head->pos)
{
arena->alignedAllocContinue(new_size - old_size, data, alignment);
arena->allocContinue(new_size - old_size, data, alignment);
return reinterpret_cast<void *>(const_cast<char *>(data));
}
else

View File

@ -224,8 +224,18 @@ private:
public:
bool hasZero() const { return has_zero; }
void setHasZero() { has_zero = true; }
void clearHasZero() { has_zero = false; }
void setHasZero()
{
has_zero = true;
new (zeroValue()) Cell();
}
void clearHasZero()
{
has_zero = false;
zeroValue()->~Cell();
}
Cell * zeroValue() { return reinterpret_cast<Cell*>(&zero_value_storage); }
const Cell * zeroValue() const { return reinterpret_cast<const Cell*>(&zero_value_storage); }

View File

@ -61,7 +61,7 @@ add_executable (space_saving space_saving.cpp)
target_link_libraries (space_saving PRIVATE clickhouse_common_io)
add_executable (integer_hash_tables_and_hashes integer_hash_tables_and_hashes.cpp)
target_include_directories (integer_hash_tables_and_hashes SYSTEM BEFORE PRIVATE ${SPARCEHASH_INCLUDE_DIR})
target_include_directories (integer_hash_tables_and_hashes SYSTEM BEFORE PRIVATE ${SPARSEHASH_INCLUDE_DIR})
target_link_libraries (integer_hash_tables_and_hashes PRIVATE dbms)
add_executable (allocator allocator.cpp)

View File

@ -331,8 +331,8 @@ void NO_INLINE testForEachMapAndHash(const Key * data, size_t size)
testForEachHash<HashMap>(data, size, nothing);
testForEachHash<std::unordered_map>(data, size, nothing);
testForEachHash<GOOGLE_NAMESPACE::dense_hash_map>(data, size, [](auto & map){ map.set_empty_key(-1); });
testForEachHash<GOOGLE_NAMESPACE::sparse_hash_map>(data, size, nothing);
testForEachHash<::google::dense_hash_map>(data, size, [](auto & map){ map.set_empty_key(-1); });
testForEachHash<::google::sparse_hash_map>(data, size, nothing);
}

View File

@ -138,7 +138,7 @@ NamesAndTypesList NamesAndTypesList::filter(const Names & names) const
NamesAndTypesList NamesAndTypesList::addTypes(const Names & names) const
{
/// NOTE It's better to make a map in `IStorage` than to create it here every time again.
GOOGLE_NAMESPACE::dense_hash_map<StringRef, const DataTypePtr *, StringRefHash> types;
::google::dense_hash_map<StringRef, const DataTypePtr *, StringRefHash> types;
types.set_empty_key(StringRef());
for (const NameAndTypePair & column : *this)

View File

@ -288,6 +288,7 @@ struct Settings : public SettingsCollection<Settings>
M(SettingUInt64, max_bytes_in_join, 0, "Maximum size of the hash table for JOIN (in number of bytes in memory).") \
M(SettingOverflowMode, join_overflow_mode, OverflowMode::THROW, "What to do when the limit is exceeded.") \
M(SettingBool, join_any_take_last_row, false, "When disabled (default) ANY JOIN will take the first found row for a key. When enabled, it will take the last row seen if there are multiple rows for the same key.") \
M(SettingBool, partial_merge_join, false, "Use partial merge join instead of hash join if possible.") \
\
M(SettingUInt64, max_rows_to_transfer, 0, "Maximum size (in rows) of the transmitted external table obtained when the GLOBAL IN/JOIN section is executed.") \
M(SettingUInt64, max_bytes_to_transfer, 0, "Maximum size (in uncompressed bytes) of the transmitted external table obtained when the GLOBAL IN/JOIN section is executed.") \

View File

@ -1,6 +1,6 @@
add_executable (string_pool string_pool.cpp)
target_link_libraries (string_pool PRIVATE clickhouse_common_io)
target_include_directories (string_pool SYSTEM BEFORE PRIVATE ${SPARCEHASH_INCLUDE_DIR})
target_include_directories (string_pool SYSTEM BEFORE PRIVATE ${SPARSEHASH_INCLUDE_DIR})
add_executable (field field.cpp)
target_link_libraries (field PRIVATE dbms)

View File

@ -33,8 +33,8 @@ int main(int argc, char ** argv)
using Vec = std::vector<std::string>;
using Set = std::unordered_map<std::string, int>;
using RefsSet = std::unordered_map<StringRef, int, StringRefHash>;
using DenseSet = GOOGLE_NAMESPACE::dense_hash_map<std::string, int>;
using RefsDenseSet = GOOGLE_NAMESPACE::dense_hash_map<StringRef, int, StringRefHash>;
using DenseSet = ::google::dense_hash_map<std::string, int>;
using RefsDenseSet = ::google::dense_hash_map<StringRef, int, StringRefHash>;
using RefsHashMap = HashMap<StringRef, int, StringRefHash>;
Vec vec;

View File

@ -1,5 +1,4 @@
#include <Interpreters/Set.h>
#include <Interpreters/Join.h>
#include <DataStreams/materializeBlock.h>
#include <DataStreams/IBlockOutputStream.h>
#include <DataStreams/CreatingSetsBlockInputStream.h>
@ -124,12 +123,7 @@ void CreatingSetsBlockInputStream::createOne(SubqueryForSet & subquery)
if (!done_with_join)
{
subquery.renameColumns(block);
if (subquery.joined_block_actions)
subquery.joined_block_actions->execute(block);
if (!subquery.join->insertFromBlock(block))
if (!subquery.insertJoinedBlock(block))
done_with_join = true;
}
@ -162,8 +156,7 @@ void CreatingSetsBlockInputStream::createOne(SubqueryForSet & subquery)
head_rows = profile_info.rows;
if (subquery.join)
subquery.join->setTotals(subquery.source->getTotals());
subquery.setTotals();
if (head_rows != 0)
{

View File

@ -20,4 +20,10 @@ Block materializeBlock(const Block & block)
return res;
}
void materializeBlockInplace(Block & block)
{
for (size_t i = 0; i < block.columns(); ++i)
block.getByPosition(i).column = block.getByPosition(i).column->convertToFullColumnIfConst();
}
}

View File

@ -9,5 +9,6 @@ namespace DB
/** Converts columns-constants to full columns ("materializes" them).
*/
Block materializeBlock(const Block & block);
void materializeBlockInplace(Block & block);
}

View File

@ -4,17 +4,18 @@
namespace DB
{
class Context;
class Join;
using JoinPtr = std::shared_ptr<Join>;
using HashJoinPtr = std::shared_ptr<Join>;
class FunctionJoinGet final : public IFunction
{
public:
static constexpr auto name = "joinGet";
FunctionJoinGet(
TableStructureReadLockHolder table_lock_, StoragePtr storage_join_, JoinPtr join_, const String & attr_name_, DataTypePtr return_type_)
FunctionJoinGet(TableStructureReadLockHolder table_lock_, StoragePtr storage_join_, HashJoinPtr join_, const String & attr_name_,
DataTypePtr return_type_)
: table_lock(std::move(table_lock_))
, storage_join(std::move(storage_join_))
, join(std::move(join_))
@ -36,7 +37,7 @@ private:
private:
TableStructureReadLockHolder table_lock;
StoragePtr storage_join;
JoinPtr join;
HashJoinPtr join;
const String attr_name;
DataTypePtr return_type;
};

View File

@ -12,6 +12,8 @@ void registerFunctionsBitmap(FunctionFactory & factory)
factory.registerFunction<FunctionBitmapSubsetInRange>();
factory.registerFunction<FunctionBitmapSelfCardinality>();
factory.registerFunction<FunctionBitmapMin>();
factory.registerFunction<FunctionBitmapMax>();
factory.registerFunction<FunctionBitmapAndCardinality>();
factory.registerFunction<FunctionBitmapOrCardinality>();
factory.registerFunction<FunctionBitmapXorCardinality>();

View File

@ -49,6 +49,12 @@ namespace ErrorCodes
* Retrun bitmap cardinality:
* bitmapCardinality: bitmap -> integer
*
* Retrun smallest value in the set:
* bitmapMin: bitmap -> integer
*
* Retrun the greatest value in the set:
* bitmapMax: bitmap -> integer
*
* Two bitmap and calculation, return cardinality:
* bitmapAndCardinality: bitmap,bitmap -> integer
*
@ -357,13 +363,13 @@ private:
}
};
template <typename Name>
template <typename Impl>
class FunctionBitmapSelfCardinalityImpl : public IFunction
{
public:
static constexpr auto name = Name::name;
static constexpr auto name = Impl::name;
static FunctionPtr create(const Context &) { return std::make_shared<FunctionBitmapSelfCardinalityImpl>(); }
static FunctionPtr create(const Context &) { return std::make_shared<FunctionBitmapSelfCardinalityImpl<Impl>>(); }
String getName() const override { return name; }
@ -417,13 +423,46 @@ private:
= typeid_cast<const ColumnAggregateFunction *>(block.getByPosition(arguments[0]).column.get());
for (size_t i = 0; i < input_rows_count; ++i)
{
const AggregateFunctionGroupBitmapData<T> & bd1
const AggregateFunctionGroupBitmapData<T> & bd
= *reinterpret_cast<const AggregateFunctionGroupBitmapData<T> *>(column->getData()[i]);
vec_to[i] = bd1.rbs.size();
vec_to[i] = Impl::apply(bd);
}
}
};
struct BitmapCardinalityImpl
{
public:
static constexpr auto name = "bitmapCardinality";
template <typename T>
static UInt64 apply(const AggregateFunctionGroupBitmapData<T> & bd)
{
return bd.rbs.size();
}
};
struct BitmapMinImpl
{
public:
static constexpr auto name = "bitmapMin";
template <typename T>
static UInt64 apply(const AggregateFunctionGroupBitmapData<T> & bd)
{
return bd.rbs.rb_min();
}
};
struct BitmapMaxImpl
{
public:
static constexpr auto name = "bitmapMax";
template <typename T>
static UInt64 apply(const AggregateFunctionGroupBitmapData<T> & bd)
{
return bd.rbs.rb_max();
}
};
template <typename T>
struct BitmapAndCardinalityImpl
{
@ -840,7 +879,9 @@ struct NameBitmapHasAny
static constexpr auto name = "bitmapHasAny";
};
using FunctionBitmapSelfCardinality = FunctionBitmapSelfCardinalityImpl<NameBitmapCardinality>;
using FunctionBitmapSelfCardinality = FunctionBitmapSelfCardinalityImpl<BitmapCardinalityImpl>;
using FunctionBitmapMin = FunctionBitmapSelfCardinalityImpl<BitmapMinImpl>;
using FunctionBitmapMax = FunctionBitmapSelfCardinalityImpl<BitmapMaxImpl>;
using FunctionBitmapAndCardinality = FunctionBitmapCardinality<BitmapAndCardinalityImpl, NameBitmapAndCardinality, UInt64>;
using FunctionBitmapOrCardinality = FunctionBitmapCardinality<BitmapOrCardinalityImpl, NameBitmapOrCardinality, UInt64>;
using FunctionBitmapXorCardinality = FunctionBitmapCardinality<BitmapXorCardinalityImpl, NameBitmapXorCardinality, UInt64>;

View File

@ -2,11 +2,13 @@
#include <Interpreters/DatabaseAndTableWithAlias.h>
#include <Interpreters/InterpreterSelectWithUnionQuery.h>
#include <Interpreters/Join.h>
#include <Interpreters/MergeJoin.h>
#include <Parsers/ASTExpressionList.h>
#include <Parsers/ASTTablesInSelectQuery.h>
#include <Parsers/ASTSelectQuery.h>
#include <Core/Settings.h>
#include <Core/Block.h>
#include <Storages/IStorage.h>
@ -16,6 +18,17 @@
namespace DB
{
namespace ErrorCodes
{
extern const int LOGICAL_ERROR;
}
AnalyzedJoin::AnalyzedJoin(const Settings & settings)
: size_limits(SizeLimits{settings.max_rows_in_join, settings.max_bytes_in_join, settings.join_overflow_mode})
, join_use_nulls(settings.join_use_nulls)
, partial_merge_join(settings.partial_merge_join)
{}
void AnalyzedJoin::addUsingKey(const ASTPtr & ast)
{
key_names_left.push_back(ast->getColumnName());
@ -129,6 +142,16 @@ Names AnalyzedJoin::requiredJoinedNames() const
return Names(required_columns_set.begin(), required_columns_set.end());
}
NameSet AnalyzedJoin::requiredRightKeys() const
{
NameSet required;
for (const auto & name : key_names_right)
for (const auto & column : columns_added_by_join)
if (name == column.name)
required.insert(name);
return required;
}
NamesWithAliases AnalyzedJoin::getRequiredColumns(const Block & sample, const Names & action_required_columns) const
{
NameSet required_columns(action_required_columns.begin(), action_required_columns.end());
@ -209,37 +232,7 @@ bool AnalyzedJoin::sameJoin(const AnalyzedJoin * x, const AnalyzedJoin * y)
&& x->table_join.strictness == y->table_join.strictness
&& x->key_names_left == y->key_names_left
&& x->key_names_right == y->key_names_right
&& x->columns_added_by_join == y->columns_added_by_join
&& x->hash_join == y->hash_join;
}
BlockInputStreamPtr AnalyzedJoin::createStreamWithNonJoinedDataIfFullOrRightJoin(const Block & source_header, UInt64 max_block_size) const
{
if (isRightOrFull(table_join.kind))
return hash_join->createStreamWithNonJoinedRows(source_header, *this, max_block_size);
return {};
}
JoinPtr AnalyzedJoin::makeHashJoin(const Block & sample_block, const SizeLimits & size_limits_for_join) const
{
auto join = std::make_shared<Join>(key_names_right, join_use_nulls, size_limits_for_join, table_join.kind, table_join.strictness);
join->setSampleBlock(sample_block);
return join;
}
void AnalyzedJoin::joinBlock(Block & block) const
{
hash_join->joinBlock(block, *this);
}
void AnalyzedJoin::joinTotals(Block & block) const
{
hash_join->joinTotals(block);
}
bool AnalyzedJoin::hasTotals() const
{
return hash_join->hasTotals();
&& x->columns_added_by_join == y->columns_added_by_join;
}
NamesAndTypesList getNamesAndTypeListFromTableExpression(const ASTTableExpression & table_expression, const Context & context)
@ -267,4 +260,11 @@ NamesAndTypesList getNamesAndTypeListFromTableExpression(const ASTTableExpressio
return names_and_type_list;
}
JoinPtr makeJoin(std::shared_ptr<AnalyzedJoin> table_join, const Block & right_sample_block)
{
if (table_join->partial_merge_join)
return std::make_shared<MergeJoin>(table_join, right_sample_block);
return std::make_shared<Join>(table_join, right_sample_block);
}
}

View File

@ -4,7 +4,9 @@
#include <Core/NamesAndTypes.h>
#include <Core/SettingsCommon.h>
#include <Parsers/ASTTablesInSelectQuery.h>
#include <Interpreters/IJoin.h>
#include <DataStreams/IBlockStream_fwd.h>
#include <DataStreams/SizeLimits.h>
#include <utility>
#include <memory>
@ -17,8 +19,7 @@ class ASTSelectQuery;
struct DatabaseAndTableWithAlias;
class Block;
class Join;
using JoinPtr = std::shared_ptr<Join>;
struct Settings;
class AnalyzedJoin
{
@ -36,12 +37,15 @@ class AnalyzedJoin
friend class SyntaxAnalyzer;
const SizeLimits size_limits;
const bool join_use_nulls;
const bool partial_merge_join;
Names key_names_left;
Names key_names_right; /// Duplicating names are qualified.
ASTs key_asts_left;
ASTs key_asts_right;
ASTTableJoin table_join;
bool join_use_nulls = false;
/// All columns which can be read from joined table. Duplicating names are qualified.
NamesAndTypesList columns_from_joined_table;
@ -53,9 +57,28 @@ class AnalyzedJoin
/// Original name -> name. Only ranamed columns.
std::unordered_map<String, String> renames;
JoinPtr hash_join;
public:
AnalyzedJoin(const Settings &);
/// for StorageJoin
AnalyzedJoin(SizeLimits limits, bool use_nulls, ASTTableJoin::Kind kind, ASTTableJoin::Strictness strictness,
const Names & key_names_right_)
: size_limits(limits)
, join_use_nulls(use_nulls)
, partial_merge_join(false)
, key_names_right(key_names_right_)
{
table_join.kind = kind;
table_join.strictness = strictness;
}
ASTTableJoin::Kind kind() const { return table_join.kind; }
ASTTableJoin::Strictness strictness() const { return table_join.strictness; }
const SizeLimits & sizeLimits() const { return size_limits; }
bool forceNullableRight() const { return join_use_nulls && isLeftOrFull(table_join.kind); }
bool forceNullableLeft() const { return join_use_nulls && isRightOrFull(table_join.kind); }
void addUsingKey(const ASTPtr & ast);
void addOnKeys(ASTPtr & left_table_ast, ASTPtr & right_table_ast);
@ -69,6 +92,7 @@ public:
void deduplicateAndQualifyColumnNames(const NameSet & left_table_columns, const String & right_table_prefix);
size_t rightKeyInclusion(const String & name) const;
NameSet requiredRightKeys() const;
void addJoinedColumn(const NameAndTypePair & joined_column);
void addJoinedColumnsAndCorrectNullability(Block & sample_block) const;
@ -78,17 +102,12 @@ public:
Names requiredJoinedNames() const;
const Names & keyNamesLeft() const { return key_names_left; }
const Names & keyNamesRight() const { return key_names_right; }
const NamesAndTypesList & columnsFromJoinedTable() const { return columns_from_joined_table; }
const NamesAndTypesList & columnsAddedByJoin() const { return columns_added_by_join; }
void setHashJoin(JoinPtr join) { hash_join = join; }
JoinPtr makeHashJoin(const Block & sample_block, const SizeLimits & size_limits_for_join) const;
BlockInputStreamPtr createStreamWithNonJoinedDataIfFullOrRightJoin(const Block & source_header, UInt64 max_block_size) const;
void joinBlock(Block & block) const;
void joinTotals(Block & block) const;
bool hasTotals() const;
static bool sameJoin(const AnalyzedJoin * x, const AnalyzedJoin * y);
friend JoinPtr makeJoin(std::shared_ptr<AnalyzedJoin> table_join, const Block & right_sample_block);
};
struct ASTTableExpression;

View File

@ -160,11 +160,12 @@ ExpressionAction ExpressionAction::arrayJoin(const NameSet & array_joined_column
return a;
}
ExpressionAction ExpressionAction::ordinaryJoin(std::shared_ptr<AnalyzedJoin> table_join)
ExpressionAction ExpressionAction::ordinaryJoin(std::shared_ptr<AnalyzedJoin> table_join, JoinPtr join)
{
ExpressionAction a;
a.type = JOIN;
a.table_join = table_join;
a.join = join;
return a;
}
@ -475,7 +476,7 @@ void ExpressionAction::execute(Block & block, bool dry_run) const
case JOIN:
{
table_join->joinBlock(block);
join->joinBlock(block);
break;
}
@ -543,7 +544,7 @@ void ExpressionAction::executeOnTotals(Block & block) const
if (type != JOIN)
execute(block, false);
else
table_join->joinTotals(block);
join->joinTotals(block);
}
@ -763,7 +764,7 @@ void ExpressionActions::execute(Block & block, bool dry_run) const
bool ExpressionActions::hasTotalsInJoin() const
{
for (const auto & action : actions)
if (action.table_join && action.table_join->hasTotals())
if (action.table_join && action.join->hasTotals())
return true;
return false;
}
@ -1157,11 +1158,11 @@ void ExpressionActions::optimizeArrayJoin()
}
std::shared_ptr<const AnalyzedJoin> ExpressionActions::getTableJoin() const
JoinPtr ExpressionActions::getTableJoinAlgo() const
{
for (const auto & action : actions)
if (action.table_join)
return action.table_join;
if (action.join)
return action.join;
return {};
}

View File

@ -21,6 +21,8 @@ namespace ErrorCodes
}
class AnalyzedJoin;
class IJoin;
using JoinPtr = std::shared_ptr<IJoin>;
class IPreparedFunction;
using PreparedFunctionPtr = std::shared_ptr<IPreparedFunction>;
@ -101,6 +103,7 @@ public:
/// For JOIN
std::shared_ptr<const AnalyzedJoin> table_join;
JoinPtr join;
/// For PROJECT.
NamesWithAliases projection;
@ -116,7 +119,7 @@ public:
static ExpressionAction project(const Names & projected_columns_);
static ExpressionAction addAliases(const NamesWithAliases & aliased_columns_);
static ExpressionAction arrayJoin(const NameSet & array_joined_columns, bool array_join_is_left, const Context & context);
static ExpressionAction ordinaryJoin(std::shared_ptr<AnalyzedJoin> join);
static ExpressionAction ordinaryJoin(std::shared_ptr<AnalyzedJoin> table_join, JoinPtr join);
/// Which columns necessary to perform this action.
Names getNeededColumns() const;
@ -232,7 +235,7 @@ public:
static std::string getSmallestColumn(const NamesAndTypesList & columns);
std::shared_ptr<const AnalyzedJoin> getTableJoin() const;
JoinPtr getTableJoinAlgo() const;
const Settings & getSettings() const { return settings; }

View File

@ -30,6 +30,7 @@
#include <Interpreters/ExternalDictionaries.h>
#include <Interpreters/Set.h>
#include <Interpreters/AnalyzedJoin.h>
#include <Interpreters/Join.h>
#include <AggregateFunctions/AggregateFunctionFactory.h>
#include <AggregateFunctions/parseAggregateFunctionParameters.h>
@ -407,9 +408,9 @@ bool SelectQueryExpressionAnalyzer::appendArrayJoin(ExpressionActionsChain & cha
return true;
}
void ExpressionAnalyzer::addJoinAction(ExpressionActionsPtr & actions) const
void ExpressionAnalyzer::addJoinAction(ExpressionActionsPtr & actions, JoinPtr join) const
{
actions->add(ExpressionAction::ordinaryJoin(syntax->analyzed_join));
actions->add(ExpressionAction::ordinaryJoin(syntax->analyzed_join, join));
}
bool SelectQueryExpressionAnalyzer::appendJoin(ExpressionActionsChain & chain, bool only_types)
@ -418,13 +419,13 @@ bool SelectQueryExpressionAnalyzer::appendJoin(ExpressionActionsChain & chain, b
if (!ast_join)
return false;
makeTableJoin(*ast_join);
JoinPtr table_join = makeTableJoin(*ast_join);
initChain(chain, sourceColumns());
ExpressionActionsChain::Step & step = chain.steps.back();
getRootActions(analyzedJoin().leftKeysList(), only_types, step.actions);
addJoinAction(step.actions);
addJoinAction(step.actions, table_join);
return true;
}
@ -464,40 +465,40 @@ static ExpressionActionsPtr createJoinedBlockActions(const Context & context, co
return ExpressionAnalyzer(expression_list, syntax_result, context).getActions(true, false);
}
void SelectQueryExpressionAnalyzer::makeTableJoin(const ASTTablesInSelectQueryElement & join_element)
JoinPtr SelectQueryExpressionAnalyzer::makeTableJoin(const ASTTablesInSelectQueryElement & join_element)
{
/// Two JOINs are not supported with the same subquery, but different USINGs.
auto join_hash = join_element.getTreeHash();
String join_subquery_id = toString(join_hash.first) + "_" + toString(join_hash.second);
SubqueryForSet & subquery_for_set = subqueries_for_sets[join_subquery_id];
SubqueryForSet & subquery_for_join = subqueries_for_sets[join_subquery_id];
/// Special case - if table name is specified on the right of JOIN, then the table has the type Join (the previously prepared mapping).
if (!subquery_for_set.join)
subquery_for_set.join = tryGetStorageJoin(join_element, context);
if (!subquery_for_join.join)
subquery_for_join.join = tryGetStorageJoin(join_element, context);
if (!subquery_for_set.join)
if (!subquery_for_join.join)
{
/// Actions which need to be calculated on joined block.
ExpressionActionsPtr joined_block_actions = createJoinedBlockActions(context, analyzedJoin());
if (!subquery_for_set.source)
makeSubqueryForJoin(join_element, joined_block_actions, subquery_for_set);
/// Test actions on sample block (early error detection)
Block sample_block = subquery_for_set.renamedSampleBlock();
joined_block_actions->execute(sample_block);
/// TODO You do not need to set this up when JOIN is only needed on remote servers.
subquery_for_set.join = analyzedJoin().makeHashJoin(sample_block, settings.size_limits_for_join);
subquery_for_set.joined_block_actions = joined_block_actions;
if (!subquery_for_join.source)
{
NamesWithAliases required_columns_with_aliases =
analyzedJoin().getRequiredColumns(joined_block_actions->getSampleBlock(), joined_block_actions->getRequiredColumns());
makeSubqueryForJoin(join_element, std::move(required_columns_with_aliases), subquery_for_join);
}
syntax->analyzed_join->setHashJoin(subquery_for_set.join);
/// TODO You do not need to set this up when JOIN is only needed on remote servers.
subquery_for_join.setJoinActions(joined_block_actions); /// changes subquery_for_join.sample_block inside
subquery_for_join.join = makeJoin(syntax->analyzed_join, subquery_for_join.sample_block);
}
return subquery_for_join.join;
}
void SelectQueryExpressionAnalyzer::makeSubqueryForJoin(const ASTTablesInSelectQueryElement & join_element,
const ExpressionActionsPtr & joined_block_actions,
NamesWithAliases && required_columns_with_aliases,
SubqueryForSet & subquery_for_set) const
{
/** For GLOBAL JOINs (in the case, for example, of the push method for executing GLOBAL subqueries), the following occurs
@ -505,10 +506,6 @@ void SelectQueryExpressionAnalyzer::makeSubqueryForJoin(const ASTTablesInSelectQ
* in the subquery_for_set object this subquery is exposed as source and the temporary table _data1 as the `table`.
* - this function shows the expression JOIN _data1.
*/
NamesWithAliases required_columns_with_aliases =
analyzedJoin().getRequiredColumns(joined_block_actions->getSampleBlock(), joined_block_actions->getRequiredColumns());
Names original_columns;
for (auto & pr : required_columns_with_aliases)
original_columns.push_back(pr.first);

View File

@ -20,6 +20,8 @@ class ExpressionActions;
using ExpressionActionsPtr = std::shared_ptr<ExpressionActions>;
struct ASTTableJoin;
class IJoin;
using JoinPtr = std::shared_ptr<IJoin>;
class ASTFunction;
class ASTExpressionList;
@ -58,15 +60,11 @@ private:
struct ExtractedSettings
{
const bool use_index_for_in_with_subqueries;
const bool join_use_nulls;
const SizeLimits size_limits_for_set;
const SizeLimits size_limits_for_join;
ExtractedSettings(const Settings & settings_)
: use_index_for_in_with_subqueries(settings_.use_index_for_in_with_subqueries),
join_use_nulls(settings_.join_use_nulls),
size_limits_for_set(settings_.max_rows_in_set, settings_.max_bytes_in_set, settings_.set_overflow_mode),
size_limits_for_join(settings_.max_rows_in_join, settings_.max_bytes_in_join, settings_.join_overflow_mode)
size_limits_for_set(settings_.max_rows_in_set, settings_.max_bytes_in_set, settings_.set_overflow_mode)
{}
};
@ -127,7 +125,7 @@ protected:
void addMultipleArrayJoinAction(ExpressionActionsPtr & actions, bool is_left) const;
void addJoinAction(ExpressionActionsPtr & actions) const;
void addJoinAction(ExpressionActionsPtr & actions, JoinPtr = {}) const;
void getRootActions(const ASTPtr & ast, bool no_subqueries, ExpressionActionsPtr & actions, bool only_consts = false);
@ -219,8 +217,8 @@ private:
*/
void tryMakeSetForIndexFromSubquery(const ASTPtr & subquery_or_table_name);
void makeTableJoin(const ASTTablesInSelectQueryElement & join_element);
void makeSubqueryForJoin(const ASTTablesInSelectQueryElement & join_element, const ExpressionActionsPtr & joined_block_actions,
JoinPtr makeTableJoin(const ASTTablesInSelectQueryElement & join_element);
void makeSubqueryForJoin(const ASTTablesInSelectQueryElement & join_element, NamesWithAliases && required_columns_with_aliases,
SubqueryForSet & subquery_for_set) const;
const ASTSelectQuery * getAggregatingQuery() const;

View File

@ -0,0 +1,39 @@
#pragma once
#include <memory>
#include <vector>
#include <Core/Names.h>
#include <Columns/IColumn.h>
#include <DataStreams/IBlockStream_fwd.h>
namespace DB
{
class Block;
class IJoin
{
public:
virtual ~IJoin() = default;
/// Add block of data from right hand of JOIN.
/// @returns false, if some limit was exceeded and you should not insert more data.
virtual bool addJoinedBlock(const Block & block) = 0;
/// Join the block with data from left hand of JOIN to the right hand data (that was previously built by calls to addJoinedBlock).
/// Could be called from different threads in parallel.
virtual void joinBlock(Block & block) = 0;
virtual bool hasTotals() const { return false; }
virtual void setTotals(const Block & block) = 0;
virtual void joinTotals(Block & block) const = 0;
virtual size_t getTotalRowCount() const = 0;
virtual BlockInputStreamPtr createStreamWithNonJoinedRows(const Block &, UInt64) const { return {}; }
};
using JoinPtr = std::shared_ptr<IJoin>;
}

View File

@ -790,7 +790,7 @@ static std::pair<UInt64, UInt64> getLimitLengthAndOffset(const ASTSelectQuery &
if (query.limitLength())
{
length = getLimitUIntValue(query.limitLength(), context);
if (query.limitOffset())
if (query.limitOffset() && length)
offset = getLimitUIntValue(query.limitOffset(), context);
}
@ -1118,9 +1118,9 @@ void InterpreterSelectQuery::executeImpl(TPipeline & pipeline, const BlockInputS
stream = std::make_shared<ExpressionBlockInputStream>(stream, expressions.before_join);
}
if (auto join = expressions.before_join->getTableJoin())
if (JoinPtr join = expressions.before_join->getTableJoinAlgo())
{
if (auto stream = join->createStreamWithNonJoinedDataIfFullOrRightJoin(header_before_join, settings.max_block_size))
if (auto stream = join->createStreamWithNonJoinedRows(header_before_join, settings.max_block_size))
{
if constexpr (pipeline_with_processors)
{

View File

@ -1,5 +1,6 @@
#include <IO/ReadBufferFromString.h>
#include <Parsers/ASTShowTablesQuery.h>
#include <Parsers/formatAST.h>
#include <Interpreters/Context.h>
#include <Interpreters/executeQuery.h>
#include <Interpreters/InterpreterShowTablesQuery.h>
@ -53,6 +54,9 @@ String InterpreterShowTablesQuery::getRewrittenQuery()
if (!query.like.empty())
rewritten_query << " AND name " << (query.not_like ? "NOT " : "") << "LIKE " << std::quoted(query.like, '\'');
if (query.limit_length)
rewritten_query << " LIMIT " << query.limit_length;
return rewritten_query.str();
}

View File

@ -10,6 +10,7 @@
#include <DataTypes/DataTypeNullable.h>
#include <Interpreters/Join.h>
#include <Interpreters/join_common.h>
#include <Interpreters/AnalyzedJoin.h>
#include <Interpreters/joinDispatch.h>
#include <Interpreters/NullableUtils.h>
@ -35,35 +36,12 @@ namespace ErrorCodes
extern const int ILLEGAL_COLUMN;
}
static std::unordered_map<String, DataTypePtr> requiredRightKeys(const Names & key_names, const NamesAndTypesList & columns_added_by_join)
{
NameSet right_keys;
for (const auto & name : key_names)
right_keys.insert(name);
std::unordered_map<String, DataTypePtr> required;
for (const auto & column : columns_added_by_join)
if (right_keys.count(column.name))
required.insert({column.name, column.type});
return required;
}
static void convertColumnToNullable(ColumnWithTypeAndName & column)
{
if (column.type->isNullable() || !column.type->canBeInsideNullable())
return;
column.type = makeNullable(column.type);
if (column.column)
column.column = makeNullable(column.column);
}
/// Converts column to nullable if needed. No backward convertion.
static ColumnWithTypeAndName correctNullability(ColumnWithTypeAndName && column, bool nullable)
{
if (nullable)
convertColumnToNullable(column);
JoinCommon::convertColumnToNullable(column);
return std::move(column);
}
@ -71,7 +49,7 @@ static ColumnWithTypeAndName correctNullability(ColumnWithTypeAndName && column,
{
if (nullable)
{
convertColumnToNullable(column);
JoinCommon::convertColumnToNullable(column);
if (column.type->isNullable() && negative_null_map.size())
{
MutableColumnPtr mutable_column = (*std::move(column.column)).mutate();
@ -83,15 +61,18 @@ static ColumnWithTypeAndName correctNullability(ColumnWithTypeAndName && column,
}
Join::Join(const Names & key_names_right_, bool use_nulls_, const SizeLimits & limits_,
ASTTableJoin::Kind kind_, ASTTableJoin::Strictness strictness_, bool any_take_last_row_)
: kind(kind_), strictness(strictness_),
key_names_right(key_names_right_),
use_nulls(use_nulls_),
any_take_last_row(any_take_last_row_),
log(&Logger::get("Join")),
limits(limits_)
Join::Join(std::shared_ptr<AnalyzedJoin> table_join_, const Block & right_sample_block, bool any_take_last_row_)
: table_join(table_join_)
, kind(table_join->kind())
, strictness(table_join->strictness())
, key_names_right(table_join->keyNamesRight())
, required_right_keys(table_join->requiredRightKeys())
, nullable_right_side(table_join->forceNullableRight())
, nullable_left_side(table_join->forceNullableLeft())
, any_take_last_row(any_take_last_row_)
, log(&Logger::get("Join"))
{
setSampleBlock(right_sample_block);
}
@ -269,42 +250,15 @@ size_t Join::getTotalByteCount() const
void Join::setSampleBlock(const Block & block)
{
std::unique_lock lock(rwlock);
/// You have to restore this lock if you call the fuction outside of ctor.
//std::unique_lock lock(rwlock);
LOG_DEBUG(log, "setSampleBlock: " << block.dumpStructure());
if (!empty())
return;
size_t keys_size = key_names_right.size();
ColumnRawPtrs key_columns(keys_size);
sample_block_with_columns_to_add = materializeBlock(block);
for (size_t i = 0; i < keys_size; ++i)
{
const String & column_name = key_names_right[i];
/// there could be the same key names
if (sample_block_with_keys.has(column_name))
{
key_columns[i] = sample_block_with_keys.getByName(column_name).column.get();
continue;
}
auto & col = sample_block_with_columns_to_add.getByName(column_name);
col.column = recursiveRemoveLowCardinality(col.column);
col.type = recursiveRemoveLowCardinality(col.type);
/// Extract right keys with correct keys order.
sample_block_with_keys.insert(col);
sample_block_with_columns_to_add.erase(column_name);
key_columns[i] = sample_block_with_keys.getColumns().back().get();
/// We will join only keys, where all components are not NULL.
if (auto * nullable = checkAndGetColumn<ColumnNullable>(*key_columns[i]))
key_columns[i] = &nullable->getNestedColumn();
}
ColumnRawPtrs key_columns = JoinCommon::extractKeysForJoin(key_names_right, block, right_table_keys, sample_block_with_columns_to_add);
if (strictness == ASTTableJoin::Strictness::Asof)
{
@ -343,19 +297,10 @@ void Join::setSampleBlock(const Block & block)
blocklist_sample = Block(block.getColumnsWithTypeAndName());
prepareBlockListStructure(blocklist_sample);
size_t num_columns_to_add = sample_block_with_columns_to_add.columns();
JoinCommon::createMissedColumns(sample_block_with_columns_to_add);
for (size_t i = 0; i < num_columns_to_add; ++i)
{
auto & column = sample_block_with_columns_to_add.getByPosition(i);
if (!column.column)
column.column = column.type->createColumn();
}
/// In case of LEFT and FULL joins, if use_nulls, convert joined columns to Nullable.
if (use_nulls && isLeftOrFull(kind))
for (size_t i = 0; i < num_columns_to_add; ++i)
convertColumnToNullable(sample_block_with_columns_to_add.getByPosition(i));
if (nullable_right_side)
JoinCommon::convertColumnsToNullable(sample_block_with_columns_to_add);
}
namespace
@ -504,26 +449,16 @@ void Join::prepareBlockListStructure(Block & stored_block)
}
}
bool Join::insertFromBlock(const Block & block)
bool Join::addJoinedBlock(const Block & block)
{
std::unique_lock lock(rwlock);
if (empty())
throw Exception("Logical error: Join was not initialized", ErrorCodes::LOGICAL_ERROR);
size_t keys_size = key_names_right.size();
ColumnRawPtrs key_columns(keys_size);
/// Rare case, when keys are constant. To avoid code bloat, simply materialize them.
Columns materialized_columns;
materialized_columns.reserve(keys_size);
/// Memoize key columns to work.
for (size_t i = 0; i < keys_size; ++i)
{
materialized_columns.emplace_back(recursiveRemoveLowCardinality(block.getByName(key_names_right[i]).column->convertToFullColumnIfConst()));
key_columns[i] = materialized_columns.back().get();
}
ColumnRawPtrs key_columns = JoinCommon::temporaryMaterializeColumns(block, key_names_right, materialized_columns);
/// We will insert to the map only keys, where all components are not NULL.
ConstNullMapPtr null_map{};
@ -536,20 +471,11 @@ bool Join::insertFromBlock(const Block & block)
prepareBlockListStructure(*stored_block);
size_t size = stored_block->columns();
/// Rare case, when joined columns are constant. To avoid code bloat, simply materialize them.
for (size_t i = 0; i < size; ++i)
stored_block->safeGetByPosition(i).column = stored_block->safeGetByPosition(i).column->convertToFullColumnIfConst();
materializeBlockInplace(*stored_block);
/// In case of LEFT and FULL joins, if use_nulls, convert joined columns to Nullable.
if (use_nulls && isLeftOrFull(kind))
{
for (size_t i = isFull(kind) ? keys_size : 0; i < size; ++i)
{
convertColumnToNullable(stored_block->getByPosition(i));
}
}
if (nullable_right_side)
JoinCommon::convertColumnsToNullable(*stored_block, (isFull(kind) ? key_names_right.size() : 0));
if (kind != ASTTableJoin::Kind::Cross)
{
@ -570,7 +496,7 @@ bool Join::insertFromBlock(const Block & block)
blocks_nullmaps.emplace_back(stored_block, null_map_holder);
}
return limits.check(getTotalRowCount(), getTotalByteCount(), "JOIN", ErrorCodes::SET_SIZE_LIMIT_EXCEEDED);
return table_join->sizeLimits().check(getTotalRowCount(), getTotalByteCount(), "JOIN", ErrorCodes::SET_SIZE_LIMIT_EXCEEDED);
}
@ -783,23 +709,12 @@ template <ASTTableJoin::Kind KIND, ASTTableJoin::Strictness STRICTNESS, typename
void Join::joinBlockImpl(
Block & block,
const Names & key_names_left,
const NamesAndTypesList & columns_added_by_join,
const Block & block_with_columns_to_add,
const Maps & maps_) const
{
size_t keys_size = key_names_left.size();
ColumnRawPtrs key_columns(keys_size);
/// Rare case, when keys are constant. To avoid code bloat, simply materialize them.
Columns materialized_columns;
materialized_columns.reserve(keys_size);
/// Memoize key columns to work with.
for (size_t i = 0; i < keys_size; ++i)
{
materialized_columns.emplace_back(recursiveRemoveLowCardinality(block.getByName(key_names_left[i]).column->convertToFullColumnIfConst()));
key_columns[i] = materialized_columns.back().get();
}
ColumnRawPtrs key_columns = JoinCommon::temporaryMaterializeColumns(block, key_names_left, materialized_columns);
/// Keys with NULL value in any column won't join to anything.
ConstNullMapPtr null_map{};
@ -814,12 +729,10 @@ void Join::joinBlockImpl(
constexpr bool right_or_full = static_in_v<KIND, ASTTableJoin::Kind::Right, ASTTableJoin::Kind::Full>;
if constexpr (right_or_full)
{
for (size_t i = 0; i < existing_columns; ++i)
{
block.getByPosition(i).column = block.getByPosition(i).column->convertToFullColumnIfConst();
if (use_nulls)
convertColumnToNullable(block.getByPosition(i));
}
materializeBlockInplace(block);
if (nullable_left_side)
JoinCommon::convertColumnsToNullable(block);
}
/** For LEFT/INNER JOIN, the saved blocks do not contain keys.
@ -829,7 +742,7 @@ void Join::joinBlockImpl(
*/
ColumnsWithTypeAndName extras;
if constexpr (STRICTNESS == ASTTableJoin::Strictness::Asof)
extras.push_back(sample_block_with_keys.getByName(key_names_right.back()));
extras.push_back(right_table_keys.getByName(key_names_right.back()));
AddedColumns added(sample_block_with_columns_to_add, block_with_columns_to_add, block, blocklist_sample, extras);
std::unique_ptr<IColumn::Offsets> offsets_to_replicate;
@ -841,11 +754,8 @@ void Join::joinBlockImpl(
block.insert(added.moveColumn(i));
/// Filter & insert missing rows
auto right_keys = requiredRightKeys(key_names_right, columns_added_by_join);
constexpr bool is_all_join = STRICTNESS == ASTTableJoin::Strictness::All;
constexpr bool inner_or_right = static_in_v<KIND, ASTTableJoin::Kind::Inner, ASTTableJoin::Kind::Right>;
constexpr bool left_or_full = static_in_v<KIND, ASTTableJoin::Kind::Left, ASTTableJoin::Kind::Full>;
std::vector<size_t> right_keys_to_replicate [[maybe_unused]];
@ -856,17 +766,16 @@ void Join::joinBlockImpl(
block.safeGetByPosition(i).column = block.safeGetByPosition(i).column->filter(row_filter, -1);
/// Add join key columns from right block if they has different name.
for (size_t i = 0; i < key_names_right.size(); ++i)
for (size_t i = 0; i < right_table_keys.columns(); ++i)
{
auto & right_name = key_names_right[i];
const auto & right_key = right_table_keys.getByPosition(i);
auto & left_name = key_names_left[i];
auto it = right_keys.find(right_name);
if (it != right_keys.end() && !block.has(right_name))
if (required_right_keys.count(right_key.name) && !block.has(right_key.name))
{
const auto & col = block.getByName(left_name);
bool is_nullable = it->second->isNullable();
block.insert(correctNullability({col.column, col.type, right_name}, is_nullable));
bool is_nullable = nullable_right_side || right_key.type->isNullable();
block.insert(correctNullability({col.column, col.type, right_key.name}, is_nullable));
}
}
}
@ -879,13 +788,12 @@ void Join::joinBlockImpl(
const IColumn::Filter & filter = null_map_filter.getData();
/// Add join key columns from right block if they has different name.
for (size_t i = 0; i < key_names_right.size(); ++i)
for (size_t i = 0; i < right_table_keys.columns(); ++i)
{
auto & right_name = key_names_right[i];
const auto & right_key = right_table_keys.getByPosition(i);
auto & left_name = key_names_left[i];
auto it = right_keys.find(right_name);
if (it != right_keys.end() && !block.has(right_name))
if (required_right_keys.count(right_key.name) && !block.has(right_key.name))
{
const auto & col = block.getByName(left_name);
ColumnPtr column = col.column->convertToFullColumnIfConst();
@ -900,11 +808,11 @@ void Join::joinBlockImpl(
mut_column->insertDefault();
}
bool is_nullable = (use_nulls && left_or_full) || it->second->isNullable();
block.insert(correctNullability({std::move(mut_column), col.type, right_name}, is_nullable, null_map_filter));
bool is_nullable = nullable_right_side || right_key.type->isNullable();
block.insert(correctNullability({std::move(mut_column), col.type, right_key.name}, is_nullable, null_map_filter));
if constexpr (is_all_join)
right_keys_to_replicate.push_back(block.getPositionByName(right_name));
right_keys_to_replicate.push_back(block.getPositionByName(right_key.name));
}
}
}
@ -974,27 +882,6 @@ void Join::joinBlockImplCross(Block & block) const
block = block.cloneWithColumns(std::move(dst_columns));
}
void Join::checkTypesOfKeys(const Block & block_left, const Names & key_names_left, const Block & block_right) const
{
size_t keys_size = key_names_left.size();
for (size_t i = 0; i < keys_size; ++i)
{
/// Compare up to Nullability.
DataTypePtr left_type = removeNullable(recursiveRemoveLowCardinality(block_left.getByName(key_names_left[i]).type));
DataTypePtr right_type = removeNullable(recursiveRemoveLowCardinality(block_right.getByName(key_names_right[i]).type));
if (!left_type->equals(*right_type))
throw Exception("Type mismatch of columns to JOIN by: "
+ key_names_left[i] + " " + left_type->getName() + " at left, "
+ key_names_right[i] + " " + right_type->getName() + " at right",
ErrorCodes::TYPE_MISMATCH);
}
}
static void checkTypeOfKey(const Block & block_left, const Block & block_right)
{
auto & [c1, left_type_origin, left_name] = block_left.safeGetByPosition(0);
@ -1024,7 +911,7 @@ template <typename Maps>
void Join::joinGetImpl(Block & block, const String & column_name, const Maps & maps_) const
{
joinBlockImpl<ASTTableJoin::Kind::Left, ASTTableJoin::Strictness::Any>(
block, {block.getByPosition(0).name}, {}, {sample_block_with_columns_to_add.getByName(column_name)}, maps_);
block, {block.getByPosition(0).name}, {sample_block_with_columns_to_add.getByName(column_name)}, maps_);
}
@ -1038,7 +925,7 @@ void Join::joinGet(Block & block, const String & column_name) const
if (key_names_right.size() != 1)
throw Exception("joinGet only supports StorageJoin containing exactly one key", ErrorCodes::LOGICAL_ERROR);
checkTypeOfKey(block, sample_block_with_keys);
checkTypeOfKey(block, right_table_keys);
if (kind == ASTTableJoin::Kind::Left && strictness == ASTTableJoin::Strictness::Any)
{
@ -1049,18 +936,16 @@ void Join::joinGet(Block & block, const String & column_name) const
}
void Join::joinBlock(Block & block, const AnalyzedJoin & join_params) const
void Join::joinBlock(Block & block)
{
const Names & key_names_left = join_params.keyNamesLeft();
const NamesAndTypesList & columns_added_by_join = join_params.columnsAddedByJoin();
std::shared_lock lock(rwlock);
checkTypesOfKeys(block, key_names_left, sample_block_with_keys);
const Names & key_names_left = table_join->keyNamesLeft();
JoinCommon::checkTypesOfKeys(block, key_names_left, right_table_keys, key_names_right);
if (joinDispatch(kind, strictness, maps, [&](auto kind_, auto strictness_, auto & map)
{
joinBlockImpl<kind_, strictness_>(block, key_names_left, columns_added_by_join, sample_block_with_columns_to_add, map);
joinBlockImpl<kind_, strictness_>(block, key_names_left, sample_block_with_columns_to_add, map);
}))
{
/// Joined
@ -1157,11 +1042,12 @@ struct AdderNonJoined<ASTTableJoin::Strictness::Asof, Mapped>
class NonJoinedBlockInputStream : public IBlockInputStream
{
public:
NonJoinedBlockInputStream(const Join & parent_, const Block & left_sample_block, const Names & key_names_left,
const NamesAndTypesList & columns_added_by_join, UInt64 max_block_size_)
NonJoinedBlockInputStream(const Join & parent_, const Block & left_sample_block, UInt64 max_block_size_)
: parent(parent_)
, max_block_size(max_block_size_)
{
const Names & key_names_left = parent_.table_join->keyNamesLeft();
/** left_sample_block contains keys and "left" columns.
* result_sample_block - keys, "left" columns, and "right" columns.
*/
@ -1180,10 +1066,9 @@ public:
const Block & right_sample_block = parent.sample_block_with_columns_to_add;
std::unordered_map<size_t, size_t> left_to_right_key_map;
makeResultSampleBlock(left_sample_block, right_sample_block, columns_added_by_join,
key_positions_left, left_to_right_key_map);
makeResultSampleBlock(left_sample_block, right_sample_block, key_positions_left, left_to_right_key_map);
auto nullability_changes = getNullabilityChanges(parent.sample_block_with_keys, result_sample_block,
auto nullability_changes = getNullabilityChanges(parent.right_table_keys, result_sample_block,
key_positions_left, left_to_right_key_map);
column_indices_left.reserve(left_sample_block.columns() - key_names_left.size());
@ -1249,16 +1134,12 @@ private:
void makeResultSampleBlock(const Block & left_sample_block, const Block & right_sample_block,
const NamesAndTypesList & columns_added_by_join,
const std::vector<size_t> & key_positions_left,
std::unordered_map<size_t, size_t> & left_to_right_key_map)
{
result_sample_block = materializeBlock(left_sample_block);
/// Convert left columns to Nullable if allowed
if (parent.use_nulls)
for (size_t i = 0; i < result_sample_block.columns(); ++i)
convertColumnToNullable(result_sample_block.getByPosition(i));
if (parent.nullable_left_side)
JoinCommon::convertColumnsToNullable(result_sample_block);
/// Add columns from the right-side table to the block.
for (size_t i = 0; i < right_sample_block.columns(); ++i)
@ -1268,23 +1149,19 @@ private:
result_sample_block.insert(src_column.cloneEmpty());
}
const auto & key_names_right = parent.key_names_right;
auto right_keys = requiredRightKeys(key_names_right, columns_added_by_join);
/// Add join key columns from right block if they has different name.
for (size_t i = 0; i < key_names_right.size(); ++i)
for (size_t i = 0; i < parent.right_table_keys.columns(); ++i)
{
auto & right_name = key_names_right[i];
const auto & right_key = parent.right_table_keys.getByPosition(i);
size_t left_key_pos = key_positions_left[i];
auto it = right_keys.find(right_name);
if (it != right_keys.end() && !result_sample_block.has(right_name))
if (parent.required_right_keys.count(right_key.name) && !result_sample_block.has(right_key.name))
{
const auto & col = result_sample_block.getByPosition(left_key_pos);
bool is_nullable = (parent.use_nulls && isFull(parent.kind)) || it->second->isNullable();
result_sample_block.insert(correctNullability({col.column, col.type, right_name}, is_nullable));
bool is_nullable = (parent.nullable_right_side && isFull(parent.kind)) || right_key.type->isNullable();
result_sample_block.insert(correctNullability({col.column, col.type, right_key.name}, is_nullable));
size_t right_key_pos = result_sample_block.getPositionByName(right_name);
size_t right_key_pos = result_sample_block.getPositionByName(right_key.name);
left_to_right_key_map[left_key_pos] = right_key_pos;
}
}
@ -1418,7 +1295,7 @@ private:
}
}
static std::unordered_set<size_t> getNullabilityChanges(const Block & sample_block_with_keys, const Block & out_block,
static std::unordered_set<size_t> getNullabilityChanges(const Block & right_table_keys, const Block & out_block,
const std::vector<size_t> & key_positions,
const std::unordered_map<size_t, size_t> & left_to_right_key_map)
{
@ -1433,7 +1310,7 @@ private:
key_pos = it->second;
const auto & dst = out_block.getByPosition(key_pos).column;
const auto & src = sample_block_with_keys.getByPosition(i).column;
const auto & src = right_table_keys.getByPosition(i).column;
if (dst->isNullable() != src->isNullable())
nullability_changes.insert(key_pos);
}
@ -1461,12 +1338,11 @@ private:
};
BlockInputStreamPtr Join::createStreamWithNonJoinedRows(const Block & left_sample_block, const AnalyzedJoin & join_params,
UInt64 max_block_size) const
BlockInputStreamPtr Join::createStreamWithNonJoinedRows(const Block & left_sample_block, UInt64 max_block_size) const
{
return std::make_shared<NonJoinedBlockInputStream>(*this, left_sample_block,
join_params.keyNamesLeft(), join_params.columnsAddedByJoin(), max_block_size);
if (isRightOrFull(table_join->kind()))
return std::make_shared<NonJoinedBlockInputStream>(*this, left_sample_block, max_block_size);
return {};
}
}

View File

@ -7,6 +7,7 @@
#include <Parsers/ASTTablesInSelectQuery.h>
#include <Interpreters/IJoin.h>
#include <Interpreters/AggregationCommon.h>
#include <Interpreters/RowRefs.h>
#include <Core/SettingsCommon.h>
@ -120,30 +121,22 @@ using MappedAsof = WithFlags<AsofRowRefs, false>;
* If it is true, we always generate Nullable column and substitute NULLs for non-joined rows,
* as in standard SQL.
*/
class Join
class Join : public IJoin
{
public:
Join(const Names & key_names_right_, bool use_nulls_, const SizeLimits & limits_,
ASTTableJoin::Kind kind_, ASTTableJoin::Strictness strictness_, bool any_take_last_row_ = false);
Join(std::shared_ptr<AnalyzedJoin> table_join_, const Block & right_sample_block, bool any_take_last_row_ = false);
bool empty() { return type == Type::EMPTY; }
bool isNullUsedAsDefault() const { return use_nulls; }
/** Set information about structure of right hand of JOIN (joined data).
* You must call this method before subsequent calls to insertFromBlock.
*/
void setSampleBlock(const Block & block);
/** Add block of data from right hand of JOIN to the map.
* Returns false, if some limit was exceeded and you should not insert more data.
*/
bool insertFromBlock(const Block & block);
bool addJoinedBlock(const Block & block) override;
/** Join data from the map (that was previously built by calls to insertFromBlock) to the block with data from "left" table.
/** Join data from the map (that was previously built by calls to addJoinedBlock) to the block with data from "left" table.
* Could be called from different threads in parallel.
*/
void joinBlock(Block & block, const AnalyzedJoin & join_params) const;
void joinBlock(Block & block) override;
/// Infer the return type for joinGet function
DataTypePtr joinGetReturnType(const String & column_name) const;
@ -153,21 +146,20 @@ public:
/** Keep "totals" (separate part of dataset, see WITH TOTALS) to use later.
*/
void setTotals(const Block & block) { totals = block; }
bool hasTotals() const { return totals; }
void setTotals(const Block & block) override { totals = block; }
bool hasTotals() const override { return totals; }
void joinTotals(Block & block) const;
void joinTotals(Block & block) const override;
/** For RIGHT and FULL JOINs.
* A stream that will contain default values from left table, joined with rows from right table, that was not joined before.
* Use only after all calls to joinBlock was done.
* left_sample_block is passed without account of 'use_nulls' setting (columns will be converted to Nullable inside).
*/
BlockInputStreamPtr createStreamWithNonJoinedRows(const Block & left_sample_block, const AnalyzedJoin & join_params,
UInt64 max_block_size) const;
BlockInputStreamPtr createStreamWithNonJoinedRows(const Block & left_sample_block, UInt64 max_block_size) const override;
/// Number of keys in all built JOIN maps.
size_t getTotalRowCount() const;
size_t getTotalRowCount() const override;
/// Sum size in bytes of all buffers, used for JOIN maps and for all memory pools.
size_t getTotalByteCount() const;
@ -282,14 +274,19 @@ private:
friend class NonJoinedBlockInputStream;
friend class JoinBlockInputStream;
std::shared_ptr<AnalyzedJoin> table_join;
ASTTableJoin::Kind kind;
ASTTableJoin::Strictness strictness;
/// Names of key columns (columns for equi-JOIN) in "right" table (in the order they appear in USING clause).
const Names key_names_right;
/// Names of key columns in right-side table (in the order they appear in ON/USING clause). @note It could contain duplicates.
const Names & key_names_right;
/// Names right-side table keys that are needed in result (would be attached after joined columns).
const NameSet required_right_keys;
/// Substitute NULLs for non-JOINed rows.
bool use_nulls;
/// In case of LEFT and FULL joins, if use_nulls, convert right-side columns to Nullable.
bool nullable_right_side;
/// In case of RIGHT and FULL joins, if use_nulls, convert left-side columns to Nullable.
bool nullable_left_side;
/// Overwrite existing values when encountering the same key again
bool any_take_last_row;
@ -315,17 +312,14 @@ private:
/// Block with columns from the right-side table except key columns.
Block sample_block_with_columns_to_add;
/// Block with key columns in the same order they appear in the right-side table.
Block sample_block_with_keys;
/// Block with key columns in the same order they appear in the right-side table (duplicates appear once).
Block right_table_keys;
/// Block as it would appear in the BlockList
Block blocklist_sample;
Poco::Logger * log;
/// Limits for maximum map size.
SizeLimits limits;
Block totals;
/** Protect state for concurrent use in insertFromBlock and joinBlock.
@ -337,19 +331,19 @@ private:
void init(Type type_);
/** Set information about structure of right hand of JOIN (joined data).
*/
void setSampleBlock(const Block & block);
/** Take an inserted block and discard everything that does not need to be stored
* Example, remove the keys as they come from the LHS block, but do keep the ASOF timestamps
*/
void prepareBlockListStructure(Block & stored_block);
/// Throw an exception if blocks have different types of key columns.
void checkTypesOfKeys(const Block & block_left, const Names & key_names_left, const Block & block_right) const;
template <ASTTableJoin::Kind KIND, ASTTableJoin::Strictness STRICTNESS, typename Maps>
void joinBlockImpl(
Block & block,
const Names & key_names_left,
const NamesAndTypesList & columns_added_by_join,
const Block & block_with_columns_to_add,
const Maps & maps) const;
@ -359,7 +353,4 @@ private:
void joinGetImpl(Block & block, const String & column_name, const Maps & maps) const;
};
using JoinPtr = std::shared_ptr<Join>;
using Joins = std::vector<JoinPtr>;
}

View File

@ -0,0 +1,390 @@
#include <Core/NamesAndTypes.h>
#include <Core/SortCursor.h>
#include <Columns/ColumnNullable.h>
#include <Interpreters/MergeJoin.h>
#include <Interpreters/AnalyzedJoin.h>
#include <Interpreters/sortBlock.h>
#include <Interpreters/join_common.h>
#include <DataStreams/materializeBlock.h>
#include <DataStreams/MergeSortingBlockInputStream.h>
namespace DB
{
namespace ErrorCodes
{
extern const int SET_SIZE_LIMIT_EXCEEDED;
extern const int NOT_IMPLEMENTED;
}
struct MergeJoinEqualRange
{
size_t left_start = 0;
size_t right_start = 0;
size_t left_length = 0;
size_t right_length = 0;
bool empty() const { return !left_length && !right_length; }
};
using Range = MergeJoinEqualRange;
class MergeJoinCursor
{
public:
MergeJoinCursor(const Block & block, const SortDescription & desc_)
: impl(SortCursorImpl(block, desc_))
{}
size_t position() const { return impl.pos; }
size_t end() const { return impl.rows; }
bool atEnd() const { return impl.pos >= impl.rows; }
void nextN(size_t num) { impl.pos += num; }
int compareAt(const MergeJoinCursor & rhs, size_t lhs_pos, size_t rhs_pos) const
{
int res = 0;
for (size_t i = 0; i < impl.sort_columns_size; ++i)
{
res = impl.sort_columns[i]->compareAt(lhs_pos, rhs_pos, *(rhs.impl.sort_columns[i]), 1);
if (res)
break;
}
return res;
}
bool sameNext(size_t lhs_pos) const
{
if (lhs_pos + 1 >= impl.rows)
return false;
for (size_t i = 0; i < impl.sort_columns_size; ++i)
if (impl.sort_columns[i]->compareAt(lhs_pos, lhs_pos + 1, *(impl.sort_columns[i]), 1) != 0)
return false;
return true;
}
size_t getEqualLength()
{
if (atEnd())
return 0;
size_t pos = impl.pos;
while (sameNext(pos))
++pos;
return pos - impl.pos + 1;
}
Range getNextEqualRange(MergeJoinCursor & rhs)
{
while (!atEnd() && !rhs.atEnd())
{
int cmp = compareAt(rhs, impl.pos, rhs.impl.pos);
if (cmp < 0)
impl.next();
if (cmp > 0)
rhs.impl.next();
if (!cmp)
{
Range range{impl.pos, rhs.impl.pos, 0, 0};
range.left_length = getEqualLength();
range.right_length = rhs.getEqualLength();
return range;
}
}
return Range{impl.pos, rhs.impl.pos, 0, 0};
}
private:
SortCursorImpl impl;
};
namespace
{
MutableColumns makeMutableColumns(const Block & block, size_t rows_to_reserve = 0)
{
MutableColumns columns;
columns.reserve(block.columns());
for (const auto & src_column : block)
{
columns.push_back(src_column.column->cloneEmpty());
columns.back()->reserve(rows_to_reserve);
}
return columns;
}
void makeSortAndMerge(const Names & keys, SortDescription & sort, SortDescription & merge)
{
NameSet unique_keys;
for (auto & key_name : keys)
{
merge.emplace_back(SortColumnDescription(key_name, 1, 1));
if (!unique_keys.count(key_name))
{
unique_keys.insert(key_name);
sort.emplace_back(SortColumnDescription(key_name, 1, 1));
}
}
}
void copyLeftRange(const Block & block, MutableColumns & columns, size_t start, size_t rows_to_add)
{
for (size_t i = 0; i < block.columns(); ++i)
{
const auto & src_column = block.getByPosition(i).column;
columns[i]->insertRangeFrom(*src_column, start, rows_to_add);
}
}
void copyRightRange(const Block & right_block, const Block & right_columns_to_add, MutableColumns & columns,
size_t row_position, size_t rows_to_add)
{
for (size_t i = 0; i < right_columns_to_add.columns(); ++i)
{
const auto & src_column = right_block.getByName(right_columns_to_add.getByPosition(i).name).column;
auto & dst_column = columns[i];
auto * dst_nullable = typeid_cast<ColumnNullable *>(dst_column.get());
if (dst_nullable && !isColumnNullable(*src_column))
dst_nullable->insertRangeFromNotNullable(*src_column, row_position, rows_to_add);
else
dst_column->insertRangeFrom(*src_column, row_position, rows_to_add);
}
}
void joinEqualsAnyLeft(const Block & right_block, const Block & right_columns_to_add, MutableColumns & right_columns, const Range & range)
{
copyRightRange(right_block, right_columns_to_add, right_columns, range.right_start, range.left_length);
}
void joinEquals(const Block & left_block, const Block & right_block, const Block & right_columns_to_add,
MutableColumns & left_columns, MutableColumns & right_columns, const Range & range, bool is_all)
{
size_t left_rows_to_add = range.left_length;
size_t right_rows_to_add = is_all ? range.right_length : 1;
size_t row_position = range.right_start;
for (size_t right_row = 0; right_row < right_rows_to_add; ++right_row, ++row_position)
{
copyLeftRange(left_block, left_columns, range.left_start, left_rows_to_add);
copyRightRange(right_block, right_columns_to_add, right_columns, row_position, left_rows_to_add);
}
}
void appendNulls(MutableColumns & right_columns, size_t rows_to_add)
{
for (auto & column : right_columns)
for (size_t i = 0; i < rows_to_add; ++i)
column->insertDefault();
}
void joinInequalsLeft(const Block & left_block, MutableColumns & left_columns, MutableColumns & right_columns,
size_t start, size_t end, bool copy_left)
{
if (end <= start)
return;
size_t rows_to_add = end - start;
if (copy_left)
copyLeftRange(left_block, left_columns, start, rows_to_add);
appendNulls(right_columns, rows_to_add);
}
}
MergeJoin::MergeJoin(std::shared_ptr<AnalyzedJoin> table_join_, const Block & right_sample_block)
: table_join(table_join_)
, nullable_right_side(table_join->forceNullableRight())
, is_all(table_join->strictness() == ASTTableJoin::Strictness::All)
, is_inner(isInner(table_join->kind()))
, is_left(isLeft(table_join->kind()))
{
if (!isLeft(table_join->kind()) && !isInner(table_join->kind()))
throw Exception("Partial merge supported for LEFT and INNER JOINs only", ErrorCodes::NOT_IMPLEMENTED);
JoinCommon::extractKeysForJoin(table_join->keyNamesRight(), right_sample_block, right_table_keys, right_columns_to_add);
const NameSet required_right_keys = table_join->requiredRightKeys();
for (const auto & column : right_table_keys)
if (required_right_keys.count(column.name))
right_columns_to_add.insert(ColumnWithTypeAndName{nullptr, column.type, column.name});
JoinCommon::removeLowCardinalityInplace(right_columns_to_add);
JoinCommon::createMissedColumns(right_columns_to_add);
if (nullable_right_side)
JoinCommon::convertColumnsToNullable(right_columns_to_add);
makeSortAndMerge(table_join->keyNamesLeft(), left_sort_description, left_merge_description);
makeSortAndMerge(table_join->keyNamesRight(), right_sort_description, right_merge_description);
}
void MergeJoin::setTotals(const Block & totals_block)
{
totals = totals_block;
mergeRightBlocks();
}
void MergeJoin::mergeRightBlocks()
{
const size_t max_merged_block_size = 128 * 1024 * 1024;
if (right_blocks.empty())
return;
Blocks unsorted_blocks;
unsorted_blocks.reserve(right_blocks.size());
for (const auto & block : right_blocks)
unsorted_blocks.push_back(block);
/// TODO: there should be no splitted keys by blocks for RIGHT|FULL JOIN
MergeSortingBlocksBlockInputStream stream(unsorted_blocks, right_sort_description, max_merged_block_size);
right_blocks.clear();
while (Block block = stream.read())
right_blocks.push_back(block);
}
bool MergeJoin::addJoinedBlock(const Block & src_block)
{
Block block = materializeBlock(src_block);
JoinCommon::removeLowCardinalityInplace(block);
sortBlock(block, right_sort_description);
std::unique_lock lock(rwlock);
right_blocks.push_back(block);
right_blocks_row_count += block.rows();
right_blocks_bytes += block.bytes();
return table_join->sizeLimits().check(right_blocks_row_count, right_blocks_bytes, "JOIN", ErrorCodes::SET_SIZE_LIMIT_EXCEEDED);
}
void MergeJoin::joinBlock(Block & block)
{
JoinCommon::checkTypesOfKeys(block, table_join->keyNamesLeft(), right_table_keys, table_join->keyNamesRight());
materializeBlockInplace(block);
JoinCommon::removeLowCardinalityInplace(block);
sortBlock(block, left_sort_description);
std::shared_lock lock(rwlock);
size_t rows_to_reserve = is_left ? block.rows() : 0;
MutableColumns left_columns = makeMutableColumns(block, (is_all ? rows_to_reserve : 0));
MutableColumns right_columns = makeMutableColumns(right_columns_to_add, rows_to_reserve);
MergeJoinCursor left_cursor(block, left_merge_description);
size_t left_key_tail = 0;
if (is_left)
{
for (auto it = right_blocks.begin(); it != right_blocks.end(); ++it)
{
if (left_cursor.atEnd())
break;
leftJoin(left_cursor, block, *it, left_columns, right_columns, left_key_tail);
}
left_cursor.nextN(left_key_tail);
joinInequalsLeft(block, left_columns, right_columns, left_cursor.position(), left_cursor.end(), is_all);
//left_cursor.nextN(left_cursor.end() - left_cursor.position());
changeLeftColumns(block, std::move(left_columns));
addRightColumns(block, std::move(right_columns));
}
else if (is_inner)
{
for (auto it = right_blocks.begin(); it != right_blocks.end(); ++it)
{
if (left_cursor.atEnd())
break;
innerJoin(left_cursor, block, *it, left_columns, right_columns, left_key_tail);
}
left_cursor.nextN(left_key_tail);
changeLeftColumns(block, std::move(left_columns));
addRightColumns(block, std::move(right_columns));
}
}
void MergeJoin::leftJoin(MergeJoinCursor & left_cursor, const Block & left_block, const Block & right_block,
MutableColumns & left_columns, MutableColumns & right_columns, size_t & left_key_tail)
{
MergeJoinCursor right_cursor(right_block, right_merge_description);
while (!left_cursor.atEnd() && !right_cursor.atEnd())
{
size_t left_position = left_cursor.position(); /// save inequal position
Range range = left_cursor.getNextEqualRange(right_cursor);
joinInequalsLeft(left_block, left_columns, right_columns, left_position, range.left_start, is_all);
if (range.empty())
break;
if (is_all)
joinEquals(left_block, right_block, right_columns_to_add, left_columns, right_columns, range, is_all);
else
joinEqualsAnyLeft(right_block, right_columns_to_add, right_columns, range);
right_cursor.nextN(range.right_length);
/// Do not run over last left keys for ALL JOIN (cause of possible duplicates in next right block)
if (is_all && right_cursor.atEnd())
{
left_key_tail = range.left_length;
break;
}
left_cursor.nextN(range.left_length);
}
}
void MergeJoin::innerJoin(MergeJoinCursor & left_cursor, const Block & left_block, const Block & right_block,
MutableColumns & left_columns, MutableColumns & right_columns, size_t & left_key_tail)
{
MergeJoinCursor right_cursor(right_block, right_merge_description);
while (!left_cursor.atEnd() && !right_cursor.atEnd())
{
Range range = left_cursor.getNextEqualRange(right_cursor);
if (range.empty())
break;
joinEquals(left_block, right_block, right_columns_to_add, left_columns, right_columns, range, is_all);
right_cursor.nextN(range.right_length);
/// Do not run over last left keys for ALL JOIN (cause of possible duplicates in next right block)
if (is_all && right_cursor.atEnd())
{
left_key_tail = range.left_length;
break;
}
left_cursor.nextN(range.left_length);
}
}
void MergeJoin::changeLeftColumns(Block & block, MutableColumns && columns)
{
if (is_left && !is_all)
return;
block.setColumns(std::move(columns));
}
void MergeJoin::addRightColumns(Block & block, MutableColumns && right_columns)
{
for (size_t i = 0; i < right_columns_to_add.columns(); ++i)
{
const auto & column = right_columns_to_add.getByPosition(i);
block.insert(ColumnWithTypeAndName{std::move(right_columns[i]), column.type, column.name});
}
}
}

View File

@ -0,0 +1,57 @@
#pragma once
#include <memory>
#include <shared_mutex>
#include <Core/Block.h>
#include <Core/SortDescription.h>
#include <Interpreters/IJoin.h>
namespace DB
{
class AnalyzedJoin;
class MergeJoinCursor;
struct MergeJoinEqualRange;
class MergeJoin : public IJoin
{
public:
MergeJoin(std::shared_ptr<AnalyzedJoin> table_join_, const Block & right_sample_block);
bool addJoinedBlock(const Block & block) override;
void joinBlock(Block &) override;
void joinTotals(Block &) const override {}
void setTotals(const Block &) override;
size_t getTotalRowCount() const override { return right_blocks_row_count; }
private:
mutable std::shared_mutex rwlock;
std::shared_ptr<AnalyzedJoin> table_join;
SortDescription left_sort_description;
SortDescription right_sort_description;
SortDescription left_merge_description;
SortDescription right_merge_description;
Block right_table_keys;
Block right_columns_to_add;
BlocksList right_blocks;
Block totals;
size_t right_blocks_row_count = 0;
size_t right_blocks_bytes = 0;
const bool nullable_right_side;
const bool is_all;
const bool is_inner;
const bool is_left;
void changeLeftColumns(Block & block, MutableColumns && columns);
void addRightColumns(Block & block, MutableColumns && columns);
void mergeRightBlocks();
void leftJoin(MergeJoinCursor & left_cursor, const Block & left_block, const Block & right_block,
MutableColumns & left_columns, MutableColumns & right_columns, size_t & left_key_tail);
void innerJoin(MergeJoinCursor & left_cursor, const Block & left_block, const Block & right_block,
MutableColumns & left_columns, MutableColumns & right_columns, size_t & left_key_tail);
};
}

View File

@ -1,5 +1,7 @@
#include <Interpreters/SubqueryForSet.h>
#include <Interpreters/InterpreterSelectWithUnionQuery.h>
#include <Interpreters/Join.h>
#include <Interpreters/MergeJoin.h>
#include <DataStreams/LazyBlockInputStream.h>
namespace DB
@ -31,4 +33,26 @@ void SubqueryForSet::renameColumns(Block & block)
}
}
void SubqueryForSet::setJoinActions(ExpressionActionsPtr actions)
{
actions->execute(sample_block);
joined_block_actions = actions;
}
bool SubqueryForSet::insertJoinedBlock(Block & block)
{
renameColumns(block);
if (joined_block_actions)
joined_block_actions->execute(block);
return join->addJoinedBlock(block);
}
void SubqueryForSet::setTotals()
{
if (join && source)
join->setTotals(source->getTotals());
}
}

View File

@ -1,6 +1,7 @@
#pragma once
#include <Parsers/IAST.h>
#include <Interpreters/IJoin.h>
#include <Interpreters/PreparedSets.h>
#include <Interpreters/ExpressionActions.h>
@ -8,9 +9,6 @@
namespace DB
{
class Join;
using JoinPtr = std::shared_ptr<Join>;
class InterpreterSelectWithUnionQuery;
@ -25,6 +23,7 @@ struct SubqueryForSet
JoinPtr join;
/// Apply this actions to joined block.
ExpressionActionsPtr joined_block_actions;
Block sample_block; /// source->getHeader() + column renames
/// If set, put the result into the table.
/// This is a temporary table for transferring to remote servers for distributed query processing.
@ -33,12 +32,15 @@ struct SubqueryForSet
void makeSource(std::shared_ptr<InterpreterSelectWithUnionQuery> & interpreter,
NamesWithAliases && joined_block_aliases_);
Block renamedSampleBlock() const { return sample_block; }
void renameColumns(Block & block);
void setJoinActions(ExpressionActionsPtr actions);
bool insertJoinedBlock(Block & block);
void setTotals();
private:
NamesWithAliases joined_block_aliases; /// Rename column from joined block from this list.
Block sample_block; /// source->getHeader() + column renames
void renameColumns(Block & block);
};
/// ID of subquery -> what to do with it.

View File

@ -805,8 +805,7 @@ SyntaxAnalyzerResultPtr SyntaxAnalyzer::analyze(
SyntaxAnalyzerResult result;
result.storage = storage;
result.source_columns = source_columns_;
result.analyzed_join = std::make_shared<AnalyzedJoin>(); /// TODO: move to select_query logic
result.analyzed_join->join_use_nulls = settings.join_use_nulls;
result.analyzed_join = std::make_shared<AnalyzedJoin>(settings); /// TODO: move to select_query logic
collectSourceColumns(select_query, result.storage, result.source_columns);
NameSet source_columns_set = removeDuplicateColumns(result.source_columns);

View File

@ -181,9 +181,17 @@ Field convertFieldToTypeImpl(const Field & src, const IDataType & type, const ID
if (!which_type.isDateOrDateTime() && !which_type.isUUID() && !which_type.isEnum())
throw Exception{"Logical error: unknown numeric type " + type.getName(), ErrorCodes::LOGICAL_ERROR};
/// Numeric values for Enums should not be used directly in IN section
if (src.getType() == Field::Types::UInt64 && !which_type.isEnum())
if (which_type.isEnum() && (src.getType() == Field::Types::UInt64 || src.getType() == Field::Types::Int64))
{
/// Convert UInt64 or Int64 to Enum's value
return dynamic_cast<const IDataTypeEnum &>(type).castToValue(src);
}
if (which_type.isDateOrDateTime() && src.getType() == Field::Types::UInt64)
{
/// We don't need any conversion UInt64 is under type of Date and DateTime
return src;
}
if (src.getType() == Field::Types::String)
{

View File

@ -0,0 +1,126 @@
#include <Interpreters/join_common.h>
#include <Columns/ColumnNullable.h>
#include <DataTypes/DataTypeNullable.h>
#include <DataTypes/DataTypeLowCardinality.h>
#include <DataStreams/materializeBlock.h>
namespace DB
{
namespace ErrorCodes
{
extern const int TYPE_MISMATCH;
}
namespace JoinCommon
{
void convertColumnToNullable(ColumnWithTypeAndName & column)
{
if (column.type->isNullable() || !column.type->canBeInsideNullable())
return;
column.type = makeNullable(column.type);
if (column.column)
column.column = makeNullable(column.column);
}
void convertColumnsToNullable(Block & block, size_t starting_pos)
{
for (size_t i = starting_pos; i < block.columns(); ++i)
convertColumnToNullable(block.getByPosition(i));
}
ColumnRawPtrs temporaryMaterializeColumns(const Block & block, const Names & names, Columns & materialized)
{
ColumnRawPtrs ptrs;
ptrs.reserve(names.size());
materialized.reserve(names.size());
for (auto & column_name : names)
{
const auto & src_column = block.getByName(column_name).column;
materialized.emplace_back(recursiveRemoveLowCardinality(src_column->convertToFullColumnIfConst()));
ptrs.push_back(materialized.back().get());
}
return ptrs;
}
void removeLowCardinalityInplace(Block & block)
{
for (size_t i = 0; i < block.columns(); ++i)
{
auto & col = block.getByPosition(i);
col.column = recursiveRemoveLowCardinality(col.column);
col.type = recursiveRemoveLowCardinality(col.type);
}
}
ColumnRawPtrs extractKeysForJoin(const Names & key_names_right, const Block & right_sample_block,
Block & sample_block_with_keys, Block & sample_block_with_columns_to_add)
{
size_t keys_size = key_names_right.size();
ColumnRawPtrs key_columns(keys_size);
sample_block_with_columns_to_add = materializeBlock(right_sample_block);
for (size_t i = 0; i < keys_size; ++i)
{
const String & column_name = key_names_right[i];
/// there could be the same key names
if (sample_block_with_keys.has(column_name))
{
key_columns[i] = sample_block_with_keys.getByName(column_name).column.get();
continue;
}
auto & col = sample_block_with_columns_to_add.getByName(column_name);
col.column = recursiveRemoveLowCardinality(col.column);
col.type = recursiveRemoveLowCardinality(col.type);
/// Extract right keys with correct keys order.
sample_block_with_keys.insert(col);
sample_block_with_columns_to_add.erase(column_name);
key_columns[i] = sample_block_with_keys.getColumns().back().get();
/// We will join only keys, where all components are not NULL.
if (auto * nullable = checkAndGetColumn<ColumnNullable>(*key_columns[i]))
key_columns[i] = &nullable->getNestedColumn();
}
return key_columns;
}
void checkTypesOfKeys(const Block & block_left, const Names & key_names_left, const Block & block_right, const Names & key_names_right)
{
size_t keys_size = key_names_left.size();
for (size_t i = 0; i < keys_size; ++i)
{
DataTypePtr left_type = removeNullable(recursiveRemoveLowCardinality(block_left.getByName(key_names_left[i]).type));
DataTypePtr right_type = removeNullable(recursiveRemoveLowCardinality(block_right.getByName(key_names_right[i]).type));
if (!left_type->equals(*right_type))
throw Exception("Type mismatch of columns to JOIN by: "
+ key_names_left[i] + " " + left_type->getName() + " at left, "
+ key_names_right[i] + " " + right_type->getName() + " at right",
ErrorCodes::TYPE_MISMATCH);
}
}
void createMissedColumns(Block & block)
{
for (size_t i = 0; i < block.columns(); ++i)
{
auto & column = block.getByPosition(i);
if (!column.column)
column.column = column.type->createColumn();
}
}
}
}

View File

@ -0,0 +1,32 @@
#pragma once
#include <Interpreters/IJoin.h>
namespace DB
{
struct ColumnWithTypeAndName;
class Block;
class IColumn;
using ColumnRawPtrs = std::vector<const IColumn *>;
namespace JoinCommon
{
void convertColumnToNullable(ColumnWithTypeAndName & column);
void convertColumnsToNullable(Block & block, size_t starting_pos = 0);
ColumnRawPtrs temporaryMaterializeColumns(const Block & block, const Names & names, Columns & materialized);
void removeLowCardinalityInplace(Block & block);
/// Split key and other columns by keys name list
ColumnRawPtrs extractKeysForJoin(const Names & key_names_right, const Block & right_sample_block,
Block & sample_block_with_keys, Block & sample_block_with_columns_to_add);
/// Throw an exception if blocks have different types of key columns. Compare up to Nullability.
void checkTypesOfKeys(const Block & block_left, const Names & key_names_left, const Block & block_right, const Names & key_names_right);
void createMissedColumns(Block & block);
}
}

View File

@ -11,11 +11,11 @@ add_executable (aggregate aggregate.cpp)
target_link_libraries (aggregate PRIVATE dbms)
add_executable (hash_map hash_map.cpp)
target_include_directories (hash_map SYSTEM BEFORE PRIVATE ${SPARCEHASH_INCLUDE_DIR})
target_include_directories (hash_map SYSTEM BEFORE PRIVATE ${SPARSEHASH_INCLUDE_DIR})
target_link_libraries (hash_map PRIVATE dbms)
add_executable (hash_map_lookup hash_map_lookup.cpp)
target_include_directories (hash_map_lookup SYSTEM BEFORE PRIVATE ${SPARCEHASH_INCLUDE_DIR})
target_include_directories (hash_map_lookup SYSTEM BEFORE PRIVATE ${SPARSEHASH_INCLUDE_DIR})
target_link_libraries (hash_map_lookup PRIVATE dbms)
add_executable (hash_map3 hash_map3.cpp)
@ -23,7 +23,7 @@ target_include_directories(hash_map3 SYSTEM BEFORE PRIVATE ${METROHASH_INCLUDE_D
target_link_libraries (hash_map3 PRIVATE dbms ${FARMHASH_LIBRARIES} ${METROHASH_LIBRARIES})
add_executable (hash_map_string hash_map_string.cpp)
target_include_directories (hash_map_string SYSTEM BEFORE PRIVATE ${SPARCEHASH_INCLUDE_DIR})
target_include_directories (hash_map_string SYSTEM BEFORE PRIVATE ${SPARSEHASH_INCLUDE_DIR})
target_link_libraries (hash_map_string PRIVATE dbms)
add_executable (hash_map_string_2 hash_map_string_2.cpp)
@ -34,11 +34,11 @@ target_include_directories(hash_map_string_3 SYSTEM BEFORE PRIVATE ${METROHASH_I
target_link_libraries (hash_map_string_3 PRIVATE dbms ${FARMHASH_LIBRARIES} ${METROHASH_LIBRARIES})
add_executable (hash_map_string_small hash_map_string_small.cpp)
target_include_directories (hash_map_string_small SYSTEM BEFORE PRIVATE ${SPARCEHASH_INCLUDE_DIR})
target_include_directories (hash_map_string_small SYSTEM BEFORE PRIVATE ${SPARSEHASH_INCLUDE_DIR})
target_link_libraries (hash_map_string_small PRIVATE dbms)
add_executable (two_level_hash_map two_level_hash_map.cpp)
target_include_directories (two_level_hash_map SYSTEM BEFORE PRIVATE ${SPARCEHASH_INCLUDE_DIR})
target_include_directories (two_level_hash_map SYSTEM BEFORE PRIVATE ${SPARSEHASH_INCLUDE_DIR})
target_link_libraries (two_level_hash_map PRIVATE dbms)
add_executable (logical_expressions_optimizer logical_expressions_optimizer.cpp)

View File

@ -267,8 +267,8 @@ int main(int argc, char ** argv)
{
Stopwatch watch;
GOOGLE_NAMESPACE::dense_hash_map<Key, Value, DefaultHash<Key>> map;
GOOGLE_NAMESPACE::dense_hash_map<Key, Value, DefaultHash<Key>>::iterator it;
::google::dense_hash_map<Key, Value, DefaultHash<Key>> map;
::google::dense_hash_map<Key, Value, DefaultHash<Key>>::iterator it;
map.set_empty_key(-1ULL);
for (size_t i = 0; i < n; ++i)
{
@ -288,8 +288,8 @@ int main(int argc, char ** argv)
{
Stopwatch watch;
GOOGLE_NAMESPACE::sparse_hash_map<Key, Value, DefaultHash<Key>> map;
GOOGLE_NAMESPACE::sparse_hash_map<Key, Value, DefaultHash<Key>>::iterator it;
::google::sparse_hash_map<Key, Value, DefaultHash<Key>> map;
::google::sparse_hash_map<Key, Value, DefaultHash<Key>>::iterator it;
for (size_t i = 0; i < n; ++i)
{
map.insert(std::make_pair(data[i], value));

View File

@ -13,7 +13,7 @@ ASTPtr ASTShowTablesQuery::clone() const
return res;
}
void ASTShowTablesQuery::formatQueryImpl(const FormatSettings & settings, FormatState &, FormatStateStacked) const
void ASTShowTablesQuery::formatQueryImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const
{
if (databases)
{
@ -30,6 +30,12 @@ void ASTShowTablesQuery::formatQueryImpl(const FormatSettings & settings, Format
if (!like.empty())
settings.ostr << (settings.hilite ? hilite_keyword : "") << " LIKE " << (settings.hilite ? hilite_none : "")
<< std::quoted(like, '\'');
if (limit_length)
{
settings.ostr << (settings.hilite ? hilite_keyword : "") << " LIMIT " << (settings.hilite ? hilite_none : "");
limit_length->formatImpl(settings, state, frame);
}
}
}

View File

@ -19,6 +19,7 @@ public:
String from;
String like;
bool not_like{false};
ASTPtr limit_length;
/** Get the text that identifies this element. */
String getID(char) const override { return "ShowTables"; }

View File

@ -107,6 +107,9 @@ struct ASTTableJoin : public IAST
void formatImpl(const FormatSettings & settings, FormatState & state, FormatStateStacked frame) const override;
};
inline bool isLeft(ASTTableJoin::Kind kind) { return kind == ASTTableJoin::Kind::Left; }
inline bool isRight(ASTTableJoin::Kind kind) { return kind == ASTTableJoin::Kind::Right; }
inline bool isInner(ASTTableJoin::Kind kind) { return kind == ASTTableJoin::Kind::Inner; }
inline bool isFull(ASTTableJoin::Kind kind) { return kind == ASTTableJoin::Kind::Full; }
inline bool isCross(ASTTableJoin::Kind kind) { return kind == ASTTableJoin::Kind::Cross; }
inline bool isComma(ASTTableJoin::Kind kind) { return kind == ASTTableJoin::Kind::Comma; }

View File

@ -5,6 +5,7 @@
#include <Parsers/CommonParsers.h>
#include <Parsers/ParserShowTablesQuery.h>
#include <Parsers/ExpressionElementParsers.h>
#include <Parsers/ExpressionListParsers.h>
#include <Common/typeid_cast.h>
@ -22,8 +23,10 @@ bool ParserShowTablesQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expec
ParserKeyword s_from("FROM");
ParserKeyword s_not("NOT");
ParserKeyword s_like("LIKE");
ParserKeyword s_limit("LIMIT");
ParserStringLiteral like_p;
ParserIdentifier name_p;
ParserExpressionWithOptionalAlias limit_p(false);
ASTPtr like;
ASTPtr database;
@ -60,6 +63,12 @@ bool ParserShowTablesQuery::parseImpl(Pos & pos, ASTPtr & node, Expected & expec
}
else if (query->not_like)
return false;
if (s_limit.ignore(pos, expected))
{
if (!limit_p.parse(pos, query->limit_length, expected))
return false;
}
}
else
return false;

View File

@ -7,14 +7,14 @@ namespace DB
{
/** Query like this:
* SHOW TABLES [FROM db] [[NOT] LIKE 'str']
* SHOW TABLES [FROM db] [[NOT] LIKE 'str'] [LIMIT expr]
* or
* SHOW DATABASES.
*/
class ParserShowTablesQuery : public IParserBase
{
protected:
const char * getName() const { return "SHOW [TEMPORARY] TABLES|DATABASES [[NOT] LIKE 'str']"; }
const char * getName() const { return "SHOW [TEMPORARY] TABLES|DATABASES [[NOT] LIKE 'str'] [LIMIT expr]"; }
bool parseImpl(Pos & pos, ASTPtr & node, Expected & expected);
};

View File

@ -82,8 +82,7 @@ void CreatingSetsTransform::finishSubquery(SubqueryForSet & subquery)
head_rows = profile_info.rows;
if (subquery.join)
subquery.join->setTotals(subquery.source->getTotals());
subquery.setTotals();
if (head_rows != 0)
{
@ -175,12 +174,7 @@ void CreatingSetsTransform::work()
if (!done_with_join)
{
subquery.renameColumns(block);
if (subquery.joined_block_actions)
subquery.joined_block_actions->execute(block);
if (!subquery.join->insertFromBlock(block))
if (!subquery.insertJoinedBlock(block))
done_with_join = true;
}

View File

@ -120,8 +120,8 @@ Block IStorage::getSampleBlockForColumns(const Names & column_names) const
namespace
{
using NamesAndTypesMap = GOOGLE_NAMESPACE::dense_hash_map<StringRef, const IDataType *, StringRefHash>;
using UniqueStrings = GOOGLE_NAMESPACE::dense_hash_set<StringRef, StringRefHash>;
using NamesAndTypesMap = ::google::dense_hash_map<StringRef, const IDataType *, StringRefHash>;
using UniqueStrings = ::google::dense_hash_set<StringRef, StringRefHash>;
String listOfColumns(const NamesAndTypesList & available_columns)
{

View File

@ -8,6 +8,7 @@
#include <DataStreams/IBlockInputStream.h>
#include <DataTypes/NestedUtils.h>
#include <Interpreters/joinDispatch.h>
#include <Interpreters/AnalyzedJoin.h>
#include <Common/assert_cast.h>
#include <Poco/String.h> /// toLower
@ -49,8 +50,8 @@ StorageJoin::StorageJoin(
if (!getColumns().hasPhysical(key))
throw Exception{"Key column (" + key + ") does not exist in table declaration.", ErrorCodes::NO_SUCH_COLUMN_IN_TABLE};
join = std::make_shared<Join>(key_names, use_nulls, limits, kind, strictness, overwrite);
join->setSampleBlock(getSampleBlock().sortColumns());
table_join = std::make_shared<AnalyzedJoin>(limits, use_nulls, kind, strictness, key_names);
join = std::make_shared<Join>(table_join, getSampleBlock().sortColumns(), overwrite);
restore();
}
@ -62,8 +63,7 @@ void StorageJoin::truncate(const ASTPtr &, const Context &, TableStructureWriteL
Poco::File(path + "tmp/").createDirectories();
increment = 0;
join = std::make_shared<Join>(key_names, use_nulls, limits, kind, strictness);
join->setSampleBlock(getSampleBlock().sortColumns());
join = std::make_shared<Join>(table_join, getSampleBlock().sortColumns());
}
@ -75,7 +75,7 @@ void StorageJoin::assertCompatible(ASTTableJoin::Kind kind_, ASTTableJoin::Stric
}
void StorageJoin::insertBlock(const Block & block) { join->insertFromBlock(block); }
void StorageJoin::insertBlock(const Block & block) { join->addJoinedBlock(block); }
size_t StorageJoin::getSize() const { return join->getTotalRowCount(); }
@ -209,10 +209,10 @@ public:
for (size_t i = 0; i < sample_block.columns(); ++i)
{
auto & [_, type, name] = sample_block.getByPosition(i);
if (parent.sample_block_with_keys.has(name))
if (parent.right_table_keys.has(name))
{
key_pos = i;
column_with_null[i] = parent.sample_block_with_keys.getByName(name).type->isNullable();
column_with_null[i] = parent.right_table_keys.getByName(name).type->isNullable();
}
else
{

View File

@ -9,8 +9,9 @@
namespace DB
{
class AnalyzedJoin;
class Join;
using JoinPtr = std::shared_ptr<Join>;
using HashJoinPtr = std::shared_ptr<Join>;
/** Allows you save the state for later use on the right side of the JOIN.
@ -29,7 +30,7 @@ public:
void truncate(const ASTPtr &, const Context &, TableStructureWriteLockHolder &) override;
/// Access the innards.
JoinPtr & getJoin() { return join; }
HashJoinPtr & getJoin() { return join; }
/// Verify that the data structure is suitable for implementing this type of JOIN.
void assertCompatible(ASTTableJoin::Kind kind_, ASTTableJoin::Strictness strictness_) const;
@ -50,7 +51,8 @@ private:
ASTTableJoin::Kind kind; /// LEFT | INNER ...
ASTTableJoin::Strictness strictness; /// ANY | ALL
JoinPtr join;
std::shared_ptr<AnalyzedJoin> table_join;
HashJoinPtr join;
void insertBlock(const Block & block) override;
size_t getSize() const override;

View File

@ -2,11 +2,15 @@
set -x
# doesn't actually cd to directory, but return absolute path
CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# cd to directory
cd $CUR_DIR
CONTRIBUTORS_FILE=${CONTRIBUTORS_FILE=$CUR_DIR/StorageSystemContributors.generated.cpp}
git shortlog --summary | perl -lnE 's/^\s+\d+\s+(.+)/ "$1",/; next unless $1; say $_' > $CONTRIBUTORS_FILE.tmp
# if you don't specify HEAD here, without terminal `git shortlog` would expect input from stdin
git shortlog HEAD --summary | perl -lnE 's/^\s+\d+\s+(.+)/ "$1",/; next unless $1; say $_' > $CONTRIBUTORS_FILE.tmp
# If git history not available - dont make target file
if [ ! -s $CONTRIBUTORS_FILE.tmp ]; then

View File

@ -3,6 +3,7 @@
#include <Core/Block.h>
#include <Storages/StorageValues.h>
#include <DataTypes/DataTypeTuple.h>
#include <Parsers/ASTExpressionList.h>
#include <Parsers/ASTLiteral.h>
@ -42,6 +43,7 @@ static void parseAndInsertValues(MutableColumns & res_columns, const ASTs & args
for (size_t i = 1; i < args.size(); ++i)
{
const auto & [value_field, value_type_ptr] = evaluateConstantExpression(args[i], context);
const DataTypes & value_types_tuple = typeid_cast<const DataTypeTuple *>(value_type_ptr.get())->getElements();
const TupleBackend & value_tuple = value_field.safeGet<Tuple>().toUnderType();
if (value_tuple.size() != sample_block.columns())
@ -49,7 +51,7 @@ static void parseAndInsertValues(MutableColumns & res_columns, const ASTs & args
for (size_t j = 0; j < value_tuple.size(); ++j)
{
Field value = convertFieldToType(value_tuple[j], *sample_block.getByPosition(j).type, value_type_ptr.get());
Field value = convertFieldToType(value_tuple[j], *sample_block.getByPosition(j).type, value_types_tuple[j].get());
res_columns[j]->insert(value);
}
}

View File

@ -0,0 +1,52 @@
#!/usr/bin/env bash
CURDIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
. $CURDIR/../shell_config.sh
. $CURDIR/mergetree_mutations.lib
${CLICKHOUSE_CLIENT} -n --query="
DROP TABLE IF EXISTS fetches_r1;
DROP TABLE IF EXISTS fetches_r2"
${CLICKHOUSE_CLIENT} --query="CREATE TABLE fetches_r1(x UInt32) ENGINE ReplicatedMergeTree('/clickhouse/tables/test/fetches', 'r1') ORDER BY x"
${CLICKHOUSE_CLIENT} --query="CREATE TABLE fetches_r2(x UInt32) ENGINE ReplicatedMergeTree('/clickhouse/tables/test/fetches', 'r2') ORDER BY x \
SETTINGS prefer_fetch_merged_part_time_threshold=0, \
prefer_fetch_merged_part_size_threshold=0"
${CLICKHOUSE_CLIENT} -n --query="
INSERT INTO fetches_r1 VALUES (1);
INSERT INTO fetches_r1 VALUES (2);
INSERT INTO fetches_r1 VALUES (3)"
${CLICKHOUSE_CLIENT} --query="SYSTEM SYNC REPLICA fetches_r2"
${CLICKHOUSE_CLIENT} --query="DETACH TABLE fetches_r2"
${CLICKHOUSE_CLIENT} --query="OPTIMIZE TABLE fetches_r1 PARTITION tuple() FINAL" --replication_alter_partitions_sync=0
${CLICKHOUSE_CLIENT} --query="SYSTEM SYNC REPLICA fetches_r1"
# After attach replica r2 should fetch the merged part from r1.
${CLICKHOUSE_CLIENT} --query="ATTACH TABLE fetches_r2"
${CLICKHOUSE_CLIENT} --query="SYSTEM SYNC REPLICA fetches_r2"
${CLICKHOUSE_CLIENT} --query="SELECT '*** Check data after fetch of merged part ***'"
${CLICKHOUSE_CLIENT} --query="SELECT _part, * FROM fetches_r2 ORDER BY x"
${CLICKHOUSE_CLIENT} --query="DETACH TABLE fetches_r2"
# Add mutation that doesn't change data.
${CLICKHOUSE_CLIENT} --query="ALTER TABLE fetches_r1 DELETE WHERE x = 0" --replication_alter_partitions_sync=0
wait_for_mutation "fetches_r1" "0000000000"
${CLICKHOUSE_CLIENT} --query="SYSTEM SYNC REPLICA fetches_r1"
# After attach replica r2 should compare checksums for mutated part and clone the local part.
${CLICKHOUSE_CLIENT} --query="ATTACH TABLE fetches_r2"
${CLICKHOUSE_CLIENT} --query="SYSTEM SYNC REPLICA fetches_r2"
${CLICKHOUSE_CLIENT} --query="SELECT '*** Check data after fetch/clone of mutated part ***'"
${CLICKHOUSE_CLIENT} --query="SELECT _part, * FROM fetches_r2 ORDER BY x"
${CLICKHOUSE_CLIENT} -n --query="
DROP TABLE fetches_r1;
DROP TABLE fetches_r2"

View File

@ -1,42 +0,0 @@
DROP TABLE IF EXISTS fetches_r1;
DROP TABLE IF EXISTS fetches_r2;
CREATE TABLE fetches_r1(x UInt32) ENGINE ReplicatedMergeTree('/clickhouse/tables/test/fetches', 'r1') ORDER BY x;
CREATE TABLE fetches_r2(x UInt32) ENGINE ReplicatedMergeTree('/clickhouse/tables/test/fetches', 'r2') ORDER BY x
SETTINGS prefer_fetch_merged_part_time_threshold=0,
prefer_fetch_merged_part_size_threshold=0;
INSERT INTO fetches_r1 VALUES (1);
INSERT INTO fetches_r1 VALUES (2);
INSERT INTO fetches_r1 VALUES (3);
SYSTEM SYNC REPLICA fetches_r2;
DETACH TABLE fetches_r2;
SET replication_alter_partitions_sync=0;
OPTIMIZE TABLE fetches_r1 PARTITION tuple() FINAL;
SYSTEM SYNC REPLICA fetches_r1;
-- After attach replica r2 should fetch the merged part from r1.
ATTACH TABLE fetches_r2;
SYSTEM SYNC REPLICA fetches_r2;
SELECT '*** Check data after fetch of merged part ***';
SELECT _part, * FROM fetches_r2 ORDER BY x;
DETACH TABLE fetches_r2;
-- Add mutation that doesn't change data.
ALTER TABLE fetches_r1 DELETE WHERE x = 0;
SYSTEM SYNC REPLICA fetches_r1;
-- After attach replica r2 should compare checksums for mutated part and clone the local part.
ATTACH TABLE fetches_r2;
SYSTEM SYNC REPLICA fetches_r2;
SELECT '*** Check data after fetch/clone of mutated part ***';
SELECT _part, * FROM fetches_r2 ORDER BY x;
DROP TABLE fetches_r1;
DROP TABLE fetches_r2;

View File

@ -67,3 +67,13 @@
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33]
[30,31,32,33,100]
[100]
4294967295
4294967295
4294967295
1
0
0
0
0
9
500

View File

@ -211,3 +211,27 @@ select bitmapToArray(bitmapSubsetInRange(bitmapBuild([
select bitmapToArray(bitmapSubsetInRange(bitmapBuild([
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,
100,200,500]), toUInt32(100), toUInt32(200)));
-- bitmapMin:
---- Empty
SELECT bitmapMin(bitmapBuild(emptyArrayUInt8()));
SELECT bitmapMin(bitmapBuild(emptyArrayUInt16()));
SELECT bitmapMin(bitmapBuild(emptyArrayUInt32()));
---- Small
select bitmapMin(bitmapBuild([1,5,7,9]));
---- Large
select bitmapMin(bitmapBuild([
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,
100,200,500]));
-- bitmapMax:
---- Empty
SELECT bitmapMax(bitmapBuild(emptyArrayUInt8()));
SELECT bitmapMax(bitmapBuild(emptyArrayUInt16()));
SELECT bitmapMax(bitmapBuild(emptyArrayUInt32()));
---- Small
select bitmapMax(bitmapBuild([1,5,7,9]));
---- Large
select bitmapMax(bitmapBuild([
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,
100,200,500]));

View File

@ -0,0 +1,6 @@
find me
and me
also me
find me
and me
also me

View File

@ -0,0 +1,5 @@
DROP TABLE IF EXISTS enums;
CREATE TABLE enums AS VALUES('x Enum8(\'hello\' = 0, \'world\' = 1, \'foo\' = -1), y String', ('hello', 'find me'), (0, 'and me'), (-1, 'also me'), ('world', 'don\'t find me'));
SELECT y FROM enums WHERE x IN (0, -1);
SELECT y FROM enums WHERE x IN ('hello', -1);
DROP TABLE enums;

View File

@ -0,0 +1,218 @@
t join none using
0 0 0
-
0 0 0
-
-
t join none on
0 0 0 0
-
0 0 0 0
-
-
none join t using
none join t on
/none
t join none using
0 0 \N
-
0 0 \N
-
-
t join none on
0 0 \N \N
-
0 0 \N \N
-
-
none join t using
none join t on
/none
any left
0 0 0
1 10 0
2 20 2
3 30 0
4 40 4
-
0 0 0
1 10 0
2 20 0
3 30 0
4 40 0
-
0 0 0
1 10 0
2 20 2
3 30 0
4 40 4
-
0 0 0
1 10 0
2 20 0
3 30 0
4 40 0
all left
0 0 0 0
1 10 0 0
2 20 2 21
2 20 2 22
3 30 0 0
4 40 4 41
4 40 4 42
-
0 0 0 0
1 10 0 0
2 20 0 0
3 30 0 0
4 40 0 0
-
0 0 0 0
1 10 0 0
2 20 0 0
3 30 0 0
4 40 0 0
-
0 0 0 0
1 10 0 0
2 20 2 21
2 20 2 22
3 30 0 0
4 40 4 41
4 40 4 42
-
0 0 0 0
1 10 0 0
2 20 2 21
2 20 2 22
3 30 0 0
4 40 4 41
4 40 4 42
any inner
0 0 0
2 20 2
4 40 4
-
0 0 0
-
0 0 0
2 20 2
4 40 4
-
0 0 0
all inner
0 0 0 0
2 20 2 21
2 20 2 22
4 40 4 41
4 40 4 42
-
0 0 0 0
-
0 0 0 0
-
0 0 0 0
2 20 2 21
2 20 2 22
4 40 4 41
4 40 4 42
-
0 0 0 0
2 20 2 21
2 20 2 22
4 40 4 41
4 40 4 42
any left
0 0 0
1 10 \N
2 20 2
3 30 \N
4 40 4
-
0 0 0
1 10 \N
2 20 \N
3 30 \N
4 40 \N
-
0 0 0
1 10 \N
2 20 2
3 30 \N
4 40 4
-
0 0 0
1 10 \N
2 20 \N
3 30 \N
4 40 \N
all left
0 0 0 0
1 10 \N \N
2 20 2 21
2 20 2 22
3 30 \N \N
4 40 4 41
4 40 4 42
-
0 0 0 0
1 10 \N \N
2 20 \N \N
3 30 \N \N
4 40 \N \N
-
0 0 0 0
1 10 \N \N
2 20 \N \N
3 30 \N \N
4 40 \N \N
-
0 0 0 0
1 10 \N \N
2 20 2 21
2 20 2 22
3 30 \N \N
4 40 4 41
4 40 4 42
-
0 0 0 0
1 10 \N \N
2 20 2 21
2 20 2 22
3 30 \N \N
4 40 4 41
4 40 4 42
any inner
0 0 0
2 20 2
4 40 4
-
0 0 0
-
0 0 0
2 20 2
4 40 4
-
0 0 0
all inner
0 0 0 0
2 20 2 21
2 20 2 22
4 40 4 41
4 40 4 42
-
0 0 0 0
-
0 0 0 0
-
0 0 0 0
2 20 2 21
2 20 2 22
4 40 4 41
4 40 4 42
-
0 0 0 0
2 20 2 21
2 20 2 22
4 40 4 41
4 40 4 42

View File

@ -0,0 +1,164 @@
DROP TABLE IF EXISTS t0;
DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t2;
CREATE TABLE t0 (x UInt32, y UInt64) engine = MergeTree ORDER BY (x,y);
CREATE TABLE t1 (x UInt32, y UInt64) engine = MergeTree ORDER BY (x,y);
CREATE TABLE t2 (x UInt32, y UInt64) engine = MergeTree ORDER BY (x,y);
INSERT INTO t1 (x, y) VALUES (0, 0);
SET partial_merge_join = 1;
SET any_join_distinct_right_table_keys = 1;
SELECT 't join none using';
SELECT * FROM t1 ANY LEFT JOIN t0 USING (x) ORDER BY x;
SELECT '-';
SELECT * FROM t1 LEFT JOIN t0 USING (x) ORDER BY x;
SELECT '-';
SELECT * FROM t1 ANY INNER JOIN t0 USING (x) ORDER BY x;
SELECT '-';
SELECT * FROM t1 INNER JOIN t0 USING (x) ORDER BY x;
SELECT 't join none on';
SELECT * FROM t1 ANY LEFT JOIN t0 ON t1.x = t0.x ORDER BY x;
SELECT '-';
SELECT * FROM t1 LEFT JOIN t0 ON t1.x = t0.x ORDER BY x;
SELECT '-';
SELECT * FROM t1 ANY INNER JOIN t0 ON t1.x = t0.x ORDER BY x;
SELECT '-';
SELECT * FROM t1 INNER JOIN t0 ON t1.x = t0.x ORDER BY x;
SELECT 'none join t using';
SELECT * FROM t0 ANY LEFT JOIN t1 USING (x);
SELECT * FROM t0 LEFT JOIN t1 USING (x);
SELECT * FROM t0 ANY INNER JOIN t1 USING (x);
SELECT * FROM t0 INNER JOIN t1 USING (x);
SELECT 'none join t on';
SELECT * FROM t0 ANY LEFT JOIN t1 ON t1.x = t0.x;
SELECT * FROM t0 LEFT JOIN t1 ON t1.x = t0.x;
SELECT * FROM t0 ANY INNER JOIN t1 ON t1.x = t0.x;
SELECT * FROM t0 INNER JOIN t1 ON t1.x = t0.x;
SELECT '/none';
SET join_use_nulls = 1;
SELECT 't join none using';
SELECT * FROM t1 ANY LEFT JOIN t0 USING (x) ORDER BY x;
SELECT '-';
SELECT * FROM t1 LEFT JOIN t0 USING (x) ORDER BY x;
SELECT '-';
SELECT * FROM t1 ANY INNER JOIN t0 USING (x) ORDER BY x;
SELECT '-';
SELECT * FROM t1 INNER JOIN t0 USING (x) ORDER BY x;
SELECT 't join none on';
SELECT * FROM t1 ANY LEFT JOIN t0 ON t1.x = t0.x ORDER BY x;
SELECT '-';
SELECT * FROM t1 LEFT JOIN t0 ON t1.x = t0.x ORDER BY x;
SELECT '-';
SELECT * FROM t1 ANY INNER JOIN t0 ON t1.x = t0.x ORDER BY x;
SELECT '-';
SELECT * FROM t1 INNER JOIN t0 ON t1.x = t0.x ORDER BY x;
SELECT 'none join t using';
SELECT * FROM t0 ANY LEFT JOIN t1 USING (x);
SELECT * FROM t0 LEFT JOIN t1 USING (x);
SELECT * FROM t0 ANY INNER JOIN t1 USING (x);
SELECT * FROM t0 INNER JOIN t1 USING (x);
SELECT 'none join t on';
SELECT * FROM t0 ANY LEFT JOIN t1 ON t1.x = t0.x;
SELECT * FROM t0 LEFT JOIN t1 ON t1.x = t0.x;
SELECT * FROM t0 ANY INNER JOIN t1 ON t1.x = t0.x;
SELECT * FROM t0 INNER JOIN t1 ON t1.x = t0.x;
SELECT '/none';
INSERT INTO t1 (x, y) VALUES (1, 10) (2, 20);
INSERT INTO t1 (x, y) VALUES (4, 40) (3, 30);
INSERT INTO t2 (x, y) VALUES (4, 41) (2, 21) (2, 22);
INSERT INTO t2 (x, y) VALUES (0, 0) (5, 50) (4, 42);
SET join_use_nulls = 0;
SELECT 'any left';
SELECT t1.*, t2.x FROM t1 ANY LEFT JOIN t2 USING (x) ORDER BY x;
SELECT '-';
SELECT t1.*, t2.x FROM t1 ANY LEFT JOIN t2 USING (x,y) ORDER BY x;
SELECT '-';
SELECT t1.*, t2.x FROM t1 ANY LEFT JOIN t2 USING (x) ORDER BY x;
SELECT '-';
SELECT t1.*, t2.x FROM t1 ANY LEFT JOIN t2 USING (x,y) ORDER BY x;
SELECT 'all left';
SELECT t1.*, t2.* FROM t1 LEFT JOIN t2 ON t1.x = t2.x ORDER BY x, t2.y;
SELECT '-';
SELECT t1.*, t2.* FROM t1 LEFT JOIN t2 ON t1.y = t2.y ORDER BY x;
SELECT '-';
SELECT t1.*, t2.* FROM t1 LEFT JOIN t2 ON t1.x = t2.x AND t1.y = t2.y ORDER BY x;
SELECT '-';
SELECT t1.*, t2.* FROM t1 LEFT JOIN t2 ON t1.x = t2.x AND toUInt32(intDiv(t1.y,10)) = t2.x ORDER BY x, t2.y;
SELECT '-';
SELECT t1.*, t2.* FROM t1 LEFT JOIN t2 ON t1.x = t2.x AND toUInt64(t1.x) = intDiv(t2.y,10) ORDER BY x, t2.y;
SELECT 'any inner';
SELECT t1.*, t2.x FROM t1 ANY INNER JOIN t2 USING (x) ORDER BY x;
SELECT '-';
SELECT t1.*, t2.x FROM t1 ANY INNER JOIN t2 USING (x,y) ORDER BY x;
SELECT '-';
SELECT t1.*, t2.x FROM t1 ANY INNER JOIN t2 USING (x) ORDER BY x;
SELECT '-';
SELECT t1.*, t2.x FROM t1 ANY INNER JOIN t2 USING (x,y) ORDER BY x;
SELECT 'all inner';
SELECT t1.*, t2.* FROM t1 INNER JOIN t2 ON t1.x = t2.x ORDER BY x, t2.y;
SELECT '-';
SELECT t1.*, t2.* FROM t1 INNER JOIN t2 ON t1.y = t2.y ORDER BY x;
SELECT '-';
SELECT t1.*, t2.* FROM t1 INNER JOIN t2 ON t1.x = t2.x AND t1.y = t2.y ORDER BY x;
SELECT '-';
SELECT t1.*, t2.* FROM t1 INNER JOIN t2 ON t1.x = t2.x AND toUInt32(intDiv(t1.y,10)) = t2.x ORDER BY x, t2.y;
SELECT '-';
SELECT t1.*, t2.* FROM t1 INNER JOIN t2 ON t1.x = t2.x AND toUInt64(t1.x) = intDiv(t2.y,10) ORDER BY x, t2.y;
SET join_use_nulls = 1;
SELECT 'any left';
SELECT t1.*, t2.x FROM t1 ANY LEFT JOIN t2 USING (x) ORDER BY x;
SELECT '-';
SELECT t1.*, t2.x FROM t1 ANY LEFT JOIN t2 USING (x,y) ORDER BY x;
SELECT '-';
SELECT t1.*, t2.x FROM t1 ANY LEFT JOIN t2 USING (x) ORDER BY x;
SELECT '-';
SELECT t1.*, t2.x FROM t1 ANY LEFT JOIN t2 USING (x,y) ORDER BY x;
SELECT 'all left';
SELECT t1.*, t2.* FROM t1 LEFT JOIN t2 ON t1.x = t2.x ORDER BY x, t2.y;
SELECT '-';
SELECT t1.*, t2.* FROM t1 LEFT JOIN t2 ON t1.y = t2.y ORDER BY x;
SELECT '-';
SELECT t1.*, t2.* FROM t1 LEFT JOIN t2 ON t1.x = t2.x AND t1.y = t2.y ORDER BY x;
SELECT '-';
SELECT t1.*, t2.* FROM t1 LEFT JOIN t2 ON t1.x = t2.x AND toUInt32(intDiv(t1.y,10)) = t2.x ORDER BY x, t2.y;
SELECT '-';
SELECT t1.*, t2.* FROM t1 LEFT JOIN t2 ON t1.x = t2.x AND toUInt64(t1.x) = intDiv(t2.y,10) ORDER BY x, t2.y;
SELECT 'any inner';
SELECT t1.*, t2.x FROM t1 ANY INNER JOIN t2 USING (x) ORDER BY x;
SELECT '-';
SELECT t1.*, t2.x FROM t1 ANY INNER JOIN t2 USING (x,y) ORDER BY x;
SELECT '-';
SELECT t1.*, t2.x FROM t1 ANY INNER JOIN t2 USING (x) ORDER BY x;
SELECT '-';
SELECT t1.*, t2.x FROM t1 ANY INNER JOIN t2 USING (x,y) ORDER BY x;
SELECT 'all inner';
SELECT t1.*, t2.* FROM t1 INNER JOIN t2 ON t1.x = t2.x ORDER BY x, t2.y;
SELECT '-';
SELECT t1.*, t2.* FROM t1 INNER JOIN t2 ON t1.y = t2.y ORDER BY x;
SELECT '-';
SELECT t1.*, t2.* FROM t1 INNER JOIN t2 ON t1.x = t2.x AND t1.y = t2.y ORDER BY x;
SELECT '-';
SELECT t1.*, t2.* FROM t1 INNER JOIN t2 ON t1.x = t2.x AND toUInt32(intDiv(t1.y,10)) = t2.x ORDER BY x, t2.y;
SELECT '-';
SELECT t1.*, t2.* FROM t1 INNER JOIN t2 ON t1.x = t2.x AND toUInt64(t1.x) = intDiv(t2.y,10) ORDER BY x, t2.y;
DROP TABLE t0;
DROP TABLE t1;
DROP TABLE t2;

View File

@ -0,0 +1,5 @@
1 1
2
3
4
5

View File

@ -0,0 +1,7 @@
set partial_merge_join = 1;
select s1.x, s2.x from (select 1 as x) s1 left join (select 1 as x) s2 using x;
select * from (select materialize(2) as x) s1 left join (select 2 as x) s2 using x;
select * from (select 3 as x) s1 left join (select materialize(3) as x) s2 using x;
select * from (select toLowCardinality(4) as x) s1 left join (select 4 as x) s2 using x;
select * from (select 5 as x) s1 left join (select toLowCardinality(5) as x) s2 using x;

View File

@ -0,0 +1 @@
[[],[2]]

View File

@ -0,0 +1 @@
select groupUniqArray(v) from values('id int, v Array(int)', (1, [2]), (1, [])) group by id;

View File

@ -0,0 +1 @@
SELECT count() FROM system.numbers LIMIT 1, 0;

View File

@ -0,0 +1,3 @@
-- serialization of big arrays shouldn't use too much memory
set max_memory_usage = 3000000000;
select ignore(x) from (select groupArray(number) x from numbers(33554433)) group by x format Null;

View File

@ -0,0 +1,15 @@
*** Should show 6: ***
test1
test2
test3
test4
test5
test6
*** Should show 2: ***
test1
test2
*** Should show 4: ***
test1
test2
test3
test4

View File

@ -0,0 +1,20 @@
DROP DATABASE IF EXISTS test_show_limit;
CREATE DATABASE test_show_limit;
CREATE TABLE test_show_limit.test1 (test UInt8) ENGINE = TinyLog;
CREATE TABLE test_show_limit.test2 (test UInt8) ENGINE = TinyLog;
CREATE TABLE test_show_limit.test3 (test UInt8) ENGINE = TinyLog;
CREATE TABLE test_show_limit.test4 (test UInt8) ENGINE = TinyLog;
CREATE TABLE test_show_limit.test5 (test UInt8) ENGINE = TinyLog;
CREATE TABLE test_show_limit.test6 (test UInt8) ENGINE = TinyLog;
SELECT '*** Should show 6: ***';
SHOW TABLES FROM test_show_limit;
SELECT '*** Should show 2: ***';
SHOW TABLES FROM test_show_limit LIMIT 2;
SELECT '*** Should show 4: ***';
SHOW TABLES FROM test_show_limit LIMIT 2 * 2;
DROP DATABASE test_show_limit;

7
debian/.pbuilderrc vendored
View File

@ -11,11 +11,10 @@
# sudo ln -s gutsy /usr/share/debootstrap/scripts/bionic
# sudo ln -s sid /usr/share/debootstrap/scripts/buster
# build ubuntu:
# sudo DIST=trusty pbuilder create --configfile debian/.pbuilderrc && DIST=trusty pdebuild --configfile debian/.pbuilderrc
# sudo DIST=xenial pbuilder create --configfile debian/.pbuilderrc && DIST=xenial pdebuild --configfile debian/.pbuilderrc
# sudo DIST=zesty pbuilder create --configfile debian/.pbuilderrc && DIST=zesty pdebuild --configfile debian/.pbuilderrc
# sudo DIST=artful pbuilder create --configfile debian/.pbuilderrc && DIST=artful pdebuild --configfile debian/.pbuilderrc
# sudo DIST=bionic pbuilder create --configfile debian/.pbuilderrc && DIST=bionic pdebuild --configfile debian/.pbuilderrc
# sudo DIST=cosmic pbuilder create --configfile debian/.pbuilderrc && DIST=cosmic pdebuild --configfile debian/.pbuilderrc
# sudo DIST=disco pbuilder create --configfile debian/.pbuilderrc && DIST=disco pdebuild --configfile debian/.pbuilderrc
# sudo DIST=eoan pbuilder create --configfile debian/.pbuilderrc && DIST=eoan pdebuild --configfile debian/.pbuilderrc
# sudo DIST=devel pbuilder create --configfile debian/.pbuilderrc && DIST=devel pdebuild --configfile debian/.pbuilderrc
# build debian:
# sudo DIST=stable pbuilder create --configfile debian/.pbuilderrc && DIST=stable pdebuild --configfile debian/.pbuilderrc

2
debian/control vendored
View File

@ -5,7 +5,7 @@ Maintainer: Alexey Milovidov <milovidov@yandex-team.ru>
Build-Depends: debhelper (>= 9),
cmake | cmake3,
ninja-build,
gcc-7 [amd64 i386] | gcc-8 [amd64 i386] | gcc-9 [amd64 i386], g++-7 [amd64 i386] | g++-8 [amd64 i386] | g++-9 [amd64 i386],
gcc-9 [amd64 i386] | gcc-8 [amd64 i386], g++-9 [amd64 i386] | g++-8 [amd64 i386],
clang-8 [arm64 armhf] | clang-7 [arm64 armhf] | clang-6.0 [arm64 armhf],
libc6-dev,
libicu-dev,

View File

@ -2,4 +2,4 @@
# Try stop parallel build after timeout
killall make gcc gcc-7 g++-7 gcc-8 g++-8 clang clang-5.0 clang++-5.0 clang-6.0 clang++-6.0 clang-7 clang++-7 ||:
killall make gcc gcc-8 g++-8 gcc-9 g++-9 clang clang-6.0 clang++-6.0 clang-7 clang++-7 ||:

3
debian/rules vendored
View File

@ -32,6 +32,9 @@ endif
CMAKE_FLAGS += -DENABLE_UTILS=0
DEB_CC ?= $(shell which gcc-9 gcc-8 gcc | head -n1)
DEB_CXX ?= $(shell which g++-9 g++-8 g++ | head -n1)
ifdef DEB_CXX
DEB_BUILD_GNU_TYPE := $(shell dpkg-architecture -qDEB_BUILD_GNU_TYPE)
DEB_HOST_GNU_TYPE := $(shell dpkg-architecture -qDEB_HOST_GNU_TYPE)

Some files were not shown because too many files have changed in this diff Show More