ClickHouse

thevar1able/ClickHouse

Fork 0

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-11-05 15:21:43 +00:00

Commit Graph

Author	SHA1	Message	Date
Ivan	315ff4f0d9	ANTLR4 Grammar for ClickHouse and new parser (#11298 )	2020-12-04 05:15:44 +03:00
Azat Khuzhin	064f901ea8	Add ability to preallocate hashtables for hashed/sparsehashed dictionaries preallocation can be used only when we know number of rows, and for this we need: - source clickhouse - no filtering (i.e. lack of <where>), since filtering can filter too much rows and eventually it may allocate memory that will never be used. For sparse_hash the difference is quite significant, preallocated sparse_hash hashtable allocates ~33% faster (7.5 seconds vs 5 seconds for insert, and the difference is more significant for higher number of elements): $ ninja bench-sparse_hash-run [1/1] cd /src/ch/hashtable-bench/.cmake && ...ch/hashtable-bench/.cmake/bench-sparse_hash sparse_hash/insert: 7.574 <!-- sparse_hash/find : 2.14426 sparse_hash/maxrss: 174MiB sparse_hash/time: 9710.51 msec (user+sys) $ time ninja bench-sparse_hash-preallocate-run [1/1] cd /src/ch/hashtable-bench/.cmake && ...-bench/.cmake/bench-sparse_hash-preallocate sparse_hash/insert: 5.0522 <!-- sparse_hash/find : 2.14024 sparse_hash/maxrss: 174MiB sparse_hash/time: 7192.06 msec (user+sys) P.S. the difference for sparse_hashed dictionary with 4e9 elements (uint64, uint16) is ~18% (4975.905 vs 4103.569 sec) v2: do not reallocate the dictionary from the progress callback Since this will access hashtable in parallel. v3: drop PREALLOCATE() and do this only for source=clickhouse and empty <where>	2020-10-09 22:28:14 +03:00

Author

SHA1

Message

Date

Ivan

315ff4f0d9

ANTLR4 Grammar for ClickHouse and new parser (#11298 )

2020-12-04 05:15:44 +03:00

Azat Khuzhin

064f901ea8

Add ability to preallocate hashtables for hashed/sparsehashed dictionaries

preallocation can be used only when we know number of rows, and for this
we need:
- source clickhouse
- no filtering (i.e. lack of <where>), since filtering can filter
  too much rows and eventually it may allocate memory that will
  never be used.

For sparse_hash the difference is quite significant, preallocated
sparse_hash hashtable allocates ~33% faster (7.5 seconds vs 5 seconds
for insert, and the difference is more significant for higher number of
elements):

    $ ninja bench-sparse_hash-run
    [1/1] cd /src/ch/hashtable-bench/.cmake && ...ch/hashtable-bench/.cmake/bench-sparse_hash
    sparse_hash/insert: 7.574 <!--
    sparse_hash/find  : 2.14426
    sparse_hash/maxrss: 174MiB
    sparse_hash/time:   9710.51 msec (user+sys)

    $ time ninja bench-sparse_hash-preallocate-run
    [1/1] cd /src/ch/hashtable-bench/.cmake && ...-bench/.cmake/bench-sparse_hash-preallocate
    sparse_hash/insert: 5.0522 <!--
    sparse_hash/find  : 2.14024
    sparse_hash/maxrss: 174MiB
    sparse_hash/time:   7192.06 msec (user+sys)

P.S. the difference for sparse_hashed dictionary with 4e9 elements
(uint64, uint16) is ~18% (4975.905 vs 4103.569 sec)

v2: do not reallocate the dictionary from the progress callback
    Since this will access hashtable in parallel.
v3: drop PREALLOCATE() and do this only for source=clickhouse and empty
    <where>

2020-10-09 22:28:14 +03:00

2 Commits