On the query we have missing two options:
- `LIFETIME` Is not on the example if you copy paste you will have an Exception `DB::Exception: Cannot create dictionary with empty lifetime.`
- `SOURCE` was not mentioned and it's important to link to the main/source table.
- There was an error on the `dictGetT` function there was an additional T this function do not exist (I have tested and we need to use `dictGet`).
- Also in the Dictionary example we have no extra attribute other than the id and the two dates, and for running the queries and the `dicGet` function you need an additional attribute this is why I have added `advertiser_id` (BTW I use advertiser_id as this was use in the example just before) and also add one example, without the example it was not easy to understand what was the 'attr_name' mentioned before.
- I add an example as an user did not knew how to cast the date to a Uint64 (Because most of the time the original/raw dates are defined on the range as Date64, so this example will explain them how to cast when doing the query)
It was initially implemented in #15454, but was reverted in #21948 (due
to higher memory usage).
This implementation differs from the initial, since now there is
separate attribute to enable preallocation, before it was done
automatically, but this has problems with duplicates in the source.
Plus this implementation does not uses dynamic_cast, instead it extends
IDictionarySource interface.
preallocation can be used only when we know number of rows, and for this
we need:
- source clickhouse
- no filtering (i.e. lack of <where>), since filtering can filter
too much rows and eventually it may allocate memory that will
never be used.
For sparse_hash the difference is quite significant, preallocated
sparse_hash hashtable allocates ~33% faster (7.5 seconds vs 5 seconds
for insert, and the difference is more significant for higher number of
elements):
$ ninja bench-sparse_hash-run
[1/1] cd /src/ch/hashtable-bench/.cmake && ...ch/hashtable-bench/.cmake/bench-sparse_hash
sparse_hash/insert: 7.574 <!--
sparse_hash/find : 2.14426
sparse_hash/maxrss: 174MiB
sparse_hash/time: 9710.51 msec (user+sys)
$ time ninja bench-sparse_hash-preallocate-run
[1/1] cd /src/ch/hashtable-bench/.cmake && ...-bench/.cmake/bench-sparse_hash-preallocate
sparse_hash/insert: 5.0522 <!--
sparse_hash/find : 2.14024
sparse_hash/maxrss: 174MiB
sparse_hash/time: 7192.06 msec (user+sys)
P.S. the difference for sparse_hashed dictionary with 4e9 elements
(uint64, uint16) is ~18% (4975.905 vs 4103.569 sec)
v2: do not reallocate the dictionary from the progress callback
Since this will access hashtable in parallel.
v3: drop PREALLOCATE() and do this only for source=clickhouse and empty
<where>