Commit Graph

200 Commits

Author SHA1 Message Date
johanngan
de3b08aa5b Clean up regexp tree dictionary documentation
dictGetOrNull() relies on IDictionary::hasKeys(), which
RegExpTreeDictionary doesn't implement, so this probably never worked.
If you try to use it, an exception is thrown. The docs shouldn't
indicate that this is supported.

Also fix a markdown hyperlink in the docs.
2023-05-25 14:35:24 -05:00
Denny Crane
8a00be69b3
Update index.md 2023-05-24 10:40:33 -03:00
Han Fei
2625696591
Merge branch 'master' into hanfei/regexp-doc 2023-05-21 23:42:01 +02:00
Robert Schulze
491cf8b6e1
Fix minor mistakes 2023-05-21 13:43:05 +00:00
Robert Schulze
9d9d4e3d62
Some fixups 2023-05-21 13:40:52 +00:00
Robert Schulze
312f751503
Uppercase remaining SQL keywords 2023-05-21 13:08:55 +00:00
Azat Khuzhin
2b240d3721 Improve documentation for HASHED/SPARSE_HASHED/COMPLEX_KEY_HASHED/COMPLEX_KEY_SPARSE_HASHED
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
2996b38606 Add ability to configure maximum load factor for the HASHED/SPARSE_HASHED layout
As it turns out, HashMap/PackedHashMap works great even with max load
factor of 0.99. By "great" I mean it least it works faster then
google sparsehash, and not to mention it's friendliness to the memory
allocator (it has zero fragmentation since it works with a continuious
memory region, in comparison to the sparsehash that doing lots of
realloc, which jemalloc does not like, due to it's slabs).

Here is a table of different setups:

settings                         | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
-                                | -          | -          | -                     | -               | -
HASHED upstream                  | -          | -          | -                     | -               | 35GiB
SPARSE_HASHED upstream           | -          | -          | -                     | -               | 26GiB
-                                | -          | -          | -                     | -               | -
sparse_hash_map glibc hashbench  | -          | -          | -                     | -               | 17.5GiB
sparse_hash_map packed allocator | 101.878    | 231.48     | 4.32                  | -               | 17.7GiB
PackedHashMap 0.5                | 15.514     | 42.35      | 23.61                 | 20GiB           | 22GiB
hashed 0.95                      | 34.903     | 115.615    | 8.65                  | 16GiB           | 18.7GiB
**PackedHashMap 0.95**           | **93.6**   | **19.883** | **10.68**             | **10GiB**       | **12.8GiB**
PackedHashMap 0.99               | 26.113     | 83.6       | 11.96                 | 10GiB           | 12.3GiB

As it shows, PackedHashMap with 0.95 max_load_factor, eats 2.6x less
memory then SPARSE_HASHED in upstream, and it also 2x faster for read!

v2: fix grower
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Han Fei
549af4d351 address comments 2023-05-17 21:23:32 +02:00
Han Fei
7df0e9d933 fix broken link 2023-05-16 15:33:08 +02:00
Han Fei
a40d86b921
Update docs/en/sql-reference/dictionaries/index.md
Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>
2023-05-16 11:22:42 +02:00
Han Fei
ed5906f15d
Update docs/en/sql-reference/dictionaries/index.md
Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>
2023-05-16 11:22:31 +02:00
Han Fei
31b8e3c489
Update docs/en/sql-reference/dictionaries/index.md
Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>
2023-05-16 11:22:24 +02:00
Han Fei
e4e473ef30
Update docs/en/sql-reference/dictionaries/index.md
Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>
2023-05-16 11:22:14 +02:00
Han Fei
29aa960377 refine docs for regexp tree dictionary 2023-05-16 09:07:35 +02:00
Han Fei
ef74e64336 address comments 2023-05-11 22:18:08 +02:00
Ivan Takarlikov
8873856ce5 Fix some grammar mistakes in documentation, code and tests 2023-05-04 13:35:18 -03:00
MikhailBurdukov
b229a28e94
Merge branch 'master' into mongo_dict_tls 2023-04-26 23:39:27 +03:00
MikhailBurdukov
7764168bd5 Resove conflict 2023-04-26 19:50:58 +00:00
MikhailBurdukov
baaee66e85 Missing files 2023-04-26 19:29:29 +00:00
Robert Schulze
c406663442
Docs: Replace annoying three spaces in enumerations by a single space 2023-04-19 15:56:55 +00:00
DanRoscigno
6d8a2bbd48 standardize admonitions 2023-03-27 14:54:05 -04:00
rfraposa
ac5ed141d8 New nav - reverting the revert 2023-03-17 21:45:43 -05:00
Alexander Tokmakov
ec44c8293a
Revert "New navigation" 2023-03-17 21:21:11 +03:00
rfraposa
a580d7c021 Combined Dictionary pages 2023-03-08 16:52:01 -07:00
rfraposa
4b1b4a711e Fix links 2023-03-08 00:05:58 -07:00
rfraposa
fa6f3dadba Link fixes 2023-03-07 22:52:43 -07:00
rfraposa
4f67e3facf Update Dictionary links 2023-03-03 20:11:51 -07:00
rfraposa
d1045b9f11 Fix Dictionary links; update install.md 2023-03-02 07:56:03 -07:00
rfraposa
17a2d7ed45 Fixing broken links 2023-03-01 16:53:17 -07:00
rfraposa
1b6916ddd2 Condensed dictionary docs into a single page 2023-02-28 12:01:52 -07:00
rfraposa
a4a5a8a7d3 Initial copy of doc-preview 2023-02-28 11:59:05 -07:00
rfraposa
e52edd4e85 Update external-dicts-dict-layout.md 2023-02-01 09:06:21 -07:00
Denny Crane
fda47bf4f8
Update external-dicts-dict-layout.md 2023-01-24 21:31:43 -04:00
Azat Khuzhin
4366f7fb3b Remove PREALLOCATE for HASHED/SPARSE_HASHED dictionaries
It does not give significant benefit, but now, you hashed/sparse_hashed
dictionaries can be filled in parallel (#40003), using sharded
dictionaries, and this should be used instead of PREALLOCATE.

Note, that dictionaries, that had been created with PREALLOCATE will
work, but simply ignore this attribute.

Fixes: #41985 (cc @alexey-milovidov)
Reverts: #23979 (cc @kitaisreal)

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-18 20:18:37 +01:00
Azat Khuzhin
99063b152f Allow to configure queue backlog of the parallel hashed dictionary loader
v2: Decrease default parallel_queue_backlog to 10000 (same speed)
v3: Rename parallel_queue_backlog to per_shard_load_backlog
v3: Rename per_shard_load_backlog to shard_load_queue_backlog
v4: Fix documentation
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
345c422e28 Add ability to load hashed dictionaries using multiple threads
Right now dictionaries (here I will talk about only
HASHED/SPARSE_HASHED/COMPLEX_KEY_HASHED/COMPLEX_KEY_SPARSE_HASHED)
can load data only in one thread, since it uses one hash table that
cannot be filled from multiple threads.

And in case you have very big dictionary (i.e. 10e9 elements), it can
take a awhile to load them, especially for SPARSE_HASHED variants (and
if you have such amount of elements there, you are likely use
SPARSE_HASHED, since it requires less memory), in my env it takes ~4
hours, which is enormous amount of time.

So this patch add support of shards for dictionaries, number of shards
determine how much hash tables will use this dictionary, also, and which
is more important, how much threads it can use to load the data.

And with 16 threads this works 2x faster, not perfect though, see the
follow up patches in this series.

v0: PARTITION BY
v1: SHARDS 1
v2: SHARDS(1)
v3: tried optimized mod - logical and, but it does not gain even 10%
v4: tried squashing more (max_block_size * shards), but it does not gain even 10% either
v5: move SHARDS into layout parameters (unknown simply ignored)
v6: tune params for perf tests (to avoid too long queries)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:25 +01:00
Han Fei
6ed4570f73
Merge branch 'master' into regexp-tree-dictionary 2023-01-10 15:36:30 +01:00
Han Fei
5f8296b719
Update docs/en/sql-reference/dictionaries/external-dictionaries/regexp-tree.md
Co-authored-by: Vladimir C <vdimir@clickhouse.com>
2023-01-10 14:41:06 +01:00
Ivan Blinkov
61c2f23713 Remove leftover empty lines at the end of markdown files 2023-01-09 15:15:18 +01:00
Han Fei
f2a9eea995 write docs and optimize regex compile 2023-01-05 17:38:01 +01:00
Denny Crane
850f77f4d2
Update external-dicts-dict-sources.md 2022-12-26 16:21:36 -04:00
Dale Mcdiarmid
1f5e6799ec revert contents change 2022-12-16 12:03:57 +00:00
Dale Mcdiarmid
ba52210124 revert format issue 2022-12-16 12:00:12 +00:00
Dale Mcdiarmid
22e8477b2a cross link dictionaries + udf posts£ 2022-12-14 15:01:15 +00:00
Dan Roscigno
c5eb269515
Merge pull request #43943 from DanRoscigno/update-operations-docs
Update operations docs
2022-12-05 20:58:42 -05:00
Dale Mcdiarmid
5ab5aa13f4 cross link docs to blogs 2022-12-05 17:28:03 +00:00
DanRoscigno
08e8ea1bfa update link 2022-12-05 08:23:51 -05:00
DanRoscigno
5e087ae967 link to tutorial 2022-11-16 11:54:06 -05:00
DanRoscigno
c60b98f576 updates from review 2022-11-15 16:17:43 -05:00