Commit Graph

54 Commits

Author SHA1 Message Date
vdimir
398499d253
Support SHARDS for HashedArrayDictionary 2023-12-13 13:00:28 +00:00
Denny Crane
c93efc929a fix typo in the doc 2023-11-13 16:54:21 +03:00
Justin de Guzman
b51f90362d
Clarify query behavior dictionary updates 2023-10-27 00:20:32 -07:00
Robert Schulze
4ed5b903b4
Docs: remove anchor prefix 2023-09-18 18:35:59 +00:00
johanngan
bcb058f999 Add case insensitive and dot-all modes to RegExpTree dictionary
The new per-dictionary settings control regex match semantics around
case sensitivity and the '.' wildcard with newlines. They must be set at
the dictionary level since they're applied to regex engines at
pattern-compile-time.

- regexp_dict_flag_case_insensitive: case insensitive matching
- regexp_dict_flag_dotall: '.' matches all characters including newlines

They correspond to HS_FLAG_CASELESS and HS_FLAG_DOTALL in Vectorscan
and case_sensitive and dot_nl in RE2. These are the most useful options
compatible with the internal behavior of RegExpTreeDictionary around
splitting up simple and complex patterns between Vectorscan and RE2.

The alternative is to use (?i) and/or (?s) for all patterns. However,
(?s) isn't handled properly by OptimizedRegularExpression::analyze().
And while (?i) is, it still causes the dictionary to treat the pattern
as "complex" for sequential scanning with RE2 rather than multi-matching
with Vectorscan, even though Vectorscan supports case insensitive
literal matching. Setting dictionary-wide flags is both more convenient,
and circumvents these problems.
2023-09-06 11:28:53 -05:00
Robert Schulze
f20dd27ba6
Clean header mess up 2023-08-17 18:47:11 +00:00
Rich Raposa
13a03c046f
Remove duplicate section from Dictionary page 2023-08-11 23:45:58 -06:00
johanngan
c0f162c5b6 Add dictGetAll function for RegExpTreeDictionary
This function outputs an array of attribute values from all regexp nodes
that matched in a regexp tree dictionary. An optional final argument can
be passed to limit the array size.
2023-06-04 23:46:04 -05:00
Robert Schulze
54872f9e7e
Typos: Follow-up to #50476 2023-06-02 13:28:09 +00:00
Robert Schulze
65cc92a78d
CI: Fix aspell on nested docs 2023-06-02 12:24:41 +00:00
johanngan
de3b08aa5b Clean up regexp tree dictionary documentation
dictGetOrNull() relies on IDictionary::hasKeys(), which
RegExpTreeDictionary doesn't implement, so this probably never worked.
If you try to use it, an exception is thrown. The docs shouldn't
indicate that this is supported.

Also fix a markdown hyperlink in the docs.
2023-05-25 14:35:24 -05:00
Denny Crane
8a00be69b3
Update index.md 2023-05-24 10:40:33 -03:00
Han Fei
2625696591
Merge branch 'master' into hanfei/regexp-doc 2023-05-21 23:42:01 +02:00
Robert Schulze
491cf8b6e1
Fix minor mistakes 2023-05-21 13:43:05 +00:00
Robert Schulze
9d9d4e3d62
Some fixups 2023-05-21 13:40:52 +00:00
Robert Schulze
312f751503
Uppercase remaining SQL keywords 2023-05-21 13:08:55 +00:00
Azat Khuzhin
2b240d3721 Improve documentation for HASHED/SPARSE_HASHED/COMPLEX_KEY_HASHED/COMPLEX_KEY_SPARSE_HASHED
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Azat Khuzhin
2996b38606 Add ability to configure maximum load factor for the HASHED/SPARSE_HASHED layout
As it turns out, HashMap/PackedHashMap works great even with max load
factor of 0.99. By "great" I mean it least it works faster then
google sparsehash, and not to mention it's friendliness to the memory
allocator (it has zero fragmentation since it works with a continuious
memory region, in comparison to the sparsehash that doing lots of
realloc, which jemalloc does not like, due to it's slabs).

Here is a table of different setups:

settings                         | load (sec) | read (sec) | read (million rows/s) | bytes_allocated | RSS
-                                | -          | -          | -                     | -               | -
HASHED upstream                  | -          | -          | -                     | -               | 35GiB
SPARSE_HASHED upstream           | -          | -          | -                     | -               | 26GiB
-                                | -          | -          | -                     | -               | -
sparse_hash_map glibc hashbench  | -          | -          | -                     | -               | 17.5GiB
sparse_hash_map packed allocator | 101.878    | 231.48     | 4.32                  | -               | 17.7GiB
PackedHashMap 0.5                | 15.514     | 42.35      | 23.61                 | 20GiB           | 22GiB
hashed 0.95                      | 34.903     | 115.615    | 8.65                  | 16GiB           | 18.7GiB
**PackedHashMap 0.95**           | **93.6**   | **19.883** | **10.68**             | **10GiB**       | **12.8GiB**
PackedHashMap 0.99               | 26.113     | 83.6       | 11.96                 | 10GiB           | 12.3GiB

As it shows, PackedHashMap with 0.95 max_load_factor, eats 2.6x less
memory then SPARSE_HASHED in upstream, and it also 2x faster for read!

v2: fix grower
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-05-19 06:07:21 +02:00
Han Fei
549af4d351 address comments 2023-05-17 21:23:32 +02:00
Han Fei
7df0e9d933 fix broken link 2023-05-16 15:33:08 +02:00
Han Fei
a40d86b921
Update docs/en/sql-reference/dictionaries/index.md
Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>
2023-05-16 11:22:42 +02:00
Han Fei
ed5906f15d
Update docs/en/sql-reference/dictionaries/index.md
Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>
2023-05-16 11:22:31 +02:00
Han Fei
31b8e3c489
Update docs/en/sql-reference/dictionaries/index.md
Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>
2023-05-16 11:22:24 +02:00
Han Fei
e4e473ef30
Update docs/en/sql-reference/dictionaries/index.md
Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>
2023-05-16 11:22:14 +02:00
Han Fei
29aa960377 refine docs for regexp tree dictionary 2023-05-16 09:07:35 +02:00
Han Fei
ef74e64336 address comments 2023-05-11 22:18:08 +02:00
Ivan Takarlikov
8873856ce5 Fix some grammar mistakes in documentation, code and tests 2023-05-04 13:35:18 -03:00
MikhailBurdukov
b229a28e94
Merge branch 'master' into mongo_dict_tls 2023-04-26 23:39:27 +03:00
MikhailBurdukov
7764168bd5 Resove conflict 2023-04-26 19:50:58 +00:00
MikhailBurdukov
baaee66e85 Missing files 2023-04-26 19:29:29 +00:00
Robert Schulze
c406663442
Docs: Replace annoying three spaces in enumerations by a single space 2023-04-19 15:56:55 +00:00
DanRoscigno
6d8a2bbd48 standardize admonitions 2023-03-27 14:54:05 -04:00
rfraposa
ac5ed141d8 New nav - reverting the revert 2023-03-17 21:45:43 -05:00
Alexander Tokmakov
ec44c8293a
Revert "New navigation" 2023-03-17 21:21:11 +03:00
rfraposa
a580d7c021 Combined Dictionary pages 2023-03-08 16:52:01 -07:00
rfraposa
4b1b4a711e Fix links 2023-03-08 00:05:58 -07:00
rfraposa
fa6f3dadba Link fixes 2023-03-07 22:52:43 -07:00
rfraposa
4f67e3facf Update Dictionary links 2023-03-03 20:11:51 -07:00
rfraposa
d1045b9f11 Fix Dictionary links; update install.md 2023-03-02 07:56:03 -07:00
rfraposa
17a2d7ed45 Fixing broken links 2023-03-01 16:53:17 -07:00
rfraposa
a4a5a8a7d3 Initial copy of doc-preview 2023-02-28 11:59:05 -07:00
Ivan Blinkov
61c2f23713 Remove leftover empty lines at the end of markdown files 2023-01-09 15:15:18 +01:00
DanRoscigno
c8f9af1afa start renaming 2022-11-02 15:47:11 -04:00
DanRoscigno
5b5fcc56aa add slugs 2022-08-28 10:53:34 -04:00
rfraposa
869967de41 Remove H1 anchor tags from docs 2022-06-02 04:55:18 -06:00
rfraposa
8f01fe9c49 Revised /en folder 2022-04-09 07:34:21 -06:00
rfraposa
5250d9ad11 Removed /ja folder, cleaned up /ru markdown 2022-04-09 07:29:05 -06:00
Alexey Milovidov
9854b55835
Revert "Format changes for new docs" 2022-04-04 02:05:35 +03:00
rfraposa
560471f991 Update /sql-reference docs 2022-03-29 22:06:21 -06:00
Olga Revyakina
adf494ae1f Nullable types 2021-04-04 11:00:48 +03:00