ClickHouse

mirror of https://github.com/ClickHouse/ClickHouse.git synced 2024-11-14 03:25:15 +00:00

Author	SHA1	Message	Date
Denny Crane	8a00be69b3	Update index.md	2023-05-24 10:40:33 -03:00
Han Fei	2625696591	Merge branch 'master' into hanfei/regexp-doc	2023-05-21 23:42:01 +02:00
Robert Schulze	491cf8b6e1	Fix minor mistakes	2023-05-21 13:43:05 +00:00
Robert Schulze	9d9d4e3d62	Some fixups	2023-05-21 13:40:52 +00:00
Robert Schulze	312f751503	Uppercase remaining SQL keywords	2023-05-21 13:08:55 +00:00
Azat Khuzhin	2b240d3721	Improve documentation for HASHED/SPARSE_HASHED/COMPLEX_KEY_HASHED/COMPLEX_KEY_SPARSE_HASHED Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Azat Khuzhin	2996b38606	Add ability to configure maximum load factor for the HASHED/SPARSE_HASHED layout As it turns out, HashMap/PackedHashMap works great even with max load factor of 0.99. By "great" I mean it least it works faster then google sparsehash, and not to mention it's friendliness to the memory allocator (it has zero fragmentation since it works with a continuious memory region, in comparison to the sparsehash that doing lots of realloc, which jemalloc does not like, due to it's slabs). Here is a table of different setups: settings \| load (sec) \| read (sec) \| read (million rows/s) \| bytes_allocated \| RSS - \| - \| - \| - \| - \| - HASHED upstream \| - \| - \| - \| - \| 35GiB SPARSE_HASHED upstream \| - \| - \| - \| - \| 26GiB - \| - \| - \| - \| - \| - sparse_hash_map glibc hashbench \| - \| - \| - \| - \| 17.5GiB sparse_hash_map packed allocator \| 101.878 \| 231.48 \| 4.32 \| - \| 17.7GiB PackedHashMap 0.5 \| 15.514 \| 42.35 \| 23.61 \| 20GiB \| 22GiB hashed 0.95 \| 34.903 \| 115.615 \| 8.65 \| 16GiB \| 18.7GiB PackedHashMap 0.95 \| 93.6 \| 19.883 \| 10.68 \| 10GiB \| 12.8GiB PackedHashMap 0.99 \| 26.113 \| 83.6 \| 11.96 \| 10GiB \| 12.3GiB As it shows, PackedHashMap with 0.95 max_load_factor, eats 2.6x less memory then SPARSE_HASHED in upstream, and it also 2x faster for read! v2: fix grower Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-05-19 06:07:21 +02:00
Han Fei	549af4d351	address comments	2023-05-17 21:23:32 +02:00
Han Fei	7df0e9d933	fix broken link	2023-05-16 15:33:08 +02:00
Han Fei	a40d86b921	Update docs/en/sql-reference/dictionaries/index.md Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>	2023-05-16 11:22:42 +02:00
Han Fei	ed5906f15d	Update docs/en/sql-reference/dictionaries/index.md Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>	2023-05-16 11:22:31 +02:00
Han Fei	31b8e3c489	Update docs/en/sql-reference/dictionaries/index.md Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>	2023-05-16 11:22:24 +02:00
Han Fei	e4e473ef30	Update docs/en/sql-reference/dictionaries/index.md Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>	2023-05-16 11:22:14 +02:00
Han Fei	29aa960377	refine docs for regexp tree dictionary	2023-05-16 09:07:35 +02:00
Han Fei	ef74e64336	address comments	2023-05-11 22:18:08 +02:00
Ivan Takarlikov	8873856ce5	Fix some grammar mistakes in documentation, code and tests	2023-05-04 13:35:18 -03:00
MikhailBurdukov	b229a28e94	Merge branch 'master' into mongo_dict_tls	2023-04-26 23:39:27 +03:00
MikhailBurdukov	7764168bd5	Resove conflict	2023-04-26 19:50:58 +00:00
MikhailBurdukov	baaee66e85	Missing files	2023-04-26 19:29:29 +00:00
Robert Schulze	c406663442	Docs: Replace annoying three spaces in enumerations by a single space	2023-04-19 15:56:55 +00:00
DanRoscigno	6d8a2bbd48	standardize admonitions	2023-03-27 14:54:05 -04:00
rfraposa	ac5ed141d8	New nav - reverting the revert	2023-03-17 21:45:43 -05:00
Alexander Tokmakov	ec44c8293a	Revert "New navigation"	2023-03-17 21:21:11 +03:00
rfraposa	a580d7c021	Combined Dictionary pages	2023-03-08 16:52:01 -07:00
rfraposa	4b1b4a711e	Fix links	2023-03-08 00:05:58 -07:00
rfraposa	fa6f3dadba	Link fixes	2023-03-07 22:52:43 -07:00
rfraposa	4f67e3facf	Update Dictionary links	2023-03-03 20:11:51 -07:00
rfraposa	d1045b9f11	Fix Dictionary links; update install.md	2023-03-02 07:56:03 -07:00
rfraposa	17a2d7ed45	Fixing broken links	2023-03-01 16:53:17 -07:00
rfraposa	1b6916ddd2	Condensed dictionary docs into a single page	2023-02-28 12:01:52 -07:00
rfraposa	a4a5a8a7d3	Initial copy of doc-preview	2023-02-28 11:59:05 -07:00
rfraposa	e52edd4e85	Update external-dicts-dict-layout.md	2023-02-01 09:06:21 -07:00
Denny Crane	fda47bf4f8	Update external-dicts-dict-layout.md	2023-01-24 21:31:43 -04:00
Azat Khuzhin	4366f7fb3b	Remove PREALLOCATE for HASHED/SPARSE_HASHED dictionaries It does not give significant benefit, but now, you hashed/sparse_hashed dictionaries can be filled in parallel (#40003), using sharded dictionaries, and this should be used instead of PREALLOCATE. Note, that dictionaries, that had been created with PREALLOCATE will work, but simply ignore this attribute. Fixes: #41985 (cc @alexey-milovidov) Reverts: #23979 (cc @kitaisreal) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-18 20:18:37 +01:00
Azat Khuzhin	99063b152f	Allow to configure queue backlog of the parallel hashed dictionary loader v2: Decrease default parallel_queue_backlog to 10000 (same speed) v3: Rename parallel_queue_backlog to per_shard_load_backlog v3: Rename per_shard_load_backlog to shard_load_queue_backlog v4: Fix documentation Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:26 +01:00
Azat Khuzhin	345c422e28	Add ability to load hashed dictionaries using multiple threads Right now dictionaries (here I will talk about only HASHED/SPARSE_HASHED/COMPLEX_KEY_HASHED/COMPLEX_KEY_SPARSE_HASHED) can load data only in one thread, since it uses one hash table that cannot be filled from multiple threads. And in case you have very big dictionary (i.e. 10e9 elements), it can take a awhile to load them, especially for SPARSE_HASHED variants (and if you have such amount of elements there, you are likely use SPARSE_HASHED, since it requires less memory), in my env it takes ~4 hours, which is enormous amount of time. So this patch add support of shards for dictionaries, number of shards determine how much hash tables will use this dictionary, also, and which is more important, how much threads it can use to load the data. And with 16 threads this works 2x faster, not perfect though, see the follow up patches in this series. v0: PARTITION BY v1: SHARDS 1 v2: SHARDS(1) v3: tried optimized mod - logical and, but it does not gain even 10% v4: tried squashing more (max_block_size * shards), but it does not gain even 10% either v5: move SHARDS into layout parameters (unknown simply ignored) v6: tune params for perf tests (to avoid too long queries) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>	2023-01-13 13:39:25 +01:00
Han Fei	6ed4570f73	Merge branch 'master' into regexp-tree-dictionary	2023-01-10 15:36:30 +01:00
Han Fei	5f8296b719	Update docs/en/sql-reference/dictionaries/external-dictionaries/regexp-tree.md Co-authored-by: Vladimir C <vdimir@clickhouse.com>	2023-01-10 14:41:06 +01:00
Ivan Blinkov	61c2f23713	Remove leftover empty lines at the end of markdown files	2023-01-09 15:15:18 +01:00
Han Fei	f2a9eea995	write docs and optimize regex compile	2023-01-05 17:38:01 +01:00
Denny Crane	850f77f4d2	Update external-dicts-dict-sources.md	2022-12-26 16:21:36 -04:00
Dale Mcdiarmid	1f5e6799ec	revert contents change	2022-12-16 12:03:57 +00:00
Dale Mcdiarmid	ba52210124	revert format issue	2022-12-16 12:00:12 +00:00
Dale Mcdiarmid	22e8477b2a	cross link dictionaries + udf posts£	2022-12-14 15:01:15 +00:00
Dan Roscigno	c5eb269515	Merge pull request #43943 from DanRoscigno/update-operations-docs Update operations docs	2022-12-05 20:58:42 -05:00
Dale Mcdiarmid	5ab5aa13f4	cross link docs to blogs	2022-12-05 17:28:03 +00:00
DanRoscigno	08e8ea1bfa	update link	2022-12-05 08:23:51 -05:00
DanRoscigno	5e087ae967	link to tutorial	2022-11-16 11:54:06 -05:00
DanRoscigno	c60b98f576	updates from review	2022-11-15 16:17:43 -05:00
DanRoscigno	63ae261119	move tip to snippet	2022-11-15 12:44:54 -05:00

1 2 3 4

199 Commits