Previously it could create MergeTreeInOrder for each mark, however this
could be very suboptimal, due to each MergeTreeInOrder has some memory
overhead.
Now, by collapsing all marks for one part together it is more memory
effiecient.
I've tried the query from the altinity wiki [1] and it decreases memory
usage twice:
SELECT * FROM repl_tbl FINAL WHERE key IN (SELECT toUInt32(number) FROM numbers(1000000) WHERE number % 50000 = 0) FORMAT Null
- upstream: MemoryTracker: Peak memory usage (for query): 520.27 MiB.
- patched: MemoryTracker: Peak memory usage (for query): 260.95 MiB.
[1]: https://kb.altinity.com/engines/mergetree-table-engine-family/replacingmergetree/#multiple-keys
And it could be not 2x and even more or less, it depends on the gaps in
marks for reading (for example in my setup the memory usage increased a
lot, from ~16GiB of RAM to >64GiB due to lots of marks and gaps).
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
This reverts the following commits:
- e77dd81036
- e8527e720b
Additionally, functional tests are added.
When scanning complex regexp nodes sequentially with RE2, the old code
has an optimization to break out of the loop early upon finding a leaf
node that matches. This is an invalid optimization because there's no
guarantee that it's actually a VALID match, because its parents might
NOT have matched. Semantically, a user would expect this match to be
discarded and for the search to continue. Instead, since we skipped
matching after the first false positive, subsequent nodes that would
have matched are missing from the output value. This affects both
dictGet and dictGetAll.
It's difficult to distinguish a true positive from a false positive
while looping through complex_regexp_nodes because we would have to scan
all the parents of a matching node to confirm a true positive. Trying to
do this might actually end up being slower than just scanning every
complex regexp node, because complex_regexp_nodes is only a subset of
all the tree nodes; we may end up duplicating work with scanning
that Vectorscan has already done, depending on whether the parent nodes
are "simple" or "complex". So instead of trying to fix this
optimization, just remove it entirely.