TODO (suggested by Nikolai)
1. Build query plan fro current query (inside storage::read) up to WithMergableState
2. Check, that plan is simple enough: Aggregating - Expression - Filter - ReadFromStorage (or simplier)
3. Check, that filter is the same as filter in projection, and also expression calculates the same aggregation keys as in projection
4. Return WithMergableState if projection applies
3 will be easier to do with ActionsDAG, cause it sees all functions, and dependencies are direct (but it is possible with ExpressionActions also)
Also need to figure out how prewhere works for projections, and
row_filter_policies.
wip
Reading files using mmap() does not have any significant benefits over
plain read() [1].
[1]: https://gist.github.com/azat/3d6c8d82bdd91e7a38d997fd6bcfd574
And not only it does not have significant benefits, it also has some
issues, due to max_server_memory_usage (default to 90% of available
RAM), since when you read files with mmap() eventually process RSS may
exceed max_server_memory_usage, and in this case any allocation will
fail (with "Memory limit exceeded (total)") error (yes kernel will
unload pages, but likely it will happens after queries will starting to
fail), like in this test [2].
[2]: https://gist.github.com/azat/4813489828162e6c2ce131963c6a1acb
TL;DR;
Note that there was also an idea to take those mmap()'ed regions in
memory tracking (#23211), but there are some drawbacks (since accounting
mmap() is tricky, first of all you need to account only once per inode
for file and plus kernel can unload some pages and those memory will not
be used by the server anymore).
And as an adddition to #23211 there was #23212, that adds
max_bytes_to_use_mmap_io, but since mmap is not a subject for memory
accounting there is no need in it.
v2: fix optimize_skip_unused_shards_rewrite_in for sharding_key wrapped into function
v3: fix column name for optimize_skip_unused_shards_rewrite_in
v4: fix optimize_skip_unused_shards_rewrite_in with Null
v5:
- squash with Remove query argument for IStreamFactory::createForShard()
- use proper column after function execution (using sharding_key_column_name)
- update the test reference since (X) now is tuple(X)