This way the remote nodes will not need to send all the rows, so this
will decrease network io and also this will make queries w/
optimize_aggregation_in_order=1/LIMIT X and w/o ORDER BY faster since it
initiator will not need to read all the rows, only first X (but note
that for this you need to your data to be sharded correctly or you may
get inaccurate results).
Note, that having lots of processing stages will increase the complexity
of interpreter (it is already not that clean and simple right now).
Although using separate QueryProcessingStage looks pretty natural.
Another option is to make WithMergeableStateAfterAggregation always, but
in this case you will not be able to disable only this optimization,
i.e. if there will be some issue with it.
v2: fix OFFSET
v3: convert 01814_distributed_push_down_limit test to .sh and add retries
v4: add test with OFFSET
v5: add new query stage into the bash completion
v6/tests: use LIMIT O,L syntax over LIMIT L OFFSET O since it is broken in ANTLR parser
https://clickhouse-test-reports.s3.yandex.net/23027/a18a06399b7aeacba7c50b5d1e981ada5df19745/functional_stateless_tests_(antlr_debug).html#fail1
v7/tests: set use_hedged_requests to 0, to avoid excessive log entries on retries
https://clickhouse-test-reports.s3.yandex.net/23027/a18a06399b7aeacba7c50b5d1e981ada5df19745/functional_stateless_tests_flaky_check_(address).html#fail1
TODO (suggested by Nikolai)
1. Build query plan fro current query (inside storage::read) up to WithMergableState
2. Check, that plan is simple enough: Aggregating - Expression - Filter - ReadFromStorage (or simplier)
3. Check, that filter is the same as filter in projection, and also expression calculates the same aggregation keys as in projection
4. Return WithMergableState if projection applies
3 will be easier to do with ActionsDAG, cause it sees all functions, and dependencies are direct (but it is possible with ExpressionActions also)
Also need to figure out how prewhere works for projections, and
row_filter_policies.
wip
Before this patch the following query:
SELECT assumeNotNull(argMax(dummy, 1))
FROM remote('127.1', system.one)
SETTINGS distributed_group_by_no_merge = 2
Leads to:
Code: 10. DB::Exception: Received from localhost:9000. DB::Exception: Not found column argMax(dummy, 1) in block: while executing 'INPUT : 0 -> argMax(dummy, 1) UInt8 : 0'.
Since it tries to execute function one more time, but shards will not
send this column when the query processed with
distributed_group_by_no_merge=2 (i.e. up to
WithMergeableStateAfterAggregation).
v0: no exception
v2: execut window functions
v3: throw exception, since executing window function in this case will
lead to messy output
* master: (155 commits)
Update version_date.tsv after release 20.8.13.15
Update version_date.tsv after release 20.12.7.3
Update version_date.tsv after release 21.1.5.4
Update version_date.tsv after release 21.2.4.6
fix
Add test to skip list
Fix WriteBufferFromHTTPServerResponse usage in other places (add missing finalize())
Fix WriteBufferFromHTTPServerResponse usage in odbc-bridge
Update config.xml
Suppress signed overflow in AggregateFunctionGroupArrayMoving 2
Update BaseDaemon.cpp
review suggestions
Fix bash syntax in 01731_async_task_queue_wait
Do not use view() in 01731_async_task_queue_wait to fix ANTLR parser
Increase buffer for uncaught exception / std::terminate
Even more better
Fix uncaught exception when HTTP client goes away
test for decimal ( p , s) in dictionaries
Just little better
Fixed style check
...
* master: (160 commits)
Make Poco HTTP Server zero-copy again (#19516)
Fixed documentation
ccache 4.2+ does not requires any quirks for SOURCE_DATE_EPOCH
Add a function `htmlOrXmlCoarseParse` to extract content from html or xml format string. (#19600)
Reinterpret function added Decimal, DateTim64 support
Add test
Update InterpreterSelectQuery.cpp
Improved serialization for data types combined of Arrays and Tuples. Improved matching enum data types to protobuf enum type. Fixed serialization of the Map data type. Omitted values are now set by default.
Log stdout and stderr when failed to start docker in integration tests.
Added comment
Don't backport base commit of branch in the same branch (#20628)
Fix fasttest retry for failed tests
Dictionary create source with functions crash fix
Added error reinterpretation tests
Update run.sh
Updated documentation
fix subquery with limit
Rename untyped function reinterpretAs into reinterpret
ignore data store files
Support vhost
...
When distributed_group_by_no_merge=2 is used (or when
optimize_distributed_group_by_sharding_key takes place), remote servers
will do full ORDER BY, so initiator can skip this step and do only merge
of ordered blocks.
* master: (759 commits)
Suppress UBSan report in Decimal comparison
Suppress UBSan report in Decimal comparison
Fix UBSan report in arrayDifference
Update README.md
Non significant change in AggregationCommon
Print stack trace on SIGTRAP
Fix dependent test
Fix tests for better parallel run
Add test for already working code
Revert "Fix access control manager destruction order"
Update index.md
Update index.md
Update index.md
Bit more complicated example for isIPv4String - ru
Bit more complicated example for isIPv4String
cleanup
Replace database with ordinary
Added comments
Split tests to make them stable
Fixes
...
# Conflicts:
# src/Storages/MergeTree/MergeTreeRangeReader.cpp
* master: (247 commits)
Update 01014_lazy_database_basic.sh
Update 00459_group_array_insert_at.sql
Same for ALTER
Syntax and links
Added test.
Check where and prewhere identifiers exist.
Remove redundant lsof
Add lsof to fasttest
Add new line to status
Make integration odbc tests idempotent
Minor code improvement in JOIN
Arcadia does not have bitmaps
Make Fuzzer more reliable
Add missing lsof for fasttest docker image
Update test
The test most likely would not work in Arcadia
Less parser depth
Fix stack overflow in coroutine
Fuzzer: better messages.
Fixed build.
...