This way the remote nodes will not need to send all the rows, so this
will decrease network io and also this will make queries w/
optimize_aggregation_in_order=1/LIMIT X and w/o ORDER BY faster since it
initiator will not need to read all the rows, only first X (but note
that for this you need to your data to be sharded correctly or you may
get inaccurate results).
Note, that having lots of processing stages will increase the complexity
of interpreter (it is already not that clean and simple right now).
Although using separate QueryProcessingStage looks pretty natural.
Another option is to make WithMergeableStateAfterAggregation always, but
in this case you will not be able to disable only this optimization,
i.e. if there will be some issue with it.
v2: fix OFFSET
v3: convert 01814_distributed_push_down_limit test to .sh and add retries
v4: add test with OFFSET
v5: add new query stage into the bash completion
v6/tests: use LIMIT O,L syntax over LIMIT L OFFSET O since it is broken in ANTLR parser
https://clickhouse-test-reports.s3.yandex.net/23027/a18a06399b7aeacba7c50b5d1e981ada5df19745/functional_stateless_tests_(antlr_debug).html#fail1
v7/tests: set use_hedged_requests to 0, to avoid excessive log entries on retries
https://clickhouse-test-reports.s3.yandex.net/23027/a18a06399b7aeacba7c50b5d1e981ada5df19745/functional_stateless_tests_flaky_check_(address).html#fail1
Before this patch clickhouse-client interprets the whole queries and if
"{ echo }" found, it starts echoing queries, but this will make it
impossible to skip some of lines.
The Year 1925 is a starting point because most of the timezones
switched to saner (mostly 15-minutes based) offsets somewhere
during 1924 or before. And that significantly simplifies implementation.
2238 is to simplify arithmetics for sanitizing LUT index access;
there are less than 0x1ffff days from 1925.
* Extended DateLUTImpl internal LUT to 0x1ffff items, some of which
represent negative (pre-1970) time values.
As a collateral benefit, Date now correctly supports dates up to 2149
(instead of 2106).
* Added a new strong typedef ExtendedDayNum, which represents dates
pre-1970 and post 2149.
* Functions that used to return DayNum now return ExtendedDayNum.
* Refactored DateLUTImpl to untie DayNum from the dual role of being
a value and an index (due to negative time). Index is now a different
type LUTIndex with explicit conversion functions from DatNum, time_t,
and ExtendedDayNum.
* Updated DateLUTImpl to properly support values close to epoch start
(1970-01-01 00:00), including negative ones.
* Reduced resolution of DateLUTImpl::Values::time_at_offset_change
to multiple of 15-minutes to allow storing 64-bits of time_t in
DateLUTImpl::Value while keeping same size.
* Minor performance updates to DateLUTImpl when building month LUT
by skipping non-start-of-month days.
* Fixed extractTimeZoneFromFunctionArguments to work correctly
with DateTime64.
* New unit-tests and stateless integration tests for both DateTime
and DateTime64.
* add the query data deduplication excluding duplicated parts in MergeTree family engines.
query deduplication is based on parts' UUID which should be enabled first with merge_tree setting
assign_part_uuids=1
allow_experimental_query_deduplication setting is to enable part deduplication, default ot false.
data part UUID is a mechanism of giving a data part a unique identifier.
Having UUID and deduplication mechanism provides a potential of moving parts
between shards preserving data consistency on a read path:
duplicated UUIDs will cause root executor to retry query against on of the replica explicitly
asking to exclude encountered duplicated fingerprints during a distributed query execution.
NOTE: this implementation don't provide any knobs to lock part and hence its UUID. Any mutations/merge will
update part's UUID.
* add _part_uuid virtual column, allowing to use UUIDs in predicates.
Signed-off-by: Aleksei Semiglazov <asemiglazov@cloudflare.com>
address comments