The Year 1925 is a starting point because most of the timezones
switched to saner (mostly 15-minutes based) offsets somewhere
during 1924 or before. And that significantly simplifies implementation.
2238 is to simplify arithmetics for sanitizing LUT index access;
there are less than 0x1ffff days from 1925.
* Extended DateLUTImpl internal LUT to 0x1ffff items, some of which
represent negative (pre-1970) time values.
As a collateral benefit, Date now correctly supports dates up to 2149
(instead of 2106).
* Added a new strong typedef ExtendedDayNum, which represents dates
pre-1970 and post 2149.
* Functions that used to return DayNum now return ExtendedDayNum.
* Refactored DateLUTImpl to untie DayNum from the dual role of being
a value and an index (due to negative time). Index is now a different
type LUTIndex with explicit conversion functions from DatNum, time_t,
and ExtendedDayNum.
* Updated DateLUTImpl to properly support values close to epoch start
(1970-01-01 00:00), including negative ones.
* Reduced resolution of DateLUTImpl::Values::time_at_offset_change
to multiple of 15-minutes to allow storing 64-bits of time_t in
DateLUTImpl::Value while keeping same size.
* Minor performance updates to DateLUTImpl when building month LUT
by skipping non-start-of-month days.
* Fixed extractTimeZoneFromFunctionArguments to work correctly
with DateTime64.
* New unit-tests and stateless integration tests for both DateTime
and DateTime64.
* add the query data deduplication excluding duplicated parts in MergeTree family engines.
query deduplication is based on parts' UUID which should be enabled first with merge_tree setting
assign_part_uuids=1
allow_experimental_query_deduplication setting is to enable part deduplication, default ot false.
data part UUID is a mechanism of giving a data part a unique identifier.
Having UUID and deduplication mechanism provides a potential of moving parts
between shards preserving data consistency on a read path:
duplicated UUIDs will cause root executor to retry query against on of the replica explicitly
asking to exclude encountered duplicated fingerprints during a distributed query execution.
NOTE: this implementation don't provide any knobs to lock part and hence its UUID. Any mutations/merge will
update part's UUID.
* add _part_uuid virtual column, allowing to use UUIDs in predicates.
Signed-off-by: Aleksei Semiglazov <asemiglazov@cloudflare.com>
address comments
* Use only |name_parts| as primary name source
* Restore legacy logic for table restoration
* Fix build
* Fix tests
* Add pytest server config
* Fix tests
* Fixes due to review
With '\n...' after the query [1] clickhouse-client prefer data from the
INSERT over from stdin, and produce very tricky message:
Code: 27. DB::Exception: Cannot parse input: expected '\n' before: ' ': (at row 1)
Well for TSV it is ok, but for RowBinary:
Code: 33. DB::Exception: Cannot read all data. Bytes read: 1. Bytes expected: 4.
So improve error message by adding the source of data for INSERT.
[1]: clickhouse-client -q "INSERT INTO data FORMAT TSV\n " <<<2
Add inter-server cluster secret, it is used for Distributed queries
inside cluster, you can configure in the configuration file:
<remote_servers>
<logs>
<shard>
<secret>foobar</secret> <!-- empty -- works as before -->
...
</shard>
</logs>
</remote_servers>
And this will allow clickhouse to make sure that the query was not
faked, and was issued from the node that knows the secret. And since
trust appeared it can use initial_user for query execution, this will
apply correct *_for_user (since with inter-server secret enabled, the
query will be executed from the same user on the shards as on initator,
unlike "default" user w/o it).
v2: Change user to the initial_user for Distributed queries if secret match
v3: Add Protocol::Cluster package
v4: Drop Protocol::Cluster and use plain Protocol::Hello + user marker
v5: Do not use user from Hello for cluster-secure (superfluous)