Azat Khuzhin
4288d09a85
Do not write expired columns by TTL after merge w/o TTL
...
Usually second merge do not perform TTL, since everything is up to date,
however in this case TTLTransform is not used, and hence expired_columns
will not be filled for new part, and so those columns will be written
with default values.
Avoid this, by manually filling expired_columns.
Here is a simpler reproducer:
Simple reproducer:
```sql
create table ttl_02262 (date Date, key Int, value String TTL date + interval 1 month) engine=MergeTree order by key settings min_bytes_for_wide_part=0, min_rows_for_wide_part=0;
insert into ttl_02262 values ('2010-01-01', 2010, 'foo');
```
```sh
# ls -l .server/data/default/ttl_02262/all_*
.server/data/default/ttl_02262/all_1_1_0:
total 48
-rw-r----- 1 root root 335 May 26 14:19 checksums.txt
-rw-r----- 1 root root 76 May 26 14:19 columns.txt
-rw-r----- 1 root root 1 May 26 14:19 count.txt
-rw-r----- 1 root root 28 May 26 14:19 date.bin
-rw-r----- 1 root root 48 May 26 14:19 date.mrk2
-rw-r----- 1 root root 10 May 26 14:19 default_compression_codec.txt
-rw-r----- 1 root root 30 May 26 14:19 key.bin
-rw-r----- 1 root root 48 May 26 14:19 key.mrk2
-rw-r----- 1 root root 8 May 26 14:19 primary.idx
-rw-r----- 1 root root 99 May 26 14:19 ttl.txt
-rw-r----- 1 root root 30 May 26 14:19 value.bin
-rw-r----- 1 root root 48 May 26 14:19 value.mrk2
```
```sql
optimize table ttl_02262 final;
```
```sh
.server/data/default/ttl_02262/all_1_1_1:
total 40
-rw-r----- 1 root root 279 May 26 14:19 checksums.txt
-rw-r----- 1 root root 61 May 26 14:19 columns.txt
-rw-r----- 1 root root 1 May 26 14:19 count.txt
-rw-r----- 1 root root 28 May 26 14:19 date.bin
-rw-r----- 1 root root 48 May 26 14:19 date.mrk2
-rw-r----- 1 root root 10 May 26 14:19 default_compression_codec.txt
-rw-r----- 1 root root 30 May 26 14:19 key.bin
-rw-r----- 1 root root 48 May 26 14:19 key.mrk2
-rw-r----- 1 root root 8 May 26 14:19 primary.idx
-rw-r----- 1 root root 81 May 26 14:19 ttl.txt
```
```sql
optimize table ttl_02262 final;
```
```sh
.server/data/default/ttl_02262/all_1_1_2:
total 48
-rw-r----- 1 root root 349 May 26 14:20 checksums.txt
-rw-r----- 1 root root 76 May 26 14:20 columns.txt
-rw-r----- 1 root root 1 May 26 14:20 count.txt
-rw-r----- 1 root root 28 May 26 14:20 date.bin
-rw-r----- 1 root root 48 May 26 14:20 date.mrk2
-rw-r----- 1 root root 10 May 26 14:20 default_compression_codec.txt
-rw-r----- 1 root root 30 May 26 14:20 key.bin
-rw-r----- 1 root root 48 May 26 14:20 key.mrk2
-rw-r----- 1 root root 8 May 26 14:20 primary.idx
-rw-r----- 1 root root 81 May 26 14:20 ttl.txt
-rw-r----- 1 root root 27 May 26 14:20 value.bin
-rw-r----- 1 root root 48 May 26 14:20 value.mrk2
```
And now we have `value.*` for all_1_1_2, this should not happen.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-26 20:14:10 +03:00
Azat Khuzhin
8328d7068b
Fix updating of MergeTreeDataPartTTLInfo::finished
...
Previously you cannot distinguish non-initialized finished with
initialized to false, so update() cannot do the correct thing.
Rename the field to avoid hidden usage.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-26 20:14:10 +03:00
Azat Khuzhin
0de1a64436
Log empty parts in IMergedBlockOutputStream::removeEmptyColumnsFromPart()
...
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-26 20:14:09 +03:00
Dmitry Novik
673a521d0b
Merge pull request #34775 from azat/union-type-cast
...
RFC: Fix converting types for UNION queries (may produce LOGICAL_ERROR)
2022-05-26 17:28:23 +02:00
Azat Khuzhin
dc9ca3d70c
Fix LOGICAL_ERROR in getMaxSourcePartsSizeForMerge during merges ( #37413 )
2022-05-26 14:14:58 +02:00
Nikolai Kochetov
fea2401f1f
Merge pull request #37532 from ClickHouse/add-separate-mutex-for-factories-info
...
Use a separate mutex for query_factories_info in Context.
2022-05-26 13:03:28 +02:00
Maksim Kita
3a92e61827
Merge pull request #37148 from kitaisreal/dictionary-get-descendants-performance-improvement
...
Dictionary getDescendants performance improvement
2022-05-26 12:29:17 +02:00
Antonio Andelic
fe236c98d5
Merge pull request #37534 from ClickHouse/revert-37036-keeper-preprocess-operations
...
Revert "Add support for preprocessing ZooKeeper operations in `clickhouse-keeper`"
2022-05-26 08:14:46 +02:00
Sergei Trifonov
eedddf86fd
Merge pull request #37552 from ClickHouse/serxa-patch-1
...
fix root CMakeLists.txt search
2022-05-26 04:41:07 +02:00
Sergei Trifonov
417296481e
fix root CMakeLists.txt search
2022-05-26 04:39:02 +02:00
Dmitry Novik
5c3c994d2a
Merge pull request #37493 from ClickHouse/grouping-sets-optimization-fix
...
Fix ORDER BY optimization in case of GROUPING SETS
2022-05-26 02:25:02 +02:00
Alexey Milovidov
f321925032
Merge pull request #36341 from ClickHouse/allow-setuid-inside-clickhouse
...
Allow to drop privileges at startup
2022-05-26 01:07:04 +03:00
Maksim Kita
58cd1bd3ec
Merge pull request #36843 from bharatnc/ncb/h3-unidirectionaledges-funcs
...
add h3 unidirectional edge functions
2022-05-25 22:46:40 +02:00
Maksim Kita
bee3c30f66
Merge pull request #37524 from kitaisreal/geo-distance-functions-improve-performance
...
Geo distance functions improve performance
2022-05-25 22:40:40 +02:00
Maksim Kita
b12b363158
Fixed build of hierarchical index for HashedArrayDictionary
2022-05-25 22:40:19 +02:00
alesapin
bf0da38d6f
Merge pull request #37402 from DanRoscigno/origin/67-replace-zookeeper-to-clickhouse-keeper-in-docs-and-tutorials
...
add ClickHouse Keeper to doc pages describing ZooKeeper use
2022-05-25 22:24:56 +02:00
Robert Schulze
7543841438
Merge pull request #37518 from ClickHouse/bump-cctz-to-2022-05-15
...
Bump cctz to 2022-05-15
2022-05-25 22:14:41 +02:00
Alexander Tokmakov
6ca6b267fa
Merge pull request #37545 from ClickHouse/revert-37424-fix_fetching_part_deadlock
...
Revert "(only with zero-copy replication, non-production experimental feature not recommended to use) fix possible deadlock during fetching part"
2022-05-25 23:11:16 +03:00
Alexander Tokmakov
47820c216d
Revert "(only with zero-copy replication, non-production experimental feature not recommended to use) fix possible deadlock during fetching part"
2022-05-25 23:10:33 +03:00
Robert Schulze
23378ab67b
Merge pull request #37520 from ClickHouse/update-3rd-party-contribution-guide
...
Update 3rd party contribution guide
2022-05-25 21:54:49 +02:00
Alexey Milovidov
abf2558fba
Merge pull request #37491 from ClickHouse/match_refactoring
...
Refactorings of LIKE/MATCH code
2022-05-25 22:05:38 +03:00
Alexey Milovidov
4482da9eb6
Update greatCircleDistance.cpp
2022-05-25 21:59:31 +03:00
Alexey Milovidov
de90c6e6c0
Merge pull request #37533 from ClickHouse/fixes-architecture-doc
...
Update architecture.md
2022-05-25 21:57:26 +03:00
Alexey Milovidov
5ecde38b40
Merge pull request #37541 from ClickHouse/blinkov-patch-23
...
Update SECURITY.md
2022-05-25 21:54:05 +03:00
alesapin
620ab399c9
Update docs/en/operations/clickhouse-keeper.md
2022-05-25 20:23:24 +02:00
alesapin
51868a9a4f
Merge pull request #37424 from metahys/fix_fetching_part_deadlock
...
(only with zero-copy replication, non-production experimental feature not recommended to use) fix possible deadlock during fetching part
2022-05-25 20:15:41 +02:00
Azat Khuzhin
a813f5996e
Fix converting types for UNION queries (may produce LOGICAL_ERROR)
...
CI founds [1]:
2022.02.20 15:14:23.969247 [ 492 ] {} <Fatal> BaseDaemon: (version 22.3.1.1, build id: 6082C357CFA6FF99) (from thread 472) (query_id: a5187ff9-962a-4e7c-86f6-8d48850a47d6) (query: SELECT 0., round(avgWeighted(x, y)) FROM (SELECT toDate(toDate('214748364.8', '-922337203.6854775808', '-0.1', NULL) - NULL, 10.000100135803223, '-2147483647'), 255 AS x, -2147483647 AS y UNION ALL SELECT y, NULL AS x, 2147483646 AS y)) Received signal Aborted (6)
[1]: https://s3.amazonaws.com/clickhouse-test-reports/0/26d0e5438c86e52a145aaaf4cb523c399989a878/fuzzer_astfuzzerdebug,actions//report.html
The problem is that subqueries returns different headers:
- first query -- x, y
- second query -- y, x
v2: Make order of columns strict only for UNION
https://s3.amazonaws.com/clickhouse-test-reports/34775/9cc8c01a463d18c471853568b2f0af659a4e643f/stateless_tests__address__actions__[2/2].html
Fixes: 00597_push_down_predicate_long
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-25 20:31:47 +03:00
Nikolai Kochetov
ff98c24d44
Merge pull request #37048 from Avogar/fix-array-map-nothing
...
Add default implementation for Nothing in functions
2022-05-25 19:10:40 +02:00
Ivan Blinkov
df84be9b43
Update SECURITY.md
2022-05-25 20:04:20 +03:00
Alexey Milovidov
97c5a4c725
Update SECURITY.md
2022-05-25 20:04:15 +03:00
Nikolai Kochetov
1f9b1cf726
Fixing build.
2022-05-25 18:59:46 +02:00
alesapin
0a3597da72
Merge pull request #34915 from ianton-ru/MDB-16962
...
Fix collision of S3 operation log revision
2022-05-25 18:15:31 +02:00
Alexey Milovidov
cb92482ca5
Merge pull request #37484 from kitaisreal/function-has-all-avx2-dynamic-dispatch
...
Function hasAll added dynamic dispatch
2022-05-25 19:05:32 +03:00
Nikolai Kochetov
7b681fa8ac
Fixing build.
2022-05-25 17:15:23 +02:00
Antonio Andelic
6a962549d5
Revert "Add support for preprocessing ZooKeeper operations in clickhouse-keeper
"
2022-05-25 16:45:32 +02:00
Igor Nikonov
4f09a0c431
Update architecture.md
...
Updated broken links in Functions section
2022-05-25 16:27:17 +02:00
mergify[bot]
73662b4436
Merge branch 'master' into fix_fetching_part_deadlock
2022-05-25 14:22:35 +00:00
Maksim Kita
28355114c0
Fixed tests
2022-05-25 16:19:29 +02:00
Nikolai Kochetov
6370c29049
Use a separate mutex for query_factories_info in Context.
2022-05-25 14:16:59 +00:00
mergify[bot]
f49552d48e
Merge branch 'master' into grouping-sets-optimization-fix
2022-05-25 13:03:54 +00:00
Maksim Kita
45da28ecae
Improve performance of geo distance functions
2022-05-25 14:22:22 +02:00
Robert Schulze
c743fef3ae
Update 3rd party contribution guide
...
- replace obsolete references to clickhouse-extra to clickhouse
- generally rewrite the guide and make it easier to understand
2022-05-25 13:46:05 +02:00
Robert Schulze
90deef1c3c
Bump cctz to 2022-05-15
2022-05-25 12:21:05 +02:00
Maksim Kita
c372c3d6aa
Fix performance tests
2022-05-25 11:49:59 +02:00
Kseniia Sumarokova
b50d4549c9
Merge pull request #37356 from amosbird/partition-prune-for-s3
...
"Partition pruning" for s3
2022-05-25 11:03:07 +02:00
Robert Schulze
05e4fa7df1
Fix special case of trivial regexp
...
Previously, we would alsays set 1 in case of a trivial regex (which is
correct). If someone in future builds a negated operator, then this
will produce wrong results. Right now, negation of regexp (SQL: NOT
MATCH) is implemented at a higher level, so we are safe and this is more
a preventive fix.
2022-05-25 10:05:55 +02:00
Robert Schulze
01ab7b9bad
Pass strings in some places as string_view
...
The original goal was to get change
const auto & needle = String(
reinterpret_cast<const char *>(cur_needle_data),
cur_needle_length);
in Functions/MatchImpl.h into a std::string_view to save an allocation +
copy. The needle is eventually passed as search pattern into the re2
library. Re2 has an alternative constructor taking a const char * i.e. a
NULL-terminated string. Here, the needle is NULL-terminated but
1. this is only because it is passed inside a ColumnString yet this is
not always the case (e.g. fixed string columns has a dense layout w/o
NULL terminator).
2. assuming NULL termination for users != MatchImpl of the regex code is
too dangerous.
So, for now we'll stay with copying to be on the safe side. One fine day
when re2 has a ptr/size ctor, we can use std::string_view.
Just changing a few other places from std::string to std::string_view
but this will not help with performance.
2022-05-25 10:05:51 +02:00
Robert Schulze
e8c96777f6
Make OptimizedRegularExpression::analyze() private
2022-05-25 10:05:45 +02:00
Robert Schulze
040fbf3686
Tighter sanity checks in matching code
2022-05-25 10:05:06 +02:00
Robert Schulze
35bef17302
Introduce variables to hold the match result
...
--> nicer when debugging
2022-05-25 10:04:47 +02:00