Commit Graph

89658 Commits

Author SHA1 Message Date
Azat Khuzhin
4288d09a85 Do not write expired columns by TTL after merge w/o TTL
Usually second merge do not perform TTL, since everything is up to date,
however in this case TTLTransform is not used, and hence expired_columns
will not be filled for new part, and so those columns will be written
with default values.

Avoid this, by manually filling expired_columns.

Here is a simpler reproducer:

Simple reproducer:

    ```sql
    create table ttl_02262 (date Date, key Int, value String TTL date + interval 1 month) engine=MergeTree order by key settings min_bytes_for_wide_part=0, min_rows_for_wide_part=0;
    insert into ttl_02262 values ('2010-01-01', 2010, 'foo');
    ```

    ```sh
    # ls -l .server/data/default/ttl_02262/all_*
    .server/data/default/ttl_02262/all_1_1_0:
    total 48
    -rw-r----- 1 root root 335 May 26 14:19 checksums.txt
    -rw-r----- 1 root root  76 May 26 14:19 columns.txt
    -rw-r----- 1 root root   1 May 26 14:19 count.txt
    -rw-r----- 1 root root  28 May 26 14:19 date.bin
    -rw-r----- 1 root root  48 May 26 14:19 date.mrk2
    -rw-r----- 1 root root  10 May 26 14:19 default_compression_codec.txt
    -rw-r----- 1 root root  30 May 26 14:19 key.bin
    -rw-r----- 1 root root  48 May 26 14:19 key.mrk2
    -rw-r----- 1 root root   8 May 26 14:19 primary.idx
    -rw-r----- 1 root root  99 May 26 14:19 ttl.txt
    -rw-r----- 1 root root  30 May 26 14:19 value.bin
    -rw-r----- 1 root root  48 May 26 14:19 value.mrk2
    ```

    ```sql
    optimize table ttl_02262 final;
    ```

    ```sh
    .server/data/default/ttl_02262/all_1_1_1:
    total 40
    -rw-r----- 1 root root 279 May 26 14:19 checksums.txt
    -rw-r----- 1 root root  61 May 26 14:19 columns.txt
    -rw-r----- 1 root root   1 May 26 14:19 count.txt
    -rw-r----- 1 root root  28 May 26 14:19 date.bin
    -rw-r----- 1 root root  48 May 26 14:19 date.mrk2
    -rw-r----- 1 root root  10 May 26 14:19 default_compression_codec.txt
    -rw-r----- 1 root root  30 May 26 14:19 key.bin
    -rw-r----- 1 root root  48 May 26 14:19 key.mrk2
    -rw-r----- 1 root root   8 May 26 14:19 primary.idx
    -rw-r----- 1 root root  81 May 26 14:19 ttl.txt
    ```

    ```sql
    optimize table ttl_02262 final;
    ```

    ```sh
    .server/data/default/ttl_02262/all_1_1_2:
    total 48
    -rw-r----- 1 root root 349 May 26 14:20 checksums.txt
    -rw-r----- 1 root root  76 May 26 14:20 columns.txt
    -rw-r----- 1 root root   1 May 26 14:20 count.txt
    -rw-r----- 1 root root  28 May 26 14:20 date.bin
    -rw-r----- 1 root root  48 May 26 14:20 date.mrk2
    -rw-r----- 1 root root  10 May 26 14:20 default_compression_codec.txt
    -rw-r----- 1 root root  30 May 26 14:20 key.bin
    -rw-r----- 1 root root  48 May 26 14:20 key.mrk2
    -rw-r----- 1 root root   8 May 26 14:20 primary.idx
    -rw-r----- 1 root root  81 May 26 14:20 ttl.txt
    -rw-r----- 1 root root  27 May 26 14:20 value.bin
    -rw-r----- 1 root root  48 May 26 14:20 value.mrk2
    ```

And now we have `value.*` for all_1_1_2, this should not happen.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-26 20:14:10 +03:00
Azat Khuzhin
8328d7068b Fix updating of MergeTreeDataPartTTLInfo::finished
Previously you cannot distinguish non-initialized finished with
initialized to false, so update() cannot do the correct thing.

Rename the field to avoid hidden usage.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-26 20:14:10 +03:00
Azat Khuzhin
0de1a64436 Log empty parts in IMergedBlockOutputStream::removeEmptyColumnsFromPart()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-26 20:14:09 +03:00
Dmitry Novik
673a521d0b
Merge pull request #34775 from azat/union-type-cast
RFC: Fix converting types for UNION queries (may produce LOGICAL_ERROR)
2022-05-26 17:28:23 +02:00
Azat Khuzhin
dc9ca3d70c
Fix LOGICAL_ERROR in getMaxSourcePartsSizeForMerge during merges (#37413) 2022-05-26 14:14:58 +02:00
Nikolai Kochetov
fea2401f1f
Merge pull request #37532 from ClickHouse/add-separate-mutex-for-factories-info
Use a separate mutex for query_factories_info in Context.
2022-05-26 13:03:28 +02:00
Maksim Kita
3a92e61827
Merge pull request #37148 from kitaisreal/dictionary-get-descendants-performance-improvement
Dictionary getDescendants performance improvement
2022-05-26 12:29:17 +02:00
Antonio Andelic
fe236c98d5
Merge pull request #37534 from ClickHouse/revert-37036-keeper-preprocess-operations
Revert "Add support for preprocessing ZooKeeper operations in `clickhouse-keeper`"
2022-05-26 08:14:46 +02:00
Sergei Trifonov
eedddf86fd
Merge pull request #37552 from ClickHouse/serxa-patch-1
fix root CMakeLists.txt search
2022-05-26 04:41:07 +02:00
Sergei Trifonov
417296481e
fix root CMakeLists.txt search 2022-05-26 04:39:02 +02:00
Dmitry Novik
5c3c994d2a
Merge pull request #37493 from ClickHouse/grouping-sets-optimization-fix
Fix ORDER BY optimization in case of GROUPING SETS
2022-05-26 02:25:02 +02:00
Alexey Milovidov
f321925032
Merge pull request #36341 from ClickHouse/allow-setuid-inside-clickhouse
Allow to drop privileges at startup
2022-05-26 01:07:04 +03:00
Maksim Kita
58cd1bd3ec
Merge pull request #36843 from bharatnc/ncb/h3-unidirectionaledges-funcs
add h3 unidirectional edge functions
2022-05-25 22:46:40 +02:00
Maksim Kita
bee3c30f66
Merge pull request #37524 from kitaisreal/geo-distance-functions-improve-performance
Geo distance functions improve performance
2022-05-25 22:40:40 +02:00
Maksim Kita
b12b363158 Fixed build of hierarchical index for HashedArrayDictionary 2022-05-25 22:40:19 +02:00
alesapin
bf0da38d6f
Merge pull request #37402 from DanRoscigno/origin/67-replace-zookeeper-to-clickhouse-keeper-in-docs-and-tutorials
add ClickHouse Keeper to doc pages describing ZooKeeper use
2022-05-25 22:24:56 +02:00
Robert Schulze
7543841438
Merge pull request #37518 from ClickHouse/bump-cctz-to-2022-05-15
Bump cctz to 2022-05-15
2022-05-25 22:14:41 +02:00
Alexander Tokmakov
6ca6b267fa
Merge pull request #37545 from ClickHouse/revert-37424-fix_fetching_part_deadlock
Revert "(only with zero-copy replication, non-production experimental feature not recommended to use) fix possible deadlock during fetching part"
2022-05-25 23:11:16 +03:00
Alexander Tokmakov
47820c216d
Revert "(only with zero-copy replication, non-production experimental feature not recommended to use) fix possible deadlock during fetching part" 2022-05-25 23:10:33 +03:00
Robert Schulze
23378ab67b
Merge pull request #37520 from ClickHouse/update-3rd-party-contribution-guide
Update 3rd party contribution guide
2022-05-25 21:54:49 +02:00
Alexey Milovidov
abf2558fba
Merge pull request #37491 from ClickHouse/match_refactoring
Refactorings of LIKE/MATCH code
2022-05-25 22:05:38 +03:00
Alexey Milovidov
4482da9eb6
Update greatCircleDistance.cpp 2022-05-25 21:59:31 +03:00
Alexey Milovidov
de90c6e6c0
Merge pull request #37533 from ClickHouse/fixes-architecture-doc
Update architecture.md
2022-05-25 21:57:26 +03:00
Alexey Milovidov
5ecde38b40
Merge pull request #37541 from ClickHouse/blinkov-patch-23
Update SECURITY.md
2022-05-25 21:54:05 +03:00
alesapin
620ab399c9
Update docs/en/operations/clickhouse-keeper.md 2022-05-25 20:23:24 +02:00
alesapin
51868a9a4f
Merge pull request #37424 from metahys/fix_fetching_part_deadlock
(only with zero-copy replication, non-production experimental feature not recommended to use) fix possible deadlock during fetching part
2022-05-25 20:15:41 +02:00
Azat Khuzhin
a813f5996e Fix converting types for UNION queries (may produce LOGICAL_ERROR)
CI founds [1]:

    2022.02.20 15:14:23.969247 [ 492 ] {} <Fatal> BaseDaemon: (version 22.3.1.1, build id: 6082C357CFA6FF99) (from thread 472) (query_id: a5187ff9-962a-4e7c-86f6-8d48850a47d6) (query: SELECT 0., round(avgWeighted(x, y)) FROM (SELECT toDate(toDate('214748364.8', '-922337203.6854775808', '-0.1', NULL) - NULL, 10.000100135803223, '-2147483647'), 255 AS x, -2147483647 AS y UNION ALL SELECT y, NULL AS x, 2147483646 AS y)) Received signal Aborted (6)

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/0/26d0e5438c86e52a145aaaf4cb523c399989a878/fuzzer_astfuzzerdebug,actions//report.html

The problem is that subqueries returns different headers:
- first query  -- x, y
- second query -- y, x

v2: Make order of columns strict only for UNION
    https://s3.amazonaws.com/clickhouse-test-reports/34775/9cc8c01a463d18c471853568b2f0af659a4e643f/stateless_tests__address__actions__[2/2].html
    Fixes: 00597_push_down_predicate_long
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-05-25 20:31:47 +03:00
Nikolai Kochetov
ff98c24d44
Merge pull request #37048 from Avogar/fix-array-map-nothing
Add default implementation for Nothing in functions
2022-05-25 19:10:40 +02:00
Ivan Blinkov
df84be9b43
Update SECURITY.md 2022-05-25 20:04:20 +03:00
Alexey Milovidov
97c5a4c725
Update SECURITY.md 2022-05-25 20:04:15 +03:00
Nikolai Kochetov
1f9b1cf726 Fixing build. 2022-05-25 18:59:46 +02:00
alesapin
0a3597da72
Merge pull request #34915 from ianton-ru/MDB-16962
Fix collision of S3 operation log revision
2022-05-25 18:15:31 +02:00
Alexey Milovidov
cb92482ca5
Merge pull request #37484 from kitaisreal/function-has-all-avx2-dynamic-dispatch
Function hasAll added dynamic dispatch
2022-05-25 19:05:32 +03:00
Nikolai Kochetov
7b681fa8ac Fixing build. 2022-05-25 17:15:23 +02:00
Antonio Andelic
6a962549d5
Revert "Add support for preprocessing ZooKeeper operations in clickhouse-keeper" 2022-05-25 16:45:32 +02:00
Igor Nikonov
4f09a0c431
Update architecture.md
Updated broken links in Functions section
2022-05-25 16:27:17 +02:00
mergify[bot]
73662b4436
Merge branch 'master' into fix_fetching_part_deadlock 2022-05-25 14:22:35 +00:00
Maksim Kita
28355114c0 Fixed tests 2022-05-25 16:19:29 +02:00
Nikolai Kochetov
6370c29049 Use a separate mutex for query_factories_info in Context. 2022-05-25 14:16:59 +00:00
mergify[bot]
f49552d48e
Merge branch 'master' into grouping-sets-optimization-fix 2022-05-25 13:03:54 +00:00
Maksim Kita
45da28ecae Improve performance of geo distance functions 2022-05-25 14:22:22 +02:00
Robert Schulze
c743fef3ae
Update 3rd party contribution guide
- replace obsolete references to clickhouse-extra to clickhouse

- generally rewrite the guide and make it easier to understand
2022-05-25 13:46:05 +02:00
Robert Schulze
90deef1c3c
Bump cctz to 2022-05-15 2022-05-25 12:21:05 +02:00
Maksim Kita
c372c3d6aa Fix performance tests 2022-05-25 11:49:59 +02:00
Kseniia Sumarokova
b50d4549c9
Merge pull request #37356 from amosbird/partition-prune-for-s3
"Partition pruning" for s3
2022-05-25 11:03:07 +02:00
Robert Schulze
05e4fa7df1
Fix special case of trivial regexp
Previously, we would alsays set 1 in case of a trivial regex (which is
correct). If someone in future builds a negated operator, then this
will produce wrong results. Right now, negation of regexp (SQL: NOT
MATCH) is implemented at a higher level, so we are safe and this is more
a preventive fix.
2022-05-25 10:05:55 +02:00
Robert Schulze
01ab7b9bad
Pass strings in some places as string_view
The original goal was to get change

  const auto & needle = String(
        reinterpret_cast<const char *>(cur_needle_data),
        cur_needle_length);

in Functions/MatchImpl.h into a std::string_view to save an allocation +
copy. The needle is eventually passed as search pattern into the re2
library. Re2 has an alternative constructor taking a const char * i.e. a
NULL-terminated string. Here, the needle is NULL-terminated but
1. this is only because it is passed inside a ColumnString yet this is
   not always the case (e.g. fixed string columns has a dense layout w/o
   NULL terminator).
2. assuming NULL termination for users != MatchImpl of the regex code is
   too dangerous.

So, for now we'll stay with copying to be on the safe side. One fine day
when re2 has a ptr/size ctor, we can use std::string_view.

Just changing a few other places from std::string to std::string_view
but this will not help with performance.
2022-05-25 10:05:51 +02:00
Robert Schulze
e8c96777f6
Make OptimizedRegularExpression::analyze() private 2022-05-25 10:05:45 +02:00
Robert Schulze
040fbf3686
Tighter sanity checks in matching code 2022-05-25 10:05:06 +02:00
Robert Schulze
35bef17302
Introduce variables to hold the match result
--> nicer when debugging
2022-05-25 10:04:47 +02:00