This use-after-free can be reproduced with distributed queries.
Also note, that this is not sumMappedArray() and friends (that
previously called sumMap()) but Map combinator.
You will find ASan report in details.
<details>
READ of size 8 at 0x62d00012d218 thread T186 (QueryPipelineEx)
2022.07.03 05:09:40.000234 [ 31956 ] {} <Trace> AsynchronousMetrics: MemoryTracking: was 1.23 GiB, peak 1.23 GiB, will set to 1.25 GiB (RSS), difference: 19.51 MiB
2022.07.03 05:09:41.000137 [ 31956 ] {} <Trace> AsynchronousMetrics: MemoryTracking: was 1.25 GiB, peak 1.25 GiB, will set to 1.26 GiB (RSS), difference: 3.76 MiB
#0 0x1233a0d8 in DB::AggregateFunctionSumData<>::get() const build_docker/../src/AggregateFunctions/AggregateFunctionSum.h:245:16
#1 0x1233a0d8 in DB::AggregateFunctionSum<>::insertResultInto(char*, DB::IColumn&, DB::Arena*) const build_docker/../src/AggregateFunctions/AggregateFunctionSum.h:536:70
#2 0x1470f910 in DB::AggregateFunctionMap<char8_t>::insertResultInto() const build_docker/../src/AggregateFunctions/AggregateFunctionMap.h:236:26
#3 0x147110ce in DB::IAggregateFunctionHelper<>::insertResultIntoBatch() const build_docker/../src/AggregateFunctions/IAggregateFunction.h:618:53
#4 0x2c4269d7 in void DB::Aggregator::convertToBlockImplFinal<>() const build_docker/../src/Interpreters/Aggregator.cpp:1878:49
#5 0x2c403b9f in void DB::Aggregator::convertToBlockImpl<>() const build_docker/../src/Interpreters/Aggregator.cpp:1714:13
#6 0x2be09b53 in DB::Aggregator::prepareBlockAndFillSingleLevel() const::$_2::operator()() const build_docker/../src/Interpreters/Aggregator.cpp:2144:9
#7 0x2be09b53 in DB::Block DB::Aggregator::prepareBlockAndFill<>() const build_docker/../src/Interpreters/Aggregator.cpp:2000:5
#8 0x2be09b53 in DB::Aggregator::prepareBlockAndFillSingleLevel() const build_docker/../src/Interpreters/Aggregator.cpp:2150:12
#9 0x2be37de3 in DB::Aggregator::mergeBlocks() build_docker/../src/Interpreters/Aggregator.cpp:3032:17
#10 0x308c27f8 in DB::MergingAggregatedBucketTransform::transform() build_docker/../src/Processors/Transforms/MergingAggregatedMemoryEfficientTransform.cpp:360:37
0x62d00012d218 is located 3608 bytes inside of 32768-byte region [0x62d00012c400,0x62d000134400)
freed by thread T186 (QueryPipelineEx) here:
#0 0xd701312 in free (/work1/azat/tmp/upstream/clickhouse-asan+0xd701312) (BuildId: b7977aef37e9f720)
...
#8 0x2e3c22eb in DB::ColumnAggregateFunction::~ColumnAggregateFunction() build_docker/../src/Columns/ColumnAggregateFunction.cpp:89:1
...
#18 0xd9fcdd4 in std::__1::vector<DB::ColumnWithTypeAndName, std::__1::allocator<DB::ColumnWithTypeAndName> >::~vector() build_docker/../contrib/libcxx/include/vector:401:9
#19 0x2be373f4 in DB::Aggregator::mergeBlocks() build_docker/../contrib/libcxx/include/__memory/unique_ptr.h
#20 0x308c27f8 in DB::MergingAggregatedBucketTransform::transform() build_docker/../src/Processors/Transforms/MergingAggregatedMemoryEfficientTransform.cpp:360:37
previously allocated by thread T186 (QueryPipelineEx) here:
#0 0xd7015be in malloc (/work1/azat/tmp/upstream/clickhouse-asan+0xd7015be) (BuildId: b7977aef37e9f720)
#1 0xd85190a in Allocator<false, false>::allocNoTrack(unsigned long, unsigned long) build_docker/../src/Common/Allocator.h:227:27
#2 0xd988d45 in Allocator<false, false>::alloc(unsigned long, unsigned long) build_docker/../src/Common/Allocator.h:96:16
#3 0xd988d45 in DB::Arena::MemoryChunk::MemoryChunk(unsigned long, DB::Arena::MemoryChunk*) build_docker/../src/Common/Arena.h:54:64
#4 0xd98904b in DB::Arena::addMemoryChunk(unsigned long) build_docker/../src/Common/Arena.h:122:20
#5 0xec9542c in DB::Arena::alignedAlloc(unsigned long, unsigned long) build_docker/../src/Common/Arena.h:171:13
#6 0x1470f123 in DB::AggregateFunctionMap<char8_t>::deserialize() const build_docker/../src/AggregateFunctions/AggregateFunctionMap.h:205:35
</details>
P.S. Thanks to @den-crane for the reproducer.
Fixes: #35359 (cc @den-crane @dongxiao-yang)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
It can be useful to match versions, since in some tables
(system.trace_log) there is only revision column.
P.S. came to this when was digging into stress reports from CI.
P.P.S. case insensitive by analogy with version().
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
SQL functions countSubstrings(), countSubstringsCaseInsensitive(),
countSubstringsUTF8(), position(), positionCaseInsensitive(),
positionUTF8() with non-const pattern argument use fallback sorters
LibCASCIICaseSensitiveStringSearcher and LibCASCIICaseInsensitiveStringSearcher
which call ::strstr(), resp. ::strcasestr(). These functions assume that
the haystack is 0-terminated and they even document that. However, the
callers did not check if the haystack contains 0-byte (perhaps because
its sort of expensive). As a consequence, if the haystack contained a
zero byte in it's payload, matches behind this zero byte were ignored.
create table t (id UInt32, pattern String) engine = MergeTree() order by id;
insert into t values (1, 'x');
select countSubstrings('aaaxxxaa\0xxx', pattern) from t;
We returned 3 before this commit, now we return 6
Seems like the problem is that now data more often come to FillingTransform in multiple chunks. Don't know why it affects the results, will continue investigation.
Order of files on disk is not guarantee to match with:
- order of creation
- lexical order
So sometimes 02305_data4.jsonl comes first and 2 rows is enough to get
schema.
Reorganize checks a little to avoid flakiness.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>