Commit Graph

36482 Commits

Author SHA1 Message Date
Nikolai Kochetov
b877c484d2
Merge pull request #45481 from ClickHouse/fix-deadlock-with-allow_asynchronous_read_from_io_pool_for_merge_tree
Fix possible deadlock with allow_asynchronous_read_from_io_pool_for_merge_tree in case of exception from ThreadPool::schedule
2023-01-21 12:05:34 +01:00
Nikolai Kochetov
ec1e2436cc
Merge pull request #45450 from ClickHouse/fix-disabled-two-level-agg
Fix disabled two-level aggregation from HTTP
2023-01-21 12:01:59 +01:00
Sema Checherinda
962894afc8
Merge pull request #44909 from CheSema/intersect-prev-part
Do not merge over a gap with outdated undeleted parts
2023-01-21 11:51:21 +01:00
Maksim Kita
47385a19e7 Remove unnecessary getTotalRowCount function calls 2023-01-21 11:27:07 +01:00
Azat Khuzhin
a64f6b5f3e Fix possible (likely distributed) query hung
Recently I saw the following, the client executed long distributed query
and terminated the connection, and in this case query cancellation will
be done from PullingAsyncPipelineExecutor dtor, but during cancellation
one of nodes sent ECONNRESET, and this leads to an exception from
PullingAsyncPipelineExecutor::cancel(), and this leads to a deadlock
when multiple threads waits each others, because cancel() for
LazyOutputFormat wasn't called.

Here is as relevant portion of logs:

    2023.01.04 08:26:09.236208 [ 37968 ] {f2ed6149-146d-4a3d-874a-b0b751c7b567} <Debug> executeQuery: (from 10.61.13.253:44266, user: default)  TooLongDistributedQueryToPost
    ...
    2023.01.04 08:26:09.262424 [ 37968 ] {f2ed6149-146d-4a3d-874a-b0b751c7b567} <Trace> MergeTreeInOrderSelectProcessor: Reading 1 ranges in order from part 9_330_538_18, approx. 61440 rows starting from 0
    2023.01.04 08:26:09.266399 [ 26788 ] {f2ed6149-146d-4a3d-874a-b0b751c7b567} <Trace> Connection (s4.ch:9000): Connecting. Database: (not specified). User: default
    2023.01.04 08:26:09.266849 [ 26788 ] {f2ed6149-146d-4a3d-874a-b0b751c7b567} <Trace> Connection (s4.ch:9000): Connected to ClickHouse server version 22.10.1.
    2023.01.04 08:26:09.267165 [ 26788 ] {f2ed6149-146d-4a3d-874a-b0b751c7b567} <Debug> Connection (s4.ch:9000): Sent data for 2 scalars, total 2 rows in 3.1587e-05 sec., 62635 rows/sec., 68.00 B (2.03 MiB/sec.), compressed 0.4594594594594595 times to 148.00 B (4.41 MiB/sec.)
    2023.01.04 08:39:13.047170 [ 37968 ] {f2ed6149-146d-4a3d-874a-b0b751c7b567} <Error> PullingAsyncPipelineExecutor: Code: 210. DB::NetException: Connection reset by peer, while writing to socket (10.7.142.115:9000). (NETWORK_ERROR), Stack trace (when copying this message, always include the lines below):

    0. ./.build/./contrib/libcxx/include/exception:133: Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x1818234c in /usr/lib/debug/usr/bin/clickhouse.debug
    1. ./.build/./src/Common/Exception.cpp:69: DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0x1004fbda in /usr/lib/debug/usr/bin/clickhouse.debug
    2. ./.build/./src/Common/NetException.h:12: DB::WriteBufferFromPocoSocket::nextImpl() @ 0x14e352f3 in /usr/lib/debug/usr/bin/clickhouse.debug
    3. ./.build/./src/IO/BufferBase.h:39: DB::Connection::sendCancel() @ 0x15c21e6b in /usr/lib/debug/usr/bin/clickhouse.debug
    4. ./.build/./src/Client/MultiplexedConnections.cpp:0: DB::MultiplexedConnections::sendCancel() @ 0x15c4d5b7 in /usr/lib/debug/usr/bin/clickhouse.debug
    5. ./.build/./src/QueryPipeline/RemoteQueryExecutor.cpp:627: DB::RemoteQueryExecutor::tryCancel(char const*, std::__1::unique_ptr<DB::RemoteQueryExecutorReadContext, std::__1::default_delete<DB::RemoteQueryExecutorReadContext> >*) @ 0x14446c09 in /usr/lib/debug/usr/bin/clickhouse.debug
    6. ./.build/./contrib/libcxx/include/__iterator/wrap_iter.h💯 DB::ExecutingGraph::cancel() @ 0x15d2c0de in /usr/lib/debug/usr/bin/clickhouse.debug
    7. ./.build/./contrib/libcxx/include/__memory/unique_ptr.h:300: DB::PullingAsyncPipelineExecutor::cancel() @ 0x15d32055 in /usr/lib/debug/usr/bin/clickhouse.debug
    8. ./.build/./contrib/libcxx/include/__memory/unique_ptr.h:312: DB::PullingAsyncPipelineExecutor::~PullingAsyncPipelineExecutor() @ 0x15d31f4f in /usr/lib/debug/usr/bin/clickhouse.debug
    9. ./.build/./src/Server/TCPHandler.cpp:0: DB::TCPHandler::processOrdinaryQueryWithProcessors() @ 0x15cde919 in /usr/lib/debug/usr/bin/clickhouse.debug
    10. ./.build/./src/Server/TCPHandler.cpp:0: DB::TCPHandler::runImpl() @ 0x15cd8554 in /usr/lib/debug/usr/bin/clickhouse.debug
    11. ./.build/./src/Server/TCPHandler.cpp:1904: DB::TCPHandler::run() @ 0x15ce6479 in /usr/lib/debug/usr/bin/clickhouse.debug
    12. ./.build/./contrib/poco/Net/src/TCPServerConnection.cpp:57: Poco::Net::TCPServerConnection::start() @ 0x18074f07 in /usr/lib/debug/usr/bin/clickhouse.debug
    13. ./.build/./contrib/libcxx/include/__memory/unique_ptr.h:54: Poco::Net::TCPServerDispatcher::run() @ 0x180753ed in /usr/lib/debug/usr/bin/clickhouse.debug
    14. ./.build/./contrib/poco/Foundation/src/ThreadPool.cpp:213: Poco::PooledThread::run() @ 0x181e3807 in /usr/lib/debug/usr/bin/clickhouse.debug
    15. ./.build/./contrib/poco/Foundation/include/Poco/SharedPtr.h:156: Poco::ThreadImpl::runnableEntry(void*) @ 0x181e1483 in /usr/lib/debug/usr/bin/clickhouse.debug
    16. ? @ 0x7ffff7e55fd4 in ?
    17. ? @ 0x7ffff7ed666c in ?
     (version 22.10.1.1)

And here is the state of the threads:

<details>

<summary>system.stack_trace</summary>

```sql
SELECT
    arrayStringConcat(arrayMap(x -> demangle(addressToSymbol(x)), trace), '\n') AS sym
FROM system.stack_trace
WHERE query_id = 'f2ed6149-146d-4a3d-874a-b0b751c7b567'
SETTINGS allow_introspection_functions=1

Row 1:
──────
sym:
pthread_cond_wait
std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&)
bool ConcurrentBoundedQueue<DB::Chunk>::emplaceImpl<DB::Chunk>(std::__1::optional<unsigned long>, DB::Chunk&&)
DB::IOutputFormat::work()
DB::ExecutionThreadContext::executeTask()
DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*)

Row 2:
──────
sym:
pthread_cond_wait
Poco::EventImpl::waitImpl()
DB::PipelineExecutor::joinThreads()
DB::PipelineExecutor::executeImpl(unsigned long)
DB::PipelineExecutor::execute(unsigned long)

Row 3:
──────
sym:
pthread_cond_wait
Poco::EventImpl::waitImpl()
DB::PullingAsyncPipelineExecutor::Data::~Data()
DB::PullingAsyncPipelineExecutor::~PullingAsyncPipelineExecutor()
DB::TCPHandler::processOrdinaryQueryWithProcessors()
DB::TCPHandler::runImpl()
DB::TCPHandler::run()
Poco::Net::TCPServerConnection::start()
Poco::Net::TCPServerDispatcher::run()
Poco::PooledThread::run()
Poco::ThreadImpl::runnableEntry(void*)
```

</details>

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-21 08:05:56 +01:00
Azat Khuzhin
e2fcf0f072 Catch exception on query cancellation
Since we still want to join the thread, yes it will be done in dtor, but
this looks better.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-21 08:05:56 +01:00
Azat Khuzhin
0566f72d36 Cleanup PullingAsyncPipelineExecutor::cancel()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-21 08:05:56 +01:00
avogar
eed1db7e07 Fix schema inference in hdfsCluster 2023-01-20 21:17:35 +00:00
Anton Popov
41a199e175
Fix crash when ListObjects request fails (#45371) 2023-01-20 20:10:23 +01:00
Nikolai Kochetov
dcd84c152a Fix possible deadlock with allow_asynchronous_read_from_io_pool_for_merge_tree in case of exception from ThreadPool::schedule 2023-01-20 18:57:47 +00:00
Robert Schulze
e6167d6b36
Deprecate Gorilla compression of non-float columns
Reasons:

1. The original Gorilla paper proposed a compression schema for pairs of
   time stamps and double-precision FP values. ClickHouse's Gorilla
   codec only implements compression of the latter and it does not
   impose any data type restrictions.
   - Data types != Float* or (U)Int* (e.g. Decimal, Point etc.) are
     definitely not supposed to be used with Gorilla.
   - (U)Int* types are debatable. The paper only considers
     integers-stored-as-FP-values, a practical use case for which
     Gorilla works well. Standalone integers are not considered which
     makes them at least suspicious.

2. Achieve consistency with FPC, another specialized floating-point
   timeseries codec, which rejects non-float data.

3. On practical datasets, ZSTD is often "good enough" (**) so it should
   be okay to disincentive non-ZSTD codecs a little bit. If needed,
   Delta and DoubleDelta codecs are viable alternative for slowly
   changing (time-series-like) integer sequences.

Since on-prem and hosted users may still have Gorilla-compressed
non-float data, this combination is only deprecated for now. No warning
or error will be emitted. Users are encouraged to migrate
Gorilla-compressed non-float data to an alternative codec. It is planned
to treat Gorilla-compressed non-float columns as "suspicious" six months
after this commit (i.e. in v23.6). Even then, it will still be possible
to set "allow_suspicious_codecs = true" and read and write
Gorilla-compressed non-float data.

(*) Sec. 4.1.2, "Gorilla restricts the value element in its tuple to a
    double floating point type.", https://doi.org/10.14778/2824032.2824078

(**) https://clickhouse.com/blog/optimize-clickhouse-codecs-compression-schema
2023-01-20 17:31:16 +00:00
robot-ch-test-poll4
2066581d8f
Merge pull request #45451 from evillique/default_granularity
Add default GRANULARITY argument for secondary indexes
2023-01-20 17:46:21 +01:00
avogar
86336940f8 Better comment 2023-01-20 16:41:59 +00:00
avogar
4432ee9927 Fix aborts in arrow lib 2023-01-20 16:40:33 +00:00
vdimir
e30ab0874b
Review fixes 2023-01-20 16:30:34 +00:00
Alexander Tokmakov
910d6dc0ce
Merge pull request #45342 from ClickHouse/exception_message_patterns
Save message format strings for DB::Exception
2023-01-20 18:46:52 +03:00
Kseniia Sumarokova
01320da02b
Update BoundedReadBuffer.cpp 2023-01-20 16:25:02 +01:00
ltrk2
810c9ba50c Produce a null map of the correct size 2023-01-20 10:24:42 -05:00
ltrk2
9d798ea1bc Document functions 2023-01-20 10:24:42 -05:00
ltrk2
65b9c69c90 Introduce non-throwing variants of hasToken 2023-01-20 10:24:42 -05:00
avogar
550a703fbc Make a bit better 2023-01-20 14:58:39 +00:00
Antonio Andelic
136e4ec1b3
Merge pull request #45273 from azat/fix-test-log-level
Fix log level "Test" for send_logs_level in client
2023-01-20 15:36:05 +01:00
Alexander Tokmakov
ec5d7d0a3a
Update src/Functions/FunctionsConversion.h
Co-authored-by: Alexander Gololobov <440544+davenger@users.noreply.github.com>
2023-01-20 17:33:01 +03:00
Kruglov Pavel
28ddcc2432
Merge branch 'master' into tsv-csv-detect-header 2023-01-20 15:08:38 +01:00
Sema Checherinda
b76b612d23
fix typo 2023-01-20 14:55:58 +01:00
Nikolai Kochetov
039901b395 Fixing build 2023-01-20 13:49:50 +00:00
Robert Schulze
1a966a9590
Fix bad comparison 2023-01-20 13:05:06 +00:00
Sema Checherinda
02f22f04e8
fix typos 2023-01-20 13:35:23 +01:00
kssenii
8d20af8127 Fix 2023-01-20 13:34:23 +01:00
Azat Khuzhin
bdeb5514c5 Fix ASan builds for glibc 2.36+ (use RTLD_NEXT for ThreadFuzzer interceptors)
Recently I noticed that clickhouse compiled with ASan does not work with
newer glibc 2.36+, before I though that this was only about compiling
with old but using new, however that was not correct, ASan simply does
not work with glibc 2.36+.

Here is a simple reproducer [1]:

    $ cat > test-asan.cpp <<EOL
    #include <pthread.h>
    int main()
    {
        // something broken in ASan in interceptor for __pthread_mutex_lock
        // and only since glibc 2.36, and for pthread_mutex_lock everything is OK
        pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
        return __pthread_mutex_lock(&mutex);
    }
    EOL
    $ clang -g3 -o test-asan test-asan.cpp -fsanitize=address
    $ ./test-asan
    AddressSanitizer:DEADLYSIGNAL
    =================================================================
    ==15659==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x000000000000 bp 0x7fffffffccb0 sp 0x7fffffffcb98 T0)
    ==15659==Hint: pc points to the zero page.
    ==15659==The signal is caused by a READ memory access.
    ==15659==Hint: address points to the zero page.
        #0 0x0  (<unknown module>)
        #1 0x7ffff7cda28f  (/usr/lib/libc.so.6+0x2328f) (BuildId: 1e94beb079e278ac4f2c8bce1f53091548ea1584)

    AddressSanitizer can not provide additional info.
    SUMMARY: AddressSanitizer: SEGV (<unknown module>)
    ==15659==ABORTING

  [1]: https://gist.github.com/azat/af073e57a248e04488b21068643f079e

I've started observing glibc code, there was some changes in glibc, that
moves pthread functions out from libpthread.so.0 into libc.so.6
(somewhere between 2.31 and 2.35), but
the problem pops up only with 2.36, 2.35 works fine.

After this I've looked into changes between 2.35 and 2.36, and found
this patch [2] - "dlsym: Make RTLD_NEXT prefer default version
definition [BZ #14932]", that fixes this bug [3].

  [2]: https://sourceware.org/git/?p=glibc.git;a=commit;h=efa7936e4c91b1c260d03614bb26858fbb8a0204
  [3]: https://sourceware.org/bugzilla/show_bug.cgi?id=14932

The problem with using DL_LOOKUP_RETURN_NEWEST flag for RTLD_NEXT is
that it does not resolve hidden symbols (and __pthread_mutex_lock is
indeed hidden).

Here is a sample that will show the difference [4]:

    $ cat > test-dlsym.c <<EOL
    #define _GNU_SOURCE
    #include <dlfcn.h>
    #include <stdio.h>

    int main()
    {
        void *p = dlsym(RTLD_NEXT, "__pthread_mutex_lock");
        printf("__pthread_mutex_lock: %p (via RTLD_NEXT)\n", p);
        return 0;
    }
    EOL

    # glibc 2.35: __pthread_mutex_lock: 0x7ffff7e27f70 (via RTLD_NEXT)
    # glibc 2.36: __pthread_mutex_lock: (nil) (via RTLD_NEXT)

  [4]: https://gist.github.com/azat/3b5f2ae6011bef2ae86392cea7789eb7

But ThreadFuzzer uses internal symbols to wrap
pthread_mutex_lock/pthread_mutex_unlock, which are intercepted by ASan
and this leads to NULL dereference.

The fix was obvious - just use dlsym(RTLD_NEXT), however on older
glibc's this leads to endless recursion (see commits in the code). But
only for jemalloc [5], and even though sanitizers does not uses jemalloc
the code of ThreadFuzzer is generic and I don't want to guard it with
more preprocessors macros.

  [5]: https://gist.github.com/azat/588d9c72c1e70fc13ebe113197883aa2

So we have to use RTLD_NEXT only for ASan.

There is also one more interesting issue, if you will compile with clang
that itself had been compiled with newer libc (i.e. 2.36), you will get
the following error:

    $ podman run --privileged -v $PWD/.cmake-asan/programs:/root/bin -e PATH=/bin:/root/bin -e --rm -it ubuntu-dev-v3 clickhouse
    ==1==ERROR: AddressSanitizer failed to allocate 0x0 (0) bytes of SetAlternateSignalStack (error code: 22)
    ...
    ==1==End of process memory map.
    AddressSanitizer: CHECK failed: sanitizer_common.cpp:53 "((0 && "unable to mmap")) != (0)" (0x0, 0x0) (tid=1)
        <empty stack>

The problem is that since GLIBC_2.31, `SIGSTKSZ` is a call to
`getconf(_SC_MINSIGSTKSZ)`, but older glibc does not have it, so `-1`
will be returned and used as `SIGSTKSZ` instead.

The workaround to disable alternative stack:

    $ podman run --privileged -v $PWD/.cmake-asan/programs:/root/bin -e PATH=/bin:/root/bin -e ASAN_OPTIONS=use_sigaltstack=0 --rm -it ubuntu-dev-v3 clickhouse client --version
    ClickHouse client version 22.13.1.1.

Fixes: #43426
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-20 13:09:13 +01:00
Robert Schulze
bfc3b4f5ca
Suffix "GinFilter" --> "Inverted" 2023-01-20 12:02:35 +00:00
Nikolai Kochetov
1e29993aef Fixing build 2023-01-20 11:55:20 +00:00
Robert Schulze
0738b2499c
Use GinFilters typedef where possible 2023-01-20 11:52:04 +00:00
Maksim Kita
3e08a98f16
Merge pull request #45388 from azat/dict/remove-preallocate
Remove PREALLOCATE for HASHED/SPARSE_HASHED dictionaries
2023-01-20 14:51:25 +03:00
Robert Schulze
0b77f07f67
Remove superfluous check (the same is checked in MergeTreeIndices.cpp) 2023-01-20 11:50:35 +00:00
Robert Schulze
d2c830ec39
Cosmetics 2023-01-20 11:49:08 +00:00
Robert Schulze
72973076c9
Rename MergeTreeIndexGin.h/cpp to MergeTreeIndexInverted.h/cpp 2023-01-20 11:42:36 +00:00
Robert Schulze
1ef2704539
Cosmetics 2023-01-20 11:39:23 +00:00
Anton Popov
9c0ba7c7ca
Merge pull request #45432 from CurtizJ/allow-json-extract-int-from-float
Allow to convert float stored in string field to integer in `JSONExtract`
2023-01-20 12:35:06 +01:00
Robert Schulze
463cc843de
"segment file" --> "segment metadata file" 2023-01-20 11:26:22 +00:00
Robert Schulze
58df3953bb
Move some code around (no other changes) 2023-01-20 11:24:23 +00:00
Kseniia Sumarokova
c066b9bddd
Update SwapHelper.h 2023-01-20 12:19:19 +01:00
Maksim Kita
e067a55b78 Fixed tests 2023-01-20 12:19:16 +01:00
Robert Schulze
3267ac2787
Prefix more typedefs in DB namespace with "Gin" 2023-01-20 11:19:07 +00:00
Robert Schulze
919b67f117
Cosmetics 2023-01-20 11:15:28 +00:00
Sema Checherinda
09f3a5c599 add a comment, add a check, fix test 2023-01-20 12:10:31 +01:00
Robert Schulze
98e117dca6
SegmentDictionary --> GinSegmentDictionary, also move typedef 2023-01-20 11:09:49 +00:00
Robert Schulze
908fa83f72
Move some typedefs around 2023-01-20 11:08:19 +00:00
Robert Schulze
44618927f9
Inline two short methods + uppercase 2023-01-20 11:04:35 +00:00
Robert Schulze
f8b446f517
Move method implementations (no other changes) 2023-01-20 10:57:16 +00:00
Robert Schulze
5c3cc5283f
"term dictionary" --> "dictionary" 2023-01-20 10:53:41 +00:00
Robert Schulze
be936b257c
Make version enum private 2023-01-20 10:48:43 +00:00
Robert Schulze
0653f86de9
Various cosmetic cleanups 2023-01-20 10:45:35 +00:00
Maksim Kita
23e26032ca
Merge pull request #45399 from aalexfvk/alexfvk/mdb-21326_fix_system_dictionaries_when_dictionary_with_bad_structure
Fix select from system.dictionaries when there is dictionary with bad structure
2023-01-20 13:36:32 +03:00
Maksim Kita
758c8f2776
Merge branch 'master' into dict/remove-preallocate 2023-01-20 13:15:37 +03:00
Maksim Kita
e6ee5554d1 Fixed tests 2023-01-20 11:15:13 +01:00
Azat Khuzhin
1f9a65b875 Modernize InternalTextLogsQueue::getPriorityName()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-20 11:09:35 +01:00
Azat Khuzhin
fc276abadd Fix log level "Test" for send_logs_level in client
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-20 11:09:35 +01:00
Antonio Andelic
0ad37ad286
Merge pull request #45320 from stigsb/system_tables_volume_config
Add <storage_policy> config parameter for system logs
2023-01-20 10:27:57 +01:00
Aleksandr Musorin
838acb22b7
added num_processed_files and processed_files_size 2023-01-20 10:20:41 +01:00
Robert Schulze
5ec6d89d43
Merge pull request #38667 from ClibMouse/ftsearch
Inverted Indices Implementation
2023-01-20 10:18:05 +01:00
SmitaRKulkarni
6aa63414db
Merge pull request #45072 from ClickHouse/43891_Disallow_concurrent_backups_and_restores
Added settings to disallow concurrent backups and restores
2023-01-20 09:17:20 +01:00
Nikolai Kochetov
3e00d18498 Merge branch 'master' into fix-disabled-two-level-agg 2023-01-19 20:54:04 +00:00
Nikolay Degterinsky
dd7fef11a2 Add default granularity 2023-01-19 20:52:38 +00:00
Nikolai Kochetov
d24be2712e Fix disabled two-level aggregation from HTTP 2023-01-19 20:50:27 +00:00
Maksim Kita
3363f7c718 Added GroupingFunctionsResolvePass 2023-01-19 19:06:02 +01:00
Maksim Kita
506f91b841 Fixed tests 2023-01-19 19:05:49 +01:00
Maksim Kita
2c56b0b2b9 Planner small fixes 2023-01-19 19:05:49 +01:00
Kseniia Sumarokova
ad4a9d2880
Update SwapHelper.h 2023-01-19 18:58:09 +01:00
kssenii
f56f515392 Fix 2023-01-19 18:45:06 +01:00
Anton Popov
089d1f5b62 fix fuzzer 2023-01-19 17:03:24 +00:00
kssenii
4ce8950712 Minor changes 2023-01-19 17:53:10 +01:00
larryluogit
52ae33dba7
Merge branch 'master' into ftsearch 2023-01-19 11:34:11 -05:00
avogar
c34c0aa22e Fix comments 2023-01-19 16:03:46 +00:00
Han Fei
3007507a8b
Merge pull request #45428 from hanfei1991/hanfei/fix-empty-expressions
fix regexp logical error in stress tests
2023-01-19 16:39:39 +01:00
Kruglov Pavel
9820beae68
Apply suggestions from code review
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2023-01-19 16:11:13 +01:00
Anton Popov
4ca359d57b
Merge pull request #45418 from CurtizJ/fix-disk-encrypted
Fix reading from encrypted disk with passed file size
2023-01-19 16:11:08 +01:00
Anton Popov
7f2e37860d allow to convert float stored in string field to integer in JSONExtract 2023-01-19 14:24:55 +00:00
Aleksei Filatov
afada0ecb3 Fix review notes 2023-01-19 17:02:57 +03:00
Alexander Tokmakov
7bb65cc002
Update StorageReplicatedMergeTree.cpp 2023-01-19 16:45:41 +03:00
Igor Nikonov
d0ce804bfc Fix: dynamic_cast -> typeid_cast for SortingStep 2023-01-19 13:40:21 +00:00
Han Fei
94336a9b66 fix typo 2023-01-19 13:55:29 +01:00
Igor Nikonov
df3776d24b Make test stable
+ disable debug logging
2023-01-19 11:43:40 +00:00
Han Fei
2884b8837b fix regexp logical error in stress tests 2023-01-19 12:03:54 +01:00
SmitaRKulkarni
67e2bf31f5
Merge branch 'master' into 43891_Disallow_concurrent_backups_and_restores 2023-01-19 11:21:37 +01:00
Han Fei
f661dad0e9
Merge pull request #45106 from hanfei1991/hanfei/async-cache
support cache for async inserts block ids
2023-01-19 10:59:25 +01:00
Ilya Yatsishin
d16b59b662
Merge pull request #45422 from Avogar/fix-s3-cluser-si 2023-01-19 10:36:54 +01:00
Ilya Yatsishin
00962b7ad5
Merge pull request #45424 from Avogar/fix-json-import-nested 2023-01-19 10:31:40 +01:00
Stig Bakken
420c179b55 Add <storage_policy> config parameter for system logs 2023-01-19 10:25:28 +01:00
SmitaRKulkarni
db03dd1bb9
Merge branch 'master' into 43891_Disallow_concurrent_backups_and_restores 2023-01-19 09:32:50 +01:00
Maksim Kita
911bb8e6ab
Merge pull request #45410 from ClickHouse/revert-45406-revert-42797-or-like-chain
Resubmit Support optimize_or_like_chain in QueryTreePassManager
2023-01-19 11:30:45 +03:00
Yakov Olkhovskiy
c6ee4c3908
Merge pull request #44686 from Algunenano/fix_uuid_parsing_in_values
Don't parse beyond the quotes when reading UUIDs
2023-01-18 19:30:53 -05:00
Igor Nikonov
57d2fd300a Fix: correct update of data stream sorting properties after removing
sorting
2023-01-19 00:11:58 +00:00
Yakov Olkhovskiy
1d58ded72b fix IP parsers to treat input as not whole string 2023-01-19 00:08:20 +00:00
avogar
a8f20363f4 Fix JSON/BSONEachRow parsing with HTTP 2023-01-18 22:49:03 +00:00
avogar
117ec13c9e Fix s3Cluster schema inference when structure from insertion table is used 2023-01-18 20:33:50 +00:00
Azat Khuzhin
4366f7fb3b Remove PREALLOCATE for HASHED/SPARSE_HASHED dictionaries
It does not give significant benefit, but now, you hashed/sparse_hashed
dictionaries can be filled in parallel (#40003), using sharded
dictionaries, and this should be used instead of PREALLOCATE.

Note, that dictionaries, that had been created with PREALLOCATE will
work, but simply ignore this attribute.

Fixes: #41985 (cc @alexey-milovidov)
Reverts: #23979 (cc @kitaisreal)

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-18 20:18:37 +01:00
Igor Nikonov
1866f990de
Revert "Revert "Remove redundant sorting"" 2023-01-18 20:12:34 +01:00
Anton Popov
65a71b4431 fix reading from encrypted disk 2023-01-18 19:02:20 +00:00
Dmitry Novik
fff9fd4f00 Remove redundant group by keys with constants 2023-01-18 17:44:06 +00:00
Igor Nikonov
7ed8fec94f
Revert "Remove redundant sorting" 2023-01-18 18:38:25 +01:00
Dmitry Novik
11701d0ff5 Resolve OR function after modification 2023-01-18 17:17:16 +00:00
Dmitry Novik
df26f4fc37
Revert "Revert "Support optimize_or_like_chain in QueryTreePassManager"" 2023-01-18 18:14:03 +01:00
Anton Popov
5df0f91857
Revert "Support optimize_or_like_chain in QueryTreePassManager" 2023-01-18 17:34:19 +01:00
Maksim Kita
cabcc761ed
Merge pull request #45357 from kitaisreal/analyzer-compound-identifier-typo-correction-fix
Analyzer compound identifier typo correction fix
2023-01-18 17:59:32 +03:00
Aleksei Filatov
5e9340f682 Add integration test 2023-01-18 17:50:38 +03:00
Aleksei Filatov
7f4a01b903 Add handling of bad dictionary structure 2023-01-18 17:27:03 +03:00
Sema Checherinda
ae1dfb9ce5
Update src/Storages/MergeTree/MergeTreeData.cpp
Co-authored-by: Alexander Tokmakov <tavplubix@gmail.com>
2023-01-18 15:21:11 +01:00
Sema Checherinda
a344b526a6
Update src/Storages/StorageMergeTree.cpp
Co-authored-by: Alexander Tokmakov <tavplubix@gmail.com>
2023-01-18 15:16:18 +01:00
Alexander Tokmakov
7a824af09e fix 2023-01-18 14:30:20 +01:00
Antonio Andelic
8f8b14148a
Merge pull request #45215 from ClickHouse/fix-crash-kv-store
Fix crash when prepared set with different type used in KV stores
2023-01-18 13:27:40 +01:00
Igor Nikonov
72066846cf
Merge pull request #43905 from ClickHouse/igor/remove_redundant_order_by
Remove redundant sorting
2023-01-18 13:25:03 +01:00
vdimir
b76779797a
Do not move to prewhere in select with joins 2023-01-18 12:17:30 +00:00
Vitaly Baranov
7cdb2c4c7f
Merge pull request #45351 from vitlibar/fix-backup-with-killed-mutations
Fix backup with killed mutations
2023-01-18 13:14:27 +01:00
Han Fei
e51123c9b0 fix data race 2023-01-18 13:11:07 +01:00
Maksim Kita
8225d2814c
Merge pull request #40003 from azat/dict-shards
Add ability to load hashed dictionaries using multiple threads
2023-01-18 13:37:10 +03:00
Maksim Kita
3a550691c9
Merge pull request #42797 from ClickHouse/or-like-chain
Support optimize_or_like_chain in QueryTreePassManager
2023-01-18 13:09:33 +03:00
Maksim Kita
21b94813ad Fixed code review issues 2023-01-18 11:02:29 +01:00
Maksim Kita
cacaa2372a
Merge pull request #43261 from ClickHouse/group-by-function-elimination
Support optimize_group_by_function_keys on top of QueryTree
2023-01-18 12:55:56 +03:00
Maksim Kita
21b288c620 Fixed build 2023-01-18 10:44:40 +01:00
Antonio Andelic
cfba9b19eb
Merge pull request #45360 from azat/dist/fix-startup-race
Fix race in Distributed table startup
2023-01-18 10:09:54 +01:00
Antonio Andelic
f57ee043ae
Merge pull request #45319 from ClickHouse/disable-prewhere-in-merge-different-types
Disable PREWHERE in storage Merge if types don't match
2023-01-18 10:02:06 +01:00
Antonio Andelic
f3469ee077
Merge branch 'master' into dist/fix-startup-race 2023-01-18 09:44:52 +01:00
Smita Kulkarni
d7ca742d98 Fixed style check for beginning of if - Added settings to disallow concurrent backups and restores 2023-01-18 08:59:47 +01:00
Dmitry Novik
3b0ac7272c Update reference files 2023-01-18 00:30:30 +00:00
Dmitry Novik
752aed696a Merge remote-tracking branch 'origin/master' into group-by-function-elimination 2023-01-17 23:33:33 +00:00
Sergei Trifonov
c443c1ece0
Merge branch 'master' into hanfei/async-cache 2023-01-18 00:19:49 +01:00
Robert Schulze
4f90824347
Merge remote-tracking branch 'origin/master' into query-result-cache 2023-01-17 22:49:53 +00:00
Anton Popov
f40fd7a151
Add checks for compilation of regexps (#45356) 2023-01-17 23:46:04 +01:00
Smita Kulkarni
ee526ce877 Fix style check - Added settings to disallow concurrent backups and restores 2023-01-17 22:52:55 +01:00
Smita Kulkarni
6e06af1b25 Updated strategy for handling internal backups & restores to avoid concurrent internal backups & restores - Added settings to disallow concurrent backups and restores 2023-01-17 22:27:13 +01:00
Igor Nikonov
0db9bf38a2
Merge branch 'master' into igor/remove_redundant_order_by 2023-01-17 22:26:24 +01:00
Alexander Tokmakov
1413b9537c make error patterns more useful 2023-01-17 20:04:25 +01:00
Alexander Tokmakov
5cd90c1a3e Merge branch 'master' into exception_message_patterns 2023-01-17 20:04:04 +01:00
Alexander Tokmakov
72e8615bec formatting of some exception messages 2023-01-17 20:03:56 +01:00
Maksim Kita
4f7f2ed9e1
Merge pull request #45300 from ClickHouse/revert-45299-revert-44882-function-node-validation
Revert "Revert "Validate function arguments in query tree""
2023-01-17 21:51:26 +03:00
Maksim Kita
273610ce65
Merge pull request #43640 from ClickHouse/42648_Support_scalar_subqueries_cache
Support scalar subqueries cache
2023-01-17 21:31:13 +03:00
serxa
ce7e22b87b add detailed profile events for throttling 2023-01-17 18:29:24 +00:00
alesapin
e732f510f0
Merge branch 'master' into fix_hang_during_drop_in_zero_copy_replication 2023-01-17 19:24:36 +01:00
Alexander Tokmakov
8b13b85ea0
Merge pull request #44543 from ClickHouse/text_log_add_pattern
Add a column with a message pattern to text_log
2023-01-17 20:19:32 +03:00
Vitaly Baranov
1a680b0092 Abort multipart upload faster. 2023-01-17 18:00:11 +01:00
Vitaly Baranov
2de455367a Fix using std::ios_base::end in StdStreamFromReadBuffer::seekg(). 2023-01-17 17:56:14 +01:00
Igor Nikonov
0cfa08df7a Merge remote-tracking branch 'origin/master' into igor/remove_redundant_order_by 2023-01-17 16:28:17 +00:00
Igor Nikonov
9855504403 Rename source file according to implementation 2023-01-17 16:24:51 +00:00
Nikita Mikhaylov
0fc755806e
One more attempt to fix race in TCPHandler (#45240) 2023-01-17 16:17:14 +01:00
alesapin
69925647eb Fix style 2023-01-17 15:59:55 +01:00
alesapin
f6131101bb Fix no shared id during drop for the fourth time 2023-01-17 15:51:49 +01:00
Han Fei
8a74238fe0 improve 2023-01-17 15:47:52 +01:00
Kruglov Pavel
96bb99f864
Merge branch 'master' into tsv-csv-detect-header 2023-01-17 15:33:02 +01:00
Kruglov Pavel
582aa8b770
Merge pull request #45253 from Avogar/fix-s3-heap-use-after-free
Fix heap-use-after-free in reading from s3
2023-01-17 15:32:26 +01:00
HarryLeeIBM
e7add8218f Addressed more review comments and ClangTidy errors 2023-01-17 06:29:13 -08:00
Kruglov Pavel
4183f6082f
Fix special build 2023-01-17 15:18:39 +01:00
Azat Khuzhin
54fc6859ae Fix race in Distributed table startup
Before this patch it was possible to have multiple directory monitors
for the same directory, one from the INSERT context, another one on
storage startup().

Here are an example of logs for this scenario:

    2022.12.07 12:12:27.552485 [ 39925 ] {a47fcb32-4f44-4dbd-94fe-0070d4ea0f6b} <Debug> DDLWorker: Executed query: DETACH TABLE inc.dist_urls_in
    ...
    2022.12.07 12:12:33.228449 [ 4408 ] {20c761d3-a46d-417b-9fcd-89a8919dd1fe} <Debug> executeQuery: (from 0.0.0.0:0, user: ) /* ddl_entry=query-0000089229 */ ATTACH TABLE inc.dist_urls_in (stage: Complete)
    ... this is the DirectoryMonitor created from the context of INSERT for the old StoragePtr that had not been destroyed yet (becase of "was 1" this can be done only from the context of INSERT) ...
    2022.12.07 12:12:35.556048 [ 39536 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Files set to 173 (was 1)
    2022.12.07 12:12:35.556078 [ 39536 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Bytes set to 29750181 (was 71004)
    2022.12.07 12:12:35.562716 [ 39536 ] {} <Trace> Connection (i13.ch:9000): Connected to ClickHouse server version 22.10.1.
    2022.12.07 12:12:35.562750 [ 39536 ] {} <Debug> inc.dist_urls_in.DirectoryMonitor: Sending a batch of 10 files to i13.ch:9000 (0.00 rows, 0.00 B bytes).
    ... this is the DirectoryMonitor that created during ATTACH ...
    2022.12.07 12:12:35.802080 [ 39265 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Files set to 173 (was 0)
    2022.12.07 12:12:35.802107 [ 39265 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Bytes set to 29750181 (was 0)
    2022.12.07 12:12:35.834216 [ 39265 ] {} <Debug> inc.dist_urls_in.DirectoryMonitor: Sending a batch of 10 files to i13.ch:9000 (0.00 rows, 0.00 B bytes).
    ...
    2022.12.07 12:12:38.532627 [ 39536 ] {} <Trace> inc.dist_urls_in.DirectoryMonitor: Sent a batch of 10 files (took 2976 ms).
    ...
    2022.12.07 12:12:38.601051 [ 39265 ] {} <Error> inc.dist_urls_in.DirectoryMonitor: std::exception. Code: 1001, type: std::__1::__fs::filesystem::filesystem_error, e.what() = filesystem error: in file_size: No such file or directory ["/data6/clickhouse/data/inc/dist_urls_in/shard13_replica1/66827403.bin"], Stack trace (when copying this message, always include the lines below):
    ...
    2022.12.07 12:12:54.132837 [ 4408 ] {20c761d3-a46d-417b-9fcd-89a8919dd1fe} <Debug> DDLWorker: Executed query: ATTACH TABLE inc.dist_urls_in

And eventually both monitors (for a short period of time, one replaces
another) are trying to process the same batch (current_batch.txt), and
one of them fails because such file had been already removed.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-17 14:51:00 +01:00
Igor Nikonov
6328e02f22 Fix: update input/output stream properties
After removing sorting step we need to update sorting properties of
input/ouput streams
2023-01-17 13:39:18 +00:00
Maksim Kita
d758d83937 Analyzer compound identifier typo correction fix 2023-01-17 14:29:48 +01:00
vdimir
60acd5e424
fix clang tidy 2023-01-17 12:21:56 +00:00
vdimir
1e9ccfb4b9
wip 2023-01-17 12:21:56 +00:00
vdimir
40bf9939b7
Update JoinSwitcher::switchJoin 2023-01-17 12:21:55 +00:00
vdimir
e0e60bb460
wip 2023-01-17 12:21:55 +00:00
vdimir
4aecb836a9
Fix JoinMask 2023-01-17 12:21:55 +00:00
vdimir
18d751aed4
wip 2023-01-17 12:21:54 +00:00
vdimir
beb8ba7e62
wip 2023-01-17 12:21:54 +00:00
vdimir
57a35cae33
wip 2023-01-17 12:21:53 +00:00
vdimir
efcfcca545
Fix HashJoin::getTotalByteCount caclulation 2023-01-17 12:21:53 +00:00
vdimir
b0c4e18464
Fix double initialization GraceHashJoin::initBuckets 2023-01-17 12:21:53 +00:00
Sema Checherinda
35431e91e3
Merge pull request #45276 from ucasfl/avro-fix
Fix some avro reading bugs
2023-01-17 12:48:44 +01:00
Kseniia Sumarokova
5586f71950
Merge pull request #41231 from kssenii/minor-change-in-remote-read
Fix assertion in async read buffer from remote
2023-01-17 12:32:57 +01:00
Maksim Kita
d6a36b1d16 Fixed code review issues 2023-01-17 12:02:50 +01:00
Maksim Kita
af716ca25d Fixed tests 2023-01-17 11:20:24 +01:00
Maksim Kita
250c93614c Revert "Revert "Validate function arguments in query tree"" 2023-01-17 11:20:24 +01:00
Vitaly Baranov
692065e5fe Fix backup if mutations got killed during the backup process. 2023-01-17 11:05:34 +01:00
Vitaly Baranov
0bea056241 Fix build. 2023-01-17 09:52:08 +01:00
Vitaly Baranov
1c845185c1 Split upload into parts of the same size for smooth uploading.
Correctly use AbortMultipleUpload request.
Support std::ios_base::end StdStreamBufFromReadBuffer::seekpos().
2023-01-17 09:35:43 +01:00
Vitaly Baranov
14a7ee8e26 Copy files to S3 during backup directly without using WriteBufferFromS3 to decrease memory consumption. 2023-01-17 09:35:41 +01:00
Vitaly Baranov
b13498d9ba
Merge pull request #45288 from vitlibar/fix-s3-requests-without-region
Fix s3 requests without region
2023-01-17 09:24:59 +01:00
Antonio Andelic
76eb3e3b3c Fix test 2023-01-17 07:34:39 +00:00
SmitaRKulkarni
bb4f251448
Merge branch 'master' into 42648_Support_scalar_subqueries_cache 2023-01-17 08:10:25 +01:00
Alexander Tokmakov
522686f78b less empty patterns 2023-01-17 01:19:44 +01:00
Kseniia Sumarokova
6a02bdc917
Update AsynchronousReadIndirectBufferFromRemoteFS.cpp 2023-01-17 00:37:47 +01:00
Alexander Tokmakov
870cfcc36a less fmt::runtime usages 2023-01-17 00:11:59 +01:00
Alexander Tokmakov
e7899825e6 save format strings for DB::Exceptions 2023-01-16 23:20:33 +01:00
Vitaly Baranov
9a52087989 More complex logic: GetObjectAttributes requests will be used
only if the endpoint is "*.amazonaws.com", otherwise HeadObject requests will be used.
2023-01-16 20:14:39 +01:00
Dmitry Novik
104e55bc22 Merge remote-tracking branch 'origin/master' into or-like-chain 2023-01-16 18:56:22 +00:00
Dmitry Novik
aa2a19eaa4 Use proper map for QueryTreeNode 2023-01-16 18:43:22 +00:00
Dmitry Novik
0aecc9ad80 Updates after the review 2023-01-16 17:43:36 +00:00
Kruglov Pavel
e9d6590926
Merge branch 'master' into tsv-csv-detect-header 2023-01-16 17:50:24 +01:00
Kruglov Pavel
bdb3517512
Merge pull request #45231 from Avogar/json-tuples
Insert default values in case of missing tuple elements in JSONEachRow
2023-01-16 17:49:50 +01:00
avogar
1c0941d72a Add docs and examples 2023-01-16 16:46:41 +00:00
Alexander Tokmakov
df75c24f01
Revert "Disallow Gorilla codec on non-float columns" 2023-01-16 19:14:28 +03:00
avogar
1d26704049 Fix 2023-01-16 15:49:59 +00:00
Sema Checherinda
dbe89cd5d8 fix that optimize final waits for currently running merges 2023-01-16 16:47:12 +01:00
Sema Checherinda
90fa1ecd49 make that old_parts_lifetime=0 deletes files instantly at drop/truncate 2023-01-16 16:47:12 +01:00
Sema Checherinda
8f660afab3 style fix 2023-01-16 16:47:12 +01:00
Sema Checherinda
c51f4d7be1 do not merge over a gap with uotdate parts, delete empty parts with respect to old_parts_lifetime 2023-01-16 16:47:11 +01:00
Sema Checherinda
25e16388d7 better message in MergeTreeDataMergerMutator when parts intersect 2023-01-16 16:47:11 +01:00
Kruglov Pavel
04d95f4877
Fix 2023-01-16 16:47:04 +01:00
avogar
3ea80b0f54 Merge branch 'master' of github.com:ClickHouse/ClickHouse into tsv-csv-detect-header 2023-01-16 15:14:25 +00:00
Antonio Andelic
108b2384e7 Disable prewhere in storage merge if types don't match 2023-01-16 13:39:46 +00:00
Anton Popov
6863cd152f
Merge pull request #42181 from CurtizJ/optimize-loading-parts
Do not load inactive parts at startup
2023-01-16 14:38:50 +01:00
Kseniia Sumarokova
57c22f005b
Merge branch 'master' into minor-change-in-remote-read 2023-01-16 14:22:16 +01:00
Kseniia Sumarokova
7b612da871
Update AsynchronousReadIndirectBufferFromRemoteFS.cpp 2023-01-16 14:21:09 +01:00
Kseniia Sumarokova
4d22b49be7
Update DiskObjectStorage.cpp 2023-01-16 14:19:18 +01:00
Han Fei
30a798182a Merge branch 'master' into hanfei/async-cache 2023-01-16 14:07:36 +01:00
Nikolay Degterinsky
70e79de69b
Merge pull request #38252 from bharatnc/ncb/weighted-quantile-approx
add quantileInterpolatedWeighted function
2023-01-16 13:41:13 +01:00
Nikolay Degterinsky
88ba1b0b85
Merge pull request #42884 from evillique/better_asterisk_parser
Improve Asterisk and ColumnMatcher parsers
2023-01-16 13:29:59 +01:00
Vladimir C
0337bc7c4d
Merge pull request #45147 from rgzntrade/master 2023-01-16 13:18:18 +01:00
Igor Nikonov
a34991cb65 Merge remote-tracking branch 'origin/master' into igor/remove_redundant_order_by 2023-01-16 12:14:02 +00:00
Alexander Tokmakov
ee888f7f38
Merge pull request #44547 from ClickHouse/fix_44496
Fix too aggressive evaluation of args in default column expr
2023-01-16 15:08:58 +03:00
Kseniia Sumarokova
d859976fbd
Merge pull request #45250 from ClickHouse/43188_Record_startup_time_in_profileevents
Record server startup time in ProfileEvents
2023-01-16 12:20:37 +01:00
alesapin
190c9b3156
Merge pull request #44682 from hanfei1991/hanfei/support-advance-dedup
deduplicate async inserts in the same block earlier
2023-01-16 12:19:30 +01:00
Maksim Kita
0c7e0be0b6 Analyzer support INSERT SELECT 2023-01-16 12:17:14 +01:00
Alexander Tokmakov
94604f71b7
Merge pull request #44922 from azat/dist/async-INSERT-metrics
Optimize and fix metrics for Distributed async INSERT
2023-01-16 14:12:56 +03:00
Alexander Tokmakov
9ad6e1b129
Update logger_useful.h 2023-01-16 14:09:55 +03:00
Maksim Kita
cd2d794c99
Merge branch 'master' into 42648_Support_scalar_subqueries_cache 2023-01-16 13:49:43 +03:00
Maksim Kita
80f6a45376
Merge pull request #44641 from ClickHouse/vdimir/view_explain_2
Function viewExplain accept SELECT and settings
2023-01-16 13:39:53 +03:00
Vitaly Baranov
7030b64096 Fix build. 2023-01-16 10:46:58 +01:00
Han Fei
3481b4d50a fix style 2023-01-16 10:41:35 +01:00
Vitaly Baranov
16a20cd06e Use std::string_view instead of const std::string_view & 2023-01-16 10:18:04 +01:00
Maksim Kita
8f5250e000
Revert "Validate function arguments in query tree" 2023-01-16 10:14:34 +01:00
Vitaly Baranov
e435edb4ab Make checkObjectExists() easier. 2023-01-16 10:06:20 +01:00
Maksim Kita
60d2a0bf7f
Merge pull request #44882 from ClickHouse/function-node-validation
Validate function arguments in query tree
2023-01-16 11:31:02 +03:00
Robert Schulze
099e30ef2a
Merge remote-tracking branch 'origin/master' into query-result-cache 2023-01-16 08:04:49 +00:00
Robert Schulze
76d1fe08f9
Merge pull request #45252 from ClickHouse/block-nonfloat-gorilla
Disallow Gorilla codec on non-float columns
2023-01-16 08:55:50 +01:00
Robert Schulze
ff493c439c
Merge pull request #45244 from bigo-sg/improve_like
Add fast path for col like '%%' or col like '%'  or match(col, '.*')
2023-01-16 08:36:20 +01:00
taiyang-li
2f7ea79d94 change as request 2023-01-16 10:42:58 +08:00
simpleton
1cdd7361b0
Merge branch 'ClickHouse:master' into master 2023-01-16 09:36:38 +08:00
Han Fei
5617f7f616 address comments 2023-01-15 22:51:10 +01:00
Vitaly Baranov
a955504043 Move throw_on_error parameter to the end. 2023-01-15 20:28:16 +01:00
Vitaly Baranov
21b8aaeb8b Stop using HeadObject requests in S3
because they don't work well with endpoints without explicit region.
2023-01-15 20:28:11 +01:00
Han Fei
701dc88d6f
Merge branch 'master' into hanfei/support-advance-dedup 2023-01-15 19:46:28 +01:00
Han Fei
c859f8dbe5
Update src/Storages/MergeTree/ReplicatedMergeTreeSink.cpp
Co-authored-by: alesapin <alesapin@gmail.com>
2023-01-15 19:46:16 +01:00
Han Fei
bb2c0914e9
Update src/Storages/MergeTree/ReplicatedMergeTreeSink.cpp
Co-authored-by: alesapin <alesapin@gmail.com>
2023-01-15 19:46:09 +01:00
Robert Schulze
27fe7ebd93
Cosmetics 2023-01-15 16:12:48 +00:00
Robert Schulze
b6f12f9edd
Fix unit_tests_dbms 2023-01-15 14:49:04 +00:00
Robert Schulze
bd41c74ddf
Various test, code and docs fixups 2023-01-15 13:47:34 +00:00
Nikolay Degterinsky
24b686734d Fix build 2023-01-15 13:46:55 +00:00
Robert Schulze
eac9a5728d
Merge remote-tracking branch 'origin/master' into block-nonfloat-gorilla 2023-01-15 13:35:41 +00:00
Ilya Yatsishin
96987b7cd8
Merge pull request #45239 from Avogar/generate-random 2023-01-15 00:37:34 +01:00
Robert Schulze
a4a6126c9d
Prohibit manual delta compression before floating-point time series compression 2023-01-14 20:09:50 +00:00
Robert Schulze
fbdaca4e2a
Code cleanup 2023-01-14 19:21:30 +00:00
flynn
29eb30b49f Fix some reading avro format bugs
fix
2023-01-14 18:05:26 +00:00
Dmitry Novik
3d23654720 Skip validation of function IN 2023-01-13 23:10:16 +00:00
Alexander Tokmakov
d857d62a03 remove another set of macros 2023-01-13 20:34:31 +01:00
Alexander Tokmakov
2d7773fccc Merge branch 'master' into text_log_add_pattern 2023-01-13 20:33:46 +01:00
Han Fei
ed49ebf01a update setting explain 2023-01-13 20:26:08 +01:00
Han Fei
2fb2f503e3 Update src/Storages/MergeTree/MergeTreeSettings.h
Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>
2023-01-13 20:20:08 +01:00
Han Fei
bcf813fedc Update src/Storages/StorageReplicatedMergeTree.cpp
Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>
2023-01-13 20:19:30 +01:00
Han Fei
9e99c7e116 Update src/Storages/MergeTree/ReplicatedMergeTreeSink.cpp
Co-authored-by: Sergei Trifonov <svtrifonov@gmail.com>
2023-01-13 20:19:13 +01:00
Han Fei
a258a39eb1 Merge branch 'master' into hanfei/async-cache 2023-01-13 20:17:58 +01:00
Nikolay Degterinsky
36c20bf293 Merge remote-tracking branch 'upstream/master' into better_asterisk_parser 2023-01-13 19:15:55 +00:00
Anton Popov
487de70d01 fix locking at loading outdated data parts 2023-01-13 17:05:32 +00:00
avogar
e2470dd670 Fix tests 2023-01-13 17:03:53 +00:00
Robert Schulze
5d3f0ec4a0
Disallow Gorilla codec on non-float columns
Cf. #45195
2023-01-13 16:53:28 +00:00
avogar
6cb7c4d175 Better commit, mark noexcept 2023-01-13 16:33:11 +00:00
avogar
76c89c6d20 Fix heap-use-after-free in reading from s3 2023-01-13 16:31:30 +00:00
Smita Kulkarni
d132d30707 Addressed review comments - 42648 Support scalar subqueries cache 2023-01-13 17:28:35 +01:00
Alexander Tokmakov
6de4837580 fix 2023-01-13 16:07:20 +01:00
Maksim Kita
dc24d831cf
Merge pull request #42970 from ClickHouse/optimize-redundant-function
Implement optimize_redundant_functions_in_order_by on top of QueryTree.
2023-01-13 17:36:56 +03:00
Maksim Kita
05b1b78104
Merge pull request #44013 from kitaisreal/analyzer-aggregate-functions-passes-small-fixes
Analyzer aggregate functions passes small fixes
2023-01-13 17:31:53 +03:00
avogar
abfb6b096f Better exception message 2023-01-13 14:23:30 +00:00
Robert Schulze
4ea836b87e
Revert "Revert "update function DAYOFWEEK and add new function WEEKDAY for mysql/spark compatiability""
This reverts commit e37f572c34.
2023-01-13 14:00:16 +00:00
Smita Kulkarni
a0fe26f506 Addressed review comments and updated name to ServerStartupMilliseconds - Record server startup time in ProfileEvents 2023-01-13 14:38:54 +01:00
Alexander Tokmakov
9d5ec474a3
Merge pull request #43998 from evillique/make_system_replicas_parallel
Make `system.replicas` parallel
2023-01-13 16:33:36 +03:00
Alexander Tokmakov
b88aae9d5c Merge branch 'master' into fix_44496 2023-01-13 14:05:57 +01:00
Smita Kulkarni
cf5cb0da97 Record server startup time in ProfileEvents
Implementation:
* Added ProfileEvents::ServerStartupTime.
* Recorded time from start of main till listening to sockets.
Testing:
* Added a test 02532_profileevents_server_startup_time.sql
2023-01-13 13:47:54 +01:00
Azat Khuzhin
64e3677961 Avoid double hash calculation in HashedDictionary::getShard(StringRef)
Previously it was written this way because getShard() was a simple
module operation.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
2783850f08 Minor review fixes in HashedDictionary
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
6e0a7add93 Completelly exception safe HashedDictionary dtor
Previously there was one (even though very unlikely) case when the dtor
can throw - logging code or ThreadPool::wait.

Just guard the dtor with try/catch and done with it.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
74def83c5d Destroy hashtables for hashed dictionary in parallel only for sharded dict
Since there can be multiple hashtables, since each attribute uses it's
own hashtable.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
1c0e0ea1e4 Disable sharded dictionaries with updatable sources
Support of sharded dictionary for updatable sources is questionable
since:
- sharded dictionary developed for hashed dictionary with a huge number
  of keys
- updatable source requires storing the whole table in memory (due to
  how reload works)
- also it is an open question will it have some benefits from the
  updatable source or not, since using updatable source with a huge
  number of changes in the source does not looks optimal and on the
  other side if there are small amount of changes the you don't need
  sharded dictionary at all

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
c97991fce1 Use shared arena for HashedDictionary::blockToAttributes()
This should decrease number of allocations.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
01b100da61 Use shared arena in ParallelDictionaryLoader::createShardSelector() (and add missing rollback)
This should decrease number of allocations.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
64874824b4 Minor review fixes in HashedDictionary
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
77c1f07636 Make HashedDictionary::~HashedDictionary exception safe
Before it was possible for the desturctor to throw, in case of thread
allocation fails, rewrite it to trySchedule() and do sequential destroy
in this case.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
a3f189e191 Optimize sharded dictionaries with skewed distribution
In case of skewed distribution simple division by module will not give
you good distribution between shards and eventually this can lead to
performance the same as non-sharded dictionary (except for it will
occupy +1 thread for Block::scatter).

But if HashedDictionary::blockToAttributes() will not have calls to
HashedDictionary::getShard() this can be fixed by using a more complex
key-to-shard (getShard()) mapping. And actually you do not need to call
getShard() in blockToAttributes() you can simply use passed shard, and
that's it.

And by wrapping key with intHash64() in getShard() skewed distribution
can be fixed.

Note, that previously I tried similar approach but did not removed
getShard() from blockToAttributes(), that's why it failed.

And now it works almost as fast as with simple createBlockSelector(),
just 13.6% slower (18.75min vs 16.5min, with 16 threads).

Note, that I've also tried to add libdivide for this, but it does not
improves the performance.

I've also tried the approach without scatter, and it works 20% slower
then this one (22.5min VS 18.75min, with 16 threads).

v2: Use intHashCRC32() over intHash64() for HashedDictionary::getShard()
    (with intHash64() it works very slower, almost 2x slower, there was
    18min with 32 threads)

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
655a564280 Parallel hash tables destroy for hashed dictionaries
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
99063b152f Allow to configure queue backlog of the parallel hashed dictionary loader
v2: Decrease default parallel_queue_backlog to 10000 (same speed)
v3: Rename parallel_queue_backlog to per_shard_load_backlog
v3: Rename per_shard_load_backlog to shard_load_queue_backlog
v4: Fix documentation
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
79ad81dfdf Implement separate queue for parallel loader of hashed dictionaries
Previous patches in this series has a bottleneck in rehash(). This is
the most slowest operation when insert lots of rows into the hashtable
and eventually all that thread pool sometimes work as the most slowest
thread since we did not have any queue of blocks.

This patch adds such queue and now it scales linearly, so initialy with
1 thread I had ~4 hours for 10e9 elements (UInt64 key, UInt16 value),
after this patch it works in 16 minutes with 16 threads (well actually I
have to use 32 threads because of distribution of data in the source
table).

And now with 16 threads it works 16 times faster.

Also this patch adds more optimal block splitting for the non-complex
dictionaries, and usual block splitting for complex dictionaries.
But anyway this moves the overhead from the loading into the hashtable
threads out to the reader thread, and this is better, since reader does
not uses that much CPU.

v2: fix use-after-free on failed load (add missing wait in dtor)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
5d0fd3cdc4 Remove sharded overhead for non-sharded hashed dictionaries
By adding one more template parameter - HashedDictionary<sharded> (yes,
it is already too much of them, for the template class that has explicit
instantion).

Since perf tests [1] shows 20% slowdown.

  [1]: https://s3.amazonaws.com/clickhouse-test-reports/40003/8f0cf2d6b8a7df511afe901331d5e2c7b06c0b4d/performance_comparison_[1/4]/report.html

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:26 +01:00
Azat Khuzhin
345c422e28 Add ability to load hashed dictionaries using multiple threads
Right now dictionaries (here I will talk about only
HASHED/SPARSE_HASHED/COMPLEX_KEY_HASHED/COMPLEX_KEY_SPARSE_HASHED)
can load data only in one thread, since it uses one hash table that
cannot be filled from multiple threads.

And in case you have very big dictionary (i.e. 10e9 elements), it can
take a awhile to load them, especially for SPARSE_HASHED variants (and
if you have such amount of elements there, you are likely use
SPARSE_HASHED, since it requires less memory), in my env it takes ~4
hours, which is enormous amount of time.

So this patch add support of shards for dictionaries, number of shards
determine how much hash tables will use this dictionary, also, and which
is more important, how much threads it can use to load the data.

And with 16 threads this works 2x faster, not perfect though, see the
follow up patches in this series.

v0: PARTITION BY
v1: SHARDS 1
v2: SHARDS(1)
v3: tried optimized mod - logical and, but it does not gain even 10%
v4: tried squashing more (max_block_size * shards), but it does not gain even 10% either
v5: move SHARDS into layout parameters (unknown simply ignored)
v6: tune params for perf tests (to avoid too long queries)
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2023-01-13 13:39:25 +01:00
Vladimir C
eefbffcc5b
Merge pull request #45230 from ClickHouse/vdimir/semi_join_null_const_bug 2023-01-13 13:22:57 +01:00
Anton Popov
71188c22ee fix race on 'relative_data_path' 2023-01-13 12:19:41 +00:00
vdimir
f881a82417
Fix viewExplain, add testcases 2023-01-13 12:19:25 +00:00
vdimir
bdb9222736
Support EXPLAIN SYNTAX oneline = 1 2023-01-13 12:18:58 +00:00
Alexander Tokmakov
51d94314d6
Merge pull request #45235 from ClickHouse/more_verbose_logs_about_replication_log_entries
More verbose logs about replication log entries
2023-01-13 15:05:21 +03:00
Maksim Kita
44f4184e11
Merge pull request #44540 from kitaisreal/analyzer-support-distributed
Analyzer support distributed queries processing
2023-01-13 14:45:36 +03:00
Vitaly Baranov
00908dcc6c
Fix http requests without path for AWS. (#45238) 2023-01-13 12:35:39 +01:00
Nikolai Kochetov
6e9dd2af45
Merge pull request #42889 from guowangy/logical-optimizer-lowcardinality
Enable logical optimizer for LowCardinality regardless of short chain
2023-01-13 12:28:57 +01:00
vdimir
023162df1d
fix clang-tidy style 2023-01-13 11:25:07 +00:00
Robert Schulze
d7d3f61c73
Cleanup SourceFromChunks a bit 2023-01-13 10:57:31 +00:00
Robert Schulze
88df1df3e6
Fix Darwin build 2023-01-13 10:26:49 +00:00
Robert Schulze
9779d034eb
Merge pull request #45144 from ClibMouse/crc-power-fix
Changes to support the CRC32 in PowerPC.
2023-01-13 11:24:18 +01:00
Maksim Kita
296dc5006d Fixed tests 2023-01-13 10:59:26 +01:00
simpleton
45842da72e
Merge branch 'master' into master 2023-01-13 17:42:36 +08:00
Alexander Gololobov
d850225f6b
Merge pull request #45229 from CurtizJ/fix-rare-logical-error
Fix rare logical error: `Too large alignment`
2023-01-13 09:48:28 +01:00
Antonio Andelic
99548c8c15 Merge branch 'master' into fix-crash-kv-store 2023-01-13 08:42:08 +00:00
taiyang-li
de5474c9f9 optimize match(a, '.*') 2023-01-13 14:55:54 +08:00
taiyang-li
45df745011 add fast path for like '%%' or like '%' 2023-01-13 12:20:03 +08:00
Robert Schulze
15e11741cb
Cosmetics 2023-01-13 00:00:23 +00:00
Ilya Yatsishin
ba05646dff
Merge pull request #45222 from ClickHouse/fuzz_prewhere
Fuzz PREWHERE clause
2023-01-13 00:45:21 +01:00