Prevous implementation (DBMS_MIN_REVISION_WITH_INTERSERVER_SECRET)
accepts the salt from the client, which make it useless.
Reimplement the protocol to send the salt by the server and use it in
the client instead.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Add new engine to ReplacingMergeTree corresponding to the ReplacingCollapsingMergeTree
* Add new test for the new ReplacingMergeTree engine
* Limit sign value to -1/1
* Add new engine to ReplacingMergeTree corresponding to the ReplacingCollapsingMergeTree
* Add new test for the new ReplacingMergeTree engine
* Limit sign value to -1/1
* Replace sign column(Int8) by is_deleted(UInt8)
* Add new engine to ReplacingMergeTree corresponding to the ReplacingCollapsingMergeTree
* Add new test for the new ReplacingMergeTree engine
* Limit sign value to -1/1
* Replace sign column(Int8) by is_deleted(UInt8)
* Add new engine to ReplacingMergeTree corresponding to the ReplacingCollapsingMergeTree
* Add new test for the new ReplacingMergeTree engine
* Limit sign value to -1/1
* Replace sign column(Int8) by is_deleted(UInt8)
* Add keyword 'CLEANUP' when OPTIMIZE
* Cleanup uniquely when it's a replacingMergeTree
* Propagate CLEANUP information and change from 'with_cleanup' to 'cleanup'
* Cleanup data flagged as 'is_deleted'
* Fix merge when optimize and add a test
* Fix OPTIMIZE and INSERT + add tests
* New fix for cleanup at the merge
* Cleanup debug logs
* Add the SETTINGS option 'clean_deleted_rows' that can be 'never' or 'always'
* Fix regression bug; Now REplicatedMergeTree can be called as before without 'is_deleted'
* Add Replicated tests
* Disable tag 'long' for our test and cleanup some white spaces
* Update tests
* Fix tests and remove additional useless whitespace
* Fix replica test
* Style clean && add condition check for is_deleted values
* clean_deleted_rows settings is nom an enum
* Add valid default value to the clean_deleted_rows settings
* Update cleanup checkers to use the enum and fix typos in the test
* Fix submodule contrib/AMQP-CPP pointer
* Add missing messages in test reference and remove a print with non derterministic order
* fix replica test reference
* Fix edge case
* Fix a typo for the spell checker
* Fix reference
* Fix a condition to raise an error if is_deleted differ from 0/1 and cleanup
* Change tests file name and update number
* This should fix the ReplacingMergeTree parameter set
* Fix replicated parameters
* Disable allow_deprecated_syntax_for_merge_tree for our new column
* Fix a test
* Remove non deterministic order print in the test
* Test on replicas
* Remove a condition, when checking optional parameters, that should not be sueful since we disabled the deprected_syntaxe
* Revert "Remove a condition, when checking optional parameters, that should not be useful since we disabled the deprected_syntaxe"
This reverts commit b65d64c05e.
* Fix replica management and limit the number of argument to two maximum, due to the possiblity of deprecated table create/attach failing otherwise
* Test a fix for replicated log information error
* Try to add sync to have consistent results
* Change path of replicas that should cause one issue and add few prints in case it's not that
* Get cleanup info on replicas only if information found
* Fix style issues
* Try to avoid replication error 'cannot select parts...' and and replica read/write field order
* Cleanup according to PR reviews
and add tests on error raised.
* Update src/Storages/MergeTree/registerStorageMergeTree.cpp
Co-authored-by: Alexander Tokmakov <tavplubix@gmail.com>
* Select ... FINAL don't show rows with is_deleted = true
* Update and fix SELECT ... FINAL merge parameter
* Remove is_deleted rows only on the version inserted when merge
* Fix (master) updates issues
* Revert changes that should not be commited
* Add changes according to review
* Revert changes that should not be commited - part 2
---------
Co-authored-by: Alexander Tokmakov <tavplubix@gmail.com>
Previously the following query does not works correctly:
SELECT number FROM numbers(5) SETTINGS output_format_json_array_of_rows = 1 FORMAT JSONEachRow
While this one works OK:
SELECT number FROM numbers(5) FORMAT JSONEachRow SETTINGS output_format_json_array_of_rows = 1
The problem is in which AST those settings are stored, use the logic as
executeQuery() has to apply them:
c83f701696/src/Interpreters/executeQuery.cpp (L467-L497)
Note, the only problem should be with the settings for FORMAT, since
client applies thoes settings (and formats) locally w/o server, while in
case of i.e. HTTP it will be applied on the server and everything will
works fine.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Under certain conditions it is possible for skim to overlap the prompt,
well, not overlap, but not re-render it, and so the client does not have
a nice prompt.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* save format string for NetException
* format exceptions
* format exceptions 2
* format exceptions 3
* format exceptions 4
* format exceptions 5
* format exceptions 6
* fix
* format exceptions 7
* format exceptions 8
* Update MergeTreeIndexGin.cpp
* Update AggregateFunctionMap.cpp
* Update AggregateFunctionMap.cpp
* fix
ASAN report:
Code: 586. DB::ErrnoException: Cannot create file: /src/.clickhouse_history, errno: 2, strerror: No such file or directory. (CANNOT_CREATE_FILE)
=================================================================
==1==ERROR: AddressSanitizer: heap-use-after-free on address 0x6240000208f0 at pc 0x000030d22ade bp 0x7ffff2ff3f70 sp 0x7ffff2ff3f68
READ of size 8 at 0x6240000208f0 thread T2
#0 0x30d22add in DB::ProcessList::insert() build_docker/../src/Interpreters/ProcessList.cpp:89:36
#1 0x31411018 in DB::executeQueryImpl() build_docker/../src/Interpreters/executeQuery.cpp:516:60
#2 0x3140e1ab in DB::executeQuery() build_docker/../src/Interpreters/executeQuery.cpp:1083:30
#3 0x3364391e in DB::LocalConnection::sendQuery() build_docker/../src/Client/LocalConnection.cpp:119:21
#4 0x3367bab0 in DB::Suggest::fetch() build_docker/../src/Client/Suggest.cpp:141:16
#5 0x336820eb in void DB::Suggest::load<DB::LocalConnection>()::'lambda'()::operator()() const build_docker/../src/Client/Suggest.cpp:118:17
0x6240000208f0 is located 2032 bytes inside of 7056-byte region [0x624000020100,0x624000021c90)
freed by thread T0 here:
#0 0xe381ef2 in operator delete(void*, unsigned long) (/wrk/clickhouse-asan+0xe381ef2) (BuildId: 6ea6d1a5d2d5a164f60f0fd8230936305bc8d9d0)
#1 0x335509fe in DB::ClientBase::~ClientBase() build_docker/../src/Client/ClientBase.cpp:293:25
#2 0x1f809bd5 in mainEntryClickHouseLocal(int, char**) build_docker/../programs/local/LocalServer.cpp:804:5
#3 0xe3856ad in main build_docker/../programs/main.cpp:482:12
#4 0x7ffff7dc0082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16
previously allocated by thread T0 here:
#0 0xe38128d in operator new(unsigned long) (/wrk/clickhouse-asan+0xe38128d) (BuildId: 6ea6d1a5d2d5a164f60f0fd8230936305bc8d9d0)
#1 0x2f34a7f3 in std::__1::__unique_if<DB::ContextSharedPart>::__unique_single std::__1::make_unique[abi:v15003]<DB::ContextSharedPart>() build_docker/../contrib/libcxx/include/__memory/unique_ptr.h:714:28
#2 0x2f34a7f3 in DB::Context::createShared() build_docker/../src/Interpreters/Context.cpp:603:32
#3 0x1f7f901d in DB::LocalServer::processConfig() build_docker/../programs/local/LocalServer.cpp:535:22
#4 0x1f7f4d92 in DB::LocalServer::main() build_docker/../programs/local/LocalServer.cpp:419:5
#5 0x3af24ffe in Poco::Util::Application::run() build_docker/../contrib/poco/Util/src/Application.cpp:334:8
#6 0x1f809bca in mainEntryClickHouseLocal(int, char**) build_docker/../programs/local/LocalServer.cpp:803:20
#7 0xe3856ad in main build_docker/../programs/main.cpp:482:12
#8 0x7ffff7dc0082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16
Thread T2 created by T0 here:
#0 0xe32fedc in pthread_create (/wrk/clickhouse-asan+0xe32fedc) (BuildId: 6ea6d1a5d2d5a164f60f0fd8230936305bc8d9d0)
#1 0x336806df in std::__1::__libcpp_thread_create[abi:v15003](unsigned long*, void* (*)(void*), void*) build_docker/../contrib/libcxx/include/__threading_support:376:10
#2 0x336806df in std::__1:🧵:thread<void DB::Suggest::load<DB::LocalConnection>()::'lambda'(), void>() build_docker/../contrib/libcxx/include/thread:311:16
#3 0x3367ff5b in void DB::Suggest::load<DB::LocalConnection>(std::__1::shared_ptr<DB::Context const>, DB::ConnectionParameters const&, int) build_docker/../src/Client/Suggest.cpp:110:22
#4 0x3357fee9 in DB::ClientBase::runInteractive() build_docker/../src/Client/ClientBase.cpp:2066:22
#5 0x1f7f5264 in DB::LocalServer::main() build_docker/../programs/local/LocalServer.cpp
#6 0x3af24ffe in Poco::Util::Application::run() build_docker/../contrib/poco/Util/src/Application.cpp:334:8
#7 0x1f809bca in mainEntryClickHouseLocal(int, char**) build_docker/../programs/local/LocalServer.cpp:803:20
#8 0xe3856ad in main build_docker/../programs/main.cpp:482:12
#9 0x7ffff7dc0082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
- lots of static_cast
- add safe_cast
- types adjustments
- config
- IStorage::read/watch
- ...
- some TODO's (to convert types in future)
P.S. That was quite a journey...
v2: fixes after rebase
v3: fix conflicts after #42308 merged
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
In case of possible EINTR (i.e. query profiler) it is possible for
select() from getReplicaForReading() (this is the stage when the
initiator is waiting for Cancel packet from the remote shards, that can
be sent in case of enough rows was read, or the query had been cancelled
explicitly) to return without any sockets ready, and
getReplicaForReading() will assume that the timeout happened.
Here is a stacktrace example:
[ 59205 ] {04f3d3a4-7346-4ef2-bf57-928f9e55ed89} <Error> TCPHandler: Code: 159. DB::Exception: Received from b8:9000. DB::Exception: Timeout (-1000 ms) exceeded while reading from . Stack trace:
0. Poco::Exception::Exception() @ 0x17e26eac in /usr/bin/clickhouse
1. DB::Exception::Exception() @ 0xb550b9a in /usr/bin/clickhouse
2. DB::Exception::Exception<>() @ 0x15ad1c81 in /usr/bin/clickhouse
3. DB::MultiplexedConnections::getReplicaForReading(bool) @ 0x15ad16fc in /usr/bin/clickhouse
4. DB::MultiplexedConnections::receivePacketUnlocked() @ 0x15ad02fd in /usr/bin/clickhouse
5. DB::MultiplexedConnections::drain() @ 0x15ad0df8 in /usr/bin/clickhouse
6. DB::ConnectionCollector::drainConnections(DB::IConnections&, bool) @ 0x1443c205 in /usr/bin/clickhouse
7. DB::RemoteQueryExecutor::finish() @ 0x1445ea6a in /usr/bin/clickhouse
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Previously before the exception there is a loop for disconnecting, so
dumpAddress* will not return anything, fix this, by saving the
addresses before.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
This makes the target location consistent with other auto-generated
files like config_formats.h, config_core.h, and config_functions.h and
simplifies the build of clickhouse_common.
This makes the target location consistent with other auto-generated
files like config_formats.h, config_core.h, and config_functions.h and
simplifies the build of clickhouse_common.
DNS_ERROR would cause the replica to not be marked as unusable, resulting in the replica being repeatedly reattempted on subsequent queries and for connection failover to break.
(This is common in Kubernetes setups where a replica has failed and it's DNS record is returning NXDOMAIN)
On SELECT, would additionally result in an intermittent query error if the failed replica is chosen:
"Code: 198. DB::Exception: Received from localhost:9000. DB::Exception: Not found address of host: chi-clickhouse-main-2-0: While executing Remote. (DNS_ERROR)"
This is the initial implement of Kusto Query Language.
in this commit, we support the following features as MVP :
Tabular expression statements
Limit returned results
Select Column (basic project)
sort, order
Perform string equality operations
Filter using a list of elements
Filter using common string operations
Some string operators
Aggregate by columns
Base aggregate functions
only support avg, count ,min, max, sum
Aggregate by time intervals
Implementation:
- Added a new buffer ForkWriteBuffer takes a vector of WriteBuffer and writes data to all of them. It uses the buffer of the first element as its buffer and copies data from first buffer to all the other buffers
Testing:
- Updated tests/queries/0_stateless/02346_into_outfile_and_stdout.sh
Documentation:
- Updated the english documentation for SELECT.. INTO OUTFILE with AND STDOUT.
A simple HelloWorld program with zero includes except iostream triggers
a build of ca. 2000 source files. The reason is that ClickHouse's
top-level CMakeLists.txt overrides "add_executable()" to link all
binaries against "clickhouse_new_delete". This links against
"clickhouse_common_io", which in turn has lots of 3rd party library
dependencies ... Without linking "clickhouse_new_delete", the number of
compiled files for "HelloWorld" goes down to ca. 70.
As an example, the self-extracting-executable needs none of its current
dependencies but other programs may also benefit.
In order to restore access to the original "add_executable()", the
overriding version is now prefixed. There is precedence for a
"clickhouse_" prefix (as opposed to "ch_"), for example
"clickhouse_split_debug_symbols". In general prefixing makes sense also
because overriding CMake commands relies on undocumented behavior and is
considered not-so-great practice (*).
(*) https://crascit.com/2018/09/14/do-not-redefine-cmake-commands/
Implementation:
- Added a bool to ASTQueryWithOutput & patched the usage in ClientBase.
- Added a new buffer TeeWriteBuffer which extends from WriteBufferFromFile (used to write data to the file) and has WriteBufferFromFileDescriptor (used to write data to stdout). The WriteBufferFromFileDescriptor uses the same buffer as TeeWriteBuffer.
- Added a new bool select_into_outfile_and_stdout in ClientBase to enable/disable progress rendering.
Testing:
- Added a test tests/queries/0_stateless/02346_into_outfile_and_stdout.sh
Documentation:
- Updated the english documentation for the new option in SELECT.
- TSA is a static analyzer build by Google which finds race conditions
and deadlocks at compile time.
- It works by associating a shared member variable with a
synchronization primitive that protects it. The compiler can then
check at each access if proper locking happened before. A good
introduction are [0] and [1].
- TSA requires some help by the programmer via annotations. Luckily,
LLVM's libcxx already has annotations for std::mutex, std::lock_guard,
std::shared_mutex and std::scoped_lock. This commit enables them
(--> contrib/libcxx-cmake/CMakeLists.txt).
- Further, this commit adds convenience macros for the low-level
annotations for use in ClickHouse (--> base/defines.h). For
demonstration, they are leveraged in a few places.
- As we compile with "-Wall -Wextra -Weverything", the required compiler
flag "-Wthread-safety-analysis" was already enabled. Negative checks
are an experimental feature of TSA and disabled
(--> cmake/warnings.cmake). Compile times did not increase noticeably.
- TSA is used in a few places with simple locking. I tried TSA also
where locking is more complex. The problem was usually that it is
unclear which data is protected by which lock :-(. But there was
definitely some weird code where locking looked broken. So there is
some potential to find bugs.
*** Limitations of TSA besides the ones listed in [1]:
- The programmer needs to know which lock protects which piece of shared
data. This is not always easy for large classes.
- Two synchronization primitives used in ClickHouse are not annotated in
libcxx:
(1) std::unique_lock: A releaseable lock handle often together with
std::condition_variable, e.g. in solve producer-consumer problems.
(2) std::recursive_mutex: A re-entrant mutex variant. Its usage can be
considered a design flaw + typically it is slower than a standard
mutex. In this commit, one std::recursive_mutex was converted to
std::mutex and annotated with TSA.
- For free-standing functions (e.g. helper functions) which are passed
shared data members, it can be tricky to specify the associated lock.
This is because the annotations use the normal C++ rules for symbol
resolution.
[0] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html
[1] https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42958.pdf
In PR #37300, Alexej asked why we the compiler does not warn about
unnecessary semicolons, e.g.
f()
{
}; // <-- here
The answer is surprising: In C++98, above syntax was disallowed but by
most compilers accepted it regardless. C++>11 introduced "empty
declarations" which made the syntax legal.
The previous behavior can be restored using flag
-Wc++98-compat-extra-semi. This finds many useless semicolons which were
removed in this change. Unfortunately, there are also false positives
which would require #pragma-s and HAS_* logic (--> check_flags.cmake) to
suppress. In the end, -Wc++98-compat-extra-semi comes with extra effort
for little benefit. Therefore, this change only fixes some semicolons
but does not enable the flag.
Before tests can fail if there was implicit reconnect, with queries
left, and without referenced PR, it requires manual debugging to know
that the reason was reconnect.
But the problem is, that the server does send EndOfStream but hanged
after, but before removing this query from the system.processes.
But after adding is_all_data_sent (#36816, #36767, #36649),
clickhouse-test can check queries left only for which server did not
sent EndOfStream/Exception.
In other words after adding is_all_data_sent, it should not be possible
to have queries left in such cases.
Reverts: 53be9c5d0c
Reverts: #36587
The original motivation for this commit was that shared_ptr_helper used
std::shared_ptr<>() which does two heap allocations instead of
make_shared<>() which does a single allocation. Turned out that
1. the affected code (--> Storages/) is not on a hot path (rendering the
performance argument moot ...)
2. yet copying Storage objects is potentially dangerous and was
previously allowed.
Hence, this change
- removes shared_ptr_helper and as a result all inherited create() methods,
- instead, Storage objects are now created using make_shared<>() by the
caller (for that to work, many constructors had to be made public), and
- all Storage classes were marked as noncopyable using boost::noncopyable.
In sum, we are (likely) not making things faster but the code becomes
cleaner and harder to misuse.
The check is currently *not* part of .clang-tidy. It complains about:
(1) "switch has multiple consecutive identical branches"
(2) "repeated branch in conditional chain"
About (1): Lots of findings in switches were about redundant
"[[fallthrough]]" in places where the compiler would not warn anyways. I
have cleaned these up.
About (2): In if-else_if-else chains, fixing the warning would usually
mean concatenating multiple if-conditions. As this would reduce
readability in most cases, I did not fix these places.
Because of (2), I also refrained from adding "bugprone-branch-clone" to
.clang-tidy.
One example of such error is NO_ROW_DELIMITER [1]:
$ clickhouse-client --stacktrace --multiquery <<<"SELECT * FROM no_length_delimiter_protobuf_00825 FORMAT ProtobufSingle SETTINGS format_schema = '$PWD/tests/queries/0_stateless/format_schemas/00825_protobuf_format_no_length_delimiter:Message'"
Error on processing query: Code: 546. DB::Exception: The ProtobufSingle format can't be used to write multiple rows because this format doesn't have any row delimiter. (NO_ROW_DELIMITER), Stack trace (when copying this message, always include the lines below):
...
3. /build/build_docker/../src/Common/Exception.cpp:56: DB::Exception::Exception()
4. /build/build_docker/../src/Processors/Formats/Impl/ProtobufRowOutputFormat.cpp:43: DB::ProtobufRowOutputFormat::write()
5. /build/build_docker/../src/Processors/Formats/IRowOutputFormat.cpp:34: DB::IRowOutputFormat::consume()
6. /build/build_docker/../src/Processors/Formats/IOutputFormat.cpp:115: DB::IOutputFormat::write()
7. /build/build_docker/../src/Client/ClientBase.cpp:398: DB::ClientBase::onData()
[1]: https://s3.amazonaws.com/clickhouse-test-reports/35865/b862fa7f3ede7d30b7c606c06e7fe1e273b49d32/stateless_tests__thread__actions__[3/3].html
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Add allow_settings_after_format_in_insert setting, OFF by default.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
v2: s/parser_settings_after_format_compact/allow_settings_after_format_in_insert/ (suggested by vitlibar)
v3: replace ParserSettings with a flag (requested by vitlibar)
In case of format error (i.e. select 2 format Template settings
format_template_row='/dev/null') the client will reset the connection,
since it will not expect Data blocks.
To fix this, catch this client error and properly cancel the query. This
should also fix query hang checks (the one that executed after each test).
v2: use getCurrentExceptionMessage()/getCurrentExceptionCode()
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Use INITIAL_QUERY for clickhouse-benchmark
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Fix parallel_reading_from_replicas with clickhouse-bechmark
Before it produces the following error:
$ clickhouse-benchmark --stacktrace -i1 --query "select * from remote('127.1', default.data_mt) limit 10" --allow_experimental_parallel_reading_from_replicas=1 --max_parallel_replicas=3
Loaded 1 queries.
Logical error: 'Coordinator for parallel reading from replicas is not initialized'.
Aborted (core dumped)
Since it uses the same code, i.e RemoteQueryExecutor ->
MultiplexedConnections, which enables coordinator if it was requested
from settings, but it should be done only for non-initial queries, i.e.
when server send connection to another server.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
* Fix 02226_parallel_reading_from_replicas_benchmark for older shellcheck
By shellcheck 0.8 does not complains, while on CI shellcheck 0.7.0 and
it does complains [1]:
In 02226_parallel_reading_from_replicas_benchmark.sh line 17:
--allow_experimental_parallel_reading_from_replicas=1
^-- SC2191: The = here is literal. To assign by index, use ( [index]=value ) with no spaces. To keep as literal, quote it.
Did you mean:
"--allow_experimental_parallel_reading_from_replicas=1"
[1]: https://s3.amazonaws.com/clickhouse-test-reports/34751/d883af711822faf294c876b017cbf745b1cda1b3/style_check__actions_/shellcheck_output.txt
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
There was serveral issues:
1) there can be events for multiple threads in one block
2) lack of filter for thread_id = 0
3) thread_id was included into comparator
Fixes: #34719
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>