The short-circuit evaluation was implemented when applying the
saturable operators (and, or) on a vector of ColumnUInt8. However,
its control flow would be compiled as a series of conditional
branch instructions which are hard to predict by the hardware and
at the same time hinder the vectorization optimization by the
compiler. This commit removes the short-circuit and evaluates the
whole expression.
And use it for:
- MetadataStorageFromPlainObjectStorage
- MetadataStorageFromStaticFilesWebServer
This will allow to reduce ~100-200 lines of duplicated code, and plus
make the code less error prone.
Note, for now I tried to make this without behaviour changes.
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
The reason for removing is it because not compatible with restoring
(with send_metadata set) anyway:
- HDFS - is not compatible with send_metadata, and besides it's
implementaion is not correct, since it is simply `ls -l`, while the
following is required: `find . -maxdepth 1 -type f`
- Web - is not compatible with send_metadata anyway
- Local - is not compatible with send_metadata anyway
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Simple reproducer:
$ cmake
$ ninja contrib/llvm-project/llvm/lib/MC/MCParser/CMakeFiles/LLVMMCParser.dir/MasmParser.cpp.o # will have -std=c++14
$ touch CMakeLists.txt
$ cmake
$ ninja contrib/llvm-project/llvm/lib/MC/MCParser/CMakeFiles/LLVMMCParser.dir/MasmParser.cpp.o # will have -std=c++20 and fail
(fails because std::vector cannot work with opaque types anymore)
Fixes: #42249 (cc @rschu1ze)
In lowerUTF8()/upperUTF8() there is an SSE optimization that handles
16 byte at a time, but only for ASCII, for UTF8 symbols converion will
be done by symbol.
Consider the following example:
КВ АМ И СЖ
^ - offset is 15, length of sequence is 2
so first byte of a symbol is in first 16 bytes
second byte of a symbol is not ther
And in this case it will be handled incorrectly because it will try to
process oly these 16 bytes w/o looking forward.
This had been broken by #41286, before this patch it does not looks at
the row boundaries but only at the string end and so this sutation
wasn't possible.
Fixes: #42756
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
AFAICS it was there before since it was possible to overrun the
expected_end, since utf8.convert() was called with "src_end - src" not
"expected_end - src".
Refs: 5a21f3908b054a0efc90c65a12fbe151c74d90dc:dbms/include/DB/Functions/FunctionsString.h
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>