Commit Graph

1447 Commits

Author SHA1 Message Date
Azat Khuzhin
c1e70169d2 Suppress clang-analyzer-cplusplus.NewDelete in MsgPackRowInputFormat
Appartently there is some issue with clang-15, since even the following
example shows error [1].

  [1]: https://gist.github.com/azat/027f0e949ea836fc2e6269113ceb8752

clang-tidy report [1]:

    FAILED: src/CMakeFiles/dbms.dir/Processors/Formats/Impl/MsgPackRowInputFormat.cpp.o                                                                                                            /usr/bin/cmake -E __run_co_compile --launcher="prlimit;--as=10000000000;--data=5000000000;--cpu=1000;/usr/bin/ccache" --tidy=/usr/bin/clang-tidy-15 --source=/ch/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp -- /usr/bin/clang++-15 --target=x86_64-linux-gnu --sysroot=/ch/cmake/linux/../../contrib/sysroot/linux-x86_64/x86_64-linux-gnu/libc  -DAWS_SDK_VERSION_MAJOR=1 -DAWS_SDK_VERSION_MINOR=7 -DAWS_SDK_VERSION_PATCH=231 -DBOOST_ASIO_HAS_STD_INVOKE_RESULT=1 -DBOOST_ASIO_STANDALONE=1 -DCARES_STATICLIB -DCONFIGDIR=\"\" -DENABLE_MULTITARGET_CODE=1 -DENABLE_OPENSSL_ENCRYPTION -DHAS_RESERVED_IDENTIFIER -DHAVE_CONFIG_H -DLIBSASL_EXPORTS=1 -DLZ4_DISABLE_DEPRECATE_WARNINGS=1 -DOBSOLETE_CRAM_ATTR=1 -DOBSOLETE_DIGEST_ATTR=1 -DPLUGINDIR=\"\" -DPOCO_ENABLE_CPP11 -DPOCO_HAVE_FD_EPOLL -DPOCO_OS_FAMILY_UNIX -DSASLAUTHD_CONF_FILE_DEFAULT=\"\" -DSNAPPY_CODEC_AVAILABLE -DSTD_EXCEPTION_HAS_STACK_TRACE=1 -DUNALIGNED_OK -DWITH_COVERAGE=0 -DWITH_GZFILEOP -DX86_64 -DZLIB_COMPAT -D_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS -Iincludes/configs -I/ch/src -Isrc -Isrc/Core/include -I/ch/base/glibc-compatibility/memcpy -I/ch/base/base/.. -Ibase/base/.. -I/ch/contrib/cctz/include -I/ch/base/pcg-random/. -I/ch/contrib/miniselect/include -I/ch/contrib/zstd/lib -Icontrib/cyrus-sasl-cmake -I/ch/contrib/lz4/lib -I/ch/src/Common/mysqlxx/. -Icontrib/c-ares -I/ch/contrib/c-ares -I/ch/contrib/c-ares/include -isystem /ch/contrib/libcxx/include -isystem /ch/contrib/libcxxabi/include -isystem /ch/contrib/libunwind/include -isystem /ch/contrib/libdivide/. -isystem /ch/contrib/jemalloc-cmake/include -isystem /ch/contrib/llvm/llvm/include -isystem contrib/llvm/llvm/include -isystem /ch/contrib/abseil-cpp -isystem /ch/contrib/croaring/cpp -isystem /ch/contrib/croaring/include -isystem /ch/contrib/cityhash102/include -isystem /ch/contrib/boost -isystem /ch/contrib/poco/Net/include -isystem /ch/contrib/poco/Foundation/include -isystem /ch/contrib/poco/NetSSL_OpenSSL/include -isystem /ch/contrib/poco/Crypto/include -isystem /ch/contrib/boringssl/include -isystem /ch/contrib/poco/Util/include -isystem /ch/contrib/poco/JSON/include -isystem /ch/contrib/poco/XML/include -isystem /ch/contrib/replxx/include -isystem /ch/contrib/fmtlib-cmake/../fmtlib/include -isystem /ch/contrib/magic_enum/include -isystem /ch/contrib/double-conversion -isystem /ch/contrib/dragonbox/include -isystem /ch/contrib/re2 -isystem contrib/re2-cmake -isystem /ch/contrib/zlib-ng -isystem contrib/zlib-ng-cmake -isystem /ch/contrib/pdqsort -isystem /ch/contrib/xz/src/liblzma/api -isystem /ch/contrib/aws-c-common/include -isystem /ch/contrib/aws-c-event-stream/include -isystem /ch/contrib/aws/aws-cpp-sdk-s3/include -isystem /ch/contrib/aws/aws-cpp-sdk-core/include -isystem contrib/aws-s3-cmake/include -isystem /ch/contrib/snappy -isystem contrib/snappy-cmake -isystem /ch/contrib/msgpack-c/include -isystem /ch/contrib/fast_float/include -isystem /ch/contrib/librdkafka-cmake/include -isystem /ch/contrib/librdkafka/src -isystem contrib/librdkafka-cmake/auxdir -isystem /ch/contrib/cppkafka/include -isystem /ch/contrib/nats-io/src -isystem /ch/contrib/nats-io/src/adapters -isystem /ch/contrib/nats-io/src/include -isystem /ch/contrib/nats-io/src/unix -isystem /ch/contrib/libuv/include -isystem /ch/contrib/krb5/src/include -isystem contrib/krb5-cmake/include -isystem /ch/contrib/NuRaft/include -isystem /ch/contrib/poco/MongoDB/include -isystem contrib/mariadb-connector-c-cmake/include-public -isystem /ch/contrib/mariadb-connector-c/include -isystem /ch/contrib/mariadb-connector-c/libmariadb -isystem /ch/contrib/icu/icu4c/source/i18n -isystem /ch/contrib/icu/icu4c/source/common -isystem /ch/contrib/capnproto/c++/src -isystem /ch/contrib/arrow/cpp/src -isystem /ch/contrib/arrow-cmake/cpp/src -isystem contrib/arrow-cmake/cpp/src -isystem contrib/arrow-cmake/../orc/c++/include -isystem /ch/contrib/orc/c++/include -isystem contrib/avro-cmake/include -isystem /ch/contrib/avro/lang/c++/api -isystem /ch/contrib/openldap-cmake/linux_x86_64/include -isystem /ch/contrib/openldap/include -isystem /ch/contrib/sparsehash-c11 -isystem /ch/contrib/protobuf/src -isystem src/Server/grpc_protos -isystem /ch/contrib/grpc/include -isystem /ch/contrib/libhdfs3/include -isystem /ch/contrib/hive-metastore -isystem /ch/contrib/thrift/lib/cpp/src -isystem contrib/thrift-cmake -isystem /ch/contrib/azure/sdk/core/azure-core/inc-isystem /ch/contrib/azure/sdk/identity/azure-identity/inc -isystem /ch/contrib/azure/sdk/storage/azure-storage-common/inc -isystem /ch/contrib/azure/sdk/storage/azure-storage-blobs/inc -isystem /ch/contrib/s2geometry/src -isystem /ch/contrib/AMQP-CPP/include -isystem /ch/contrib/AMQP-CPP -isystem /ch/contrib/sqlite-amalgamation -isystem /ch/contrib/rocksdb/include -isystem /ch/contrib/libpqxx/include -isystem /ch/contrib/libpq -isystem /ch/contrib/libpq/include -isystem /ch/contrib/libstemmer_c/include -isystem /ch/contrib/wordnet-blast -isystem /ch/contrib/lemmagen-c/include -isystem /ch/contrib/simdjson/include -isystem /ch/contrib/rapidjson/include -isystem /ch/contrib/consistent-hashing --gcc-toolchain=/ch/cmake/linux/../../contrib/sysroot/linux-x86_64 -std=c++20 -fdiagnostics-color=always -Xclang -fuse-ctor-homing -fsized-deallocation  -UNDEBUG -gdwarf-aranges -pipe -mssse3 -msse4.1 -msse4.2 -mpclmul -mpopcnt -fasynchronous-unwind-tables -falign-functions=32 -mbranches-within-32B-boundaries -fdiagnostics-absolute-paths -fstrict-vtable-pointers -fexperimental-new-pass-manager -Wall -Wextra -Weverything -Wpedantic -Wno-zero -length-array -Wno-c++98-compat-pedantic -Wno-c++98-compat -Wno-c++20-compat -Wno-conversion -Wno-ctad-maybe-unsupported -Wno-disabled-macro-expansion -Wno-documentation-unknown-command -Wno-double-promotion -Wno-exit-time-destructors -Wno-float-equal -Wno-global-constructors -Wno-missing-prototypes -Wno-missing-variable-declarations -Wno-padded -Wno-switch-enum -Wno-undefined-func-template -Wno-unused-template -Wno-vla -Wno-weak-template-vtables -Wno-weak-vtables -Wno-thread-safety-negative -g -O0 -g -gdwarf-4 -fno-inline  -D_LIBCPP_DEBUG=0   -D OS_LINUX -I/ch/base -I/ch/contrib/magic_enum/include -include /ch/src/Core/iostream_debug_helpers.h -Werror -nostdinc++ -std=gnu++2a -MD -MT src/CMakeFiles/dbms.dir/Processors/Formats/Impl/MsgPackRowInputFormat.cpp.o -MF src/CMakeFiles/dbms.dir/Processors/Formats/Impl/MsgPackRowInputFormat.cpp.o.d -o src/CMakeFiles/dbms.dir/Processors/Formats/Impl/MsgPackRowInputFormat.cpp.o -c /ch/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp

    /ch/contrib/msgpack-c/include/msgpack/v1/detail/cpp11_zone.hpp:195:9: error: Attempt to free released memory [clang-analyzer-cplusplus.NewDelete,-warnings-as-errors]
            ::free(p);
            ^
    /ch/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp:509:5: note: Taking false branch
        if (buf.eof())
        ^
    /ch/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp:514:24: note: Assuming 'i' is not equal to field 'number_of_columns'
        for (size_t i = 0; i != number_of_columns; ++i)
                           ^~~~~~~~~~~~~~~~~~~~~~
    /ch/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp:514:5: note: Loop condition is true.  Entering loop body
        for (size_t i = 0; i != number_of_columns; ++i)
        ^
    /ch/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp:516:30: note: Calling 'MsgPackSchemaReader::readObject'
            auto object_handle = readObject();
                                 ^~~~~~~~~~~~
    /ch/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp:426:5: note: Taking false branch
        if (buf.eof())
        ^
    /ch/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp:433:5: note: Loop condition is true.  Entering loop body
        while (need_more_data)
        ^
    /ch/src/Processors/Formats/Impl/MsgPackRowInputFormat.cpp:438:29: note: Calling 'unpack'
                object_handle = msgpack::unpack(buf.position(), buf.buffer().end() - buf.position(), offset);
                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    /ch/contrib/msgpack-c/include/msgpack/v3/unpack.hpp:52:12: note: Calling 'unpack'
        return msgpack::v3::unpack(data, len, off, referenced, f, user_data, limit);
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    /ch/contrib/msgpack-c/include/msgpack/v3/unpack.hpp:35:5: note: Control jumps to the 'default' case at line 40
        switch(ret) {
        ^
    /ch/contrib/msgpack-c/include/msgpack/v3/unpack.hpp:41:9: note:  Execution continues on line 43
            break;
            ^
    /ch/contrib/msgpack-c/include/msgpack/v3/unpack.hpp:43:35: note: Calling '~unique_ptr'
        return msgpack::object_handle();
                                      ^
    /ch/contrib/libcxx/include/__memory/unique_ptr.h:269:19: note: Calling 'unique_ptr::reset'
      ~unique_ptr() { reset(); }
                      ^~~~~~~
    /ch/contrib/libcxx/include/__memory/unique_ptr.h:314:9: note: '__tmp' is non-null
        if (__tmp)
            ^~~~~
    /ch/contrib/libcxx/include/__memory/unique_ptr.h:314:5: note: Taking true branch
        if (__tmp)
        ^
    /ch/contrib/libcxx/include/__memory/unique_ptr.h:315:7: note: Calling 'default_delete::operator()'
          __ptr_.second()(__tmp);
          ^~~~~~~~~~~~~~~~~~~~~~
    /ch/contrib/libcxx/include/__memory/unique_ptr.h:54:5: note: Memory is released
        delete __ptr;
        ^~~~~~~~~~~~
    /ch/contrib/libcxx/include/__memory/unique_ptr.h:54:5: note: Calling 'zone::operator delete'
        delete __ptr;
        ^~~~~~~~~~~~
    /ch/contrib/msgpack-c/include/msgpack/v1/detail/cpp11_zone.hpp:195:9: note: Attempt to free released memory
            ::free(p);
            ^~~~~~~~~

  [1]: https://s3.amazonaws.com/clickhouse-builds/41046/9677898b3b234a5ba0371edaf719ea8890d084ff/binary_tidy/build_log.log

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-09-10 21:38:35 +02:00
Alexey Milovidov
fa62c7e982 Fix half of trash 2022-09-10 04:08:16 +02:00
avogar
6d5f9e5554 Proper implementation for rowFormat function, delete rowFormatNoNewLine function 2022-09-09 17:42:33 +00:00
Kruglov Pavel
c33aa54032
Fix 2022-09-09 17:53:26 +02:00
Kruglov Pavel
f669d305b6
Fix comment 2022-09-09 17:45:47 +02:00
zhenjial
bd9fabc3f7 code optimization, add test 2022-09-09 23:27:42 +08:00
avogar
ad68b7be0f Better 2022-09-09 15:01:45 +00:00
avogar
46a0318a36 Support JSONColumnsWithMetadata input format 2022-09-08 17:58:44 +00:00
zhenjial
469ceaa156 code optimization 2022-09-09 00:47:43 +08:00
avogar
c380decbbb Make better, add new settings 2022-09-08 16:07:20 +00:00
avogar
545be27f81 Merge branch 'master' of github.com:ClickHouse/ClickHouse into new-json-formats 2022-09-08 13:48:10 +00:00
Anton Popov
f0a404e2c8 Merge remote-tracking branch 'upstream/master' into HEAD 2022-09-06 15:51:16 +00:00
zhenjial
0f788d98f5 new implementation 2022-09-06 20:39:54 +08:00
zhenjial
18db90dcfc Record errors while reading text formats (CSV, TSV). 2022-09-06 17:19:15 +08:00
Kruglov Pavel
77071381e4
fix build 2022-09-02 16:37:33 +02:00
avogar
afc34dca41 Add new JSON formats, add improvements and refactoring 2022-09-01 19:00:24 +00:00
Kruglov Pavel
7a4a65bc36
Make better exception message in schema inference 2022-09-01 20:36:08 +02:00
Kruglov Pavel
f53aa86a20
Merge pull request #40485 from arthurpassos/fix-parquet-chunked-array-deserialization
Add support for extended (chunked) arrays for Parquet format
2022-09-01 19:40:40 +02:00
Alexey Milovidov
6b2e227c8b Fix integration test 2022-08-27 22:28:38 +02:00
Kruglov Pavel
e6e7f5db93
Merge pull request #40491 from mini4/fix-settings-input_format_tsv_skip_first_lines
Fix bug in settings input_format_tsv_skip_first_lines of format TSV
2022-08-24 15:57:45 +02:00
Kruglov Pavel
0781e8b4f7
Merge pull request #40534 from Avogar/nested-in-avro
Support reading Array(Record) into flatten nested table in Avro
2022-08-24 13:33:12 +02:00
kgurjev
f62c2c3221 Fix bug in settings input_format_tsv_skip_first_lines of format TSV 2022-08-24 10:02:57 +03:00
avogar
29a887578b Fix 2022-08-23 11:42:57 +00:00
avogar
581e569d04 Support reading Array(Record) into flatten nested table in Avro 2022-08-23 11:05:02 +00:00
Arthur Passos
f8e2ab0a20 Use FileReader::GetRecordBatchReader instead of FileReader::ReadRowGroup to parse Parquet 2022-08-22 08:21:32 -03:00
avogar
612ffaffde Make schema inference cache better, respect format settings that can change the schema 2022-08-19 16:39:13 +00:00
Kruglov Pavel
b67cb9e378
Merge pull request #40173 from Avogar/arrow-dict
Improve and fix dictionaries in Arrow format
2022-08-18 20:54:55 +02:00
Kruglov Pavel
09a2ff8843
Merge pull request #40293 from joshuataylor/feature/arrow-large-binary-string
Add support for LARGE_BINARY/LARGE_STRING with Arrow
2022-08-18 14:01:58 +02:00
avogar
a6318cecd5 Fix hive test 2022-08-18 11:32:42 +00:00
Nikolai Kochetov
5a85531ef7
Merge pull request #38286 from Avogar/schema-inference-cache
Add schema inference cache for s3/hdfs/file/url
2022-08-18 13:07:50 +02:00
Yakov Olkhovskiy
40fd6e189a
call readColumnWithStringData 2022-08-17 09:54:01 -04:00
Kruglov Pavel
19af748737
Fix typo 2022-08-17 14:29:09 +02:00
Kruglov Pavel
00d04456ff
Try reduce code duplication 2022-08-17 14:28:15 +02:00
avogar
8dd54c043d Merge branch 'master' of github.com:ClickHouse/ClickHouse into schema-inference-cache 2022-08-17 11:47:40 +00:00
Josh Taylor
628d2bbff5 Add support for LARGE_BINARY/LARGE_STRING with Arrow 2022-08-17 10:25:06 +08:00
avogar
99d8727335 Fix tests 2022-08-16 12:56:51 +00:00
avogar
936c457734 Remove unnended field 2022-08-16 09:51:52 +00:00
avogar
e1ff996ec3 Allow to specify structure hints in schema inference 2022-08-16 09:46:57 +00:00
Kruglov Pavel
2c5c0d6d47
Fix typo 2022-08-15 19:55:28 +02:00
avogar
ca0d883c0f Fix possible segfault in CapnProto input format 2022-08-15 15:36:18 +00:00
avogar
c160033837 Fix 2022-08-15 11:38:28 +00:00
avogar
78e197063c Better example 2022-08-12 19:08:36 +00:00
avogar
763f84b623 Remove bad comment 2022-08-12 19:05:57 +00:00
avogar
9addded80e Remove logging 2022-08-12 19:01:02 +00:00
avogar
000336622a Remove logging 2022-08-12 18:59:52 +00:00
avogar
398576e9c9 Improve and fix dictionaries in Arrow format 2022-08-12 18:56:21 +00:00
Kseniia Sumarokova
a6cfc7bc3b
Merge pull request #34651 from alexX512/master
New caching strategies
2022-08-12 17:23:37 +02:00
Anton Popov
3fdf428834
Merge pull request #39186 from Avogar/numbers-schema-inference
Add new features in schema inference
2022-08-11 00:53:54 +02:00
Arthur Passos
c4d8ad2222 Add docs 2022-08-09 15:58:46 -03:00
Arthur Passos
e724e7bef6 Update arrow dict to lc comment 2022-08-09 15:52:37 -03:00
Arthur Passos
6eb89fd780 Fix both arrow dict de-serialization and dict of nullable de-serialization 2022-08-09 15:06:22 -03:00
Arthur Passos
be1e32c3f1
Merge branch 'ClickHouse:master' into fix_arrow_column_dictionary_to_ch_lc 2022-08-09 15:04:06 -03:00
Kruglov Pavel
088e8cf9bd
Merge branch 'master' into numbers-schema-inference 2022-08-09 14:00:36 +02:00
Kruglov Pavel
99b9e85a8f
Merge pull request #39646 from Avogar/more-formats
Add more Pretty formats
2022-08-09 13:59:47 +02:00
avogar
1304e3487c Add comments, remove unneded stuff 2022-08-08 13:43:14 +00:00
avogar
2f95726b06 Fix comments 2022-08-08 12:41:00 +00:00
alexX512
6bf29cb610 Change class LRUCache to class CachBase. Check running CacheBase with default pcahce policy SLRU 2022-08-07 19:59:30 +00:00
avogar
9b1a267203 Refactor, remove TTL, add size limit, add system table and system query 2022-08-05 16:20:15 +00:00
Arthur Passos
62d48053c0 Use insertDefault instead of insert(0) 2022-08-04 15:53:44 -03:00
Arthur Passos
c307e9a228 Fix ArrowColumn dictionary to CH low cardinality conversion 2022-08-04 15:34:44 -03:00
Kruglov Pavel
235649cb98
Merge pull request #39458 from Avogar/fix-cancel-insert-into-function
Fix WriteBuffer finalize when cancel insert into function
2022-08-04 13:02:08 +02:00
Kruglov Pavel
6b2186bfeb
Merge branch 'master' into numbers-schema-inference 2022-08-02 19:34:53 +02:00
Kruglov Pavel
42136b7630
Merge pull request #39647 from Avogar/fix-arrow-strings
Fix strings in dictionary in Arrow format
2022-08-01 12:46:07 +02:00
Alexey Milovidov
4828be7fc4 Fix double escaping in the metadata of FORMAT JSON 2022-07-30 23:56:41 +02:00
Kruglov Pavel
ccd1e1bdb8
Merge branch 'master' into fix-cancel-insert-into-function 2022-07-29 20:27:32 +02:00
avogar
01a309d4e3 Fix strings in dictionary in Arrow format 2022-07-27 12:02:27 +00:00
avogar
f925046dc4 Add more Pretty formats 2022-07-27 11:37:02 +00:00
Kruglov Pavel
381ea139c2
Merge branch 'master' into schema-inference-cache 2022-07-27 11:35:36 +02:00
Kruglov Pavel
53159db782
Merge branch 'master' into numbers-schema-inference 2022-07-26 12:32:49 +02:00
Kruglov Pavel
83c7da6e88
Merge branch 'master' into fix-protobuf-capnp-empty-message 2022-07-25 13:02:41 +02:00
Alexey Milovidov
388d06fda1
Merge pull request #39535 from ClickHouse/stringref
Less usage of StringRef
2022-07-25 04:06:11 +03:00
Robert Schulze
4333750985
Less usage of StringRef
... replaced by std::string_view, see #39262
2022-07-24 18:33:52 +00:00
Alexander Tokmakov
bed2206ae9
Merge pull request #39460 from ClickHouse/remove_some_dead_and_commented_code
Remove some dead and commented code
2022-07-22 13:24:34 +03:00
avogar
794aa691bc Merge branch 'master' of github.com:ClickHouse/ClickHouse into fix-protobuf-capnp-empty-message 2022-07-21 17:04:37 +00:00
Kruglov Pavel
9252f42b4c
Merge branch 'master' into schema-inference-cache 2022-07-21 18:59:14 +02:00
avogar
fd534aa3fa wqMerge branch 'master' of github.com:ClickHouse/ClickHouse into numbers-schema-inference 2022-07-21 15:43:17 +00:00
Alexander Tokmakov
a8da5d96fc remove some dead and commented code 2022-07-21 15:05:48 +02:00
avogar
6b541aa98f Fix WriteBuffer finalize when cancel insert into function 2022-07-21 12:18:37 +00:00
Nikolai Kochetov
e15967e9db
Merge pull request #38475 from ClickHouse/additional-filters
Additional filters for a table (from setting)
2022-07-21 07:52:04 +02:00
Alexey Milovidov
844042fc18
Merge pull request #39433 from ClickHouse/revert-39396-try-fix-write-buffer-terminate
Revert "Fix WriteBuffer finalize in destructor when cacnel query"
2022-07-21 07:04:07 +03:00
Alexey Milovidov
dcda9d3bd1
Merge pull request #39365 from Avogar/fix-capnproto-abort
Avoid possible abort() in CapnProto on exception descruction
2022-07-21 05:20:45 +03:00
Kruglov Pavel
92995a832b
Revert "Fix WriteBuffer finalize in destructor when cacnel query" 2022-07-21 01:45:16 +02:00
Nikolai Kochetov
91043351aa Fixing build. 2022-07-20 20:30:16 +00:00
Kruglov Pavel
46da17ca8c
Merge branch 'master' into numbers-schema-inference 2022-07-20 13:32:39 +02:00
Kruglov Pavel
3046cd6d29
Merge branch 'master' into schema-inference-cache 2022-07-20 13:30:42 +02:00
avogar
784ee11594 Add settings to skip fields with unsupported types in Protobuf/CapnProto schema inference 2022-07-20 11:16:25 +00:00
Kruglov Pavel
a1b63b4a02
Fix style 2022-07-20 12:07:22 +02:00
Kruglov Pavel
7722b647b7
Merge pull request #39396 from Avogar/try-fix-write-buffer-terminate
Fix WriteBuffer finalize in destructor when cacnel query
2022-07-20 12:06:20 +02:00
avogar
5c16d6b553 Fix WriteBuffer finalize in destructor when cacnel query 2022-07-19 19:21:30 +00:00
avogar
4f020654be Get rid of unneded ifdefs 2022-07-19 12:12:40 +00:00
avogar
6eb234a1cc Avoid abort() in capnproto on exception descruction 2022-07-18 19:53:24 +00:00
Robert Schulze
32637cb1b9
Fix build 2022-07-18 07:58:59 +00:00
Robert Schulze
13482af4ee
First try at reducing the use of StringRef
- to be replaced by std::string_view
- suggested in #39262
2022-07-17 17:26:02 +00:00
Robert Schulze
deda29b46b
Pass const StringRef by value, not by reference
See #39224
2022-07-15 11:34:56 +00:00
Kruglov Pavel
b38241b08a
Merge branch 'master' into schema-inference-cache 2022-07-14 12:29:54 +02:00
avogar
7cde9d3b40 Add new features in schema inference 2022-07-13 15:57:55 +00:00
vdimir
63aebd17b2 Remove TabSeparatedSorted 2022-07-12 20:22:35 +02:00
vdimir
46df417c2e Fix empty line sorting in TabSeparatedSorted 2022-07-12 20:22:35 +02:00
vdimir
f51b25b262 clickhouse test ignore order via special format 2022-07-12 20:22:35 +02:00
Kruglov Pavel
4080f055b6
Merge pull request #38477 from Avogar/sql-insert-format
Add SQLInsert output format
2022-07-04 15:06:33 +02:00
avogar
5b0fd31c64 Put column names in quotes 2022-06-30 16:14:30 +00:00
Antonio Andelic
de264117fd
Merge pull request #38118 from bigo-sg/storagehive_struct_type
Add struct type support in `StorageHive`
2022-06-30 09:11:13 +02:00
mergify[bot]
9482c99ab8
Merge branch 'master' into sql-insert-format 2022-06-29 11:03:07 +00:00
Robert Schulze
f692ead6ad
Don't use std::unique_lock unless we have to
Replace where possible by std::lock_guard which is more light-weight.
2022-06-28 19:19:06 +00:00
avogar
9bb68bc6de Add SQLInsert output format 2022-06-27 18:31:57 +00:00
avogar
5155262a16 Add some additional information to cache keys 2022-06-27 12:43:24 +00:00
lgbo-ustc
cd8e5c7c49 update headers 2022-06-23 17:43:54 +08:00
lgbo-ustc
96e6f9a2d0 fixed code style 2022-06-23 16:10:01 +08:00
lgbo-ustc
c1770c22b9 Merge remote-tracking branch 'ck/master' into storagehive_struct_type 2022-06-23 15:54:20 +08:00
Kseniia Sumarokova
e48ce50863
Update ArrowBufferedStreams.cpp 2022-06-20 19:12:51 +02:00
kssenii
5dd1bb2fd8 improvements for getFileSize 2022-06-20 15:22:56 +02:00
lgbo-ustc
8c629085e4 simplified code 2022-06-17 09:36:59 +08:00
lgbo-ustc
35d534c213 nested struct in struct 2022-06-16 16:45:05 +08:00
Alexey Milovidov
5e9e5a4eaf
Merge pull request #37525 from Avogar/avro-structs
Support Maps and Records, allow to insert null as default in Avro format
2022-06-15 00:04:29 +03:00
Kseniia Sumarokova
0ae2168fb6
Merge pull request #36328 from bigo-sg/async_hdfs_read_buffer
Apply read_method 'threadpool' for StorageHive
2022-06-10 15:04:21 +02:00
taiyang-li
9fd9ff66bd remove some test code 2022-06-09 09:55:50 +08:00
taiyang-li
c65c56fd48 fix typo 2022-06-07 09:58:29 +08:00
mergify[bot]
ddf7210ecc
Merge branch 'master' into remove-useless-code-2 2022-06-03 13:58:45 +00:00
taiyang-li
f202c35311 Merge branch 'master' into async_hdfs_read_buffer 2022-06-03 17:52:09 +08:00
Paul Loyd
32d267ec6c
Stop removing UTF-8 BOM in RowBinary* formats
Fixes #37420
2022-06-01 13:12:55 +08:00
Maksim Kita
bacee7f19c
Merge pull request #37195 from kitaisreal/merging-sorted-algorithm-single-column-specialization
MergingSortedAlgorithm single column specialization
2022-05-31 16:46:18 +02:00
taiyang-li
047387bf1c fix 2 bugs: 1. select count(1) from hive_table; 2. select _file, _path from hive_table 2022-05-31 17:39:02 +08:00
avogar
4c9812d4c1 Allow to skip some of the first rows in CSV/TSV formats 2022-05-25 15:00:11 +00:00
avogar
038a422aeb Add setting to insert null as default 2022-05-25 12:56:59 +00:00
avogar
7817d6aea3 Support Maps and Records in Avro format 2022-05-25 11:20:28 +00:00
Maksim Kita
83554d1f2d Fixed style 2022-05-25 13:05:39 +02:00
Maksim Kita
9a9df26eec Fixed tests 2022-05-25 11:44:37 +02:00
Kruglov Pavel
6c9a524f6b
Merge pull request #37192 from Avogar/formats-with-names
Improve performance and memory usage for select of subset of columns for some formats
2022-05-24 13:28:14 +02:00
avogar
3651ef93fe Fix performance test 2022-05-23 17:42:13 +00:00
avogar
034c7122be Mark JSONColumns supports subset of columns 2022-05-23 15:26:01 +00:00
avogar
ce4adb447f Fix named tuples output in ORC/Arrow/Parquet formats 2022-05-23 14:21:08 +00:00
Kruglov Pavel
f539fb835d
Merge branch 'master' into formats-with-names 2022-05-23 12:14:20 +02:00
Kruglov Pavel
ce48e8e102
Merge pull request #36975 from Avogar/json-columns-formats
Add columnar JSON formats
2022-05-23 12:11:28 +02:00
Kruglov Pavel
9bc74439c1
Merge pull request #37327 from Avogar/arrow-strings
Allow to use String type instead of Binary in Arrow/Parquet/ORC formats
2022-05-23 12:05:33 +02:00
mergify[bot]
747aa5575c
Merge branch 'master' into remove-useless-code-2 2022-05-22 17:41:57 +00:00
Kruglov Pavel
704c78063f
Fix special build 2022-05-20 19:54:02 +02:00
Anton Popov
cb0e6c2718 mark all operators bool() as explicit 2022-05-20 15:29:54 +00:00
avogar
566d1b15fd Merge branch 'master' of github.com:ClickHouse/ClickHouse into formats-with-names 2022-05-20 13:54:52 +00:00
avogar
d2304f5d15 Make better 2022-05-20 12:07:29 +00:00
avogar
a6a430c5ee Merge branch 'master' of github.com:ClickHouse/ClickHouse into json-columns-formats 2022-05-20 11:08:30 +00:00
mergify[bot]
1ac4199e78
Merge branch 'master' into arrow-strings 2022-05-20 10:43:33 +00:00
avogar
cd6a29897e Apply input_format_max_rows_to_read_for_schema_inference for all files in globs in total 2022-05-18 17:56:36 +00:00
Kruglov Pavel
d81616ff65
Remove unnecessary include 2022-05-18 17:44:39 +02:00
avogar
a0369fb9a6 Allow to use String type instead of Binary in Arrow/Parquet/ORC formats 2022-05-18 14:51:21 +00:00
avogar
12010a81b7 Make better 2022-05-18 09:25:26 +00:00
Robert Schulze
0c55ac76d2
A few clangtidy updates
Enable:

- bugprone-lambda-function-name: "Checks for attempts to get the name of
  a function from within a lambda expression. The name of a lambda is
  always something like operator(), which is almost never what was
  intended."

- bugprone-unhandled-self-assignment: "Finds user-defined copy
  assignment operators which do not protect the code against
  self-assignment either by checking self-assignment explicitly or using
  the copy-and-swap or the copy-and-move method.""

- hicpp-invalid-access-moved: "Warns if an object is used after it has
  been moved."

- hicpp-use-noexcept: "This check replaces deprecated dynamic exception
  specifications with the appropriate noexcept specification (introduced
  in C++11)"

- hicpp-use-override: "Adds override (introduced in C++11) to overridden
  virtual functions and removes virtual from those functions as it is
  not required."

- performance-type-promotion-in-math-fn: "Finds calls to C math library
  functions (from math.h or, in C++, cmath) with implicit float to
  double promotions."

Split up:

- cppcoreguidelines-*. Some of them may be useful (haven't checked in
  detail), therefore allow to toggle them individually.

Disable:

- linuxkernel-*. Obvious.
2022-05-17 20:56:57 +02:00
Kruglov Pavel
8572879c37
Remove redundant code 2022-05-16 17:58:20 +02:00
Robert Schulze
e3cfec5b09
Merge remote-tracking branch 'origin/master' into clangtidies 2022-05-16 10:12:50 +02:00
avogar
68bb07d166 Better naming 2022-05-13 18:39:19 +00:00
avogar
cef13c2c02 Allow to skip unknown columns in Native format 2022-05-13 14:27:15 +00:00
avogar
b17fec659a Improve performance and memory usage for select of subset of columns for some formats 2022-05-13 13:51:28 +00:00
mergify[bot]
4a661b6e78
Merge branch 'master' into json-columns-formats 2022-05-13 11:32:03 +00:00
avogar
02679c7222 Fix tests 2022-05-10 16:27:59 +00:00
avogar
ea0362b3a3 Fix tests 2022-05-10 16:20:38 +00:00
avogar
9abdacdd2e Remove logging 2022-05-09 13:30:41 +00:00
avogar
054318b555 Fix invalid output LowCardinality -> ArrowDictionary 2022-05-09 13:29:42 +00:00
avogar
1e8d7ae749 Fix 2022-05-09 11:29:40 +00:00
avogar
04fdd75c56 Make JSONColumns frormats mono block by default 2022-05-09 11:13:44 +00:00
Robert Schulze
1b81bb49b4
Enable clang-tidy modernize-deprecated-headers & hicpp-deprecated-headers
Official docs:

  Some headers from C library were deprecated in C++ and are no longer
  welcome in C++ codebases. Some have no effect in C++. For more details
  refer to the C++ 14 Standard [depr.c.headers] section. This check
  replaces C standard library headers with their C++ alternatives and
  removes redundant ones.
2022-05-09 08:23:33 +02:00
Robert Schulze
7d3913f350
Enable clang-tidy bugprone-assert-side-effect
Official docs:

  Finds assert() with side effect. The condition of assert() is
  evaluated only in debug builds so a condition with side effect can
  cause different behavior in debug / release builds.
2022-05-08 19:15:55 +02:00
avogar
3a13c3e372 Fix comments 2022-05-06 16:50:34 +00:00
avogar
62a7ba3f26 Add columnar JSON formats 2022-05-06 16:48:48 +00:00
Anton Popov
515f68eead Merge remote-tracking branch 'upstream/master' into dynamic-columns-14 2022-05-06 16:10:51 +00:00
Anton Popov
566c08086a support Object type inside other types 2022-05-06 14:44:00 +00:00
Anton Popov
13e8db6299
Merge pull request #36762 from CurtizJ/dynamic-columns-12
Fix insertion to columns of type `Object` from multiple files
2022-05-06 14:14:32 +02:00
Kruglov Pavel
77e55c344c
Merge pull request #36667 from Avogar/mysqldump-format
Add MySQLDump input format
2022-05-04 19:49:48 +02:00
Kruglov Pavel
ffec3655fe
Fix special build 2022-05-04 17:14:15 +02:00
mergify[bot]
64084b5e32
Merge branch 'master' into shared_ptr_helper3 2022-05-03 20:46:16 +00:00
Dmitry Novik
5ba7a55c18
Merge pull request #36650 from bigo-sg/hive_text_parallel_parsing
Parallel parsing of hive text format
2022-05-03 15:56:28 +02:00
Kruglov Pavel
d613f7eab0
Merge branch 'master' into mysqldump-format 2022-05-02 13:31:57 +02:00
Antonio Andelic
a1a22b0007
Merge pull request #35149 from ContentSquare/nullables_with_proto3
Nullables with proto3 using Google wrappers
2022-05-02 09:49:37 +02:00
Robert Schulze
330212e0f4
Remove inherited create() method + disallow copying
The original motivation for this commit was that shared_ptr_helper used
std::shared_ptr<>() which does two heap allocations instead of
make_shared<>() which does a single allocation. Turned out that
1. the affected code (--> Storages/) is not on a hot path (rendering the
performance argument moot ...)
2. yet copying Storage objects is potentially dangerous and was
   previously allowed.

Hence, this change

- removes shared_ptr_helper and as a result all inherited create() methods,

- instead, Storage objects are now created using make_shared<>() by the
  caller (for that to work, many constructors had to be made public), and

- all Storage classes were marked as noncopyable using boost::noncopyable.

In sum, we are (likely) not making things faster but the code becomes
cleaner and harder to misuse.
2022-05-02 08:46:52 +02:00
Robert Schulze
89aa9ae00f
Fixed clang-tidy check "bugprone-branch-clone"
The check is currently *not* part of .clang-tidy. It complains about:
(1) "switch has multiple consecutive identical branches"
(2) "repeated branch in conditional chain"

About (1): Lots of findings in switches were about redundant
"[[fallthrough]]" in places where the compiler would not warn anyways. I
have cleaned these up.

About (2): In if-else_if-else chains, fixing the warning would usually
mean concatenating multiple if-conditions. As this would reduce
readability in most cases, I did not fix these places.

Because of (2), I also refrained from adding "bugprone-branch-clone" to
.clang-tidy.
2022-04-30 19:40:28 +02:00
mergify[bot]
cc08ccb420
Merge branch 'master' into remove-useless-code-2 2022-04-30 12:48:15 +00:00
Jakub Kuklis
a1f2dd6d34 Adding two settings in place of one, improvements to the test clarity 2022-04-29 10:01:51 +02:00
Jakub Kuklis
507ba1042c Adding a setting to enable Google wrappers special treatment 2022-04-29 10:01:51 +02:00
Jakub Kuklis
6d5c1e2fc0 Adding a setting to enable special treatment of google wrappers 2022-04-29 10:01:50 +02:00
Amos Bird
4a5e4274f0
base should not depend on Common 2022-04-29 10:26:35 +08:00
Anton Popov
1fc51e09ff fix insertion to column of type Object from multiple files via table function 2022-04-28 18:51:13 +00:00
avogar
d295de1689 Fix comments and test 2022-04-28 14:59:35 +00:00
Kruglov Pavel
4d08587559
Merge branch 'master' into mysqldump-format 2022-04-28 15:58:18 +02:00
Kseniia Sumarokova
4c371f710e
Merge pull request #36676 from kssenii/refactor-with-size-buffer
Better version of SeekableReadBufferWithSize
2022-04-28 13:44:25 +02:00
taiyang-li
99aa5fdc81 remove useless code 2022-04-27 11:15:04 +08:00
vdimir
81b86799e7
Fixup PrometheusTextOutputFormat 2022-04-26 14:57:37 +00:00
vdimir
d5d98ed951
PrometheusTextOutputFormat: support lables, histograms and summaries 2022-04-26 14:57:36 +00:00
vdimir
be0aa06958
Add output format Prometheus 2022-04-26 14:57:35 +00:00
kssenii
9d364cdce2 Refactor 2022-04-26 15:33:53 +02:00
Kruglov Pavel
a462d94157
Fix error codes 2022-04-26 13:25:07 +02:00
Kruglov Pavel
e3b222b519
Fix typo 2022-04-26 13:24:10 +02:00
avogar
33d845dade Add MySQLDump input format 2022-04-26 10:42:56 +00:00
taiyang-li
b7cc344d62 remove useless codes 2022-04-26 14:42:43 +08:00
taiyang-li
99dee35b6e parallel parsing of hive text format 2022-04-26 14:33:10 +08:00
Kruglov Pavel
34c342fdd3
Merge pull request #36205 from Avogar/improve-globs
Some refactoring around schema inference with globs
2022-04-25 13:14:46 +02:00
avogar
80eacc8533 Merge branch 'master' of github.com:ClickHouse/ClickHouse into improve-json-schema-inference 2022-04-22 17:18:44 +00:00
Kseniia Sumarokova
33bb48106f
Merge pull request #36314 from CurtizJ/print-bad-filenames
Show names of erroneous files in case of parsing errors while executing table functions
2022-04-22 13:24:55 +02:00
mergify[bot]
e38a3c3595
Merge branch 'master' into alias 2022-04-21 15:02:30 +00:00
Maksim Kita
57444fc7d3
Merge pull request #36444 from rschu1ze/clang-tidy-fixes
Clang tidy fixes
2022-04-21 16:11:27 +02:00
mergify[bot]
1ba1cad5cf
Merge branch 'master' into improve-globs 2022-04-21 11:52:13 +00:00
Kruglov Pavel
a6186f7ba4
Merge pull request #36333 from ClickHouse/bool-sync-after-error
Fix tech debt for Bool and Map data types
2022-04-21 13:32:14 +02:00
Kruglov Pavel
813e228fcc
Merge branch 'master' into improve-globs 2022-04-20 16:31:47 +02:00
Anton Popov
d4df38a0e6 fix tests 2022-04-20 14:13:04 +00:00
Alexander Tokmakov
1d30a97fd2 Merge branch 'master' into remove-useless-code-2 2022-04-20 11:45:56 +02:00
Robert Schulze
b24ca8de52
Fix various clang-tidy warnings
When I tried to add cool new clang-tidy 14 warnings, I noticed that the
current clang-tidy settings already produce a ton of warnings. This
commit addresses many of these. Almost all of them were non-critical,
i.e. C vs. C++ style casts.
2022-04-20 10:29:05 +02:00
Anton Popov
bee4ca9b62 add more tests for error diagnostics in files 2022-04-19 15:56:34 +00:00
Anton Popov
3e361c9759 Merge remote-tracking branch 'upstream/master' into HEAD 2022-04-19 14:18:04 +00:00
Alexey Milovidov
f6ab2bd523
Merge pull request #36312 from ClickHouse/remove-arcadia
Remove remaining parts of Arcadia
2022-04-18 07:02:54 +03:00
Alexey Milovidov
242919eddd Remove abbreviation 2022-04-18 01:02:49 +02:00
mergify[bot]
4fed033dca
Merge branch 'master' into alias 2022-04-17 14:37:04 +00:00
fenglv
2392d4e2b5 fix 2022-04-16 16:08:28 +00:00
Alexey Milovidov
7206838c75 Fix tech debt for Bool and Map data types 2022-04-16 16:09:04 +02:00
fenglv
58111115c5 fix style 2022-04-16 06:21:09 +00:00
fenglv
74ef1b0198 Add aliases JSONLines and NDJSON for JSONEachRow 2022-04-16 06:01:07 +00:00
Anton Popov
2de6668b3f show names of erroneous files 2022-04-16 00:10:47 +00:00
Alexey Milovidov
cbeeb7ec4f Remove Arcadia 2022-04-16 00:20:47 +02:00
avogar
42726639f3 Check ORC/Parquet/Arrow format magic bytes before loading file in memory 2022-04-13 19:27:38 +00:00
avogar
f5f1db86d9 Remove commented code 2022-04-13 19:15:52 +00:00
avogar
8b60aeb7bc Improve schema inference for json objects 2022-04-13 19:13:40 +00:00
avogar
1c065f8c7a Some refactoring around schema inference with globs 2022-04-13 17:02:48 +00:00
Alexey Milovidov
a54c01cf72 Remove useless code in ReplicatedMergeTreeRestartingThread 2022-04-11 00:44:30 +02:00
avogar
1c783ed88a Resolve conflicts 2022-04-07 12:17:48 +00:00
avogar
d2017a63b1 Merge branch 'master' of github.com:ClickHouse/ClickHouse into improve-schema-inference 2022-04-07 11:36:40 +00:00
Kruglov Pavel
f3f8f27db5
Merge pull request #35735 from Avogar/allow-read-bools-as-numbers
Allow to infer and parse bools as numbers in JSON input formats
2022-04-07 13:20:49 +02:00
taiyang-li
2ef316801c Merge branch 'master' into use_minmax_index 2022-04-07 10:53:25 +08:00
Kruglov Pavel
ec2213493f
Merge branch 'master' into allow-read-bools-as-numbers 2022-04-06 14:53:02 +02:00
Kruglov Pavel
9141066de3
Merge branch 'master' into improve-schema-inference 2022-04-06 13:51:07 +02:00
taiyang-li
acb9f1632e suppoort skip splits in orc and parquet 2022-04-06 16:40:22 +08:00
mergify[bot]
1e43e26fa1
Merge branch 'master' into fix-order 2022-04-02 12:00:29 +00:00
avogar
ab2a963287 Merge branch 'master' of github.com:ClickHouse/ClickHouse into allow-read-bools-as-numbers 2022-03-31 14:09:43 +00:00
Kruglov Pavel
252d66e80d
Update src/Processors/Formats/ISchemaReader.cpp
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-31 16:08:37 +02:00
mergify[bot]
24ade25d61
Merge branch 'master' into improve-schema-inference 2022-03-31 13:42:47 +00:00
avogar
836e7dae67 Fix bug in indexes of not presented columns in -WithNames formats 2022-03-31 12:24:40 +00:00
avogar
d272356324 Minor code improvement 2022-03-31 10:55:09 +00:00
avogar
74275da7ee Make better 2022-03-31 10:52:34 +00:00
avogar
000f3043e7 Make better 2022-03-29 17:40:07 +00:00
avogar
3fc36627b3 Allow to infer and parse bools as numbers in JSON input formats 2022-03-29 17:37:31 +00:00
avogar
ce97ccbfb9 Improve schema inference for JSONEachRow and TSKV formats 2022-03-29 14:47:51 +00:00
Antonio Andelic
9990abb76a Use compile-time check for Exception messages, fix wrong messages 2022-03-29 13:16:11 +00:00
avogar
97f5033ea9 Fix tests 2022-03-29 13:07:37 +00:00
mergify[bot]
343588de2c
Merge branch 'master' into improve-schema-inference 2022-03-29 13:06:00 +00:00
Anton Popov
9610139477
Merge pull request #35629 from CurtizJ/dynamic-columns-5
Support schema inference for type `Object` in format `JSONEachRow`
2022-03-29 14:17:09 +02:00
Anton Popov
d677635cd8
Merge pull request #35592 from CurtizJ/dynamic-columns-4
Add parallel parsing and schema inference for format `JSONAsObject`
2022-03-28 19:29:55 +02:00
Anton Popov
67195bfdd5 support schema inference for type Object in format JSONEachRow 2022-03-25 21:51:53 +00:00
avogar
6fb3c3be04 Fix comments and build 2022-03-25 12:02:21 +00:00
Kruglov Pavel
d45143ffe0
Merge branch 'master' into improve-schema-inference 2022-03-25 12:05:40 +01:00
Vladimir C
ae92963b15
Fix build error in Formats/ISchemaReader.cpp 2022-03-25 11:30:25 +01:00
Kruglov Pavel
287e1a6efc
Update src/Processors/Formats/ISchemaReader.cpp
Co-authored-by: Vladimir C <vdimir@clickhouse.com>
2022-03-24 19:16:52 +01:00
Kruglov Pavel
6a9df9d471
Update src/Processors/Formats/ISchemaReader.cpp
Co-authored-by: Vladimir C <vdimir@clickhouse.com>
2022-03-24 19:16:47 +01:00
Kruglov Pavel
3b801a4093
Update src/Processors/Formats/ISchemaReader.cpp
Co-authored-by: Vladimir C <vdimir@clickhouse.com>
2022-03-24 19:16:41 +01:00
Anton Popov
78100abc5f add parallel parsing and schema inference for type Object 2022-03-24 17:51:35 +00:00
avogar
557edbd172 Add some improvements and fixes in schema inference 2022-03-24 12:54:12 +00:00
mergify[bot]
bf90edc362
Merge branch 'master' into case-insensitive-column-matching 2022-03-24 08:00:42 +00:00
Kruglov Pavel
826b933b08
Merge pull request #35332 from Avogar/fix-tskv-schema-inference
Fix schema inference for TSKV format while using small max_read_buffer_size
2022-03-23 18:37:07 +01:00
Antonio Andelic
052057f2ef Address PR comments 2022-03-23 15:42:46 +00:00
Antonio Andelic
6b6190554b Fix conversion of arrow to CH column with hint header 2022-03-22 11:15:48 +00:00
Antonio Andelic
0c23cd7b94 Add support for case insensitive column matching in arrow 2022-03-22 10:55:10 +00:00
Antonio Andelic
ca7844e338 Fix tests 2022-03-22 09:27:20 +00:00
Antonio Andelic
6cebb6bc88 Merge branch 'master' into case-insensitive-column-matching 2022-03-22 07:36:35 +00:00
Antonio Andelic
cb3703b46e Style fix 2022-03-21 12:54:56 +00:00
Antonio Andelic
0457a3998a remove old test 2022-03-21 11:58:55 +00:00
Kruglov Pavel
1645b7083f
Update src/Processors/Formats/Impl/TSKVRowInputFormat.cpp
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-21 12:44:12 +01:00
Kruglov Pavel
0b381ebd26
Update src/Processors/Formats/Impl/TSKVRowInputFormat.cpp
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-21 12:44:06 +01:00
Kruglov Pavel
f67b8c0bad
Update src/Processors/Formats/Impl/TSKVRowInputFormat.cpp
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-21 12:44:00 +01:00
Antonio Andelic
0c74fa2c19 Remove unecessary code 2022-03-21 08:38:15 +00:00
tavplubix
716c6f0ffa
Merge pull request #35406 from Avogar/fix-parquet
Fix working with unneeded columns in Arrow/Parquet/ORC formats
2022-03-21 11:36:54 +03:00
Antonio Andelic
29d2bf7d1a Merge branch 'master' into case-insensitive-column-matching 2022-03-21 08:17:27 +00:00
Antonio Andelic
d73c906e68 Format code 2022-03-21 07:50:17 +00:00
Antonio Andelic
f75b054255 Allow case insensitive column matching 2022-03-21 07:47:37 +00:00
avogar
58f2aca120 Fix tests 2022-03-18 19:04:16 +00:00
avogar
cffa2096de Fix working with unneeded columns in Arrow/Parquet/ORC formats 2022-03-18 13:07:54 +00:00
Kruglov Pavel
aa3c05e9d4
Merge pull request #35152 from rschu1ze/protobuf-batch-write
ProtobufList
2022-03-18 13:24:34 +01:00
Antonio Andelic
607f785e48 Revert "Merge pull request #35145 from bigo-sg/lower-column-name"
This reverts commit ebf72bf61d, reversing
changes made to f1b812bdc1.
2022-03-17 12:31:43 +00:00
Anton Popov
2ced42ed41 add experimental settings for Object type 2022-03-16 16:51:23 +00:00
Anton Popov
0ba78c3c3a Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-16 15:28:09 +00:00
avogar
f7c5fe14e4 Fix schema inference for TSKV format while using small max_read_buffer_size 2022-03-16 13:53:50 +00:00
Robert Schulze
0d2ece6d91
Merge branch 'ClickHouse:master' into protobuf-batch-write 2022-03-16 09:43:33 +01:00
Robert Schulze
23122cb327
Fix review comments
ParquetBlockOutputFormat.cpp:
- undo unrelated formatting

ProtobufSerializer.cpp:
- undef debug tracing
- simplify logic in writeRow()

ProtobufSchemas.cpp:
- restore original search in cache by message type
2022-03-15 11:27:17 +01:00
Maksim Kita
2665724301 Fix clang-tidy warnings in Parsers, Processors, QueryPipeline folders 2022-03-14 18:17:35 +00:00
Anton Popov
36ec379aeb Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-14 16:28:35 +00:00
Antonio Andelic
ebf72bf61d
Merge pull request #35145 from bigo-sg/lower-column-name
add setting to lower column case when reading parquet/orc file
2022-03-14 11:25:03 +01:00
Robert Schulze
514d4d2187
Implement ProtobufList - fixes ClickHouse#16436
Introduce IO format "ProtobufList" with protobuf schema

    // schemafile.proto
    message Envelope {
      message MessageType {
        uint32 colA = 1;
        string colB = 2;
      }
      repeated MessageType mt = 1;
    }

where "Envelope" is a hard-coded/expected top-level message and
"MessageType" is a message with user-provided name containing the table
fields to export/import, e.g.

    SELECT * FROM db1.tab1 FORMAT ProtobufList SETTINGS format_schema =
    'schemafile:MessageType'

As a result, the new format wraps a list of messages (one per row) into
a single, containing message. Compare that to the schema of the existing
IO formats "Protobuf" and "ProtobufSingle":

    message MessageType {
      uint32 colA = 1;
      string colB = 2;
    }

The new format does not save space compared to the existing formats, but
it is conceptually a bit more beautiful and also more convenenient.

Implementation details:

- Created new files ProtobufList(Input|Output)Format which use the
  existing ProtobufSerializer mechanism. The goal was to reuse as much
  code as possible and avoid copypasta.

- I was torn between inheriting from I(Input|Output)Format vs.
  IRow(Input|Output)Format for ProtobufList(Input|Output)Format. The
  former is chunk-based which can be better for performance. Since the
  ProtobufSerializer mechanism is row-based but data is generally passed
  around in chunks, I decided for the latter to leverage the existing
  chunk <--> row mapping code in IRow(InputOutput)Format.

- A new ProtobufSerializer called ProtobufSerializerEnvelope was
  introduced (--> ProtobufSerializer.cpp). It represents the top-level
  message which encloses the list of inner nested messages, i.e. the
  rows.

- With the new format, parsing the schema file and matching the fields in
  the schema file to table column works like for the old formats. The only
  difference is that parsing starts one level below the "Envelope" (-->
  ProtobufSchema.cpp). This is more natural than forcing customers to
  have table columns start with "Envelope".

- Creation of the ProtobufSerializer tree also works like before. What
  is different is that we finally add a ProtobufSerializerEnvelope as
  new root of the tree. It's only purpose is to write/read the top-level
  message for the first/last row to write/read.

Caveats:

- The low-level serialization code in ProtobufWriter uses an internal
  buffer which is flushed to the output file only in endMessage().
  In the existing "Protobuf" format, this happens once per row, in the
  new format this happens only at the end of the serialization
  since row-level messages now call start/endNestedMessage(). As a
  future TODO to, the buffer should be flushed also in
  start/endNestedMessage() to reduce memory consumption.
2022-03-14 08:04:58 +01:00
Maksim Kita
ce0c8e5597
Update JSONRowOutputFormat.cpp 2022-03-14 00:58:36 +01:00
Robert Schulze
f0ba39b071
Clean up some header includes and make formatting more consistent 2022-03-13 20:24:12 +01:00
zhanghuajie
53a8987b3b fix build fail with gcc --fix warnings without disabling some parameters 2022-03-11 21:59:19 +08:00
shuchaome
7a3623d216 fix bug 2022-03-11 17:26:13 +08:00
shuchaome
46cb4483a6 Optimise by lowering schema on the beginning. Add a functional test. 2022-03-11 14:34:46 +08:00
shuchaome
b7cd85df6b remove unused column_names in ORCBlockInputFormat 2022-03-09 18:16:22 +08:00
shuchaome
bb50133424
Apply suggestions from code review
Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com>
2022-03-09 17:32:27 +08:00
shuchaome
9647818adc add unlikely for performance 2022-03-09 17:02:07 +08:00
shuchaome
8027bb1e32 modify code style 2022-03-09 16:32:18 +08:00
shuchaome
56795b831d add setting to lower column case when reading parquet/orc file 2022-03-09 16:07:02 +08:00
zhanghuajie
11dde7c127 fix build fail with gcc 2022-03-08 22:34:51 +08:00
Anton Popov
df3b07fe7c Merge remote-tracking branch 'upstream/master' into HEAD 2022-03-03 22:25:28 +00:00
Maksim Kita
b1a956c5f1 clang-tidy check performance-move-const-arg fix 2022-03-02 18:15:27 +00:00
Anton Popov
2758db5341 add more comments 2022-03-01 19:32:55 +03:00
Anton Popov
fcdebea925 Merge remote-tracking branch 'upstream/master' into HEAD 2022-02-25 13:41:30 +03:00
taiyang-li
e53719a86b remove comments 2022-02-13 17:13:23 +08:00
taiyang-li
aabf2aac69 finish all tests 2022-02-13 17:06:58 +08:00
taiyang-li
6559941972 support datetime64 when transform ch chunk to arrow table 2022-02-13 14:56:01 +08:00
avogar
9e58ae7577 Support jsonl extension for JSONEachRow format 2022-02-10 16:00:37 +03:00
Anton Popov
18940b8637 Merge remote-tracking branch 'upstream/master' into HEAD 2022-02-09 23:38:38 +03:00
Nikolai Kochetov
82a7d70a31
Merge branch 'master' into fix-removing-order-in-CreatingSetsTransform 2022-02-08 19:29:03 +03:00
Nikolai Kochetov
d2d47b9595 Fixing build. 2022-02-08 16:27:33 +00:00
Maksim Kita
4bb69bcb15
Merge pull request #34398 from DevTeamBK/input_format
Method called on already moved
2022-02-08 15:20:07 +01:00
Rajkumar
6b3adbb0de Method called on already moved 2022-02-07 19:50:34 -08:00
avogar
a4c7ecde87 Make better 2022-02-07 17:51:26 +03:00
avogar
c3d30fd502 Fix comments 2022-02-07 17:11:44 +03:00
Kruglov Pavel
34a17075d3 FIx error messages 2022-02-07 17:11:44 +03:00
avogar
77b42bb9ff Support UUID in MsgPack format 2022-02-07 17:11:44 +03:00
Alexey Milovidov
f98010e374 Small improvements 2022-02-06 07:14:01 +03:00
Alexey Milovidov
4a83dbc514 Fix linkage 2022-02-04 00:26:44 +03:00
Alexey Milovidov
c426f11096 Maybe better 2022-02-04 00:20:16 +03:00
Alexey Milovidov
7c12f5f37a Fix terribly low performance of LineAsString format 2022-02-04 00:07:31 +03:00
Anton Popov
836a348a9c Merge remote-tracking branch 'upstream/master' into HEAD 2022-02-01 15:23:07 +03:00
Alexey Milovidov
e4e7169277 Remove some strange code 2022-02-01 02:52:36 +03:00
Alexey Milovidov
83136f3515 Allow \r in the middle of the line in format Regexp 2022-02-01 02:49:26 +03:00
Alexey Milovidov
872d0a0fbe Improve performance of format Regexp 2022-02-01 02:07:48 +03:00
alesapin
dd61d1c2de
Merge pull request #34172 from ClickHouse/fix_race_in_some_engines
Fix benign race condition for storage HDFS, S3, URL
2022-01-31 22:41:54 +03:00
alesapin
93c0700c4c Fix typo 2022-01-31 16:46:58 +03:00
alesapin
056b9e335f Fix comment 2022-01-31 16:39:42 +03:00
alesapin
31753afb7e Fix cancel logic in parallel parsing 2022-01-31 16:38:15 +03:00
Maksim Kita
5ef83deaa6 Update sort to pdqsort 2022-01-30 19:49:48 +00:00
Anton Popov
78b9f15abb Merge remote-tracking branch 'upstream/master' into HEAD 2022-01-30 03:24:37 +03:00
Anton Popov
b950a12cb3
Merge pull request #34068 from CurtizJ/fix-async-insert-native
Fix asynchronous inserts with `Native` format
2022-01-29 01:24:53 +03:00
Azat Khuzhin
1519985c98 Fix possible "Can't attach query to the thread, it is already attached"
After detachQueryIfNotDetached() had been removed it is not enough to
use attachTo() for ThreadPool (scheduleOrThrowOnError()) since the query
may be already attached, if the thread doing multiple jobs, so
CurrentThread::attachToIfDetached() should be used instead.

This should fix all the places from the failures on CI [1]:

    $ fgrep DB::CurrentThread::attachTo -A1 ~/Downloads/47.txt  | fgrep -v attachTo | cut -d' ' -f5,6 | sort | uniq -c
         92 --
          2 /fasttest-workspace/build/../../ClickHouse/contrib/libcxx/include/deque:1393: DB::ParallelParsingInputFormat::parserThreadFunction(std::__1::shared_ptr<DB::ThreadGroupStatus>,
          4 /fasttest-workspace/build/../../ClickHouse/src/Storages/MergeTree/MergeTreeData.cpp:1595: void
         87 /fasttest-workspace/build/../../ClickHouse/src/Storages/MergeTree/MergeTreeDataSelectExecutor.cpp:993: void

  [1]: https://github.com/ClickHouse/ClickHouse/runs/4954466034?check_suite_focus=true

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-01-28 16:25:33 +03:00
Azat Khuzhin
b0c862c297 Fix memory accounting for queries that uses < max_untracker_memory
MemoryTracker starts accounting memory directly only after per-thread
allocation exceeded max_untracker_memory (or memory_profiler_step).

But even memory under this limit should be accounted too, and there is
code to do this in ThreadStatus dtor, however due to
PullingAsyncPipelineExecutor detached the query from thread group that
memory was not accounted.

So remove CurrentThread::detachQueryIfNotDetached() from threads that
uses ThreadFromGlobalPool since it has ThreadStatus, and the query will
be detached using CurrentThread::defaultThreadDeleter.

Note, that before this patch memory accounting works for HTTP queries
due to it had been accounted from ParallelFormattingOutputFormat, but
not for TCP.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-01-28 16:25:33 +03:00
Anton Popov
6c0959b907 fix asynchronous inserts with Native format 2022-01-28 03:25:15 +03:00
Kruglov Pavel
9f12f4af13
Merge pull request #33302 from Avogar/formats-with-suffixes
Allow to create new files on insert for File/S3/HDFS engines
2022-01-25 10:56:15 +03:00
avogar
1f49acc164 Better naming 2022-01-24 16:28:36 +03:00
Anton Popov
e8ce091e68 Merge remote-tracking branch 'upstream/master' into HEAD 2022-01-21 20:11:18 +03:00
avogar
67e396f8f4 Fix schema inference for JSONEachRow and JSONCompactEachRow 2022-01-20 16:31:24 +03:00
mergify[bot]
b318f9b5db
Merge branch 'master' into formats-with-suffixes 2022-01-18 12:17:07 +00:00
Anton Popov
a25f2518e3
Merge pull request #33141 from 1over/feature_default_keyword
Add support of DEFAULT keyword for INSERT
2022-01-18 02:04:37 +03:00
Kruglov Pavel
a7df9cd53a
Merge branch 'master' into formats-with-suffixes 2022-01-14 21:03:49 +03:00
avogar
253035a5df Fix 2022-01-14 19:17:06 +03:00
Kruglov Pavel
d2e9f37bee
Merge branch 'master' into format-by-extention 2022-01-14 18:36:23 +03:00
avogar
89a181bd19 Make better 2022-01-14 18:16:18 +03:00
Kruglov Pavel
5a908e8edd
Merge branch 'master' into formats-with-suffixes 2022-01-14 16:45:20 +03:00
Kruglov Pavel
d54a430d9c
Merge pull request #33566 from Avogar/fix-avro
Fix segfault in Avro
2022-01-14 16:01:56 +03:00
Kseniia Sumarokova
5da673c3a5
Merge pull request #31104 from bigo-sg/hive_table
Implement hive table engine
2022-01-14 09:39:17 +03:00
Kruglov Pavel
305d58a762
Merge pull request #33524 from Avogar/stacktrace-in-client
Don't print exception twice in client in case of exception in parallel parsing
2022-01-13 15:50:42 +03:00
Nikolai Kochetov
872ee5dc09
Update src/Processors/Formats/Impl/AvroRowOutputFormat.h
Co-authored-by: Bharat Nallan <bharatnc@gmail.com>
2022-01-13 12:55:14 +03:00
avogar
c5ea4b1bc0 Fix segfault in Avro 2022-01-12 18:34:28 +03:00
avogar
8390e9ad60 Detect format by file name in file/hdfs/s3/url table functions 2022-01-12 18:29:31 +03:00
lgbo-ustc
5c71d3687a fixed some bugs
1. interagtion test for test_hive_query failed
2. nullptr reference in arrowSchemaToCHHeader
2022-01-12 17:01:05 +08:00
taiyang-li
66813a3aa9 merge master 2022-01-12 16:56:29 +08:00
avogar
9915ce7ded Fix segfault in arrowSchemaToCHHeader 2022-01-11 20:30:35 +03:00
avogar
0ae0aa712b Don't print exception twice in client in case of exception in parallel parsing 2022-01-11 18:37:07 +03:00
李扬
2df2442ad0
Merge branch 'master' into hive_table 2022-01-04 01:26:16 -06:00
taiyang-li
8730dda895 fix hivte text 2022-01-01 09:16:30 +08:00
taiyang-li
1e102bc1b2 merge master 2022-01-01 09:01:06 +08:00