Commit Graph

297 Commits

Author SHA1 Message Date
avogar
7375a7d429 Refactor and improve schema inference for text formats 2022-12-07 21:19:27 +00:00
Kruglov Pavel
c35b2a6495
Add a limit for string size in RowBinary format (#43842) 2022-12-02 13:57:11 +01:00
Anton Popov
fe5fff0347
Merge pull request #43329 from xiedeyantu/support_nested_column
s3 table function can support select nested column using {column_name}.{subcolumn_name}
2022-11-29 22:27:19 +01:00
xiedeyantu
304b6ebf3a s3 table function can support select nested column using {column_name}.{subcolumn_name} 2022-11-23 23:36:12 +08:00
Kruglov Pavel
98d6b96c82
Merge pull request #42033 from mark-polokhov/BSONEachRow
Add BSONEachRow input/output format
2022-11-22 14:45:21 +01:00
Vitaly Baranov
ce81166c7e Fix style. 2022-11-16 01:35:11 +01:00
Vitaly Baranov
8e99f5fea3 Move maskSensitiveInfoInQueryForLogging() to src/Parsers/ 2022-11-14 18:55:19 +01:00
avogar
9e89af28c6 Refactor BSONEachRow format, fix bugs, support more data types, support parallel parsing and schema inference 2022-11-10 20:15:14 +00:00
Kruglov Pavel
b124875257
Merge branch 'master' into improve-streaming-engines 2022-11-03 13:22:06 +01:00
avogar
8e13d1f1ec Improve and refactor Kafka/StorageMQ/NATS and data formats 2022-10-28 16:41:10 +00:00
Alexey Milovidov
f88ed8195b Fix trash 2022-10-17 04:21:08 +02:00
Kruglov Pavel
6fc12dd922
Merge pull request #41703 from Avogar/json-object-each-row
Add setting to obtain object name as column value in JSONObjectEachRow format
2022-10-14 20:11:04 +02:00
Vitaly Baranov
f65d3ff95a Fix parallel parsing: segmentator now checks max_block_size. 2022-09-30 22:34:03 +02:00
Kruglov Pavel
f1ac2d66be
Merge branch 'master' into json-object-each-row 2022-09-28 14:15:02 +02:00
avogar
76be0d2ee1 Infer Object type only when allow_experimental_object_type is enabled 2022-09-27 23:07:36 +00:00
avogar
d3d06251a3 Add setting to obtain object name as column value in JSONObjectEachRow format 2022-09-22 16:48:54 +00:00
Kruglov Pavel
22e11aef2d
Merge pull request #40910 from Avogar/new-json-formats
Add new JSON formats, add improvements and refactoring
2022-09-21 14:19:08 +02:00
avogar
868ce8bc16 Fix comments, make better naming, add docs, add setting output_format_json_quote_64bit_floats 2022-09-20 13:49:17 +00:00
Alexey Milovidov
da01982652
Merge pull request #41046 from azat/build/llvm-15
Switch to llvm/clang 15
2022-09-16 07:31:06 +03:00
Azat Khuzhin
e8d7403a38 Suppress warning in FormatFactory::getFormatFromFileDescriptor() for FreeBSD
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
2022-09-10 21:38:35 +02:00
zhenjial
bd9fabc3f7 code optimization, add test 2022-09-09 23:27:42 +08:00
zhenjial
469ceaa156 code optimization 2022-09-09 00:47:43 +08:00
avogar
c380decbbb Make better, add new settings 2022-09-08 16:07:20 +00:00
zhenjial
0f788d98f5 new implementation 2022-09-06 20:39:54 +08:00
avogar
b94e896c1c Remove logs 2022-09-01 19:01:27 +00:00
avogar
afc34dca41 Add new JSON formats, add improvements and refactoring 2022-09-01 19:00:24 +00:00
avogar
612ffaffde Make schema inference cache better, respect format settings that can change the schema 2022-08-19 16:39:13 +00:00
Nikolai Kochetov
5a85531ef7
Merge pull request #38286 from Avogar/schema-inference-cache
Add schema inference cache for s3/hdfs/file/url
2022-08-18 13:07:50 +02:00
avogar
8dd54c043d Merge branch 'master' of github.com:ClickHouse/ClickHouse into schema-inference-cache 2022-08-17 11:47:40 +00:00
avogar
e1ff996ec3 Allow to specify structure hints in schema inference 2022-08-16 09:46:57 +00:00
Kruglov Pavel
6b2186bfeb
Merge branch 'master' into numbers-schema-inference 2022-08-02 19:34:53 +02:00
Kruglov Pavel
381ea139c2
Merge branch 'master' into schema-inference-cache 2022-07-27 11:35:36 +02:00
avogar
784ee11594 Add settings to skip fields with unsupported types in Protobuf/CapnProto schema inference 2022-07-20 11:16:25 +00:00
Kruglov Pavel
b38241b08a
Merge branch 'master' into schema-inference-cache 2022-07-14 12:29:54 +02:00
avogar
7cde9d3b40 Add new features in schema inference 2022-07-13 15:57:55 +00:00
avogar
5b0fd31c64 Put column names in quotes 2022-06-30 16:14:30 +00:00
avogar
9bb68bc6de Add SQLInsert output format 2022-06-27 18:31:57 +00:00
avogar
5155262a16 Add some additional information to cache keys 2022-06-27 12:43:24 +00:00
Alexey Milovidov
5e9e5a4eaf
Merge pull request #37525 from Avogar/avro-structs
Support Maps and Records, allow to insert null as default in Avro format
2022-06-15 00:04:29 +03:00
Robert Schulze
1a0b5f33b3
More consistent use of platform macros
cmake/target.cmake defines macros for the supported platforms, this
commit changes predefined system macros to our own macros.

__linux__ --> OS_LINUX
__APPLE__ --> OS_DARWIN
__FreeBSD__ --> OS_FREEBSD
2022-06-10 10:22:31 +02:00
Kruglov Pavel
7cc87d9a65
Merge pull request #37537 from Avogar/skip-first-lines
Allow to skip some of the first lines in CSV/TSV formats
2022-05-31 14:26:21 +02:00
Kruglov Pavel
0615866aea
Merge pull request #37450 from Avogar/check-format-on-storage-creation
Check format name on storage creation
2022-05-30 14:23:20 +02:00
Alexey Milovidov
c50791dd3b Fix clang-tidy-14, part 1 2022-05-27 22:52:14 +02:00
avogar
4c9812d4c1 Allow to skip some of the first rows in CSV/TSV formats 2022-05-25 15:00:11 +00:00
avogar
038a422aeb Add setting to insert null as default 2022-05-25 12:56:59 +00:00
avogar
f782fa31c6 Merge branch 'master' of github.com:ClickHouse/ClickHouse into check-format-on-storage-creation 2022-05-25 08:42:54 +00:00
avogar
37b66c8a9e Check format name on storage creation 2022-05-23 12:48:48 +00:00
Kruglov Pavel
f539fb835d
Merge branch 'master' into formats-with-names 2022-05-23 12:14:20 +02:00
avogar
a4cf07708c Fix comments 2022-05-20 14:57:27 +00:00
avogar
a0369fb9a6 Allow to use String type instead of Binary in Arrow/Parquet/ORC formats 2022-05-18 14:51:21 +00:00
avogar
68bb07d166 Better naming 2022-05-13 18:39:19 +00:00
avogar
b17fec659a Improve performance and memory usage for select of subset of columns for some formats 2022-05-13 13:51:28 +00:00
Kruglov Pavel
77e55c344c
Merge pull request #36667 from Avogar/mysqldump-format
Add MySQLDump input format
2022-05-04 19:49:48 +02:00
Dmitry Novik
5ba7a55c18
Merge pull request #36650 from bigo-sg/hive_text_parallel_parsing
Parallel parsing of hive text format
2022-05-03 15:56:28 +02:00
Kruglov Pavel
d613f7eab0
Merge branch 'master' into mysqldump-format 2022-05-02 13:31:57 +02:00
Jakub Kuklis
a1f2dd6d34 Adding two settings in place of one, improvements to the test clarity 2022-04-29 10:01:51 +02:00
Jakub Kuklis
507ba1042c Adding a setting to enable Google wrappers special treatment 2022-04-29 10:01:51 +02:00
avogar
d295de1689 Fix comments and test 2022-04-28 14:59:35 +00:00
Kruglov Pavel
4d08587559
Merge branch 'master' into mysqldump-format 2022-04-28 15:58:18 +02:00
avogar
33d845dade Add MySQLDump input format 2022-04-26 10:42:56 +00:00
taiyang-li
b7cc344d62 remove useless codes 2022-04-26 14:42:43 +08:00
taiyang-li
99dee35b6e parallel parsing of hive text format 2022-04-26 14:33:10 +08:00
avogar
1c065f8c7a Some refactoring around schema inference with globs 2022-04-13 17:02:48 +00:00
avogar
d2017a63b1 Merge branch 'master' of github.com:ClickHouse/ClickHouse into improve-schema-inference 2022-04-07 11:36:40 +00:00
Kruglov Pavel
ec2213493f
Merge branch 'master' into allow-read-bools-as-numbers 2022-04-06 14:53:02 +02:00
Kruglov Pavel
9141066de3
Merge branch 'master' into improve-schema-inference 2022-04-06 13:51:07 +02:00
Maksim Kita
371cdc956a Added input format settings for parsing invalid IPv4, IPv6 addresses as default values 2022-03-30 12:54:19 +02:00
avogar
3fc36627b3 Allow to infer and parse bools as numbers in JSON input formats 2022-03-29 17:37:31 +00:00
Kruglov Pavel
d45143ffe0
Merge branch 'master' into improve-schema-inference 2022-03-25 12:05:40 +01:00
avogar
557edbd172 Add some improvements and fixes in schema inference 2022-03-24 12:54:12 +00:00
Antonio Andelic
0c23cd7b94 Add support for case insensitive column matching in arrow 2022-03-22 10:55:10 +00:00
Antonio Andelic
f75b054255 Allow case insensitive column matching 2022-03-21 07:47:37 +00:00
Antonio Andelic
607f785e48 Revert "Merge pull request #35145 from bigo-sg/lower-column-name"
This reverts commit ebf72bf61d, reversing
changes made to f1b812bdc1.
2022-03-17 12:31:43 +00:00
shuchaome
46cb4483a6 Optimise by lowering schema on the beginning. Add a functional test. 2022-03-11 14:34:46 +08:00
shuchaome
56795b831d add setting to lower column case when reading parquet/orc file 2022-03-09 16:07:02 +08:00
Maksim Kita
b1a956c5f1 clang-tidy check performance-move-const-arg fix 2022-03-02 18:15:27 +00:00
avogar
77b42bb9ff Support UUID in MsgPack format 2022-02-07 17:11:44 +03:00
Kruglov Pavel
7873b4475f
Merge branch 'master' into autodetect-format 2022-01-25 10:56:52 +03:00
avogar
a6740d2f9a Detect format and schema for stdin in clickhouse-local 2022-01-25 10:25:37 +03:00
avogar
1f49acc164 Better naming 2022-01-24 16:28:36 +03:00
Kruglov Pavel
a7df9cd53a
Merge branch 'master' into formats-with-suffixes 2022-01-14 21:03:49 +03:00
avogar
253035a5df Fix 2022-01-14 19:17:06 +03:00
Kruglov Pavel
d2e9f37bee
Merge branch 'master' into format-by-extention 2022-01-14 18:36:23 +03:00
avogar
89a181bd19 Make better 2022-01-14 18:16:18 +03:00
avogar
817a314263 Fix tests and style 2022-01-14 17:46:24 +03:00
Kruglov Pavel
5a908e8edd
Merge branch 'master' into formats-with-suffixes 2022-01-14 16:45:20 +03:00
Kseniia Sumarokova
5da673c3a5
Merge pull request #31104 from bigo-sg/hive_table
Implement hive table engine
2022-01-14 09:39:17 +03:00
avogar
2d7b1bfa5e Detect format in S3/HDFS/URL table engines 2022-01-13 16:14:18 +03:00
Kruglov Pavel
305d58a762
Merge pull request #33524 from Avogar/stacktrace-in-client
Don't print exception twice in client in case of exception in parallel parsing
2022-01-13 15:50:42 +03:00
avogar
8390e9ad60 Detect format by file name in file/hdfs/s3/url table functions 2022-01-12 18:29:31 +03:00
taiyang-li
66813a3aa9 merge master 2022-01-12 16:56:29 +08:00
avogar
0ae0aa712b Don't print exception twice in client in case of exception in parallel parsing 2022-01-11 18:37:07 +03:00
zhongyuankai
99279c1443 INTO OUTFILE / FROM INFILE: autodetect FORMAT by file extension 2022-01-11 21:26:14 +08:00
zhongyuankai
878e44eb97 auto format by file extension 2022-01-08 21:47:14 +08:00
taiyang-li
1e102bc1b2 merge master 2022-01-01 09:01:06 +08:00
alesapin
16c36d72b1
Merge pull request #33296 from ClickHouse/fix_clang_tidy_3
Fix clang tidy 3
2021-12-29 22:43:42 +03:00
avogar
97788b9c21 Allow to create new files on insert for File/S3/HDFS engines 2021-12-29 21:19:13 +03:00
Kruglov Pavel
489a30859f
Merge pull request #32455 from Avogar/schema-inference
Automatic schema inference for input formats
2021-12-29 21:03:48 +03:00
alesapin
34145c47da Fix clang tidy 2021-12-29 18:36:42 +03:00
avogar
8112a71233 Implement schema inference for most input formats 2021-12-29 12:18:56 +03:00